![]() |
|
|
#562 |
|
Mar 2010
3·137 Posts |
Oh, one more thing. Could increasing threads to 512 actually increase GPU load ? And reduce the CPU bottleneck ? Right now, to utilize the gpu close to 92% I need to launch 3 separate instances of mfakt.
|
|
|
|
|
|
#563 |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Hi Karl,
I don't think so. Remind that we're talking about threads per block. If there are enough resources available (for mfaktc the number of registers is the limiting factor) the GPU can run multiple blocks per multiprocessor. Oliver |
|
|
|
|
|
#564 | |
|
"Oliver"
Mar 2005
Germany
21278 Posts |
Hi Karl,
Quote:
64 is a little bit slower in some cases, all other values are within the measuring inaccuracy on my GTX 470. Oliver |
|
|
|
|
|
|
#565 |
|
Mar 2010
3×137 Posts |
So increasing threads per block to 512 dont give any speed advantage over 256 of them ?
Ok. But the reason why single instance of mfakt cant utilize GPU up to 99% is that the whole code isnt executed on the GPU ? |
|
|
|
|
|
#566 | |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Quote:
Another reason might be short running assignments (per class less than a few seconds) because between two classes the internal queues for working blocks run empty. May I know - your CPU - your GPU - your typical assignments (e.g. M83.xxx.xxx from 2^64 to 2^65) - your OS - 32 or 64 bit version of mfaktc - are you using default settings in mfaktc.ini Oliver |
|
|
|
|
|
|
#567 |
|
Mar 2010
1100110112 Posts |
Sure!
Q6600 @ 3.0 Ghz GTX 480 @ 811/1622/3800 Typical assigments are a bit tricky. Anything PrimeNet assigns via manual request. Windows 7 x64 Using 32 bit mfaktc. Code:
SievePrimes=5000 SievePrimesAdjust=1 NumStreams=10 CPUStreams=5 WorkFile=worktodo.txt Checkpoints=1 Stages=0 StopAfterFactor=0 Last fiddled with by Karl M Johnson on 2011-02-06 at 13:42 |
|
|
|
|
|
#568 |
|
Apr 2010
Over the rainbow
2×1,303 Posts |
Core 2 duo 8300 @ 2.83 Ghz
GTX 460 @ 715/1430/3600 Typical assigments are a bit tricky. Anything PrimeNet assigns via manual request. (from 65,66 to 69,70, for M80XXXXXX speed is around 80M/s at when the assignment is low ,104M/s at high. with 2 assignment, total speed is about 130-140M/s) Windows XP Using 32 bit version Last fiddled with by firejuggler on 2011-02-06 at 13:31 |
|
|
|
|
|
#569 |
|
May 2008
Åsane, Bergen, Norway
3·5 Posts |
2x i7 920@2,66 + 1x i7 2600k@4430 all with one GTX 460@stock
XP64 Manual assignments edited to xx,71 Speed 130M~140M - one app pr GPU Works like a dream!
Last fiddled with by Ungelovende on 2011-02-06 at 14:31 Reason: typø |
|
|
|
|
|
#570 |
|
"Oliver"
Mar 2005
Germany
100010101112 Posts |
Hi!
Karl: since you're running a 64bit windows I recommend to try the 64bit version of mfaktc, the sieve (CPU) runs more efficient in 64bit mode (~33% faster on Linux on my C2D and i7). You might to adjust the upper TF limit, too. Take a look at the README.txt. Maybe your OCed GTX 480 is just a little bit too fast. ![]() ...did I really wrote that? Hardware can never be too fast. ![]() The flag "--use_fast_math" isn't used... this only affects floating point numbers an mfaktc primary depends on integers. Oliver |
|
|
|
|
|
#571 |
|
Mar 2010
3·137 Posts |
The thing is, I'm currently on most proper drivers, which do not cause slowdown to cuda/opencl apps, intentionally or not.
The good thing, is by doing some curart shenanigans, I can run apps compiled under 26x.xx on my 258.96. The bad thing, is, well, only 32 bit apps can be run like that. Regarding the flag thing: heh, well, didnt know that. I was NOT impressed when I read AVX of Sandy Bridge only works on FPs. Integers dont get any benefits from it. Well, at least SB's has a better perf perf clock ratio. Last fiddled with by Karl M Johnson on 2011-02-06 at 19:36 Reason: Yes. |
|
|
|
|
|
#572 |
|
Apr 2010
Over the rainbow
A2E16 Posts |
a small thing.. i tested mfacktc a bit further and it seems that there is a limit to the exponent you can use. If it is above 4294967295 (yeah, 11 time the billion digit project); it enter the famous loop :
got assignment: exp=4294967295 bit_min=1 bit_max=60 WARNING: exponent is not prime! Ignoring this assignment! (4294967295=2^32-1) Last fiddled with by firejuggler on 2011-02-07 at 23:13 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |