![]() |
Oh, one more thing. Could increasing threads to 512 actually increase GPU load ? And reduce the CPU bottleneck ? Right now, to utilize the gpu close to 92% I need to launch 3 separate instances of mfakt.
|
Hi Karl,
I don't think so. Remind that we're talking about threads [B]per block[/B]. If there are enough resources available (for mfaktc the number of registers is the limiting factor) the GPU can run multiple blocks per multiprocessor. Oliver |
Hi Karl,
[QUOTE=TheJudger;251228]Anyway I'll test it with the current version again, but I think I know allready the result...[/QUOTE] I just tested THREADS_PER_BLOCK = 64, 128, 192, 256, 384 and 512. 64 is a little bit slower in some cases, all other values are within the measuring inaccuracy on my GTX 470. Oliver |
So increasing threads per block to 512 dont give any speed advantage over 256 of them ?
Ok. But the reason why single instance of mfakt cant utilize GPU up to 99% is that the whole code isnt executed on the GPU ? |
[QUOTE=Karl M Johnson;251496]But the reason why single instance of mfakt cant utilize GPU up to 99% is that the whole code isnt executed on the GPU ?[/QUOTE]
Yep, that's the most likely reason. Another reason might be short running assignments (per class less than a few seconds) because between two classes the internal queues for working blocks run empty. May I know - your CPU - your GPU - your typical assignments (e.g. M83.xxx.xxx from 2^64 to 2^65) - your OS - 32 or 64 bit version of mfaktc - are you using default settings in mfaktc.ini Oliver |
Sure!
Q6600 @ 3.0 Ghz GTX 480 @ 811/1622/3800 Typical assigments are a bit tricky. Anything PrimeNet assigns via manual request. Windows 7 x64 Using 32 bit mfaktc. [code]SievePrimes=5000 SievePrimesAdjust=1 NumStreams=10 CPUStreams=5 WorkFile=worktodo.txt Checkpoints=1 Stages=0 StopAfterFactor=0[/code] Btw, when compiling mfaktc, is "--use_fast_math" flag used ? |
Core 2 duo 8300 @ 2.83 Ghz
GTX 460 @ 715/1430/3600 Typical assigments are a bit tricky. Anything PrimeNet assigns via manual request. (from 65,66 to 69,70, for M80XXXXXX speed is around 80M/s at when the assignment is low ,104M/s at high. with 2 assignment, total speed is about 130-140M/s) Windows XP Using 32 bit version |
2x i7 920@2,66 + 1x i7 2600k@4430 all with one GTX 460@stock
XP64 Manual assignments edited to xx,71 Speed 130M~140M - one app pr GPU Works like a dream!:smile: |
Hi!
Karl: since you're running a 64bit windows I recommend to try the 64bit version of mfaktc, the sieve (CPU) runs more efficient in 64bit mode (~33% faster on Linux on my C2D and i7). You might to adjust the upper TF limit, too. Take a look at the README.txt. Maybe your OCed GTX 480 is just a little bit too fast. :devil: ...did I really wrote that? Hardware can never be too fast. :wink: The flag "--use_fast_math" isn't used... this only affects floating point numbers an mfaktc primary depends on integers. Oliver |
The thing is, I'm currently on most proper drivers, which do not cause slowdown to cuda/opencl apps, intentionally or not.
The good thing, is by doing some curart shenanigans, I can run apps compiled under 26x.xx on my 258.96. The bad thing, is, well, only 32 bit apps can be run like that. Regarding the flag thing: heh, well, didnt know that. I was NOT impressed when I read AVX of Sandy Bridge only works on FPs. Integers dont get any benefits from it. Well, at least SB's has a better perf perf clock ratio. |
a small thing.. i tested mfacktc a bit further and it seems that there is a limit to the exponent you can use. If it is above 4294967295 (yeah, 11 time the billion digit project); it enter the famous loop :
got assignment: exp=4294967295 bit_min=1 bit_max=60 WARNING: exponent is not prime! Ignoring this assignment! (4294967295=2^32-1) |
| All times are UTC. The time now is 23:04. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.