mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

Karl M Johnson 2011-02-05 04:42

Oh, one more thing. Could increasing threads to 512 actually increase GPU load ? And reduce the CPU bottleneck ? Right now, to utilize the gpu close to 92% I need to launch 3 separate instances of mfakt.

TheJudger 2011-02-05 15:01

Hi Karl,

I don't think so. Remind that we're talking about threads [B]per block[/B].
If there are enough resources available (for mfaktc the number of registers is the limiting factor) the GPU can run multiple blocks per multiprocessor.

Oliver

TheJudger 2011-02-06 01:55

Hi Karl,

[QUOTE=TheJudger;251228]Anyway I'll test it with the current version again, but I think I know allready the result...[/QUOTE]

I just tested THREADS_PER_BLOCK = 64, 128, 192, 256, 384 and 512.
64 is a little bit slower in some cases, all other values are within the measuring inaccuracy on my GTX 470.

Oliver

Karl M Johnson 2011-02-06 05:42

So increasing threads per block to 512 dont give any speed advantage over 256 of them ?
Ok.
But the reason why single instance of mfakt cant utilize GPU up to 99% is that the whole code isnt executed on the GPU ?

TheJudger 2011-02-06 12:43

[QUOTE=Karl M Johnson;251496]But the reason why single instance of mfakt cant utilize GPU up to 99% is that the whole code isnt executed on the GPU ?[/QUOTE]

Yep, that's the most likely reason.
Another reason might be short running assignments (per class less than a few seconds) because between two classes the internal queues for working blocks run empty.
May I know
- your CPU
- your GPU
- your typical assignments (e.g. M83.xxx.xxx from 2^64 to 2^65)
- your OS
- 32 or 64 bit version of mfaktc
- are you using default settings in mfaktc.ini

Oliver

Karl M Johnson 2011-02-06 13:11

Sure!
Q6600 @ 3.0 Ghz
GTX 480 @ 811/1622/3800
Typical assigments are a bit tricky. Anything PrimeNet assigns via manual request.
Windows 7 x64
Using 32 bit mfaktc.
[code]SievePrimes=5000
SievePrimesAdjust=1
NumStreams=10
CPUStreams=5
WorkFile=worktodo.txt
Checkpoints=1
Stages=0
StopAfterFactor=0[/code]

Btw, when compiling mfaktc, is "--use_fast_math" flag used ?

firejuggler 2011-02-06 13:18

Core 2 duo 8300 @ 2.83 Ghz
GTX 460 @ 715/1430/3600
Typical assigments are a bit tricky. Anything PrimeNet assigns via manual request. (from 65,66 to 69,70, for M80XXXXXX speed is around 80M/s at when the assignment is low ,104M/s at high. with 2 assignment, total speed is about 130-140M/s)
Windows XP
Using 32 bit version

Ungelovende 2011-02-06 14:25

2x i7 920@2,66 + 1x i7 2600k@4430 all with one GTX 460@stock
XP64
Manual assignments edited to xx,71
Speed 130M~140M - one app pr GPU

Works like a dream!:smile:

TheJudger 2011-02-06 17:59

Hi!

Karl: since you're running a 64bit windows I recommend to try the 64bit version of mfaktc, the sieve (CPU) runs more efficient in 64bit mode (~33% faster on Linux on my C2D and i7). You might to adjust the upper TF limit, too. Take a look at the README.txt.

Maybe your OCed GTX 480 is just a little bit too fast. :devil:
...did I really wrote that? Hardware can never be too fast. :wink:

The flag "--use_fast_math" isn't used... this only affects floating point numbers an mfaktc primary depends on integers.


Oliver

Karl M Johnson 2011-02-06 19:34

The thing is, I'm currently on most proper drivers, which do not cause slowdown to cuda/opencl apps, intentionally or not.
The good thing, is by doing some curart shenanigans, I can run apps compiled under 26x.xx on my 258.96.
The bad thing, is, well, only 32 bit apps can be run like that.

Regarding the flag thing: heh, well, didnt know that.
I was NOT impressed when I read AVX of Sandy Bridge only works on FPs.
Integers dont get any benefits from it.
Well, at least SB's has a better perf perf clock ratio.

firejuggler 2011-02-07 23:07

a small thing.. i tested mfacktc a bit further and it seems that there is a limit to the exponent you can use. If it is above 4294967295 (yeah, 11 time the billion digit project); it enter the famous loop :
got assignment: exp=4294967295 bit_min=1 bit_max=60
WARNING: exponent is not prime! Ignoring this assignment!

(4294967295=2^32-1)


All times are UTC. The time now is 23:04.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.