mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-02-05, 04:42   #562
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

Oh, one more thing. Could increasing threads to 512 actually increase GPU load ? And reduce the CPU bottleneck ? Right now, to utilize the gpu close to 92% I need to launch 3 separate instances of mfakt.
Karl M Johnson is offline   Reply With Quote
Old 2011-02-05, 15:01   #563
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Hi Karl,

I don't think so. Remind that we're talking about threads per block.
If there are enough resources available (for mfaktc the number of registers is the limiting factor) the GPU can run multiple blocks per multiprocessor.

Oliver
TheJudger is offline   Reply With Quote
Old 2011-02-06, 01:55   #564
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

21278 Posts
Default

Hi Karl,

Quote:
Originally Posted by TheJudger View Post
Anyway I'll test it with the current version again, but I think I know allready the result...
I just tested THREADS_PER_BLOCK = 64, 128, 192, 256, 384 and 512.
64 is a little bit slower in some cases, all other values are within the measuring inaccuracy on my GTX 470.

Oliver
TheJudger is offline   Reply With Quote
Old 2011-02-06, 05:42   #565
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

So increasing threads per block to 512 dont give any speed advantage over 256 of them ?
Ok.
But the reason why single instance of mfakt cant utilize GPU up to 99% is that the whole code isnt executed on the GPU ?
Karl M Johnson is offline   Reply With Quote
Old 2011-02-06, 12:43   #566
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

21278 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
But the reason why single instance of mfakt cant utilize GPU up to 99% is that the whole code isnt executed on the GPU ?
Yep, that's the most likely reason.
Another reason might be short running assignments (per class less than a few seconds) because between two classes the internal queues for working blocks run empty.
May I know
- your CPU
- your GPU
- your typical assignments (e.g. M83.xxx.xxx from 2^64 to 2^65)
- your OS
- 32 or 64 bit version of mfaktc
- are you using default settings in mfaktc.ini

Oliver
TheJudger is offline   Reply With Quote
Old 2011-02-06, 13:11   #567
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

Sure!
Q6600 @ 3.0 Ghz
GTX 480 @ 811/1622/3800
Typical assigments are a bit tricky. Anything PrimeNet assigns via manual request.
Windows 7 x64
Using 32 bit mfaktc.
Code:
SievePrimes=5000
SievePrimesAdjust=1
NumStreams=10
CPUStreams=5
WorkFile=worktodo.txt
Checkpoints=1
Stages=0
StopAfterFactor=0
Btw, when compiling mfaktc, is "--use_fast_math" flag used ?

Last fiddled with by Karl M Johnson on 2011-02-06 at 13:42
Karl M Johnson is offline   Reply With Quote
Old 2011-02-06, 13:18   #568
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

260010 Posts
Default

Core 2 duo 8300 @ 2.83 Ghz
GTX 460 @ 715/1430/3600
Typical assigments are a bit tricky. Anything PrimeNet assigns via manual request. (from 65,66 to 69,70, for M80XXXXXX speed is around 80M/s at when the assignment is low ,104M/s at high. with 2 assignment, total speed is about 130-140M/s)
Windows XP
Using 32 bit version

Last fiddled with by firejuggler on 2011-02-06 at 13:31
firejuggler is offline   Reply With Quote
Old 2011-02-06, 14:25   #569
Ungelovende
 
May 2008
Åsane, Bergen, Norway

3×5 Posts
Default

2x i7 920@2,66 + 1x i7 2600k@4430 all with one GTX 460@stock
XP64
Manual assignments edited to xx,71
Speed 130M~140M - one app pr GPU

Works like a dream!

Last fiddled with by Ungelovende on 2011-02-06 at 14:31 Reason: typø
Ungelovende is offline   Reply With Quote
Old 2011-02-06, 17:59   #570
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Hi!

Karl: since you're running a 64bit windows I recommend to try the 64bit version of mfaktc, the sieve (CPU) runs more efficient in 64bit mode (~33% faster on Linux on my C2D and i7). You might to adjust the upper TF limit, too. Take a look at the README.txt.

Maybe your OCed GTX 480 is just a little bit too fast.
...did I really wrote that? Hardware can never be too fast.

The flag "--use_fast_math" isn't used... this only affects floating point numbers an mfaktc primary depends on integers.


Oliver
TheJudger is offline   Reply With Quote
Old 2011-02-06, 19:34   #571
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

41110 Posts
Default

The thing is, I'm currently on most proper drivers, which do not cause slowdown to cuda/opencl apps, intentionally or not.
The good thing, is by doing some curart shenanigans, I can run apps compiled under 26x.xx on my 258.96.
The bad thing, is, well, only 32 bit apps can be run like that.

Regarding the flag thing: heh, well, didnt know that.
I was NOT impressed when I read AVX of Sandy Bridge only works on FPs.
Integers dont get any benefits from it.
Well, at least SB's has a better perf perf clock ratio.

Last fiddled with by Karl M Johnson on 2011-02-06 at 19:36 Reason: Yes.
Karl M Johnson is offline   Reply With Quote
Old 2011-02-07, 23:07   #572
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

23×52×13 Posts
Default

a small thing.. i tested mfacktc a bit further and it seems that there is a limit to the exponent you can use. If it is above 4294967295 (yeah, 11 time the billion digit project); it enter the famous loop :
got assignment: exp=4294967295 bit_min=1 bit_max=60
WARNING: exponent is not prime! Ignoring this assignment!

(4294967295=2^32-1)

Last fiddled with by firejuggler on 2011-02-07 at 23:13
firejuggler is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 16:24.


Fri Jul 16 16:24:25 UTC 2021 up 49 days, 14:11, 1 user, load averages: 2.16, 1.73, 1.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.