mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-04-21, 20:59   #760
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

111110 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
Code:
╔═════════╤════════╤════════╤════════╤════════╤════════╤══════╗
║instances│cpu_load│gpu_load│ave_rate│cpu_temp│gpu_temp│ time ║
╟─────────┼────────┼────────┼────────┼────────┼────────┼──────╢
║        0│      0%│      0%│     n/a│    29°C│    32°C│   n/a║
║        1│     26%│     52%│  190M/s│    54°C│    52°C│ 9m10s║
║        2│     51%│     92%│  173M/s│    63°C│    62°C│10m08s║
║        3│     76%│     95%│  121M/s│    68°C│    65°C│12m29s║
║        4│    100%│     97%│   97M/s│    71°C│    66°C│14m44s║
╚═════════╧════════╧════════╧════════╧════════╧════════╧══════╝
Is it sane to use the average rate to determine overall throughput?
As a rough estimate it might be OK but you should keep an eye on SievePrimes, too. Higher SievePrimes means more candidates are removed with sieving on CPU so each class has less candidates.
A better estimate is the time per class.

Quote:
Originally Posted by Xyzzy View Post
Code:
╔═════════╤════════════════╗
║instances│   throughput   ║
╟─────────┼────────────────╢
║        1│190 × 1 = 190M/s║
║        2│173 × 2 = 346M/s║
║        3│121 × 3 = 363M/s║
║        4│ 97 × 4 = 388M/s║
╚═════════╧════════════════╝
We interpret the data above to be that the CPU is filling the GPU "bucket" faster than the GPU can empty the "bucket". With 2 or more instances the GPU load is nearly topped out.
Yepp, but mentioned above with more CPU power you can do more sieving and this reduces the number of candidates per class.


Oliver
TheJudger is offline   Reply With Quote
Old 2011-04-22, 01:21   #761
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

2·23·179 Posts
Default

Quote:
Higher SievePrimes means more candidates are removed with sieving on CPU so each class has less candidates.
We have tested raising "SievePrimes" from 25,000 up to 1,000,000 without any difference in performance. The CPU load and CPU memory usage did not increase. Perhaps we are doing something wrong?
Xyzzy is offline   Reply With Quote
Old 2011-04-22, 10:08   #762
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default CUDA 4.0 driver out

The CUDA 4.0 driver is out (see attachment, driver 270.XX). Could anybody compile mfaktc for Win7 with 4.0? I'd like to test if there's gonna be any speedup.
Attached Thumbnails
Click image for larger version

Name:	mfaktc_0.16_Screeny_CUDA40.png
Views:	114
Size:	51.2 KB
ID:	6523  
Brain is offline   Reply With Quote
Old 2011-04-22, 11:37   #763
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2×3×11×73 Posts
Default

Quote:
Originally Posted by Brain View Post
The CUDA 4.0 driver is out (see attachment, driver 270.XX). Could anybody compile mfaktc for Win7 with 4.0? I'd like to test if there's gonna be any speedup.
Is it enough to update drivers to 4.0 if you still have 3.0 sdk?

Luigi
ET_ is offline   Reply With Quote
Old 2011-04-22, 11:46   #764
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

14B16 Posts
Default

Quote:
Originally Posted by ET_ View Post
Is it enough to update drivers to 4.0 if you still have 3.0 sdk?
I only updated the driver and was surprised by the mfaktc output...

Find attached the updated lib from CUDA 4.0.12 RC2 toolkit. It will be needed when somebody does the recompile.
Attached Files
File Type: zip cudart64_40_12.zip (161.3 KB, 103 views)
Brain is offline   Reply With Quote
Old 2011-04-22, 11:59   #765
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We have tested raising "SievePrimes" from 25,000 up to 1,000,000 without any difference in performance. The CPU load and CPU memory usage did not increase. Perhaps we are doing something wrong?
Unless you've modified the code you can't increase SievePrimes to 1,000,000. In mfaktc 0.16 it is limited to 100,000. Did you set SievePrimesAdjust to 0 for your tests? Otherwise mfaktc will adjust SievePrimes automatically during the run and both settings (25k and 100k for SievePrimes) will have the same settings after some time.
Hint: You can see the actually SievePrimes in the "per class status output".

No matter to what you've set SievePrimes, CPU load will allways be 100% of one core. Memory usage does not depend on SievePrimes.


Oliver
TheJudger is offline   Reply With Quote
Old 2011-04-22, 12:23   #766
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5×103 Posts
Default

Quote:
Originally Posted by Brain View Post
I only updated the driver and was surprised by the mfaktc output...
Brain - how's your performance? I found a drop of roughly 10% when I upgraded win7 64bit driver to ver 270+

-- Craig
nucleon is offline   Reply With Quote
Old 2011-04-22, 12:42   #767
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5·359 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Unless you've modified the code you can't increase SievePrimes to 1,000,000. In mfaktc 0.16 it is limited to 100,000. Did you set SievePrimesAdjust to 0 for your tests? Otherwise mfaktc will adjust SievePrimes automatically during the run and both settings (25k and 100k for SievePrimes) will have the same settings after some time.
Hint: You can see the actually SievePrimes in the "per class status output".

No matter to what you've set SievePrimes, CPU load will allways be 100% of one core. Memory usage does not depend on SievePrimes.


Oliver
Oliver:
xyzzy is probably no more immune to typos than the rest of us, and it's easy to confuse a 1 followed by 5 0s with a 1 followed by 6 zeros unless a thousands separator is used.
Remember that you were in the process of making mfaktc do sleeps to the operating system instead of busy-waits, but we don't have that yet, so to first order, if the CPU is keeping the GPU fed, changing SievePrimes will make no difference in performance.
My windows machine with the slow GPU needs that upgrade still. I have to get the i7 ubuntu machine with GTX570 on order still.
Eric
Christenson is offline   Reply With Quote
Old 2011-04-22, 13:07   #768
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

111110 Posts
Default

Hi Eric,

Quote:
Originally Posted by Christenson View Post
Remember that you were in the process of making mfaktc do sleeps to the operating system instead of busy-waits, but we don't have that yet, so to first order, if the CPU is keeping the GPU fed, changing SievePrimes will make no difference in performance.
Depends how you define performance. If your CPU can keep the GPU busy all the time the GPU with different values of SievePrimes the GPU throughput remains the same. But a higher SievePrimes removes more candidates before GPU work is done. If a CPU can keep the GPU busy with a higher SievePrimes the runtime per class/exponent will be lower while the GPU rate remains the same.

Oliver
TheJudger is offline   Reply With Quote
Old 2011-04-22, 13:44   #769
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default

Quote:
Originally Posted by nucleon View Post
Brain - how's your performance? I found a drop of roughly 10% when I upgraded win7 64bit driver to ver 270+
I cannot see any great differences compared with my last benchmark run: see attached file. (Win7 64bit)
Attached Thumbnails
Click image for larger version

Name:	mfaktc_0.16_Screeny_CUDA40_Bench_Cropped.png
Views:	119
Size:	130.4 KB
ID:	6526  
Brain is offline   Reply With Quote
Old 2011-04-22, 16:05   #770
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

5138 Posts
Default

Running 2 instances only, I can confirm a slight drop of 5 to 10%.
Brain is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 13:21.


Mon Aug 2 13:21:00 UTC 2021 up 10 days, 7:49, 0 users, load averages: 1.95, 2.10, 2.01

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.