mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-01-09, 11:19   #12
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

101111112 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
Ralf Recker,

First of all thank you. Second, can you post the specs of your machine (memory, harddrives, DVD-R, etc)? I want to make some calculations about energy efficiency so I need to know how many and type of components you have on it to make an energy consumption estimate.
A cheap uATX board (Series 3 Chipset), 4 GB DDR2-800 RAM plus a Core 2 Quad 9550 (E0 stepping) @ 3.4 GHz (400 MHz * 8.5) and a MSI Cyclone NGTX 460 (1 GB version, ugly cooler but effective ). The GPU ran at the standard (factory overclocked) settings @ 725 MHz Core / 1450 MHz Shaders / 900 MHz RAM (GDDR 5) @ 1.012V core (currently no working Linux overclocking but the card crunched without any problems @ 850 MHz / 1700 MHz / 1000 MHz under Windows). The temps during the test were around 45-48 degrees Celsius with the case closed.

The Q9550 runs with 1.1625V @ 2.83 GHz stock clock and with 1.20V @ 3.4 GHz. It's possible to run 3.4 GHz at lower voltages but I wanted to eliminate any risk of returning wrong results while crunching on PRP or LLR net.

Last fiddled with by Ralf Recker on 2011-01-09 at 11:32
Ralf Recker is offline   Reply With Quote
Old 2011-01-09, 11:31   #13
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

191 Posts
Default

Quote:
Originally Posted by msft View Post
GTX460:
5*2^1282755+1 is prime! Time : 4491.564 sec.
5*2^1320487+1 is prime! Time : 4447.951 sec.
I have two questions: Which OS and drivers did you use? Was the CPU otherwise idle or under load? The runtime differences (compared to my GTX 460) are significant. If your CPUs were under load it could be caused by GPU starvation...

Last fiddled with by Ralf Recker on 2011-01-09 at 11:31
Ralf Recker is offline   Reply With Quote
Old 2011-01-09, 13:24   #14
msft
 
msft's Avatar
 
Jul 2009
Tokyo

11428 Posts
Default

Dear Mr. Jean Penné
Happy New Year to you! I wish this year will be the happiest and best for you.
Yours sincerely,
Shoichiro Yamada

Last fiddled with by msft on 2011-01-09 at 13:31
msft is offline   Reply With Quote
Old 2011-01-09, 13:30   #15
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Hi ,Ralf Recker
Quote:
Originally Posted by Ralf Recker View Post
I have two questions: Which OS and drivers did you use? Was the CPU otherwise idle or under load? The runtime differences (compared to my GTX 460) are significant. If your CPUs were under load it could be caused by GPU starvation...
OS:Linux ubuntu 2.6.28-19-server
driver:devdriver_3.2_linux_64_260.19.26.run
exec mprime same time.
msft is offline   Reply With Quote
Old 2011-01-09, 13:49   #16
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

191 Posts
Default

Quote:
Originally Posted by msft View Post
Hi ,Ralf Recker

OS:Linux ubuntu 2.6.28-19-server
driver:devdriver_3.2_linux_64_260.19.26.run
exec mprime same time.
I've installed the 260.19.26 dev drivers and the 260.19.29 drivers (and the CUDA SDK 3.2) after my initial test (256.53 drivers and the CUDA SDK 3.1) and experienced a significant slowdown of the (BOINC/PrimeGrid) CWPSieve (CUDA) workunits from 2.5 minutes/WU to around 3.75 minutes/WU (CPU idle) or even 4.5 minutes (CPU under load - Sieving for NFS@Home). Since (older?) versions of Ken-g6's PPSieve (CUDA) and TPSieve (CUDA) are also working much better with the 256 drivers I've downgraded the drivers to 256.53 again (and the SDK to 3.1).

Last fiddled with by Ralf Recker on 2011-01-09 at 13:56
Ralf Recker is offline   Reply With Quote
Old 2011-01-09, 14:12   #17
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

Quote:
Originally Posted by Ralf Recker View Post
I've installed the 260.19.26 dev drivers and the 260.19.29 drivers (and the CUDA SDK 3.2) after my initial test (256.53 drivers and the CUDA SDK 3.1) and experienced a significant slowdown of the (BOINC/PrimeGrid) CWPSieve (CUDA) workunits from 2.5 minutes/WU to around 3.75 minutes/WU (CPU idle) or even 4.5 minutes (CPU under load - Sieving for NFS@Home). Since (older?) versions of Ken-g6's PPSieve (CUDA) and TPSieve (CUDA) are also working much better with the 256 drivers I've downgraded the drivers to 256.53 again (and the SDK to 3.1).
Yes.
NV messed something up in 260.xx .
OpenCL and CUDA(at least the Driver API ones) apps, compiled with 260.xx, dont work on older drivers(like 258.96).
OpenCL suffers major slowdown(up to 25%) in performance, while CUDA up to 5%.
That's my observations on applications I use.
What's curious, using 260.xx and 3.1 toolkit/sdk doesnt solve anything, so it's definitely the drivers.
258.96 are the last proper drivers for windows. I use them
For Linux, yeah, 256.53 are last proper too.

Last fiddled with by Karl M Johnson on 2011-01-09 at 14:18
Karl M Johnson is offline   Reply With Quote
Old 2011-01-09, 15:54   #18
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

$ time ./llrCUDA -q5*2^1282755+1

real 19m42.907s
user 8m58.070s
sys 7m21.650s
$ cat lresults.txt
5*2^1282755+1 is prime! Time : 1182.847 sec.
Attached Files
File Type: gz llrCUDA.0.11.tar.gz (102.8 KB, 143 views)
msft is offline   Reply With Quote
Old 2011-01-09, 16:10   #19
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

191 Posts
Thumbs up

Looks like it reached the speed of a single CPU core

ralf@quadriga ~/llrcuda.0.11 $ time ./llrCUDA -q5*2^1282755+1 -d
Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3
5*2^1282755+1, bit: 40000 / 1282757 [3.11%]. Time per bit: 0.819 ms.

To quote myself:

Quote:
Quick comparison: Time per bit on the CPU: ~0.812 ms.
(Core 2 Quad Q9550 @ 3.6 GHz - Single Core)

Update:

ralf@quadriga ~/llrcuda.0.11 $ time ./llrCUDA -q5*2^1282755+1 -d
Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3
5*2^1282755+1 is prime! Time : 1056.916 sec.

real 17m37.006s
user 4m51.894s
sys 9m27.603s
ralf@quadriga ~/llrcuda.0.11 $ cat lresults.txt
Bit 32105 / 1282757
5*2^1282755+1 is prime! Time : 1056.916 sec.

Last fiddled with by Ralf Recker on 2011-01-09 at 16:29
Ralf Recker is offline   Reply With Quote
Old 2011-01-09, 17:18   #20
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

2778 Posts
Default

Quote:
Originally Posted by Ralf Recker View Post
Looks like it reached the speed of a single CPU core

ralf@quadriga ~/llrcuda.0.11 $ time ./llrCUDA -q5*2^1282755+1 -d
Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3
5*2^1282755+1, bit: 40000 / 1282757 [3.11%]. Time per bit: 0.819 ms.

To quote myself:

(Core 2 Quad Q9550 @ 3.6 GHz - Single Core)

Update:

ralf@quadriga ~/llrcuda.0.11 $ time ./llrCUDA -q5*2^1282755+1 -d
Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3
5*2^1282755+1 is prime! Time : 1056.916 sec.

real 17m37.006s
user 4m51.894s
sys 9m27.603s
ralf@quadriga ~/llrcuda.0.11 $ cat lresults.txt
Bit 32105 / 1282757
5*2^1282755+1 is prime! Time : 1056.916 sec.
Update 2:

Code compiled for sm_20 shows no significant difference:

ralf@quadriga ~/llrcuda.0.11 $ time ./llrCUDA -q5*2^1282755+1 -d
Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3
5*2^1282755+1 is prime! Time : 1053.793 sec.

while code compiled for sm_21 is noticeably slower: Time per bit: 0.865 ms

Last fiddled with by Ralf Recker on 2011-01-09 at 17:20
Ralf Recker is offline   Reply With Quote
Old 2011-01-09, 18:24   #21
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2×5×283 Posts
Default

Quote:
Originally Posted by Ralf Recker View Post
Looks like it reached the speed of a single CPU core
Next step is to be 4x faster than CPU.

EDIT: Where can I find a list of all Nvidia GTX TDP's?

Last fiddled with by em99010pepe on 2011-01-09 at 19:02
em99010pepe is offline   Reply With Quote
Old 2011-01-09, 19:40   #22
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2·5·283 Posts
Default

For

GPU: GTX470 TDP of 215 W
CPU: Q9550 TDP of 121 W ( 4 cores at 3.6 GHz)
CPU timing for 5*2^1282755+1 of 1056 secs

Then

the GPU client should be 7.1 times faster than the CPU version to achieve the same relation Watts/candidates of CPU LLR client in 24 hours. This is the optimal for this pair, Q9550 and GTX470.

Last fiddled with by em99010pepe on 2011-01-09 at 20:00
em99010pepe is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLRcuda shanecruise Riesel Prime Search 8 2014-09-16 02:09
LLRCUDA - getting it to work diep GPU Computing 1 2013-10-02 12:12

All times are UTC. The time now is 09:07.

Sat Jul 11 09:07:54 UTC 2020 up 108 days, 6:40, 0 users, load averages: 1.57, 1.57, 1.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.