mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-08-09, 07:56   #1948
Robert_JD
 
Robert_JD's Avatar
 
Sep 2010
So Cal

2×52 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
2.55 ms/iter vs 2.94 ms/iter(M47).
I'm using the latest and greatest WHQL of 320.49.
It must be the toolkit, not the drivers.
Neat, thanks!

Notice the difference, btw:
Code:
26.09.2012  00:46        26,093,928 cufft64_50_35.dll
11.07.2013  14:06        74,730,784 cufft64_55.dll
Yeah, I'm not that surprised regarding the latest Cuda toolkit official support of arch sm_35 would be a factor in increased throughput.
Robert_JD is offline   Reply With Quote
Old 2013-09-20, 15:23   #1949
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

There's a 327.23 Forceware available with a WHQL cert.
So far so good.
Karl M Johnson is offline   Reply With Quote
Old 2013-09-26, 17:36   #1950
Nipal
 
Nipal's Avatar
 
"Vladimir"
Sep 2013
Russia

C16 Posts
Default

Where did you download "cufft64_55.dll" ?
Nipal is offline   Reply With Quote
Old 2013-09-26, 18:10   #1951
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

65438 Posts
Default

Quote:
Originally Posted by Nipal View Post
Where did you download "cufft64_55.dll" ?
They are contained in the CUDA Toolkit, but for end users the DLLs are more readily available from my site:
http://download.mersenne.ca/CUDAPm1/
James Heinrich is offline   Reply With Quote
Old 2013-09-29, 03:16   #1952
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

41110 Posts
Default

Is it just me or did LL iteration timings got very stable?
Check the several hundred thousand iterations from my latest assignment, where the difference between each ten thousand iterations is up to +-0.0002ms.
Of course, it only happens when the computer is idle, but still hooked up to the monitor.
Code:
Iteration 100000 M( 64805113 )C, 0x18b0f00abb99d3f4, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:30 real, 3.0452 ms/iter, ETA 54:43:44)
Iteration 110000 M( 64805113 )C, 0x6f27dd99b13938b5, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:31 real, 3.0452 ms/iter, ETA 54:43:12)
Iteration 120000 M( 64805113 )C, 0xcf4a8317507eeaf3, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:30 real, 3.0452 ms/iter, ETA 54:42:41)
Iteration 130000 M( 64805113 )C, 0x86e6484ad79b494c, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:42:13)
Iteration 140000 M( 64805113 )C, 0x19fed478e449b2e2, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0450 ms/iter, ETA 54:41:29)
Iteration 150000 M( 64805113 )C, 0x9c5d589f19c5503d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:41:17)
Iteration 160000 M( 64805113 )C, 0x825888f60d078fcd, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:40:47)
Iteration 170000 M( 64805113 )C, 0x37beacd26114c04d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:40:16)
Iteration 180000 M( 64805113 )C, 0x16f5a8d8c22484dc, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:39:52)
Iteration 190000 M( 64805113 )C, 0x3fa0368e6bf8340a, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:39:17)
Iteration 200000 M( 64805113 )C, 0x2d6de102809a2b23, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:38:52)
Iteration 210000 M( 64805113 )C, 0x53d3437a65c5a3e4, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:38:19)
Iteration 220000 M( 64805113 )C, 0x5faf1dab8c8b256c, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:37:49)
Iteration 230000 M( 64805113 )C, 0xdc1482a76e83f687, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0451 ms/iter, ETA 54:37:04)
Iteration 240000 M( 64805113 )C, 0xec301d099bf46f2a, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:36:45)
Iteration 250000 M( 64805113 )C, 0x02d98303e5aadc2f, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:36:20)
Iteration 260000 M( 64805113 )C, 0xe09ece2eb63e9cbd, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:35:38)
Iteration 270000 M( 64805113 )C, 0x2c62ce5814d75190, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0450 ms/iter, ETA 54:34:53)
Iteration 280000 M( 64805113 )C, 0x1fc0351a4a9109a4, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:34:49)
Iteration 290000 M( 64805113 )C, 0xc25a5b393753c4ff, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:34:13)
Iteration 300000 M( 64805113 )C, 0xbfccde3394e09673, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:33:32)
Iteration 310000 M( 64805113 )C, 0x7350af823bd9ed75, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:33:15)
Iteration 320000 M( 64805113 )C, 0xcf8b1ba62275c510, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:32:48)
Iteration 330000 M( 64805113 )C, 0xff0296223c6986f9, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0452 ms/iter, ETA 54:32:06)
Iteration 340000 M( 64805113 )C, 0x4f8495853deb6417, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:31:31)
Iteration 350000 M( 64805113 )C, 0xcdd6e1bd0ecef59d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:31:06)
Iteration 360000 M( 64805113 )C, 0xaea20be130c9dc7b, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:30:46)
Iteration 370000 M( 64805113 )C, 0x289389b1890ed2fa, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0455 ms/iter, ETA 54:30:21)
Iteration 380000 M( 64805113 )C, 0xcace87ee23554ad5, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:29:46)
Iteration 390000 M( 64805113 )C, 0xa06e9fc2bc3ab339, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0452 ms/iter, ETA 54:28:59)
Karl M Johnson is offline   Reply With Quote
Old 2013-09-29, 03:36   #1953
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2·5·293 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Is it just me or did LL iteration timings got very stable?
Check the several hundred thousand iterations from my latest assignment, where the difference between each ten thousand iterations is up to +-0.0002ms.
Of course, it only happens when the computer is idle, but still hooked up to the monitor.
Code:
Iteration 100000 M( 64805113 )C, 0x18b0f00abb99d3f4, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:30 real, 3.0452 ms/iter, ETA 54:43:44)
Iteration 110000 M( 64805113 )C, 0x6f27dd99b13938b5, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:31 real, 3.0452 ms/iter, ETA 54:43:12)
Iteration 120000 M( 64805113 )C, 0xcf4a8317507eeaf3, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:30 real, 3.0452 ms/iter, ETA 54:42:41)
Iteration 130000 M( 64805113 )C, 0x86e6484ad79b494c, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:42:13)
Iteration 140000 M( 64805113 )C, 0x19fed478e449b2e2, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0450 ms/iter, ETA 54:41:29)
Iteration 150000 M( 64805113 )C, 0x9c5d589f19c5503d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:41:17)
Iteration 160000 M( 64805113 )C, 0x825888f60d078fcd, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:40:47)
Iteration 170000 M( 64805113 )C, 0x37beacd26114c04d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:40:16)
Iteration 180000 M( 64805113 )C, 0x16f5a8d8c22484dc, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:39:52)
Iteration 190000 M( 64805113 )C, 0x3fa0368e6bf8340a, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:39:17)
Iteration 200000 M( 64805113 )C, 0x2d6de102809a2b23, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:38:52)
Iteration 210000 M( 64805113 )C, 0x53d3437a65c5a3e4, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:38:19)
Iteration 220000 M( 64805113 )C, 0x5faf1dab8c8b256c, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:37:49)
Iteration 230000 M( 64805113 )C, 0xdc1482a76e83f687, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0451 ms/iter, ETA 54:37:04)
Iteration 240000 M( 64805113 )C, 0xec301d099bf46f2a, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:36:45)
Iteration 250000 M( 64805113 )C, 0x02d98303e5aadc2f, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:36:20)
Iteration 260000 M( 64805113 )C, 0xe09ece2eb63e9cbd, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:35:38)
Iteration 270000 M( 64805113 )C, 0x2c62ce5814d75190, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0450 ms/iter, ETA 54:34:53)
Iteration 280000 M( 64805113 )C, 0x1fc0351a4a9109a4, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:34:49)
Iteration 290000 M( 64805113 )C, 0xc25a5b393753c4ff, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:34:13)
Iteration 300000 M( 64805113 )C, 0xbfccde3394e09673, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:33:32)
Iteration 310000 M( 64805113 )C, 0x7350af823bd9ed75, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:33:15)
Iteration 320000 M( 64805113 )C, 0xcf8b1ba62275c510, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:32:48)
Iteration 330000 M( 64805113 )C, 0xff0296223c6986f9, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0452 ms/iter, ETA 54:32:06)
Iteration 340000 M( 64805113 )C, 0x4f8495853deb6417, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:31:31)
Iteration 350000 M( 64805113 )C, 0xcdd6e1bd0ecef59d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:31:06)
Iteration 360000 M( 64805113 )C, 0xaea20be130c9dc7b, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:30:46)
Iteration 370000 M( 64805113 )C, 0x289389b1890ed2fa, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0455 ms/iter, ETA 54:30:21)
Iteration 380000 M( 64805113 )C, 0xcace87ee23554ad5, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:29:46)
Iteration 390000 M( 64805113 )C, 0xa06e9fc2bc3ab339, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0452 ms/iter, ETA 54:28:59)
I see the same thing. Timings are very stable when not using the machine.
Mark Rose is offline   Reply With Quote
Old 2013-09-30, 02:35   #1954
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3×3,221 Posts
Default

It was like that from the beginning, even for versions with powers of two. If you have patience to go through the last 100 pages of this topic, you will find repeated discussions where I was arguing that "non-constant" times means that something is wrong with your system (from the simple "non well-balanced", like CPU-bottle-necked, too much GPU power for the CPU you have, bad settings for priorities and/or affinities, to the more serious things like heat problems, throttling, etc).
LaurV is offline   Reply With Quote
Old 2013-09-30, 10:14   #1955
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

Quote:
Originally Posted by LaurV View Post
It was like that from the beginning, even for versions with powers of two. If you have patience to go through the last 100 pages of this topic, you will find repeated discussions where I was arguing that "non-constant" times means that something is wrong with your system (from the simple "non well-balanced", like CPU-bottle-necked, too much GPU power for the CPU you have, bad settings for priorities and/or affinities, to the more serious things like heat problems, throttling, etc).
Must have been the throttling in my case.
Karl M Johnson is offline   Reply With Quote
Old 2013-10-04, 09:14   #1956
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

By the way, why is this thread not sticky?

Also, I've noticed a curious behavior of CUDALucas: after you properly close it and restart from the latest checkpoint, the error rate is lower than it was.
Are there any explanations to this?

Last fiddled with by Karl M Johnson on 2013-10-04 at 09:36
Karl M Johnson is offline   Reply With Quote
Old 2013-10-04, 10:04   #1957
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

25BF16 Posts
Default

That is normal. The error rate you see is the MAXIMUM registered since the program started. So that some iteration producing crap will not escape un-notified. Also, the error is only checked after a bunch of iterations, and not at each iteration, as checking the error at every iteration is costly. To check error at every iteration (I always highly recommended it!), launch the program with "-t" switch. You will notify a time penalty of 1% to 10% (depending on your card), and you will also notify a "curious behavior": your error grows faster in the beginning (because it is checked at every iteration, and the maximum error is kept on screen, at every 10k or so, when you display the log). Launch with "-c 100" to see what's going on (only for fun, or didactic purpose, otherwise printing on the screen wastes a lot of time): the error never "decrease", only "increase", because always the maximum error is kept. This is normal.

edit: To clarify: the "-t" is highly recommended for first time tests. It will save you headache later. I don't recommend "-t" for DC, (especially on fast cards) where the best "error check" would be the final residue matching the original test. In case of a "non-matching", you didn't lose much time (~15 hours for a 30M DC on a gtx580, for example). On long run, "-t" for DC on fast cards may be counter-productive. Example: if your penalty with -t is about 7%, and you run one DC in x hours, then you will run 20 DCs without -t in 20x hours. If one DC produced a mismatch and you re-run a test watching the residues you had at the initial run, you will lose another x hours, so at the end you spent 21x hours to clear 20 exponents. If you run with -t, you may catch the error immediately, therefore resuming and don't lose any time, but because each test is 7% slower, you will need 20*(x+7%) hours to clear 20 exponents, which is about 21.4x, of course longer. And - like it or nor - you WILL produce bad residues, no matter how good your card you think it is. One in 20 is reasonable, statistically.

Therefore, for first-time tests, you will be sorry when I will find that prime that you missed because your card produced a bad residue, which escaped undetected because you didn't want to lose time with "-t"

Last fiddled with by LaurV on 2013-10-04 at 10:22
LaurV is offline   Reply With Quote
Old 2013-10-04, 10:29   #1958
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

6338 Posts
Default

Good explanation, LaurV.
I always run with the -t flag, as both the memory and the GPU is overclocked.
Karl M Johnson is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 14:56.


Fri Aug 6 14:56:09 UTC 2021 up 14 days, 9:25, 1 user, load averages: 2.39, 2.78, 2.82

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.