![]() |
[QUOTE=Karl M Johnson;348836]2.55 ms/iter vs 2.94 ms/iter(M47).
I'm using the latest and greatest WHQL of 320.49. It must be the toolkit, not the drivers. Neat, thanks! Notice the difference, btw: [CODE]26.09.2012 00:46 26,093,928 cufft64_50_35.dll 11.07.2013 14:06 74,730,784 cufft64_55.dll[/CODE][/QUOTE] Yeah, I'm not that surprised regarding the latest Cuda toolkit official support of arch sm_35 would be a factor in increased throughput. :smile: |
There's a 327.23 Forceware available with a WHQL cert.
So far so good. |
Where did you download "cufft64_55.dll" ?
|
[QUOTE=Nipal;354280]Where did you download "cufft64_55.dll" ?[/QUOTE]They are contained in the [url=https://developer.nvidia.com/cuda-downloads]CUDA Toolkit[/url], but for end users the DLLs are more readily available from my site:
[url]http://download.mersenne.ca/CUDAPm1/[/url] |
Is it just me or did LL iteration timings got very stable?
Check the several hundred thousand iterations from my latest assignment, where the difference between each ten thousand iterations is up to +-0.0002ms. Of course, it only happens when the computer is idle, but still hooked up to the monitor. [CODE]Iteration 100000 M( 64805113 )C, 0x18b0f00abb99d3f4, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:30 real, 3.0452 ms/iter, ETA 54:43:44) Iteration 110000 M( 64805113 )C, 0x6f27dd99b13938b5, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:31 real, 3.0452 ms/iter, ETA 54:43:12) Iteration 120000 M( 64805113 )C, 0xcf4a8317507eeaf3, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:30 real, 3.0452 ms/iter, ETA 54:42:41) Iteration 130000 M( 64805113 )C, 0x86e6484ad79b494c, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:42:13) Iteration 140000 M( 64805113 )C, 0x19fed478e449b2e2, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0450 ms/iter, ETA 54:41:29) Iteration 150000 M( 64805113 )C, 0x9c5d589f19c5503d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:41:17) Iteration 160000 M( 64805113 )C, 0x825888f60d078fcd, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:40:47) Iteration 170000 M( 64805113 )C, 0x37beacd26114c04d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:40:16) Iteration 180000 M( 64805113 )C, 0x16f5a8d8c22484dc, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:39:52) Iteration 190000 M( 64805113 )C, 0x3fa0368e6bf8340a, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:39:17) Iteration 200000 M( 64805113 )C, 0x2d6de102809a2b23, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:38:52) Iteration 210000 M( 64805113 )C, 0x53d3437a65c5a3e4, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:38:19) Iteration 220000 M( 64805113 )C, 0x5faf1dab8c8b256c, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:37:49) Iteration 230000 M( 64805113 )C, 0xdc1482a76e83f687, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0451 ms/iter, ETA 54:37:04) Iteration 240000 M( 64805113 )C, 0xec301d099bf46f2a, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:36:45) Iteration 250000 M( 64805113 )C, 0x02d98303e5aadc2f, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:36:20) Iteration 260000 M( 64805113 )C, 0xe09ece2eb63e9cbd, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:35:38) Iteration 270000 M( 64805113 )C, 0x2c62ce5814d75190, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0450 ms/iter, ETA 54:34:53) Iteration 280000 M( 64805113 )C, 0x1fc0351a4a9109a4, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:34:49) Iteration 290000 M( 64805113 )C, 0xc25a5b393753c4ff, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:34:13) Iteration 300000 M( 64805113 )C, 0xbfccde3394e09673, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:33:32) Iteration 310000 M( 64805113 )C, 0x7350af823bd9ed75, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:33:15) Iteration 320000 M( 64805113 )C, 0xcf8b1ba62275c510, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:32:48) Iteration 330000 M( 64805113 )C, 0xff0296223c6986f9, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0452 ms/iter, ETA 54:32:06) Iteration 340000 M( 64805113 )C, 0x4f8495853deb6417, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:31:31) Iteration 350000 M( 64805113 )C, 0xcdd6e1bd0ecef59d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:31:06) Iteration 360000 M( 64805113 )C, 0xaea20be130c9dc7b, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:30:46) Iteration 370000 M( 64805113 )C, 0x289389b1890ed2fa, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0455 ms/iter, ETA 54:30:21) Iteration 380000 M( 64805113 )C, 0xcace87ee23554ad5, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:29:46) Iteration 390000 M( 64805113 )C, 0xa06e9fc2bc3ab339, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0452 ms/iter, ETA 54:28:59)[/CODE] |
[QUOTE=Karl M Johnson;354483]Is it just me or did LL iteration timings got very stable?
Check the several hundred thousand iterations from my latest assignment, where the difference between each ten thousand iterations is up to +-0.0002ms. Of course, it only happens when the computer is idle, but still hooked up to the monitor. [CODE]Iteration 100000 M( 64805113 )C, 0x18b0f00abb99d3f4, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:30 real, 3.0452 ms/iter, ETA 54:43:44) Iteration 110000 M( 64805113 )C, 0x6f27dd99b13938b5, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:31 real, 3.0452 ms/iter, ETA 54:43:12) Iteration 120000 M( 64805113 )C, 0xcf4a8317507eeaf3, n = 3686400, CUDALucas v2.03 err = 0.1333 (0:30 real, 3.0452 ms/iter, ETA 54:42:41) Iteration 130000 M( 64805113 )C, 0x86e6484ad79b494c, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:42:13) Iteration 140000 M( 64805113 )C, 0x19fed478e449b2e2, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0450 ms/iter, ETA 54:41:29) Iteration 150000 M( 64805113 )C, 0x9c5d589f19c5503d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:41:17) Iteration 160000 M( 64805113 )C, 0x825888f60d078fcd, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:40:47) Iteration 170000 M( 64805113 )C, 0x37beacd26114c04d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:40:16) Iteration 180000 M( 64805113 )C, 0x16f5a8d8c22484dc, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:39:52) Iteration 190000 M( 64805113 )C, 0x3fa0368e6bf8340a, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0453 ms/iter, ETA 54:39:17) Iteration 200000 M( 64805113 )C, 0x2d6de102809a2b23, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:38:52) Iteration 210000 M( 64805113 )C, 0x53d3437a65c5a3e4, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:38:19) Iteration 220000 M( 64805113 )C, 0x5faf1dab8c8b256c, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:37:49) Iteration 230000 M( 64805113 )C, 0xdc1482a76e83f687, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0451 ms/iter, ETA 54:37:04) Iteration 240000 M( 64805113 )C, 0xec301d099bf46f2a, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:36:45) Iteration 250000 M( 64805113 )C, 0x02d98303e5aadc2f, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:36:20) Iteration 260000 M( 64805113 )C, 0xe09ece2eb63e9cbd, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:35:38) Iteration 270000 M( 64805113 )C, 0x2c62ce5814d75190, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0450 ms/iter, ETA 54:34:53) Iteration 280000 M( 64805113 )C, 0x1fc0351a4a9109a4, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:34:49) Iteration 290000 M( 64805113 )C, 0xc25a5b393753c4ff, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:34:13) Iteration 300000 M( 64805113 )C, 0xbfccde3394e09673, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:33:32) Iteration 310000 M( 64805113 )C, 0x7350af823bd9ed75, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0454 ms/iter, ETA 54:33:15) Iteration 320000 M( 64805113 )C, 0xcf8b1ba62275c510, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:32:48) Iteration 330000 M( 64805113 )C, 0xff0296223c6986f9, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0452 ms/iter, ETA 54:32:06) Iteration 340000 M( 64805113 )C, 0x4f8495853deb6417, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0452 ms/iter, ETA 54:31:31) Iteration 350000 M( 64805113 )C, 0xcdd6e1bd0ecef59d, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0453 ms/iter, ETA 54:31:06) Iteration 360000 M( 64805113 )C, 0xaea20be130c9dc7b, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:30:46) Iteration 370000 M( 64805113 )C, 0x289389b1890ed2fa, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0455 ms/iter, ETA 54:30:21) Iteration 380000 M( 64805113 )C, 0xcace87ee23554ad5, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:31 real, 3.0454 ms/iter, ETA 54:29:46) Iteration 390000 M( 64805113 )C, 0xa06e9fc2bc3ab339, n = 3686400, CUDALucas v2.03 err = 0.1367 (0:30 real, 3.0452 ms/iter, ETA 54:28:59)[/CODE][/QUOTE] I see the same thing. Timings are very stable when not using the machine. |
It was like that from the beginning, even for versions with powers of two. If you have patience to go through the last 100 pages of this topic, you will find repeated discussions where I was arguing that "non-constant" times means that something is wrong with your system (from the simple "non well-balanced", like CPU-bottle-necked, too much GPU power for the CPU you have, bad settings for priorities and/or affinities, to the more serious things like heat problems, throttling, etc).
|
[QUOTE=LaurV;354554]It was like that from the beginning, even for versions with powers of two. If you have patience to go through the last 100 pages of this topic, you will find repeated discussions where I was arguing that "non-constant" times means that something is wrong with your system (from the simple "non well-balanced", like CPU-bottle-necked, too much GPU power for the CPU you have, bad settings for priorities and/or affinities, to the more serious things like heat problems, throttling, etc).[/QUOTE]
Must have been the throttling in my case. |
By the way, why is this thread not sticky?:smile:
Also, I've noticed a curious behavior of CUDALucas: after you properly close it and restart from the latest checkpoint, the error rate is lower than it was. Are there any explanations to this? |
That is normal. The error rate you see is the MAXIMUM registered since the program started. So that some iteration producing crap will not escape un-notified. Also, the error is only checked after a bunch of iterations, and not at each iteration, as checking the error at every iteration is costly. To check error at every iteration (I always highly recommended it!), launch the program with "-t" switch. You will notify a time penalty of 1% to 10% (depending on your card), and you will also notify a "curious behavior": your error grows faster in the beginning (because it is checked at every iteration, and the maximum error is kept on screen, at every 10k or so, when you display the log). Launch with "-c 100" to see what's going on (only for fun, or didactic purpose, otherwise printing on the screen wastes a lot of time): the error never "decrease", only "increase", because always the maximum error is kept. This is normal.
edit: To clarify: the "-t" is highly recommended for first time tests. It will save you headache later. I don't recommend "-t" for DC, (especially on fast cards) where the best "error check" would be the final residue matching the original test. In case of a "non-matching", you didn't lose much time (~15 hours for a 30M DC on a gtx580, for example). On long run, "-t" for DC on fast cards may be counter-productive. Example: if your penalty with -t is about 7%, and you run one DC in x hours, then you will run 20 DCs without -t in 20x hours. If one DC produced a mismatch and you re-run a test watching the residues you had at the initial run, you will lose another x hours, so at the end you spent 21x hours to clear 20 exponents. If you run with -t, you may catch the error immediately, therefore resuming and don't lose any time, but because each test is 7% slower, you will need 20*(x+7%) hours to clear 20 exponents, which is about 21.4x, of course longer. And - like it or nor - you WILL produce bad residues, no matter how good your card you think it is. One in 20 is reasonable, statistically. Therefore, for first-time tests, you will be sorry when I will find that prime that you missed because your card produced a bad residue, which escaped undetected because you didn't want to lose time with "-t" :razz: |
Good explanation, [B]LaurV[/B].
I always run with the -t flag, as both the memory and the GPU is overclocked. |
| All times are UTC. The time now is 23:11. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.