![]() |
It would seem that despite the severely limited fp64 performance of high-end Pascal GPUs like 1080 and Titan X, they're a match for the Kepler Titans after all.
I suspect it has something to do with higher memory clocks & bandwidth, two generations gap and lower fab :smile: Of course, the GP100 PCI-E Tesla GPU, which is scheduled to be released Q4 2016, should be even better at dp fp calculus, given its fp64 performance equal to 1/2 of fp32, but it will be pricey as usual. |
[QUOTE=Karl M Johnson;440823]Cheers!
Some benchmark data with the new version below. It looks consistent with airsquirrels' timings (3.65ms/iter on same GPU and Mersenne candidate), but is a bit faster due to factory overclocking. Not quite sure why, but the drivers do not select the max clock profile for the memory, so I had to manually overclock it to default value (5.006GHz) Robert_JD, would you kindly benchmark your Titan X on M76042667 with 4096K FFT size, for comparison? [CODE] | Aug 27 06:44:21 | M76042667 34000 0x166818e8f1bcad6d | 4096K 0.21875 3.3098 6.61s | 2:23:46:54 0.04% | | Aug 27 06:44:28 | M76042667 36000 0x41398e04d2474533 | 4096K 0.21875 3.3072 6.61s | 2:23:40:16 0.04% | | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Aug 27 06:44:34 | M76042667 38000 0x6c6d2d1ec54c8443 | 4096K 0.21875 3.3109 6.62s | 2:23:34:35 0.04% | | Aug 27 06:44:41 | M76042667 40000 0xa1b6980c147b1945 | 4096K 0.21875 3.3098 6.61s | 2:23:29:23 0.05% | | Aug 27 06:44:47 | M76042667 42000 0x3952b7b74e553555 | 4096K 0.21875 3.3079 6.61s | 2:23:24:33 0.05% | [/CODE][/QUOTE] Hello, Karl!) Wow, nice result! Could you please do benchmark for the 332220523 ([url]http://www.mersenneforum.org/showthread.php?t=13185[/url]). |
Hiya Lorenzo!
Here's the benchmark data: [CODE] Using threads: square 256, splice 256. Starting M332220523 fft length = 19208K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Aug 27 13:02:36 | M332220523 3000 0x3821367649db99f0 | 19208K 0.11328 18.3241 0:19 | 72:16:28:31 0.00% | | Aug 27 13:02:54 | M332220523 4000 0xd257f742cbc095ef | 19208K 0.11328 18.3239 0:18 | 72:03:05:48 0.00% | | Aug 27 13:03:13 | M332220523 5000 0xb6a474948e0d0da4 | 19208K 0.12109 18.3275 0:18 | 71:19:08:00 0.00% | | Aug 27 13:03:31 | M332220523 6000 0x553f2f945aac8b5f | 19208K 0.12109 18.3260 0:19 | 71:13:48:01 0.00% | | Aug 27 13:03:49 | M332220523 7000 0x6f41d82484cef29c | 19208K 0.12109 18.3255 0:18 | 71:09:58:59 0.00% | | Aug 27 13:04:08 | M332220523 8000 0x1fa05e9cbb4a8709 | 19208K 0.12109 18.3265 0:18 | 71:07:07:49 0.00% | | Aug 27 13:04:26 | M332220523 9000 0x53b63e5e6e665bf6 | 19208K 0.12109 18.3228 0:19 | 71:04:52:21 0.00% | [/CODE]Titan Black had better timings in those ranges (12.77xx ms/iter, according to Robert_JD's benchmark). Guess non-Tesla Pascals can't shine everywhere at once, but it's too early to say without a different compute capability benchmark, namely Titan X (P). |
2 Attachment(s)
[QUOTE=Karl M Johnson;440823]Cheers!
Some benchmark data with the new version below. It looks consistent with airsquirrels' timings (3.65ms/iter on same GPU and Mersenne candidate), but is a bit faster due to factory overclocking. Not quite sure why, but the drivers do not select the max clock profile for the memory, so I had to manually overclock it to default value (5.006GHz) Robert_JD, would you kindly benchmark your Titan X on M76042667 with 4096K FFT size, for comparison? Here are the DLLs, if anyone else wants to give this new binary a go: [URL]http://www108.zippyshare.com/v/Wg0AFopm/file.html[/URL] [CODE] | Aug 27 06:44:21 | M76042667 34000 0x166818e8f1bcad6d | 4096K 0.21875 3.3098 6.61s | 2:23:46:54 0.04% | | Aug 27 06:44:28 | M76042667 36000 0x41398e04d2474533 | 4096K 0.21875 3.3072 6.61s | 2:23:40:16 0.04% | | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Aug 27 06:44:34 | M76042667 38000 0x6c6d2d1ec54c8443 | 4096K 0.21875 3.3109 6.62s | 2:23:34:35 0.04% | | Aug 27 06:44:41 | M76042667 40000 0xa1b6980c147b1945 | 4096K 0.21875 3.3098 6.61s | 2:23:29:23 0.05% | | Aug 27 06:44:47 | M76042667 42000 0x3952b7b74e553555 | 4096K 0.21875 3.3079 6.61s | 2:23:24:33 0.05% | [/CODE][/QUOTE] Here are a pair of benchmarks of M76042667 that you requested with an initial default memory clock of 9028, while the second bench was slightly overclocked via NVIDIA Inspector at approximately 9500. Notice the residuals match up. I tried to OC at an even 10000, but unfortunately, all I got were ZERO or near ZERO residuals.:yucky: |
2.516 ms/iter for M76 makes Titan X faster than Titan Black :smile:
And since the residues match, well, congrats, money well spent! [SIZE=1]If only it had 1/2 fp64 performance of fp32, instead of 1/32.[/SIZE] |
[QUOTE=Karl M Johnson;440844]Hiya Lorenzo!
Here's the benchmark data: [CODE] Using threads: square 256, splice 256. Starting M332220523 fft length = 19208K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Aug 27 13:02:36 | M332220523 3000 0x3821367649db99f0 | 19208K 0.11328 18.3241 0:19 | 72:16:28:31 0.00% | | Aug 27 13:02:54 | M332220523 4000 0xd257f742cbc095ef | 19208K 0.11328 18.3239 0:18 | 72:03:05:48 0.00% | | Aug 27 13:03:13 | M332220523 5000 0xb6a474948e0d0da4 | 19208K 0.12109 18.3275 0:18 | 71:19:08:00 0.00% | | Aug 27 13:03:31 | M332220523 6000 0x553f2f945aac8b5f | 19208K 0.12109 18.3260 0:19 | 71:13:48:01 0.00% | | Aug 27 13:03:49 | M332220523 7000 0x6f41d82484cef29c | 19208K 0.12109 18.3255 0:18 | 71:09:58:59 0.00% | | Aug 27 13:04:08 | M332220523 8000 0x1fa05e9cbb4a8709 | 19208K 0.12109 18.3265 0:18 | 71:07:07:49 0.00% | | Aug 27 13:04:26 | M332220523 9000 0x53b63e5e6e665bf6 | 19208K 0.12109 18.3228 0:19 | 71:04:52:21 0.00% | [/CODE]Titan Black had better timings in those ranges (12.77xx ms/iter, according to Robert_JD's benchmark). Guess non-Tesla Pascals can't shine everywhere at once, but it's too early to say without a different compute capability benchmark, namely Titan X (P).[/QUOTE] Yes, looks not so good as for the 4096 length. Anyway thank you very much!) |
1 Attachment(s)
[QUOTE=Karl M Johnson;440844]Hiya Lorenzo!
Here's the benchmark data: [CODE] Using threads: square 256, splice 256. Starting M332220523 fft length = 19208K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Aug 27 13:02:36 | M332220523 3000 0x3821367649db99f0 | 19208K 0.11328 18.3241 0:19 | 72:16:28:31 0.00% | | Aug 27 13:02:54 | M332220523 4000 0xd257f742cbc095ef | 19208K 0.11328 18.3239 0:18 | 72:03:05:48 0.00% | | Aug 27 13:03:13 | M332220523 5000 0xb6a474948e0d0da4 | 19208K 0.12109 18.3275 0:18 | 71:19:08:00 0.00% | | Aug 27 13:03:31 | M332220523 6000 0x553f2f945aac8b5f | 19208K 0.12109 18.3260 0:19 | 71:13:48:01 0.00% | | Aug 27 13:03:49 | M332220523 7000 0x6f41d82484cef29c | 19208K 0.12109 18.3255 0:18 | 71:09:58:59 0.00% | | Aug 27 13:04:08 | M332220523 8000 0x1fa05e9cbb4a8709 | 19208K 0.12109 18.3265 0:18 | 71:07:07:49 0.00% | | Aug 27 13:04:26 | M332220523 9000 0x53b63e5e6e665bf6 | 19208K 0.12109 18.3228 0:19 | 71:04:52:21 0.00% | [/CODE]Titan Black had better timings in those ranges (12.77xx ms/iter, according to Robert_JD's benchmark). Guess non-Tesla Pascals can't shine everywhere at once, but it's too early to say without a different compute capability benchmark, namely Titan X (P).[/QUOTE] I managed to benchmark M332220523 just out of curiosity to discover just how much faster or slower the Titan X Pascal is compared to Titan Black Keplers. The latter is demonstrably faster. |
I can confirm that the binary [URL="http://mersenneforum.org/member.php?u=10788"]Robert_JD[/URL] kindly provided indeed works and gives correct results on stable hardware.
|
[QUOTE=Karl M Johnson;440961]I can confirm that the binary [URL="http://mersenneforum.org/member.php?u=10788"]Robert_JD[/URL] kindly provided indeed works and gives correct results on stable hardware.[/QUOTE]
Since I updated to the latest driver my results do seem to be consistently correct. Previously the errors were intermittent so it will take some time before I have confidence in this. This is on two different 1080 Founders edition systems. |
[QUOTE=airsquirrels;440983]Since I updated to the latest driver my results do seem to be consistently correct. Previously the errors were intermittent so it will take some time before I have confidence in this. This is on two different 1080 Founders edition systems.[/QUOTE]
Anybody tested mfaktc for the 980? smile: |
(tangential to the current discussion)
I recently tried again to find settings that would work on my MSI 580. I got good -memtest results, but it never finished -r 1. I went as low as 775 MHz core (Factory OC 833), 1500 MHz RAM, with substantially increased Vcore. What I don't know is if the string of error 30 results which ended these attempts were error-driven, or if this was just another timeout situation. This card is still great with mfaktc, and can run at 900 MHz or higher doing that. Meanwhile, the faithful Gigabyte 460 is OC to 833 MHz core, RAM clocked down to 1700. It has churned out a 40.9M DC every 3.5 days for weeks, with perhaps one mismatch, which I haven't checked back on. It never does the timeout thing. Is that only 580s? I just ran -r 1 again, with the settings above. It lasted some minutes, but ended this way: [QUOTE]Using threads: square 32, splice 256. Starting self test M58404433 fft length = 3150K Running careful round off test for 1000 iterations. If average error > 0.25, or maximum error > 0.35, the test will restart with a longer FFT. Iteration 100, average error = 0.16701, max error = 0.24219 Iteration 200, average error = 0.19105, max error = 0.25000 Iteration 300, average error = 0.19933, max error = 0.25781 Iteration 400, average error = 0.20379, max error = 0.25781 Iteration 500, average error = 0.20633, max error = 0.26563 Iteration 600, average error = 0.20793, max error = 0.25781 Iteration 700, average error = 0.20877, max error = 0.25781 CUDALucas.cu(1878) : cudaSafeCall() Runtime API error 30: unknown error. CUDALucas.cu(1532) : cudaSafeCall() Runtime API error 30: unknown error. E:\CUDA\CUDALucas 2.05.1\CUDALucas 2.05.1>[/QUOTE]I also got the "driver has stopped working and recovered" message. I wish I could get this card to do CuLu, but previous experience has not been good, with proven bad results on some runs. |
| All times are UTC. The time now is 22:55. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.