mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2016-08-27, 05:52   #2509
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

19B16 Posts
Default

It would seem that despite the severely limited fp64 performance of high-end Pascal GPUs like 1080 and Titan X, they're a match for the Kepler Titans after all.
I suspect it has something to do with higher memory clocks & bandwidth, two generations gap and lower fab
Of course, the GP100 PCI-E Tesla GPU, which is scheduled to be released Q4 2016, should be even better at dp fp calculus, given its fp64 performance equal to 1/2 of fp32, but it will be pricey as usual.
Karl M Johnson is offline   Reply With Quote
Old 2016-08-27, 08:27   #2510
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2×89 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Cheers!

Some benchmark data with the new version below.
It looks consistent with airsquirrels' timings (3.65ms/iter on same GPU and Mersenne candidate), but is a bit faster due to factory overclocking.

Not quite sure why, but the drivers do not select the max clock profile for the memory, so I had to manually overclock it to default value (5.006GHz)

Robert_JD, would you kindly benchmark your Titan X on M76042667 with 4096K FFT size, for comparison?

Code:
|  Aug 27  06:44:21  |  M76042667     34000  0x166818e8f1bcad6d  |  4096K  0.21875   3.3098    6.61s  |   2:23:46:54   0.04%  |
|  Aug 27  06:44:28  |  M76042667     36000  0x41398e04d2474533  |  4096K  0.21875   3.3072    6.61s  |   2:23:40:16   0.04%  |
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  Aug 27  06:44:34  |  M76042667     38000  0x6c6d2d1ec54c8443  |  4096K  0.21875   3.3109    6.62s  |   2:23:34:35   0.04%  |
|  Aug 27  06:44:41  |  M76042667     40000  0xa1b6980c147b1945  |  4096K  0.21875   3.3098    6.61s  |   2:23:29:23   0.05%  |
|  Aug 27  06:44:47  |  M76042667     42000  0x3952b7b74e553555  |  4096K  0.21875   3.3079    6.61s  |   2:23:24:33   0.05%  |
Hello, Karl!) Wow, nice result! Could you please do benchmark for the 332220523 (http://www.mersenneforum.org/showthread.php?t=13185).
Lorenzo is offline   Reply With Quote
Old 2016-08-27, 10:09   #2511
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

Hiya Lorenzo!

Here's the benchmark data:
Code:
Using threads: square 256, splice 256.
Starting M332220523 fft length = 19208K
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It            Time  |       ETA      Done   |
|  Aug 27  13:02:36  | M332220523      3000  0x3821367649db99f0  | 19208K  0.11328  18.3241         0:19  |  72:16:28:31   0.00%  |
|  Aug 27  13:02:54  | M332220523      4000  0xd257f742cbc095ef  | 19208K  0.11328  18.3239         0:18  |  72:03:05:48   0.00%  |
|  Aug 27  13:03:13  | M332220523      5000  0xb6a474948e0d0da4  | 19208K  0.12109  18.3275         0:18  |  71:19:08:00   0.00%  |
|  Aug 27  13:03:31  | M332220523      6000  0x553f2f945aac8b5f  | 19208K  0.12109  18.3260         0:19  |  71:13:48:01   0.00%  |
|  Aug 27  13:03:49  | M332220523      7000  0x6f41d82484cef29c  | 19208K  0.12109  18.3255         0:18  |  71:09:58:59   0.00%  |
|  Aug 27  13:04:08  | M332220523      8000  0x1fa05e9cbb4a8709  | 19208K  0.12109  18.3265         0:18  |  71:07:07:49   0.00%  |
|  Aug 27  13:04:26  | M332220523      9000  0x53b63e5e6e665bf6  | 19208K  0.12109  18.3228         0:19  |  71:04:52:21   0.00%  |
Titan Black had better timings in those ranges (12.77xx ms/iter, according to Robert_JD's benchmark).
Guess non-Tesla Pascals can't shine everywhere at once, but it's too early to say without a different compute capability benchmark, namely Titan X (P).
Karl M Johnson is offline   Reply With Quote
Old 2016-08-27, 12:09   #2512
Robert_JD
 
Robert_JD's Avatar
 
Sep 2010
So Cal

2·52 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Cheers!

Some benchmark data with the new version below.
It looks consistent with airsquirrels' timings (3.65ms/iter on same GPU and Mersenne candidate), but is a bit faster due to factory overclocking.

Not quite sure why, but the drivers do not select the max clock profile for the memory, so I had to manually overclock it to default value (5.006GHz)

Robert_JD, would you kindly benchmark your Titan X on M76042667 with 4096K FFT size, for comparison?

Here are the DLLs, if anyone else wants to give this new binary a go: http://www108.zippyshare.com/v/Wg0AFopm/file.html



Code:
|  Aug 27  06:44:21  |  M76042667     34000  0x166818e8f1bcad6d  |  4096K  0.21875   3.3098    6.61s  |   2:23:46:54   0.04%  |
|  Aug 27  06:44:28  |  M76042667     36000  0x41398e04d2474533  |  4096K  0.21875   3.3072    6.61s  |   2:23:40:16   0.04%  |
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  Aug 27  06:44:34  |  M76042667     38000  0x6c6d2d1ec54c8443  |  4096K  0.21875   3.3109    6.62s  |   2:23:34:35   0.04%  |
|  Aug 27  06:44:41  |  M76042667     40000  0xa1b6980c147b1945  |  4096K  0.21875   3.3098    6.61s  |   2:23:29:23   0.05%  |
|  Aug 27  06:44:47  |  M76042667     42000  0x3952b7b74e553555  |  4096K  0.21875   3.3079    6.61s  |   2:23:24:33   0.05%  |
Here are a pair of benchmarks of M76042667 that you requested with an initial default memory clock of 9028, while the second bench was slightly overclocked via NVIDIA Inspector at approximately 9500. Notice the residuals match up. I tried to OC at an even 10000, but unfortunately, all I got were ZERO or near ZERO residuals.
Attached Files
File Type: txt M76042667 BenchMark.txt (3.2 KB, 64 views)
File Type: txt M76042667 BenchMark.txt_#2.txt (3.2 KB, 100 views)
Robert_JD is offline   Reply With Quote
Old 2016-08-27, 14:21   #2513
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

6338 Posts
Default

2.516 ms/iter for M76 makes Titan X faster than Titan Black
And since the residues match, well, congrats, money well spent!
If only it had 1/2 fp64 performance of fp32, instead of 1/32.
Karl M Johnson is offline   Reply With Quote
Old 2016-08-27, 21:10   #2514
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2×89 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Hiya Lorenzo!

Here's the benchmark data:
Code:
Using threads: square 256, splice 256.
Starting M332220523 fft length = 19208K
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It            Time  |       ETA      Done   |
|  Aug 27  13:02:36  | M332220523      3000  0x3821367649db99f0  | 19208K  0.11328  18.3241         0:19  |  72:16:28:31   0.00%  |
|  Aug 27  13:02:54  | M332220523      4000  0xd257f742cbc095ef  | 19208K  0.11328  18.3239         0:18  |  72:03:05:48   0.00%  |
|  Aug 27  13:03:13  | M332220523      5000  0xb6a474948e0d0da4  | 19208K  0.12109  18.3275         0:18  |  71:19:08:00   0.00%  |
|  Aug 27  13:03:31  | M332220523      6000  0x553f2f945aac8b5f  | 19208K  0.12109  18.3260         0:19  |  71:13:48:01   0.00%  |
|  Aug 27  13:03:49  | M332220523      7000  0x6f41d82484cef29c  | 19208K  0.12109  18.3255         0:18  |  71:09:58:59   0.00%  |
|  Aug 27  13:04:08  | M332220523      8000  0x1fa05e9cbb4a8709  | 19208K  0.12109  18.3265         0:18  |  71:07:07:49   0.00%  |
|  Aug 27  13:04:26  | M332220523      9000  0x53b63e5e6e665bf6  | 19208K  0.12109  18.3228         0:19  |  71:04:52:21   0.00%  |
Titan Black had better timings in those ranges (12.77xx ms/iter, according to Robert_JD's benchmark).
Guess non-Tesla Pascals can't shine everywhere at once, but it's too early to say without a different compute capability benchmark, namely Titan X (P).
Yes, looks not so good as for the 4096 length. Anyway thank you very much!)
Lorenzo is offline   Reply With Quote
Old 2016-08-28, 02:13   #2515
Robert_JD
 
Robert_JD's Avatar
 
Sep 2010
So Cal

5010 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Hiya Lorenzo!

Here's the benchmark data:
Code:
Using threads: square 256, splice 256.
Starting M332220523 fft length = 19208K
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It            Time  |       ETA      Done   |
|  Aug 27  13:02:36  | M332220523      3000  0x3821367649db99f0  | 19208K  0.11328  18.3241         0:19  |  72:16:28:31   0.00%  |
|  Aug 27  13:02:54  | M332220523      4000  0xd257f742cbc095ef  | 19208K  0.11328  18.3239         0:18  |  72:03:05:48   0.00%  |
|  Aug 27  13:03:13  | M332220523      5000  0xb6a474948e0d0da4  | 19208K  0.12109  18.3275         0:18  |  71:19:08:00   0.00%  |
|  Aug 27  13:03:31  | M332220523      6000  0x553f2f945aac8b5f  | 19208K  0.12109  18.3260         0:19  |  71:13:48:01   0.00%  |
|  Aug 27  13:03:49  | M332220523      7000  0x6f41d82484cef29c  | 19208K  0.12109  18.3255         0:18  |  71:09:58:59   0.00%  |
|  Aug 27  13:04:08  | M332220523      8000  0x1fa05e9cbb4a8709  | 19208K  0.12109  18.3265         0:18  |  71:07:07:49   0.00%  |
|  Aug 27  13:04:26  | M332220523      9000  0x53b63e5e6e665bf6  | 19208K  0.12109  18.3228         0:19  |  71:04:52:21   0.00%  |
Titan Black had better timings in those ranges (12.77xx ms/iter, according to Robert_JD's benchmark).
Guess non-Tesla Pascals can't shine everywhere at once, but it's too early to say without a different compute capability benchmark, namely Titan X (P).
I managed to benchmark M332220523 just out of curiosity to discover just how much faster or slower the Titan X Pascal is compared to Titan Black Keplers. The latter is demonstrably faster.
Attached Files
File Type: txt M332220523_BenchMark.txt (948 Bytes, 77 views)
Robert_JD is offline   Reply With Quote
Old 2016-08-29, 11:39   #2516
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

I can confirm that the binary Robert_JD kindly provided indeed works and gives correct results on stable hardware.
Karl M Johnson is offline   Reply With Quote
Old 2016-08-29, 15:55   #2517
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

10058 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
I can confirm that the binary Robert_JD kindly provided indeed works and gives correct results on stable hardware.
Since I updated to the latest driver my results do seem to be consistently correct. Previously the errors were intermittent so it will take some time before I have confidence in this. This is on two different 1080 Founders edition systems.
airsquirrels is offline   Reply With Quote
Old 2016-08-29, 16:50   #2518
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

32·5·107 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
Since I updated to the latest driver my results do seem to be consistently correct. Previously the errors were intermittent so it will take some time before I have confidence in this. This is on two different 1080 Founders edition systems.
Anybody tested mfaktc for the 980? smile:
ET_ is offline   Reply With Quote
Old 2016-08-31, 19:44   #2519
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

(tangential to the current discussion)
I recently tried again to find settings that would work on my MSI 580. I got good -memtest results, but it never finished -r 1. I went as low as 775 MHz core (Factory OC 833), 1500 MHz RAM, with substantially increased Vcore. What I don't know is if the string of error 30 results which ended these attempts were error-driven, or if this was just another timeout situation. This card is still great with mfaktc, and can run at 900 MHz or higher doing that.

Meanwhile, the faithful Gigabyte 460 is OC to 833 MHz core, RAM clocked down to 1700. It has churned out a 40.9M DC every 3.5 days for weeks, with perhaps one mismatch, which I haven't checked back on. It never does the timeout thing. Is that only 580s?

I just ran -r 1 again, with the settings above. It lasted some minutes, but ended this way:
Quote:
Using threads: square 32, splice 256.
Starting self test M58404433 fft length = 3150K
Running careful round off test for 1000 iterations.
If average error > 0.25, or maximum error > 0.35,
the test will restart with a longer FFT.
Iteration 100, average error = 0.16701, max error = 0.24219
Iteration 200, average error = 0.19105, max error = 0.25000
Iteration 300, average error = 0.19933, max error = 0.25781
Iteration 400, average error = 0.20379, max error = 0.25781
Iteration 500, average error = 0.20633, max error = 0.26563
Iteration 600, average error = 0.20793, max error = 0.25781
Iteration 700, average error = 0.20877, max error = 0.25781
CUDALucas.cu(1878) : cudaSafeCall() Runtime API error 30: unknown error.
CUDALucas.cu(1532) : cudaSafeCall() Runtime API error 30: unknown error.

E:\CUDA\CUDALucas 2.05.1\CUDALucas 2.05.1>
I also got the "driver has stopped working and recovered" message.
I wish I could get this card to do CuLu, but previous experience has not been good, with proven bad results on some runs.

Last fiddled with by kladner on 2016-08-31 at 19:45
kladner is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 20:16.


Fri Jul 16 20:16:07 UTC 2021 up 49 days, 18:03, 1 user, load averages: 2.35, 2.19, 2.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.