![]() |
|
|
#2289 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
342110 Posts |
At the risk of cross-posting, I'll repeat my request here, since it's relevant to discussions in this thread.
It has been brought to my attention that my CUDALucas throughput page is actually quite inaccurate, leading to inaccurate crossover point estimates. Therefore can I please request that everyone please run a quick benchmark for me, I'd like to validate (and update) the lookup table I use. Please run this simple benchmark on a wide variety of GPUs you have available and email the results to james@mersenne.ca (or PM me here if you prefer). Code:
CUDALucas -info -cufftbench 1048576 8388608 1048576 |
|
|
|
|
|
#2290 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11×311 Posts |
Also, please run the benchmark with CUDALucas v2.04 if possible.
|
|
|
|
|
|
#2291 | |
|
Romulan Interpreter
Jun 2011
Thailand
100101101101012 Posts |
Quote:
In fact, what you really need there would be a column with the clock at which the numbers were taken, as the results are different for different clock speeds. Last fiddled with by LaurV on 2013-05-22 at 04:56 |
|
|
|
|
|
|
#2292 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11·311 Posts |
Quote:
I would appreciate your benchmark results (for as many different GPU families as you have), since the data from my own GTX 570 (and other users' results) show significant deviation from my posted benchmark data. |
|
|
|
|
|
|
#2293 |
|
Romulan Interpreter
Jun 2011
Thailand
100101101101012 Posts |
OK then...
Code:
e:\CudaLucas\CL1>cl204b4020x64 -d 1 -info -cufftbench 1048576 8388608 1048576 ------- DEVICE 1 ------- name GeForce GTX 580 totalGlobalMem 1610416128 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 Compatibility 2.0 clockRate (MHz) 1646 textureAlignment 512 deviceOverlap 1 multiProcessorCount 16 CUFFT bench start = 1048576 end = 8388608 distance = 1048576 CUFFT_Z2Z size= 1048576 time= 0.508452 msec CUFFT_Z2Z size= 2097152 time= 1.031502 msec CUFFT_Z2Z size= 3145728 time= 1.925885 msec CUFFT_Z2Z size= 4194304 time= 2.622479 msec CUFFT_Z2Z size= 5242880 time= 3.118799 msec CUFFT_Z2Z size= 6291456 time= 3.944277 msec CUFFT_Z2Z size= 7340032 time= 4.284509 msec CUFFT_Z2Z size= 8388608 time= 5.400013 msec e:\CudaLucas\CL1>cl204b4020x64 -d 1 -info -cufftbench 1048576 8388608 1048576 ------- DEVICE 1 ------- name GeForce GTX 580 totalGlobalMem 1610416128 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 Compatibility 2.0 clockRate (MHz) 1564 textureAlignment 512 deviceOverlap 1 multiProcessorCount 16 CUFFT bench start = 1048576 end = 8388608 distance = 1048576 CUFFT_Z2Z size= 1048576 time= 0.535127 msec CUFFT_Z2Z size= 2097152 time= 1.085725 msec CUFFT_Z2Z size= 3145728 time= 2.021942 msec CUFFT_Z2Z size= 4194304 time= 2.746189 msec CUFFT_Z2Z size= 5242880 time= 3.256758 msec CUFFT_Z2Z size= 6291456 time= 4.151508 msec CUFFT_Z2Z size= 7340032 time= 4.532980 msec CUFFT_Z2Z size= 8388608 time= 5.721727 msec e:\CudaLucas\CL1> |
|
|
|
|
|
#2294 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
65358 Posts |
Thanks. I can no longer edit my post, but my benchmark request now includes a request for the first 20000 iterations of
Code:
CUDALucas 57885161 |
|
|
|
|
|
#2295 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
It would be nice, if there was a graph for Dc and LL, along with P-1 in the monthly, weekly graphs, etc.
|
|
|
|
|
|
#2296 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
100110001001112 Posts |
Quote:
The LL and DC work types were a bit of an afterthought -- I'd actually not even realized that perhaps graphing them might be interesting. I've just taken delivery of seven new servers I need to configure for a client, so this new graph won't be ready for a couple of weeks. But please consider it on my "ToDo" list. |
|
|
|
|
|
|
#2297 | |
|
Romulan Interpreter
Jun 2011
Thailand
72·197 Posts |
Quote:
, then that "polite" switch was wrong, then I had to delete the checkpoints between runs, etc, hehe.. well... I am aging... (I did not want to change my ini files, so I gave cmd line parameters). Therefore the rows with "iteration 30k" and "40k" (last two rows for each test) contain the correct timing (because row "20k" was runned partially with "impolite" switch, till I changed it). The last test is a bit of FFT "tunning", the program selects quite a bad FFT for this expo. On gtx580, the 3160 is much faster then 3200 (even faster then 3072, no joke!)Code:
>cl204b4020x64 -info -d 1 -c 10000 57885161
------- DEVICE 1 -------
name GeForce GTX 580
totalGlobalMem 1610416128
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
Compatibility 2.0
clockRate (MHz) 1564 <<<this is the default, the card is factory OC to 782MHz by Asus
textureAlignment 512
deviceOverlap 1
multiProcessorCount 16
mkdir: cannot create directory `backup1': File exists
Starting M57885161 fft length = 2880K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.50000 >= 0.35, increasing n from 2880K
Starting M57885161 fft length = 3072K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 80 < 1000 && err = 0.35156 >= 0.35, increasing n from 3072K
Starting M57885161 fft length = 3200K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.08149, max error = 0.11719
Iteration 200, average error = 0.09383, max error = 0.11719
Iteration 300, average error = 0.09701, max error = 0.11719
Iteration 400, average error = 0.09848, max error = 0.10938
Iteration 500, average error = 0.09981, max error = 0.11719
Iteration 600, average error = 0.10009, max error = 0.11719
Iteration 700, average error = 0.10052, max error = 0.10938
Iteration 800, average error = 0.10090, max error = 0.11328
Iteration 900, average error = 0.10110, max error = 0.10938
Iteration 1000, average error = 0.10130 < 0.25 (max error = 0.12500), continuing test.
Iteration 10000 M( 57885161 )C, 0x76c27556683cd84d, n = 3200K, CUDALucas v2.04 Beta err = 0.1250 (1:05 real, 6.5045 ms/iter, ETA 104:33:36)
p
-polite 0
Iteration 20000 M( 57885161 )C, 0xfd8e311d20ffe6ab, n = 3200K, CUDALucas v2.04 Beta err = 0.1328 (0:58 real, 5.7869 ms/iter, ETA 93:00:29)
Iteration 30000 M( 57885161 )C, 0xce0d85ab0065a232, n = 3200K, CUDALucas v2.04 Beta err = 0.1289 (0:57 real, 5.6789 ms/iter, ETA 91:15:23)
Iteration 40000 M( 57885161 )C, 0x6746379dfc966410, n = 3200K, CUDALucas v2.04 Beta err = 0.1328 (0:57 real, 5.6825 ms/iter, ETA 91:17:54)
SIGINT caught, writing checkpoint. Estimated time spent so far: 4:32
>cl204b4020x64 -info -d 1 -c 10000 57885161
------- DEVICE 1 -------
<... snip values same as above test...>
clockRate (MHz) 1646
<... snip values same as above test...>
Iteration 900, average error = 0.10110, max error = 0.10938
Iteration 1000, average error = 0.10130 < 0.25 (max error = 0.12500), continuing test.
p
-polite 0
Iteration 10000 M( 57885161 )C, 0x76c27556683cd84d, n = 3200K, CUDALucas v2.04 Beta err = 0.1250 (0:56 real, 5.6304 ms/iter, ETA 90:30:33)
Iteration 20000 M( 57885161 )C, 0xfd8e311d20ffe6ab, n = 3200K, CUDALucas v2.04 Beta err = 0.1328 (0:54 real, 5.3918 ms/iter, ETA 86:39:27)
Iteration 30000 M( 57885161 )C, 0xce0d85ab0065a232, n = 3200K, CUDALucas v2.04 Beta err = 0.1289 (0:54 real, 5.3914 ms/iter, ETA 86:38:10)
Iteration 40000 M( 57885161 )C, 0x6746379dfc966410, n = 3200K, CUDALucas v2.04 Beta err = 0.1328 (0:54 real, 5.3881 ms/iter, ETA 86:34:10)
SIGINT caught, writing checkpoint. Estimated time spent so far: 3:44
>cl204b4020x64 -info -d 1 -c 10000 -f 3136k 57885161
<... snip values same as above test...>
clockRate (MHz) 1646
<... snip values same as above test...>
Starting M57885161 fft length = 3136K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.15533, max error = 0.22656
Iteration 200, average error = 0.18332, max error = 0.23438
Iteration 300, average error = 0.19044, max error = 0.21875
Iteration 400, average error = 0.19509, max error = 0.22803
Iteration 500, average error = 0.19776, max error = 0.23438
Iteration 600, average error = 0.19979, max error = 0.23438
Iteration 700, average error = 0.20043, max error = 0.23438
Iteration 800, average error = 0.20119, max error = 0.22461
Iteration 900, average error = 0.20133, max error = 0.22656
Iteration 1000, average error = 0.20198 < 0.25 (max error = 0.21875), continuing test.
p
-polite 0
Iteration 10000 M( 57885161 )C, 0x76c27556683cd84d, n = 3136K, CUDALucas v2.04 Beta err = 0.2578 (0:55 real, 5.4927 ms/iter, ETA 88:17:43)
t
disabling -t <<<<grrr! I forgot this, I always keep it enabled... I don't have time now to repeat the tests, sorry!
Iteration 20000 M( 57885161 )C, 0xfd8e311d20ffe6ab, n = 3136K, CUDALucas v2.04 Beta err = 0.2539 (0:50 real, 5.0379 ms/iter, ETA 80:58:12)
Iteration 30000 M( 57885161 )C, 0xce0d85ab0065a232, n = 3136K, CUDALucas v2.04 Beta err = 0.2344 (0:50 real, 4.9700 ms/iter, ETA 79:51:55)
Iteration 40000 M( 57885161 )C, 0x6746379dfc966410, n = 3136K, CUDALucas v2.04 Beta err = 0.2178 (0:50 real, 4.9710 ms/iter, ETA 79:52:01)
SIGINT caught, writing checkpoint. Estimated time spent so far: 3:32
>
Last fiddled with by LaurV on 2013-06-01 at 05:56 |
|
|
|
|
|
|
#2298 | |
|
May 2013
East. Always East.
11×157 Posts |
Quote:
Does this mean that an exponent is trial factored to some bit level (I saw a table somewhere a while ago detailing how far to go for a range of exponents), then P-1'ed and THEN released for LL testing? Part of me feels like not enough P-1 gets done to keep up with all the LL-testing. EDIT: Or does "What Makes Sense" take care of it if it becomes an issue? I don't know that I've ever seen "What makes sense" in Prime95 ever generate anything other than LL-tests. Last fiddled with by TheMawn on 2013-06-14 at 01:47 |
|
|
|
|
|
|
#2299 | ||
|
If I May
"Chris Halsall"
Sep 2002
Barbados
262716 Posts |
Quote:
Quote:
We're "riding the wave"....
|
||
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Status | Primeinator | Operation Billion Digits | 5 | 2011-12-06 02:35 |
| 62 bit status | 1997rj7 | Lone Mersenne Hunters | 27 | 2008-09-29 13:52 |
| OBD Status | Uncwilly | Operation Billion Digits | 22 | 2005-10-25 14:05 |
| 1-2M LLR status | paulunderwood | 3*2^n-1 Search | 2 | 2005-03-13 17:03 |
| Status of 26.0M - 26.5M | 1997rj7 | Lone Mersenne Hunters | 25 | 2004-06-18 16:46 |