![]() |
|
|
#1893 | ||
|
"Svein Johansen"
May 2013
Norway
110010012 Posts |
Quote:
Ill make a dropbox to latest build once I have 2.03 with cuda 5.0 and compute and sm set to 35. Ive been mostly spending the week finding techniques to test different versions and benchmarking. Tonight I will look into testing 2.03 with sm35 to see if its the library that slows down 2.05 or the code itself. Quote:
|
||
|
|
|
|
|
#1894 | |
|
Jun 2012
1116 Posts |
Quote:
I could give it a try for the fermat numbers. I will have to ask you questions on the forum on the mod points. |
|
|
|
|
|
|
#1895 | |
|
"Svein Johansen"
May 2013
Norway
20110 Posts |
Quote:
|
|
|
|
|
|
|
#1896 | |
|
"Svein Johansen"
May 2013
Norway
110010012 Posts |
Quote:
The main iteration is done in the int check() function. If you follow the check function, you will see the algorithm |
|
|
|
|
|
|
#1897 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Hi Carl,
I've to annoy you again, sorry! Code:
Position 213, Iteration 100000, Errors: 0, completed 91.06% Position 214, Iteration 10000, Errors: 0, completed 91.11% Position 214, Iteration 20000, Errors: 0, completed 91.15% Position 214, Iteration 30000, Errors: 0, completed 91.19% Position 214, Iteration 40000, Errors: 0, completed 91.23% Position 214, Iteration 50000, Errors: 0, completed 91.28% Position 214, Iteration 60000, Errors: 0, completed 91.32% Position 214, Iteration 70000, Errors: 0, completed 91.36% Position 214, Iteration 80000, Errors: 0, completed -91.36% Position 214, Iteration 90000, Errors: 0, completed -91.32% Position 214, Iteration 100000, Errors: 0, completed -91.28% Code:
printf("Position %d, Iteration %d, Errors: %d, completed %2.2f%%\n", pos, k, total, ((double)pos*iter+k)*100 / (double) (s*iter));
Last fiddled with by TheJudger on 2013-05-23 at 20:38 |
|
|
|
|
|
#1898 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
32·5·7 Posts |
The numbers actually get that big? I'm often amazed at the things I can't imagine. Thanks.
Carl |
|
|
|
|
|
#1899 |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Hi Carl,
you could move the *100 to the other side of the division (*0.01). In this case it would take much longer to trigger the overflow. Currently it is 2^31-1 / 100 = ~21.5M iterations. You could add some timing information (iterations per second, estimated remaining time) to your memtest if you have some spare time. Oliver |
|
|
|
|
|
#1900 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
32·5·7 Posts |
Oliver
Thanks for the suggestions. Here's what I'm planning: 1. Include device and environment info at the beginning of the test. 2. Include timing, eta, and temperature info at each report. 3. Give address ranges of the memory being tested rather than the uninformative position 1 etc. Don't know when I will get to it though. Carl |
|
|
|
|
|
#1901 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
23·149 Posts |
I'm confused about benchmark timings vs production timings. For example, on my GTX 670 I get this:
Code:
Iteration 10000 M( 57885161 )C, 0x76c27556683cd84d, n = 3200K, CUDALucas v2.04 Beta err = 0.1076 (1:21 real, 8.0405 ms/iter, ETA 129:15:02) However, when running a benchmark on 3200K I get this: Code:
cudalucas -cufftbench 3276800 3276800 32768 CUFFT bench start = 3276800 end = 3276800 distance = 32768 CUFFT_Z2Z size= 3276800 time= 3.704131 msec Last fiddled with by James Heinrich on 2013-05-25 at 13:13 |
|
|
|
|
|
#1902 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
An LL iteration consists of a forward FFT, a point-wise squaring, an inverse FFT, and a rounding-to-integer-and-propagating-carries-step.
The benchmark only times one of the FFTs. So, your LL iteration did two 3.7ms FFTs, and spent 0.6 ms doing point-wise squaring and rounding/carry. Last fiddled with by Prime95 on 2013-05-25 at 14:06 |
|
|
|
|
|
#1903 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
32×5×7 Posts |
cufftbench only times the ffts. 1 iteration of an LL test consists of 2 ffts, pointwise multiplication, normalization, and splicing. For a rough equivalence of the two timings, pretend iteration times are a multiple of fft times. A more accurate equivalence is iteration time = 2 * fft + k * n for some constant k and fft length n.
Edit: Looks like Prime95 beat me to it. Last fiddled with by owftheevil on 2013-05-25 at 14:11 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
| CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
| CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
| settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
| Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |