![]() |
|
|
#1057 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
722110 Posts |
Note there is a spelling error in "desable", should be "disable". This might confuse a few people.
@flash: When he says smoothness, he means smoothness of the multiplier determining the FFT length (same smoothness at with P-1/B1/B2). He is using multiples of the 32K length; 45 is a smooth number, because it factors as 3*3*5, and so is 3-smooth. 44 is not (as) smooth because it's 11*4 (11-smooth), and thus an FFT length of 45*32K will usually be faster than 44*32K. That's why 2*32K, 16*32K, 32*32K, 64*32K etc... were the FFTs available before, because those are the "smoothest" lengths (factor as power of 2). Note that Prime95 does not allow any multiple for FFT lengths, but has only a few (presumably the smoothest) multipliers chosen. I'll come back later with exact multiples. |
|
|
|
|
|
#1058 | |
|
Jul 2009
Tokyo
2·5·61 Posts |
Quote:
Ver 1.69 1) desable -> disable 2) change -t option (if rooundoff error then write check point file(correct data).) Code:
$ ./CUDALucas 216091 start M216091 fft length = 12288 Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 12288, CUDALucas v1.69 err = 0.004395 (0:24 real, 2.3972 ms/iter, ETA 7:59) Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 12288, CUDALucas v1.69 err = 0.004395 (0:22 real, 2.1895 ms/iter, ETA 6:55) iteration = 20333 >= 1000 && err = 0.4 >= 0.35,fft length = 12288 not write checkpoint file and exit.(when disable -t option) Code:
$ ./CUDALucas 216091 -t start M216091 fft length = 12288 Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 12288, CUDALucas v1.69 err = 0.004684 (0:42 real, 4.2346 ms/iter, ETA 14:06) Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 12288, CUDALucas v1.69 err = 0.004684 (0:43 real, 4.2341 ms/iter, ETA 13:24) iteration = 20333 >= 1000 && err = 0.4 >= 0.35,fft length = 12288 write checkpoint file and exit.(when enable -t option) $ ./CUDALucas 216091 continuing work from a partial result M216091 fft length = 12288 iteration = 20333 Iteration 30000 M( 216091 )C, 0x540772c2abb7833a, n = 12288, CUDALucas v1.69 err = 0.00415 (0:21 real, 2.1144 ms/iter, ETA 6:20) |
|
|
|
|
|
|
#1059 |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41×251 Posts |
Thanks msft & flashjh. Trying now v1.68, it seems to be a problem to resume from v1.67 (old jobs about 10M iterations done, when I try to resume I get all residues cleared - equal to 2). I will finish the current expos with 1.67 and test the newer version after.
edit: this is to confirm that the structure of the checkpoint files changed again with version 1.68, they are 4 bytes shorter and totally messed inside :D You can not continue older assignments with the newer version. I just did 100k iterations with both 1.67 and 1.68, same residues, totally different files. Please finish all started assignments before switching. I will definitively switch tomorrow after my running assignment is finished. Last fiddled with by LaurV on 2012-03-21 at 19:01 |
|
|
|
|
|
#1060 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
He said he changed the checkpoint format in 1.68.
|
|
|
|
|
|
#1061 |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41×251 Posts |
Sorry! Me being stupid, I did not learned about it! (maybe I read superficially or I forgot). Now after you said, I read again.. and indeed he said
![]() edit: I got quite worried when I saw all residues being 00000002, grrrr... lost half hour or more, that is because I would be sleeping at 2:10 AM, not hunting primes...
Last fiddled with by LaurV on 2012-03-21 at 19:12 |
|
|
|
|
|
#1062 | |
|
"Jerry"
Nov 2011
Vancouver, WA
1,123 Posts |
Quote:
- CUDA 4.0 | SM 2.0 - CUDA 4.1 | SM 2.0 - CUDA 3.2 | SM 1.3 Skipping 4.1 | 2.1 unless someone requests it. |
|
|
|
|
|
|
#1063 | |||
|
"Jerry"
Nov 2011
Vancouver, WA
112310 Posts |
Quote:
Quote:
Quote:
Anyway, I spent some time working on my FFT sizes (sorted fastest 1st): Code:
CUFFT_Z2Z size= 1048576 time= 0.494499 msec 32 CUFFT_Z2Z size= 1179648 time= 0.598818 msec 36 CUFFT_Z2Z size= 1146880 time= 0.658661 msec 35 CUFFT_Z2Z size= 1310720 time= 0.725707 msec 40 CUFFT_Z2Z size= 1474560 time= 0.809843 msec 45 CUFFT_Z2Z size= 1572864 time= 0.861832 msec 48 CUFFT_Z2Z size= 1376256 time= 0.868893 msec 42 CUFFT_Z2Z size= 1605632 time= 0.88437 msec 49 CUFFT_Z2Z size= 1638400 time= 0.956487 msec 50 CUFFT_Z2Z size= 1769472 time= 1.012213 msec 54 CUFFT_Z2Z size= 1835008 time= 1.029823 msec 56 CUFFT_Z2Z size= 2097152 time= 1.077876 msec 64 CUFFT_Z2Z size= 2064384 time= 1.158135 msec 63 CUFFT_Z2Z size= 2359296 time= 1.259588 msec 72 CUFFT_Z2Z size= 1966080 time= 1.267012 msec 60 CUFFT_Z2Z size= 2293760 time= 1.419909 msec 70 CUFFT_Z2Z size= 2621440 time= 1.442881 msec 80 CUFFT_Z2Z size= 2654208 time= 1.469601 msec 81 CUFFT_Z2Z size= 2457600 time= 1.585579 msec 75 CUFFT_Z2Z size= 2949120 time= 1.745705 msec 90 CUFFT_Z2Z size= 3145728 time= 1.760098 msec 96 CUFFT_Z2Z size= 2752512 time= 1.81603 msec 84 CUFFT_Z2Z size= 3211264 time= 1.96938 msec 98 CUFFT_Z2Z size= 3670016 time= 2.0914 msec 112 CUFFT_Z2Z size= 3538944 time= 2.149464 msec 108 CUFFT_Z2Z size= 3440640 time= 2.187218 msec 105 |
|||
|
|
|
|
|
#1064 |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
101000001100112 Posts |
Ok, finished testing with 1.67, with another 2 matching residues (sm13 compiled by flashjh), totally 4 DC tests, all matched.
Switching to v1.69, another 4 expos. Few observations: - what is -m? (typo for -k?) - how do we actually ENABLE -t? it seems that if I start it with -t already as parameter, I can only disable it, but not enable it back. - enabling-disabling -s seems also not to really work on my side. All these are minor. The major one, enabling and disabling the aggressive mode, works perfectly and I am very happy about it. Now I can do my work without stopping CL, and I can let it burn to the max overnight when no one is touching the keyboard. |
|
|
|
|
|
#1065 | |||
|
Jul 2009
Tokyo
11428 Posts |
Quote:
Quote:
Quote:
|
|||
|
|
|
|
|
#1066 |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
101000001100112 Posts |
I understand that, and later I saw in the source file that it was intentionally disabled, as the "else" of the "if" is gone, and the help menu was modified from "toggle" (or "change") into "disable" only. So, it is not a bug, but an intentional choice. Most probably you had an objective reason to do so, and I was interested in the motivation behind of it. By a summary look into the source I did not see any trouble to have the "enable -t" option back, beside of the g_x=g_y stuff which could be always kept (even when -t disabled).
About the -s not working, please forget it. I was being stupid again. In fact, I was expecting it to work differently, for example a checkpoint file should be written every time when -s is enabled or disabled, and also a text line on the screen. Then the "s" key could be used to enforce writing of a checkpoint file and/or to check the progress especially in the case when -c is very big (save disk space, gain speed) and iterations are slow (big expos). Sometime we get bored to wait (if -c 1 million) for some screen output and press "s" :D Another improvement could be to have the checkpoint files containing the residue in the title too, i.e. last residue written on the screen, for the former iteration, you have it in a string already, just change the name of the file, instead of "sEXPONENT.ITERATION" use "sEXPONENT.ITERATION.RESIDUE.txt, with iteration zero-filled in front, that will be easier to sort by name, it will avoid some OS-es having trouble to display file-extension with more then 3 characters (and anyhow winxp explorer won't show extensions by default, so you can't see iteration number with the current format if you do/did not play with winxp settings), and more important, it will save my time to copy/paste the screen output into a text file, in case I want to keep the residues for later use or triplecheck. This is pain in the back if I use it from a batch file, I can not redirect the output because I want to see the screen too. You got my point. Having the residues in the file-names of the checkpoint files would be great. This program slowly become a masterpiece, day by day! I love it! Thanks for your wonderful work. |
|
|
|
|
|
#1067 |
|
Jun 2011
131 Posts |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
| CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
| CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
| settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
| Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |