mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-21, 17:32   #1057
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts
Default

Note there is a spelling error in "desable", should be "disable". This might confuse a few people.

@flash: When he says smoothness, he means smoothness of the multiplier determining the FFT length (same smoothness at with P-1/B1/B2). He is using multiples of the 32K length; 45 is a smooth number, because it factors as 3*3*5, and so is 3-smooth. 44 is not (as) smooth because it's 11*4 (11-smooth), and thus an FFT length of 45*32K will usually be faster than 44*32K. That's why 2*32K, 16*32K, 32*32K, 64*32K etc... were the FFTs available before, because those are the "smoothest" lengths (factor as power of 2). Note that Prime95 does not allow any multiple for FFT lengths, but has only a few (presumably the smoothest) multipliers chosen. I'll come back later with exact multiples.
Dubslow is offline   Reply With Quote
Old 2012-03-21, 18:10   #1058
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Note there is a spelling error in "desable", should be "disable". This might confuse a few people.
I need Cool HELP meesage.

Ver 1.69
1) desable -> disable
2) change -t option (if rooundoff error then write check point file(correct data).)
Code:
$ ./CUDALucas 216091

start M216091 fft length = 12288
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 12288, CUDALucas v1.69 err = 0.004395 (0:24 real, 2.3972 ms/iter, ETA 7:59)
Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 12288, CUDALucas v1.69 err = 0.004395 (0:22 real, 2.1895 ms/iter, ETA 6:55)
iteration = 20333 >= 1000 && err = 0.4 >= 0.35,fft length = 12288 not write checkpoint file and exit.(when disable -t option)
Code:
$ ./CUDALucas 216091 -t

start M216091 fft length = 12288
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 12288, CUDALucas v1.69 err = 0.004684 (0:42 real, 4.2346 ms/iter, ETA 14:06)
Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 12288, CUDALucas v1.69 err = 0.004684 (0:43 real, 4.2341 ms/iter, ETA 13:24)
iteration = 20333 >= 1000 && err = 0.4 >= 0.35,fft length = 12288 write checkpoint file and exit.(when enable -t option)

$ ./CUDALucas 216091 

continuing work from a partial result M216091 fft length = 12288 iteration = 20333
Iteration 30000 M( 216091 )C, 0x540772c2abb7833a, n = 12288, CUDALucas v1.69 err = 0.00415 (0:21 real, 2.1144 ms/iter, ETA 6:20)
Attached Files
File Type: bz2 CUDALucas.1.69.tar.bz2 (11.5 KB, 123 views)
msft is offline   Reply With Quote
Old 2012-03-21, 18:43   #1059
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41×251 Posts
Default

Thanks msft & flashjh. Trying now v1.68, it seems to be a problem to resume from v1.67 (old jobs about 10M iterations done, when I try to resume I get all residues cleared - equal to 2). I will finish the current expos with 1.67 and test the newer version after.

edit: this is to confirm that the structure of the checkpoint files changed again with version 1.68, they are 4 bytes shorter and totally messed inside :D You can not continue older assignments with the newer version. I just did 100k iterations with both 1.67 and 1.68, same residues, totally different files. Please finish all started assignments before switching. I will definitively switch tomorrow after my running assignment is finished.

Last fiddled with by LaurV on 2012-03-21 at 19:01
LaurV is offline   Reply With Quote
Old 2012-03-21, 18:46   #1060
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

He said he changed the checkpoint format in 1.68.
Dubslow is offline   Reply With Quote
Old 2012-03-21, 19:07   #1061
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41×251 Posts
Default

Quote:
Originally Posted by Dubslow View Post
He said he changed the checkpoint format in 1.68.
Sorry! Me being stupid, I did not learned about it! (maybe I read superficially or I forgot). Now after you said, I read again.. and indeed he said
edit: I got quite worried when I saw all residues being 00000002, grrrr... lost half hour or more, that is because I would be sleeping at 2:10 AM, not hunting primes...

Last fiddled with by LaurV on 2012-03-21 at 19:12
LaurV is offline   Reply With Quote
Old 2012-03-22, 01:36   #1062
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by msft View Post
I need Cool HELP meesage.

Ver 1.69
1) desable -> disable
2) change -t option (if rooundoff error then write check point file(correct data).)
Code:
$ ./CUDALucas 216091
 
start M216091 fft length = 12288
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 12288, CUDALucas v1.69 err = 0.004395 (0:24 real, 2.3972 ms/iter, ETA 7:59)
Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 12288, CUDALucas v1.69 err = 0.004395 (0:22 real, 2.1895 ms/iter, ETA 6:55)
iteration = 20333 >= 1000 && err = 0.4 >= 0.35,fft length = 12288 not write checkpoint file and exit.(when disable -t option)
Code:
$ ./CUDALucas 216091 -t
 
start M216091 fft length = 12288
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 12288, CUDALucas v1.69 err = 0.004684 (0:42 real, 4.2346 ms/iter, ETA 14:06)
Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 12288, CUDALucas v1.69 err = 0.004684 (0:43 real, 4.2341 ms/iter, ETA 13:24)
iteration = 20333 >= 1000 && err = 0.4 >= 0.35,fft length = 12288 write checkpoint file and exit.(when enable -t option)
 
$ ./CUDALucas 216091 
 
continuing work from a partial result M216091 fft length = 12288 iteration = 20333
Iteration 30000 M( 216091 )C, 0x540772c2abb7833a, n = 12288, CUDALucas v1.69 err = 0.00415 (0:21 real, 2.1144 ms/iter, ETA 6:20)
Attached CUDALucas 1.69 x64 binaries:

- CUDA 4.0 | SM 2.0
- CUDA 4.1 | SM 2.0
- CUDA 3.2 | SM 1.3

Skipping 4.1 | 2.1 unless someone requests it.
Attached Files
File Type: zip CUDALucas1.69.x64.zip (220.6 KB, 154 views)
flashjh is offline   Reply With Quote
Old 2012-03-22, 04:15   #1063
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

112310 Posts
Default

Quote:
Originally Posted by LaurV View Post
Number 1 is clear. As a general rule, if FFT length increases, then the time to compute it will increase, as more data is involved. This is common sense... To get the "optimum" time one have to "tune down" the FFT according with the exponent and hardware he has. That is why there was such a big fuss about having the "-f" parameter. For example, I can use half hour to play with the exponent and FFT, but then get 19 hours ETA instead of 24. <snip>
Quote:
Originally Posted by msft View Post
Too small fft length make round off error,
But too big fft length make unstable results with this Version.(>1.58?)
Narrow launch window,it is stimulating!
Quote:
Originally Posted by Dubslow View Post
@flash: When he says smoothness, he means smoothness of the multiplier determining the FFT length (same smoothness at with P-1/B1/B2). He is using multiples of the 32K length; 45 is a smooth number, because it factors as 3*3*5, and so is 3-smooth. 44 is not (as) smooth because it's 11*4 (11-smooth), and thus an FFT length of 45*32K will usually be faster than 44*32K. That's why 2*32K, 16*32K, 32*32K, 64*32K etc... <snip>
Thank you all for your input! The examples and other info were very helpful. I'm still getting a grip the whole FFT process. (Any links to simple/difficult explanations).

Anyway, I spent some time working on my FFT sizes (sorted fastest 1st):

Code:
CUFFT_Z2Z size= 1048576 time= 0.494499 msec 32
CUFFT_Z2Z size= 1179648 time= 0.598818 msec 36
CUFFT_Z2Z size= 1146880 time= 0.658661 msec 35
CUFFT_Z2Z size= 1310720 time= 0.725707 msec 40
CUFFT_Z2Z size= 1474560 time= 0.809843 msec 45
CUFFT_Z2Z size= 1572864 time= 0.861832 msec 48
CUFFT_Z2Z size= 1376256 time= 0.868893 msec 42
CUFFT_Z2Z size= 1605632 time= 0.88437  msec 49
CUFFT_Z2Z size= 1638400 time= 0.956487 msec 50
CUFFT_Z2Z size= 1769472 time= 1.012213 msec 54
CUFFT_Z2Z size= 1835008 time= 1.029823 msec 56
CUFFT_Z2Z size= 2097152 time= 1.077876 msec 64
CUFFT_Z2Z size= 2064384 time= 1.158135 msec 63
CUFFT_Z2Z size= 2359296 time= 1.259588 msec 72
CUFFT_Z2Z size= 1966080 time= 1.267012 msec 60
CUFFT_Z2Z size= 2293760 time= 1.419909 msec 70
CUFFT_Z2Z size= 2621440 time= 1.442881 msec 80
CUFFT_Z2Z size= 2654208 time= 1.469601 msec 81
CUFFT_Z2Z size= 2457600 time= 1.585579 msec 75
CUFFT_Z2Z size= 2949120 time= 1.745705 msec 90
CUFFT_Z2Z size= 3145728 time= 1.760098 msec 96
CUFFT_Z2Z size= 2752512 time= 1.81603  msec 84
CUFFT_Z2Z size= 3211264 time= 1.96938  msec 98
CUFFT_Z2Z size= 3670016 time= 2.0914   msec 112
CUFFT_Z2Z size= 3538944 time= 2.149464 msec 108
CUFFT_Z2Z size= 3440640 time= 2.187218 msec 105
The number on the right is the FFT ÷ 32768. The top 4 gave an immediate roundoff error. I had to use 1474560 also. My current 1.67 doesn't complete for about 2½ hours, so I have to wait to switch to 1.69 with the new FFT.
flashjh is offline   Reply With Quote
Old 2012-03-22, 13:05   #1064
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

101000001100112 Posts
Default

Ok, finished testing with 1.67, with another 2 matching residues (sm13 compiled by flashjh), totally 4 DC tests, all matched.

Switching to v1.69, another 4 expos. Few observations:

- what is -m? (typo for -k?)
- how do we actually ENABLE -t? it seems that if I start it with -t already as parameter, I can only disable it, but not enable it back.
- enabling-disabling -s seems also not to really work on my side.

All these are minor. The major one, enabling and disabling the aggressive mode, works perfectly and I am very happy about it. Now I can do my work without stopping CL, and I can let it burn to the max overnight when no one is touching the keyboard.
LaurV is offline   Reply With Quote
Old 2012-03-22, 14:40   #1065
msft
 
msft's Avatar
 
Jul 2009
Tokyo

11428 Posts
Default

Quote:
Originally Posted by LaurV View Post
Switching to v1.69, another 4 expos. Few observations:
- what is -m? (typo for -k?)
yes.
Quote:
- how do we actually ENABLE -t? it seems that if I start it with -t already as parameter, I can only disable it, but not enable it back.
-t option to ensure the correst chekpoint file.
Quote:
- enabling-disabling -s seems also not to really work on my side.
Please add log(command and output) with Bug report.
msft is offline   Reply With Quote
Old 2012-03-22, 15:22   #1066
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

101000001100112 Posts
Default

Quote:
Originally Posted by msft View Post
-t option to ensure the correst chekpoint file.
I understand that, and later I saw in the source file that it was intentionally disabled, as the "else" of the "if" is gone, and the help menu was modified from "toggle" (or "change") into "disable" only. So, it is not a bug, but an intentional choice. Most probably you had an objective reason to do so, and I was interested in the motivation behind of it. By a summary look into the source I did not see any trouble to have the "enable -t" option back, beside of the g_x=g_y stuff which could be always kept (even when -t disabled).

About the -s not working, please forget it. I was being stupid again. In fact, I was expecting it to work differently, for example a checkpoint file should be written every time when -s is enabled or disabled, and also a text line on the screen. Then the "s" key could be used to enforce writing of a checkpoint file and/or to check the progress especially in the case when -c is very big (save disk space, gain speed) and iterations are slow (big expos). Sometime we get bored to wait (if -c 1 million) for some screen output and press "s" :D

Another improvement could be to have the checkpoint files containing the residue in the title too, i.e. last residue written on the screen, for the former iteration, you have it in a string already, just change the name of the file, instead of "sEXPONENT.ITERATION" use "sEXPONENT.ITERATION.RESIDUE.txt, with iteration zero-filled in front, that will be easier to sort by name, it will avoid some OS-es having trouble to display file-extension with more then 3 characters (and anyhow winxp explorer won't show extensions by default, so you can't see iteration number with the current format if you do/did not play with winxp settings), and more important, it will save my time to copy/paste the screen output into a text file, in case I want to keep the residues for later use or triplecheck. This is pain in the back if I use it from a batch file, I can not redirect the output because I want to see the screen too. You got my point. Having the residues in the file-names of the checkpoint files would be great.

This program slowly become a masterpiece, day by day! I love it! Thanks for your wonderful work.
LaurV is offline   Reply With Quote
Old 2012-03-22, 20:54   #1067
apsen
 
Jun 2011

131 Posts
Default

Quote:
Originally Posted by LaurV View Post
I can not redirect the output because I want to see the screen too.
Use cygwin
Code:
cmd | tee [-a] log.file
apsen is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 14:47.


Fri Jul 7 14:47:31 UTC 2023 up 323 days, 12:16, 0 users, load averages: 1.68, 1.45, 1.21

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔