20200327, 21:32  #1992 
P90 years forever!
Aug 2002
Yeehaw, FL
3×2,281 Posts 
Sort of.
Short answer is "maximum exponent for the FFT length"  "log2(3) = 1.585 fewer bits per word" compensates for the mulby3 in P1. Long answer is you can go a little higher than that because the maximum exponent has some carry32 "head room" during PRP. Here is the long answer: What you are worried about is the absolute value of the 32bit carry exceeding 0x80000000. I studied 500K iterations of 24518003 in a 1.25M FFT (18.706 bitsperword). The maximum carry32 was 0x32420000. Fine for PRP, not so for the mulby3 step in P1. Next (well actually first) I tried to calculate a reasonable max exponent for 1.25M, 2.5M, 5M, 10M, 20M, 40M, 80M exponents. We can store roughly 0.261 fewer bits per FFT word for each doubling of the FFT length. The formula for expected max carry32 during the mulby3 P1 step should be: 3 * 0x32420000 * 2^(BPW  18.706) * 2 ^ (log2(FFTLEN/1.25M) * .261) If this max exceeds 0x70000000 I'd be worried. I'm thinking less than 0x67000000 should be very safe. It's all a matter of how much protection you want from an outlier value (much the same as protecting against outlier round off errors). Let's see if an example works. Going for a fairly safe max carry32 of 0x70000000 in a 5M FFT: 0x70000000 = 3 * 0x32420000 * 2^BPW * 2^18.706 * 2^(2 * .261) BPW = log2 (0x70000000 / 3 / 0x32420000) + 18.706  .522 BPW = 17.755 max exp for 5M FFT = 93.1M similarly for a 5.5M FFT, max exp = 102.2M 
20200327, 23:33  #1993  
∂^{2}ω=0
Sep 2002
República de California
2·5,591 Posts 
Quote:
Code:
x *= wi_re;\ temp = DNINT(x);\ frac = fabs(xtemp);\ temp = temp*prp_mult + cy;\ cy = DNINT(temp*baseinv[i]);\ x = (tempcy*base[i])*wt_re;\ Question: In the 4th of those 6 lines, both temp and cy can be of either sign, but temp, the roundedbutnotyetwordsizenormalized iFFT term, is nearly always going to much larger in magnitude than the +cy carryin, i.e. we should be able to infer the expected sign of the next line's carryout computation from it, to see whether  in your case  the integer result overlowing into the sign bit, yes?. Quote:
Last fiddled with by ewmayer on 20200327 at 23:36 

20200327, 23:39  #1994 
P90 years forever!
Aug 2002
Yeehaw, FL
15273_{8} Posts 

20200328, 00:16  #1995 
∂^{2}ω=0
Sep 2002
República de California
25656_{8} Posts 
I don't see CARRY64 in the readme  is that an undocumented cmdline flag?
One my 2 runs just finished a PRP and started p1 on the next expo  I killed, deleted savefiles and restarted the p1 job @6144K  assuming that finds no factor, will the ensuing PRP of the same expo automatically switch back to 5632K? Oh, small UI suggestion: fft 6144 for the above gave "FFT too small" error, i.e. the UI needs raw FFT length, in this case 6291456. It was a little annoying to have the resulting run immediately echo to effect of "starting run with FFT length 6144K". Could the fft option be fiddled to use FFT length in K? 
20200328, 01:45  #1996 
P90 years forever!
Aug 2002
Yeehaw, FL
1101010111011_{2} Posts 

20200328, 01:48  #1997  
P90 years forever!
Aug 2002
Yeehaw, FL
3×2,281 Posts 
Quote:
Quote:


20200328, 11:48  #1998 
"Mihai Preda"
Apr 2015
10000010100_{2} Posts 
I suppose you passed a fft command line argument. Then it will affect all the tasks, thus will affect the PRP as well. (i.e. will not switch back to the default FFT)
Last fiddled with by preda on 20200328 at 11:49 
20200328, 16:49  #1999 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7411_{8} Posts 
3 strikes you're out, game over until tomorrow
gpuowl could handle error cases more gracefully. Luckily I stumbled across this one while handling something else. Otherwise it could have cost nearly a day's throughput on that gpu.
Please consider commenting out a problematic worktodo line and continuing on with the next in such a case, instead of killing the run. Also, since config.txt optimization content is fft length dependent, what's optimal for one fft length can be fatal for another. Please consider fftlengthspecific enhancement to config.txt, as mentioned before. Code:
20200328 10:23:18 condorella/rx480 CC 94418041 / 94418041, 4d816a6edf6393__ 20200328 10:23:20 condorella/rx480 {"exponent":"94418041", "worktype":"PRP3", "status":"C", "program":{"name":"gpuowl", "v ersion":"v6.11134g1e0ce1d"}, "timestamp":"20200328 15:23:20 UTC", "user":"kriesel", "computer":"condorella/rx480", "aid": "(redacted)", "fftlength":5242880, "res64":"4d816a6edf6393__", "residuetype":1, "errors":{"gerbicz":0 }}20200328 10:23:21 condorella/rx480 131500093 FFT 7168K: Width 256x4, Height 64x8, Middle 7; 17.92 bits/word 20200328 10:23:22 condorella/rx480 OpenCL args "DEXP=131500093u DWIDTH=1024u DSMALL_HEIGHT=512u DMIDDLE=7u DWEIGHT_STE P=0x8.7b964bd91a558p3 DIWEIGHT_STEP=0xf.16e489ea55fc8p4 DWEIGHT_BIGSTEP=0xd.744fccad69d68p3 DIWEIGHT_BIGSTEP=0x9.837f05 18db8a8p4 DAMDGPU=1 DNO_ASM=1 I. clfastrelaxedmath clstd=CL2.0" 20200328 10:23:25 condorella/rx480 OpenCL compilation in 3.68 s 20200328 10:23:28 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 20200328 10:23:35 condorella/rx480 131500093 EE 800 0.00%; 5251 us/it; ETA 7d 23:49; 6781adfa7991c92a (check 2.29s) 20200328 10:23:37 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 20200328 10:23:44 condorella/rx480 131500093 EE 800 0.00%; 5251 us/it; ETA 7d 23:48; 6781adfa7991c92a (check 2.29s) 1 errors 20200328 10:23:46 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 20200328 10:23:53 condorella/rx480 131500093 EE 800 0.00%; 5255 us/it; ETA 7d 23:58; 6781adfa7991c92a (check 2.30s) 2 errors 20200328 10:23:53 condorella/rx480 3 sequential errors, will stop. 20200328 10:23:53 condorella/rx480 Exiting because "too many errors" 20200328 10:23:53 condorella/rx480 Bye C:\msys64\home\ken\gpuowlcompile\gpuowlv6.11134g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowlcompile\gpuowlv6.11134g1e0ce1d\rx480>title gpuowlv6.11134g1e0ce1d/rx480 C:\msys64\home\ken\gpuowlcompile\gpuowlv6.11134g1e0ce1d\rx480>gpuowlwin 20200328 11:27:31 gpuowl v6.11134g1e0ce1d Last fiddled with by kriesel on 20200328 at 16:49 
20200328, 20:49  #2000  
∂^{2}ω=0
Sep 2002
República de California
10101110101110_{2} Posts 
@George  thanks, I missed the K and M suffix options in my perusal of the readme.
Quote:
All runs now restarted using use CARRY64  thanks, George. Also, you'll be pleased to hear tha after the latest BSODstyle crash of the Haswell system which hosts my Radeon VII I finally got round to trying the disableCstates trick you recommend in the BIOS Overclock submenu  seem to work like charm, system has been rockstable since, uptime 4 days and counting, which is really long for this system. More details on what happens for me with use CARRY64, in the context of 2 sidebyside PRP runs @5632K: o Initially, each run going at a steady 1386 us/iter at my sclk=4 setting; o Stop run 0 & restart with use CARRY64, after a few more 200Kiter intervals, both jobs are up to 1402 us/iter, which seems weird since only one is using the slowerbutsafe carry option; o Stop run 1 & restart with use CARRY64, after a few more 200Kiter intervals, both jobs are up to 1420 us/iter, a 2.5% hit to throughput. Since the bug only affects p1 runs, would it be difficult to tweak things so that use CARRY64 invokedbyuser is only operative in p1 testing? Or maybe allow separate specification by job type, e.g use CARRY64 means both worktypes, pm1 CARRY64 means apply to p1, prp CARRY64 means apply to prp runs? 

20200329, 09:04  #2001  
"Mihai Preda"
Apr 2015
414_{16} Posts 
Quote:


20200329, 10:10  #2002 
"Mihai Preda"
Apr 2015
2^{2}×3^{2}×29 Posts 
But Ken, what is the appropriate action to take on error?
Let's say that during P1 stage1, a residue==0 is detected. This is not the result of innapropriate FFTsize, it indicates a hardware error. But what to do in this situation, given there's no way to check P1  basically what makes sense is for gpuowl to simply stop doing any P1 (on that GPU). It can't reliably roll back to any trusted point. Assuming that a residue that is != 0 is a correct one, in the situation where the GPU produces res==0 sometimes, is not a good way to go. I would just discart the whole test as corrupted. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
mfakto: an OpenCL program for Mersenne prefactoring  Bdot  GPU Computing  1616  20200531 16:46 
GPUOWL AMD Windows OpenCL issues  xx005fs  GPU Computing  0  20190726 21:37 
Testing an expression for primality  1260  Software  17  20150828 01:35 
Testing Mersenne cofactors for primality?  CRGreathouse  Computer Science & Computational Number Theory  18  20130608 19:12 
Primalitytesting program with multiple types of moduli (PFGWrelated)  Unregistered  Information & Answers  4  20061004 22:38 