mersenneforum.org GpuOwl 7.x
 Register FAQ Search Today's Posts Mark Forums Read

 2020-10-14, 10:48 #89 UBR47K     Aug 2015 4016 Posts another proof failure https://www.mersenne.org/report_expo...8980089&full=1 This AMD Fury X card consistently fails to generate PRP proofs (but it seems to produce valid RES64). Self verification fails: Code: \$ ./gpuowl -verify proof/108980089-9.proof 2020-10-14 11:21:16 gpuowl v7.0-25-g1cbd87d-dirty 2020-10-14 11:21:16 config: -proof 9 2020-10-14 11:21:16 config: -maxAlloc 3584 2020-10-14 11:21:16 config: -verify proof/108980089-9.proof 2020-10-14 11:21:16 device 0, unique id '' 2020-10-14 11:21:16 gfx803-0 0 FFT: 6M 1K:12:256 (17.32 bpw) 2020-10-14 11:21:17 gfx803-0 0 OpenCL args "-DEXP=108980089u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DWEIGHT_STEP_MINUS_1=0x1.333492 ce02374p-1 -DIWEIGHT_STEP_MINUS_1=-0x1.800112b07bd55p-2 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-10-14 11:21:20 gfx803-0 0 /tmp/comgr-8d0411/input/CompileSource:50:9: warning: GpuOwl requires OpenCL 200, found 120 #pragma message "GpuOwl requires OpenCL 200, found " STR(__OPENCL_VERSION__) ^ 1 warning generated. 2020-10-14 11:21:20 gfx803-0 0 OpenCL compilation in 2.78 s 2020-10-14 11:21:20 gfx803-0 0 proof: doing 136 iterations 2020-10-14 11:21:29 gfx803-0 0 proof verification: doing 212852 iterations 2020-10-14 11:22:18 gfx803-0 0 20000 / 212852, 2414 us/it 2020-10-14 11:23:06 gfx803-0 0 40000 / 212852, 2414 us/it 2020-10-14 11:23:54 gfx803-0 0 60000 / 212852, 2425 us/it 2020-10-14 11:24:43 gfx803-0 0 80000 / 212852, 2414 us/it 2020-10-14 11:25:31 gfx803-0 0 100000 / 212852, 2414 us/it 2020-10-14 11:26:19 gfx803-0 0 120000 / 212852, 2414 us/it 2020-10-14 11:27:08 gfx803-0 0 140000 / 212852, 2419 us/it 2020-10-14 11:27:56 gfx803-0 0 160000 / 212852, 2414 us/it 2020-10-14 11:28:44 gfx803-0 0 180000 / 212852, 2414 us/it 2020-10-14 11:29:33 gfx803-0 0 200000 / 212852, 2414 us/it 2020-10-14 11:30:04 gfx803-0 0 proof: invalid (364e0402bdbXXXXX expected ebe899f33efXXXXX) 2020-10-14 11:30:04 gfx803-0 0 proof 'proof/108980089-9.proof' failed 2020-10-14 11:30:04 gfx803-0 Bye Last fiddled with by UBR47K on 2020-10-14 at 10:49
2020-10-14, 11:28   #90
preda

"Mihai Preda"
Apr 2015

24·83 Posts

Quote:
 Originally Posted by UBR47K https://www.mersenne.org/report_expo...8980089&full=1 This AMD Fury X card consistently fails to generate PRP proofs (but it seems to produce valid RES64). Self verification fails: [..]
Thanks. I don't know yet the reason for this. Also I can't seem to reproduce.
Does that GPU produce any errors during the PRP? (EE) (i.e. is the PRP 100% reliable, or sometimes there are errors and retries?)

One approach I'm thinking of to tackle this is:
- for power >= 10, do automatic local proof verification before upload (becase at power 10 it becomes cheap enough to not matter) -- this would make sure that the server does not see invalid proofs anymore in this situation.
- if you have enough free disk space, enable power=10.

How often do you get such invalid proofs -- 100%? sometimes?

 2020-10-14, 11:36 #91 preda     "Mihai Preda" Apr 2015 24×83 Posts Worktype for PRP + P-1 in primenet.py : PRP_P1 The script gpuowl/tools/primenet.py has a new work-type for "PRP that needs P-1", which is what you want for the merged PRP + P-1; it's called "PRP_P1": primenet.py -w PRP_P1 etc. If you run it with only -w PRP (the old way) you may get assignments that already had P-1 done. That's fine as long as you don't trigger another P-1 on them, which would be mostly a waste (because of the low probability of a factor to be found in the "additional" P-1). PS: I don't think this worktype is available on the manual assignment web page yet. Last fiddled with by preda on 2020-10-14 at 11:37
2020-10-14, 11:46   #92
UBR47K

Aug 2015

26 Posts

Quote:
 Originally Posted by preda Thanks. I don't know yet the reason for this. Also I can't seem to reproduce. Does that GPU produce any errors during the PRP? (EE) (i.e. is the PRP 100% reliable, or sometimes there are errors and retries?)
No errors or retries during PRP.
My Radeon VII seems to generate correct PRP proofs, only this particular card is strange.

Quote:
 Originally Posted by preda One approach I'm thinking of to tackle this is: - for power >= 10, do automatic local proof verification before upload (becase at power 10 it becomes cheap enough to not matter) -- this would make sure that the server does not see invalid proofs anymore in this situation. - if you have enough free disk space, enable power=10. How often do you get such invalid proofs -- 100%? sometimes?
All proofs made with this Fury X are invalid.

2020-10-16, 04:48   #93
preda

"Mihai Preda"
Apr 2015

132810 Posts

Quote:
 Originally Posted by kriesel The % complete is inconsistent between a stop and a resume. The P2 iterations apparently don't all get saved at a stop and resume.
Some of these should be fixed now:
- consistent % progress across restart.

Ctrl-C during P2 has this behavior:
- first Ctrl-C triggers GCD, which is followed by a save and exit 30s later
- second Ctrl-C abruptly exits, and you lose the progress since the last GCD
(as you see, P2 save only takes place after a GCD. The savefile basically records "GCD was done up to this point").

 2020-10-16, 04:52 #94 preda     "Mihai Preda" Apr 2015 24×83 Posts P2 changes A new prime-pairing algorithm for P2 has been implemented, P2 is now (significantly) faster. This required a bump of the P2 savefile version, so don't switch version in the middle of P2, wait till P2 finishes. Also because of the changes it's no longer possible to change the B2 bound during P2. PS: and the default ratio B2/B1 is now 30, because of the lower relative cost of P2. This may bite you when upgrading in the middle of a PRP test (after P2 is complete) as you'd see the B2 bound changing because of this ratio. In this case, simply manually force the old B2 until the end of the PRP. Last fiddled with by preda on 2020-10-16 at 04:59
2020-10-16, 05:58   #95
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

73·13 Posts

Quote:
 Originally Posted by preda A new prime-pairing algorithm for P2 has been implemented, P2 is now (significantly) faster.
Probably a dumb question but is there any way to get this enhancement in the previous stand-alone P-1 version?

Thanks

2020-10-16, 06:56   #96
preda

"Mihai Preda"
Apr 2015

24·83 Posts

Quote:
 Originally Posted by petrw1 Probably a dumb question but is there any way to get this enhancement in the previous stand-alone P-1 version? Thanks
Should be possible to port it over, yes. Simply take the output of stage1 and plug it into the new stage2. Of course things are a bit hairy, so that would still require some significant work I assume.

It's not on my personal to-do list though. I also don't see the reason for the old-style P-1, so that's why I'm not particularly motivated in that direction.

So, unless somebody steps up, I guess unfortunatelly the answer is "theoretically yes, practically no"..

PS: I don't want this to sound like I'm against it, which I'm not, it's just that I actually have a ton of things to do lined up on an imaginary list already. working through them.

Last fiddled with by preda on 2020-10-16 at 07:00

2020-10-16, 06:59   #97
preda

"Mihai Preda"
Apr 2015

132810 Posts

Quote:
 Originally Posted by kriesel Note, one of the awkward things about/during P2 is there is no ETA (for the more likely NF case, or the less likely F case).
P2 ETA added. Should be quite accurate, and P2 progress % is v. accurate too.

 2020-10-16, 12:42 #98 preda     "Mihai Preda" Apr 2015 24608 Posts P2 hardening Let me start with a brief description of how P2 works: There is a large set of precomputed buffers, as many as the GPU memory allows (a 16GB GPU at the vawefront accommodates a bit over 300 buffers). These buffers are computed once, at the start of P2, and then never change (but they are used (read) regularly). There is also a set of 3 walking big-step buffers, that are updated ("stepped") once per block. (the range of primes to be covered is split into blocks "D" in size, where D=330 usually). P2 repeatedly selects one of the precomputed buffers, subtracts it from the big-step buffer, and multiplies the result into an accumulator "Acc". Errors and checks. Errors can happen in these places: a) error in the initial computation of the precomputed buffers b) precomputed buffers get mutated in GPU memory (bit-flip) c) error in the big-step buffer initial computation or increment d) error in the "Acc" multiplication Checks: d) I don't have a check on the Acc MUL; but the accumulator has a self-healing behavior, where an error in the accumulator multiply affects only the primes since the last GCD up to the error location, but not afterwards. So the effect of an error in the Acc MUL is self-contained. a) For point "a", the initial buffers computation is done twice and the results compared. This does not protect from programmer error, so one final value of the precomputed buffers is also computed in a different way -- this gives a degree of confidence that the algo is good if the values concide (which would be an unlikely coincidence otherwise). b) Once we are confident in the values of the precomputed buffers, they are snapshotted with a very simple checksum (just a sum64 of each buffer). Before each GCD, we recompute and compare the checksums of the buffers to verify that they didn't mutate under our feet. c) The big-step value can also be indepently computed at any point with a simple exponentiation. We do this before GCD, and compare the values. So the checks for b) and c) are run before each P2 GCD -- they are very fast, under 1s in total. Barring programmer error, this set of P2 error checks allows to run large (huge) P2 with a lower risk of "doing useless work" because an early hardware error ruins all the rest. Last fiddled with by preda on 2020-10-16 at 12:43
 2020-10-18, 08:35 #99 frmky     Jul 2003 So Cal 22·33·19 Posts Thought I'd try this on an nVidia Telsa K20. I compiled using clang++-9 on Ubuntu 18.04 LTS since g++-8 failed. Using ./gpuowl -prp 6972649 -b1 1500000 -b2 10000000 -maxAlloc 4.5G I got Code: 2020-10-18 01:23:20 Tesla K20c-0 6972649 2150000 30.83% 3fa14f3f8a7af59c 2020-10-18 01:23:24 Tesla K20c-0 6972649 2160000 30.98% 352eddd00f6ce8b0 2020-10-18 01:23:29 Tesla K20c-0 6972649 P1(1.5M) releasing 3050 buffers 2020-10-18 01:23:29 Tesla K20c-0 6972649 Released memory lock 'memlock-0' 2020-10-18 01:23:29 Tesla K20c-0 6972649 OK 2164500 31.04% b7b43bcc9edb8fcf 385 us/it; ETA 00:31 2020-10-18 01:23:29 Tesla K20c-0 6972649 P1 2164500 starting Jacobi check 2020-10-18 01:23:31 Tesla K20c-0 6972649 P1 Jacobi check OK 2020-10-18 01:23:31 Tesla K20c-0 6972649 OK 2168500 31.10% a2dae35546023a65 396 us/it; ETA 00:32 2020-10-18 01:23:31 Tesla K20c-0 6972649 P2(1.5M,10M) D=330, nBuf=1522 2020-10-18 01:23:31 Tesla K20c-0 6972649 P2(1.5M,10M) Generating P2 plan, please wait.. gpuowl: Pm1Plan.cpp:212: void Pm1Plan::scan(const vector &, u32, vector &, Fun) [Fun = (lambda at Pm1Plan.cpp:269:40)]: Assertion !blockBits[pos]' failed. Aborted (core dumped)`

 Similar Threads Thread Thread Starter Forum Replies Last Post preda GpuOwl 20 2020-10-17 06:51 SELROC GpuOwl 59 2020-10-02 03:56 GP2 GpuOwl 22 2020-06-13 16:57 M344587487 GpuOwl 14 2018-12-29 08:11 preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 11:49.

Wed Nov 25 11:49:36 UTC 2020 up 76 days, 9 hrs, 4 users, load averages: 1.76, 1.47, 1.47