![]() |
|
|
#89 |
|
Aug 2015
22×17 Posts |
https://www.mersenne.org/report_expo...8980089&full=1
This AMD Fury X card consistently fails to generate PRP proofs (but it seems to produce valid RES64). Self verification fails: Code:
$ ./gpuowl -verify proof/108980089-9.proof
2020-10-14 11:21:16 gpuowl v7.0-25-g1cbd87d-dirty
2020-10-14 11:21:16 config: -proof 9
2020-10-14 11:21:16 config: -maxAlloc 3584
2020-10-14 11:21:16 config: -verify proof/108980089-9.proof
2020-10-14 11:21:16 device 0, unique id ''
2020-10-14 11:21:16 gfx803-0 0 FFT: 6M 1K:12:256 (17.32 bpw)
2020-10-14 11:21:17 gfx803-0 0 OpenCL args "-DEXP=108980089u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DWEIGHT_STEP_MINUS_1=0x1.333492
ce02374p-1 -DIWEIGHT_STEP_MINUS_1=-0x1.800112b07bd55p-2 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-10-14 11:21:20 gfx803-0 0 /tmp/comgr-8d0411/input/CompileSource:50:9: warning: GpuOwl requires OpenCL 200, found 120
#pragma message "GpuOwl requires OpenCL 200, found " STR(__OPENCL_VERSION__)
^
1 warning generated.
2020-10-14 11:21:20 gfx803-0 0 OpenCL compilation in 2.78 s
2020-10-14 11:21:20 gfx803-0 0 proof: doing 136 iterations
2020-10-14 11:21:29 gfx803-0 0 proof verification: doing 212852 iterations
2020-10-14 11:22:18 gfx803-0 0 20000 / 212852, 2414 us/it
2020-10-14 11:23:06 gfx803-0 0 40000 / 212852, 2414 us/it
2020-10-14 11:23:54 gfx803-0 0 60000 / 212852, 2425 us/it
2020-10-14 11:24:43 gfx803-0 0 80000 / 212852, 2414 us/it
2020-10-14 11:25:31 gfx803-0 0 100000 / 212852, 2414 us/it
2020-10-14 11:26:19 gfx803-0 0 120000 / 212852, 2414 us/it
2020-10-14 11:27:08 gfx803-0 0 140000 / 212852, 2419 us/it
2020-10-14 11:27:56 gfx803-0 0 160000 / 212852, 2414 us/it
2020-10-14 11:28:44 gfx803-0 0 180000 / 212852, 2414 us/it
2020-10-14 11:29:33 gfx803-0 0 200000 / 212852, 2414 us/it
2020-10-14 11:30:04 gfx803-0 0 proof: invalid (364e0402bdbXXXXX expected ebe899f33efXXXXX)
2020-10-14 11:30:04 gfx803-0 0 proof 'proof/108980089-9.proof' failed
2020-10-14 11:30:04 gfx803-0 Bye
Last fiddled with by UBR47K on 2020-10-14 at 10:49 |
|
|
|
|
|
#90 | |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
Quote:
Does that GPU produce any errors during the PRP? (EE) (i.e. is the PRP 100% reliable, or sometimes there are errors and retries?) One approach I'm thinking of to tackle this is: - for power >= 10, do automatic local proof verification before upload (becase at power 10 it becomes cheap enough to not matter) -- this would make sure that the server does not see invalid proofs anymore in this situation. - if you have enough free disk space, enable power=10. How often do you get such invalid proofs -- 100%? sometimes? |
|
|
|
|
|
|
#91 |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
The script gpuowl/tools/primenet.py has a new work-type for "PRP that needs P-1", which is what you want for the merged PRP + P-1; it's called "PRP_P1":
primenet.py -w PRP_P1 etc. If you run it with only -w PRP (the old way) you may get assignments that already had P-1 done. That's fine as long as you don't trigger another P-1 on them, which would be mostly a waste (because of the low probability of a factor to be found in the "additional" P-1). PS: I don't think this worktype is available on the manual assignment web page yet. Last fiddled with by preda on 2020-10-14 at 11:37 |
|
|
|
|
|
#92 | ||
|
Aug 2015
22×17 Posts |
Quote:
My Radeon VII seems to generate correct PRP proofs, only this particular card is strange. Quote:
|
||
|
|
|
|
|
#93 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
- consistent % progress across restart. Ctrl-C during P2 has this behavior: - first Ctrl-C triggers GCD, which is followed by a save and exit 30s later - second Ctrl-C abruptly exits, and you lose the progress since the last GCD (as you see, P2 save only takes place after a GCD. The savefile basically records "GCD was done up to this point"). |
|
|
|
|
|
|
#94 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
A new prime-pairing algorithm for P2 has been implemented, P2 is now (significantly) faster.
This required a bump of the P2 savefile version, so don't switch version in the middle of P2, wait till P2 finishes. Also because of the changes it's no longer possible to change the B2 bound during P2. PS: and the default ratio B2/B1 is now 30, because of the lower relative cost of P2. This may bite you when upgrading in the middle of a PRP test (after P2 is complete) as you'd see the B2 bound changing because of this ratio. In this case, simply manually force the old B2 until the end of the PRP. Last fiddled with by preda on 2020-10-16 at 04:59 |
|
|
|
|
|
#95 |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
22·7·167 Posts |
|
|
|
|
|
|
#96 | |
|
"Mihai Preda"
Apr 2015
25338 Posts |
Quote:
It's not on my personal to-do list though. I also don't see the reason for the old-style P-1, so that's why I'm not particularly motivated in that direction. So, unless somebody steps up, I guess unfortunatelly the answer is "theoretically yes, practically no".. PS: I don't want this to sound like I'm against it, which I'm not, it's just that I actually have a ton of things to do lined up on an imaginary list already. working through them. Last fiddled with by preda on 2020-10-16 at 07:00 |
|
|
|
|
|
|
#97 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
|
|
|
|
|
|
#98 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Let me start with a brief description of how P2 works:
There is a large set of precomputed buffers, as many as the GPU memory allows (a 16GB GPU at the vawefront accommodates a bit over 300 buffers). These buffers are computed once, at the start of P2, and then never change (but they are used (read) regularly). There is also a set of 3 walking big-step buffers, that are updated ("stepped") once per block. (the range of primes to be covered is split into blocks "D" in size, where D=330 usually). P2 repeatedly selects one of the precomputed buffers, subtracts it from the big-step buffer, and multiplies the result into an accumulator "Acc". Errors and checks. Errors can happen in these places: a) error in the initial computation of the precomputed buffers b) precomputed buffers get mutated in GPU memory (bit-flip) c) error in the big-step buffer initial computation or increment d) error in the "Acc" multiplication Checks: d) I don't have a check on the Acc MUL; but the accumulator has a self-healing behavior, where an error in the accumulator multiply affects only the primes since the last GCD up to the error location, but not afterwards. So the effect of an error in the Acc MUL is self-contained. a) For point "a", the initial buffers computation is done twice and the results compared. This does not protect from programmer error, so one final value of the precomputed buffers is also computed in a different way -- this gives a degree of confidence that the algo is good if the values concide (which would be an unlikely coincidence otherwise). b) Once we are confident in the values of the precomputed buffers, they are snapshotted with a very simple checksum (just a sum64 of each buffer). Before each GCD, we recompute and compare the checksums of the buffers to verify that they didn't mutate under our feet. c) The big-step value can also be indepently computed at any point with a simple exponentiation. We do this before GCD, and compare the values. So the checks for b) and c) are run before each P2 GCD -- they are very fast, under 1s in total. Barring programmer error, this set of P2 error checks allows to run large (huge) P2 with a lower risk of "doing useless work" because an early hardware error ruins all the rest. Last fiddled with by preda on 2020-10-16 at 12:43 |
|
|
|
|
|
#99 |
|
Jul 2003
So Cal
83A16 Posts |
Thought I'd try this on an nVidia Telsa K20. I compiled using clang++-9 on Ubuntu 18.04 LTS since g++-8 failed.
Using ./gpuowl -prp 6972649 -b1 1500000 -b2 10000000 -maxAlloc 4.5G I got Code:
2020-10-18 01:23:20 Tesla K20c-0 6972649 2150000 30.83% 3fa14f3f8a7af59c 2020-10-18 01:23:24 Tesla K20c-0 6972649 2160000 30.98% 352eddd00f6ce8b0 2020-10-18 01:23:29 Tesla K20c-0 6972649 P1(1.5M) releasing 3050 buffers 2020-10-18 01:23:29 Tesla K20c-0 6972649 Released memory lock 'memlock-0' 2020-10-18 01:23:29 Tesla K20c-0 6972649 OK 2164500 31.04% b7b43bcc9edb8fcf 385 us/it; ETA 00:31 2020-10-18 01:23:29 Tesla K20c-0 6972649 P1 2164500 starting Jacobi check 2020-10-18 01:23:31 Tesla K20c-0 6972649 P1 Jacobi check OK 2020-10-18 01:23:31 Tesla K20c-0 6972649 OK 2168500 31.10% a2dae35546023a65 396 us/it; ETA 00:32 2020-10-18 01:23:31 Tesla K20c-0 6972649 P2(1.5M,10M) D=330, nBuf=1522 2020-10-18 01:23:31 Tesla K20c-0 6972649 P2(1.5M,10M) Generating P2 plan, please wait.. gpuowl: Pm1Plan.cpp:212: void Pm1Plan::scan(const vector<bool> &, u32, vector<Pm1Plan::BitBlock> &, Fun) [Fun = (lambda at Pm1Plan.cpp:269:40)]: Assertion `!blockBits[pos]' failed. Aborted (core dumped) |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| GpuOwl PRP-Proof changes | preda | GpuOwl | 20 | 2020-10-17 06:51 |
| gpuowl: runtime error | SELROC | GpuOwl | 59 | 2020-10-02 03:56 |
| gpuOWL for Wagstaff | GP2 | GpuOwl | 22 | 2020-06-13 16:57 |
| gpuowl tuning | M344587487 | GpuOwl | 14 | 2018-12-29 08:11 |
| How to interface gpuOwl with PrimeNet | preda | PrimeNet | 2 | 2017-10-07 21:32 |