![]() |
![]() |
#1 |
∂2ω=0
Sep 2002
República de California
1175510 Posts |
![]()
Mlucas v19 has gone live. Use this thread to report bugs, build issues, and for any other related discussion.
Last fiddled with by Uncwilly on 2020-11-28 at 20:51 |
![]() |
![]() |
![]() |
#2 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
738110 Posts |
![]() Haven't tried it yet, but congrats on getting it out.
|
![]() |
![]() |
![]() |
#3 |
∂2ω=0
Sep 2002
República de California
101101111010112 Posts |
![]()
Thanks. Meanwhile I have discovered a bug related to the new PRP-handling logic of the kind I expected would be shaken out by further testing ... this one specifically affects exponents really close to an FFT-length breakover point (I discovered it when I fired up a first-time PRP test of M96365419, which is very close to the 5120K-FFT exponent limit), turns out the Gerbicz-check-related breaking of the usual checkpointing interval into multiple smaller subintervals (at the end of each of which we update the G-checkproduct) breaks the is-roundoff-error-reproducible-on-retry logic. It's a simple fix, I just uploaded updated versions of the release tarball and ARM prebuilt binaries, but folks who previously built and are running the the Dec 1 code snapshot can use the simpler expedient of incremental-rebuild-and-relink of the single attached sourcefile.
Last fiddled with by ewmayer on 2019-12-03 at 22:00 |
![]() |
![]() |
![]() |
#4 | |
∂2ω=0
Sep 2002
República de California
5×2,351 Posts |
![]()
An interesting subtheme re. the newly-added PRP assigment-type support and the Gerbicz check ... shortly after the initial v19 release, got e-mail from George about the importance of adding redundancy to the G-checking mechanism:
Quote:
Well, let's review how my code does things, say, starting from a post-interrupt savefile-read: 1. Read PRP residue into array a[], accumulated G-checkproduct into b[]. Both of these residues are written to savefiles together with associated full-residue checksums - I use the Selfridge-Hurwitz residues (full-length residue mod (2^35-1) and mod (2^36-1)) for that - and the checksums compared with those recomputed during the read-from-file. 2. Do an iteration interval leading up to the next savefile update, 10k or 100k mod-squarings of a[]. Every 1000 squarings update b[] *= a[]. The initial b[] is in pure-integer form; on subsequent mul-by-a[] updates the result is left in the partially-fwd-FFTed form returned by the carry step, i.e. fwd-weighted and initial-fwd-FFT-pass done. 3. On final G-checkproduct update of the current iteration interval, 1000 iterations before the next savefile write, save a copy of the current G-checkproduct b[] in a third array c[], before doing the usual G-checkproduct update b[] *= a[]. 4. At end of the current iteration interval, prior to writing savefiles, do 1000 mod-squarings of c[] and compare the result to b[]. If mismatch, no savefiles written, instead roll back to last 'good' G-checkproduct data, which in my current first implementation means the previous multiple of 1M iterations. So during the above, the G-checkproduct accumulator b[] is vulnerable to a 1-bit error, of the kind which would not show up, say, via a roundoff error during the ensuing *= a[] FFT-mul update. So, what to do? Since the b[] data are kept in partially-fwd-FFTed form for most of the iteration interval, the Selfridge-Hurwitz (or similar CRC-style) checksums can't be easily computed from that. I think the easiest thing would be, every time I do an update b[] *= a[], do a memcpy to save a separate copy of the result, and compare that vs b[] prior to each update of the latter. [followup e-mail a few hours later] Additional thoughts: We are essentially trying to guard against a "false G-check failure", in the sense that the G-check might fail not because the PRP-residue array a[] had gotten corrupted but rather because the G-checkproduct accumulator b[] had. So every time we update b[] (or read it from a savefile) we also make a copy c[] = b[], and prior to each b[] *= a[] update we check that b == c. OK, but if at some point we find b != c, how can we tell which of the 2 is the good one? Obvious answer is to compute some kind of whole-array checksum at every update. Since post-update b[] may be in some kind of partially-FFTed state (that is the case for my code) the checksum needs to not assume integer data - perhaps simply treat the floats in a[] as integer bitfields. Would something as simple as computing a mod-2^64 sum of the uint64-reinterpretation-casted elements of a[] suffice, do you think? Further, any such checksum will be a much smaller bit-corruption target than b[], but to be safe one should probably make at least 2 further copies of *it*, call our 3 redundant checksums s1,s2,s3, then the attendant logic would look something like this: Code:
// Mod-2^64 sum of elements of double-float array a[], treated as uint64 bitfields: uint64 sum64(double a[], int n) { int i; uint64 sum = 0ull; for(i = 0; i < n; i++) sum += *(uint64*(a+i)); // Type-punning cast of a[i] return sum; } // Simply majority-vote consensus: uint64 consensus_checksum(uint64 s1, uint64 s2, uint64 s3) { if(s1 == s2) return s1; if(s1 == s3) return s1; if(s2 == s3) return s2; return 0ull; } int n; // FFT length in doubles double a[], b[], c[]; // a[] is PRP residue; b,c are redundant copies of G-checkproduct array uint64 s1,s2,s3; // Triply-redundant whole-array checksum on b,c-arrays ... [bunch of mod-squaring updates of a[]] // prior to each b[]-update, check integrity of array data: if(b[] != c[]) { // Houston, we have a problem s1 = consensus_checksum(s1,s2,s3); if(s1 == sum64(b,n)) // b-data good /* no-op */ else if(s1 == sum64(c,n)) // c-data good, copy back into b b[] = c[]; else // Catastrophic data corruption [roll back to last-good G-check savefile] } b[] *= a[]; // G-checkproduct update s1 = s2 = s3 = sum64(b,n); // Triply-redundant whole-array checksum update c[] = b[]; // Make a copy |
|
![]() |
![]() |
![]() |
#5 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1CD516 Posts |
![]()
After the dust settles, an update on the Mlucas save file format description to final v18, and to v19 PRP would be appreciated. For your convenience, https://www.mersenneforum.org/showpo...91&postcount=2
|
![]() |
![]() |
![]() |
#6 | |
∂2ω=0
Sep 2002
República de California
5·2,351 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#7 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
112·61 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 | |
∂2ω=0
Sep 2002
República de California
1175510 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 |
∂2ω=0
Sep 2002
República de California
2DEB16 Posts |
![]()
***Patch *** 03 Jan 2020: This patch adds one functionality-related item, namely adding redundancy to the PRP-test Gerbicz-check mechanism to prevent data corruption in the G-check residue from causing a "false Gerbicz-check failure", i.e. a failure not due to a corrupted PRP-test residue itself. This more or less follows the schema laid out in post #4.
I have also patched another logic bug related to roundoff-error-retry, this one was occasionally causing the run to switch to the next-larger FFT length when encountering a reproducible roundoff error, rather than first retrying at the current FFT length but with a shorter carry-chain recurrence computation for DWT weights. Not fatal, just suboptimal in terms of CPU usage. NOTE ALSO that I hit a Primenet-server-side bug on 31. Dec when I used the primenet.py script to submit my first batch of v19 LL-test results (my previous v19 submissions were all PRP-test ones). The server code was incorrectly expecting a Prime95-style checksum as part of such results lines. The really nasty part of this was that I almost missed it - until now, the primenet.py script grepped the page resulting from each attempted result-line submission for "Error code", if it found that it emitted a user-visible echo of the error message which was found, and the attempted submission line was not copied to the results_sent.txt file for archiving. In this case - I only saw this after retrying one of the submits via the manual test webform - there was "Error" on the returned page, but that was not followed by "code", so the script treated the submissions as successful. I only saw the problem when I checked the exponent status page for one of the expos, and saw no result had been registered. James Heinrich has fixed the server-side issue and to be safe I've tweaked the primenet.py script to only grep for "Error", but if you used the script to submit any v19 LL-test results (PRP tests were being correctly handled at both ends) prior to the current patch, please delete the corresponding lines from your results_sent.txt file and retry submitting using the patched primenet.py file. To be safe, check the exponent status at mersenne.org to make sure your results appear there. I just uploaded updated versions of the release tarball and ARM prebuilt binaries, but folks who previously built and are running the the Dec 3 code snapshot can use the simpler expedient of incremental-rebuild-and-relink of the attached Mlucas.c sourcefile. The also-attached tweaked primenet.py file - matching the updated one in the release tarball - is not necessary now that James has made the above-described server-side bugfix, but better safe than sorry, I say. |
![]() |
![]() |
![]() |
#10 |
Jan 2020
18 Posts |
![]()
I'm getting the following error after getting to the 100% mark:
ERROR: at line 2313 of file ../src/Mlucas.c Assertion failed: After short-div, R != 0 (mod B) Nothing has been written to the results.txt file since I started the run a week ago. I can restart the process, and it resumes from just before the end, but still spits out the same error after a couple minutes. |
![]() |
![]() |
![]() |
#11 | |
∂2ω=0
Sep 2002
República de California
1175510 Posts |
![]() Quote:
In the meantime, if you've not already done so, suggest you switch the top 2 entries in worktodo.ini and start on the next assignment. By the time that finishes you can grab a bug-patched version of the code, which should allow you successfully complete your above run. Oh, your data should be fine, like I said this appears to strictly be a postprocessing bug. Last fiddled with by ewmayer on 2020-01-16 at 20:16 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mlucas v18 available | ewmayer | Mlucas | 48 | 2019-11-28 02:53 |
Mlucas version 17 | ewmayer | Mlucas | 3 | 2017-06-17 11:18 |
MLucas on IBM Mainframe | Lorenzo | Mlucas | 52 | 2016-03-13 08:45 |
Mlucas on Sparc - | Unregistered | Mlucas | 0 | 2009-10-27 20:35 |
mlucas on sun | delta_t | Mlucas | 14 | 2007-10-04 05:45 |