2019-03-03, 00:08 | #1 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
41·149 Posts |
Errors
This thread is intended as a reference thread for what errors may occur and what methods are used to detect, prevent, or correct them. For discussion, please use the reference discussion thread https://www.mersenneforum.org/showthread.php?t=23383.
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-19 at 06:38 |
2019-03-03, 00:18 | #2 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
41×149 Posts |
TF
TF errors are of multiple types. Their impact is slight.
1. Missed factor 2. False factor 3. Reporting error 4. Malicious reports A missed factor causes additional TF effort, unless it is very near the end of the last TF level to be done. A missed TF factor, if no other factor is found in the remaining TF, leads to spending additional resources performing P-1, and about a 97% chance of spending the effort of 2 or more Lucas-Lehmer primality tests or a PRP test and proof. A missed factor has an estimated 20% chance of being found in the P-1 pass. Smooth factors may be found, but others not. A false factor will be quickly determined to be false. The PrimeNet server checks each reported factor divides the Mersenne number for which it is reported, by performing the TF computation for that single factor reported. Confirming a factor is very many times faster than searching for a factor. The PrimeNet server also checks whether a submitted factor is itself prime or composite. False factors can be generated in mfaktc and probably other software from unreliable hardware, including memory errors on gpus, or other error sources. Here's what that may look like. Code:
[Mon May 28 01:18:34 2018] UID: Kriesel/dodo-gtx480-0, M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Mon May 28 05:40:46 2018] UID: Kriesel/dodo-gtx480-0, M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Mon May 28 05:44:16 2018] UID: Kriesel/dodo-gtx480-0, M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Fri Jun 01 01:02:12 2018] UID: Kriesel/dodo-gtx480-0, no factor for M331000037 from 2^79 to 2^80 [mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 02 06:57:35 2018] UID: Kriesel/dodo-gtx480-0, M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 02 11:45:02 2018] UID: Kriesel/dodo-gtx480-0, M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 03 00:31:23 2018] UID: Kriesel/dodo-gtx480-0, M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] See the mfaktc bug and wish list for more details. Here's a short perl program to check a factor; alter to suit the exponent and reported factor. Code:
use Math::BigInt; #Math::BigInt is an extension package in perl supporting arbitrarily large integers $f = Math::BigInt->new('2056121949903925392617977'); # the factor being confirmed, as a decimal string $b=87357233; # the exponent in decimal of the mersenne number for which it's reported as a factor $m = Math::BigInt->new('2'); #$m=2**b mod f; iff m=1, f is a factor of 2**$b -1; $m->bmodpow($b,$f); # see https://perldoc.perl.org/5.14.1/Math/BigInt.html#bmodpow() if ($m == 1 ) { print "M$b has factor $f\n"; } else { $m--; print "M$b/$f leaves nonzero remainder $m\n"; } Some aspects of the result reporting essentially operate on the honor system, which has worked very well for the most part for thousands of participants over many years. It's rare, but occasionally someone chooses to try to be disruptive, or gain computing credit they have not earned by doing the work. There are many ways of dealing with such individuals. Some of them require server administrator effort. One of the aspects they try to exploit is there is currently no verification in the TF no-factor-found report that the work described was actually performed. Robert Gerbicz has discovered an approach which may allow straightforward inclusion in the TF computations, of a modest overhead proof of work performed, for the TF no-factor-found case. It is described at https://www.mersenneforum.org/showpo...7&postcount=30 and discussed at https://www.mersenneforum.org/showpo...7&postcount=34, https://www.mersenneforum.org/showpo...9&postcount=40, https://www.mersenneforum.org/showpo...2&postcount=41, https://www.mersenneforum.org/showpo...5&postcount=44, https://www.mersenneforum.org/showpo...8&postcount=49, https://www.mersenneforum.org/showpo...9&postcount=51, https://www.mersenneforum.org/showpo...1&postcount=52, https://www.mersenneforum.org/showpo...7&postcount=58, through post 64; https://www.mersenneforum.org/showpo...1&postcount=67, https://www.mersenneforum.org/showpo...1&postcount=71, https://www.mersenneforum.org/showpo...7&postcount=74, https://www.mersenneforum.org/showpo...&postcount=122, https://www.mersenneforum.org/showpo...&postcount=123 There's another discussion of TF verification of work done in https://mersenneforum.org/showthread.php?t=25493 AIDs help detect some transmission or transcription errors. Exponents or bit levels not matching what was assigned for that AID will be detected by the server and error messages issued. The observed TF error rates are low enough and their impact low enough, that it is better for total project throughput to accept them than to double check TF for them. Duplicate TF of a bit level / exponent combination is a waste. It is recorded in the PrimeNet database, but computing credit is not given, to encourage efficiency. There is already in place a TF double-checking of sorts, in the form of the user TJAOI activity, using a different algorithmic approach. Usual TF is an ascending factor candidate search on one exponent at a time. TJAOI's approach is quite different, identifying candidate factors in an ascending exhaustive search (by prime k in f=2kp+1) and then determining what exponent each may apply to, and checking whether it's a factor. See https://mersenneforum.org/showthread.php?t=19014. Indications are that search is now completed through 2^{66} and continuing. The rates of TF error are, as far as I know, not well known. I suppose it could be estimated by linear interpolation from primality test run times assuming errors occur at a rate over time. (The justification for that is weak, since they are very different computations.) Since TF run times are short compared to primality test run times, in order to represent potential total run time savings, such an error rate estimate would necessarily be low, around ~2% x 2.5% = ~0.05% for the full set of TF levels for a given exponent. That estimate depends on the quite different computations of TF and primality testing having similar error rates in time; unlikely to be the case. The absence of very-low bit level results in an exponent report such as https://www.mersenne.org/report_expo...exp_hi=&full=1 is probably not a problem or error. I forget where I read it, but it's my understanding that to manage database size, old TF no-factor-bitlevel(s) entries are removed from the database on occasion. So the absence of entries for bit levels 1-63 is not an error of any sort. Also as I recall, to set up the databases with exponents to continue, George or James sometimes run the very lowest practical TF levels before making assignments generally available in a newly expanded range. Some of the earlier factoring progress was brought in from earlier work, from before GIMPS, for which less than the full complement of data was available (who did it, what date, etc.) Well-intended users should normally NOT rerun and resubmit unlisted lower-than-listed TF levels. It's wasted compute time and database regrowth leading to more effort at maintenance. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-02-06 at 17:19 Reason: updated for unlisted low TF levels; added TJAOI reference |
2019-03-03, 00:23 | #3 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
41·149 Posts |
P-1
P-1 errors are of multiple types. Their impact is slight.
1. Missed factor 2. False factor 3. Reporting error 4. Malicious reports A missed factor causes additional P-1 effort, if it occurs in stage one of a two stage run. A missed P-1 factor, if no other factor is found in the remaining P-1 run, also leads to spending the effort of 2 or more primality tests. A false factor will be quickly determined to be false. The PrimeNet server checks each reported factor divides the Mersenne number for which it is reported, by performing the TF computation for that single factor reported. Confirming a factor is very many times faster than searching for a factor. The primenet server also checks whether a submitted factor is itself prime or composite. CUDAPm1 has been observed to occasionally report a small false factor such as 3 or 5 or 7, not of the necessary form 2kp+1 and 1 or 7 mod 8.. This may be a bug in the gcd execution. It has been observed to occur in stage 1 or stage 2. Rerun with the same inputs on the same gpu and cpu did not duplicate the error. Multiple occurrences were observed on exponents M86.1M - M86.3M. Reporting errors may consist of duplicate reports of a given P-1 bounds set and exponent combination, failure to report, or some sort of transmission or transcription error. Duplicate reports do not harm or slow project progress. Failure to report a factor found is the same substantial impact as a missed factor, plus the P-1 factoring is likely to be reassigned. Some aspects of the result reporting essentially operate on the honor system, which has worked very well for the most part for thousands of participants over many years. It's rare, but occasionally someone chooses to try to be disruptive, or gain computing credit they have not earned by doing the work. There are many ways of dealing with such individuals. Some of them require server administrator effort. One of the aspects they try to exploit is there is currently no verification in the P-1 or TF no-factor-found report that the work described was actually performed. Robert Gerbicz has discovered an approach which may allow straightforward inclusion in the TF computations, of a modest overhead proof of work performed, for the TF no-factor-found case at https://www.mersenneforum.org/showpo...7&postcount=30; see also the TF post above for many links to subsequent discussion in regard to TF verification. This has not yet to my knowledge been implemented in any factoring code. There's no definite equivalent for P-1 known, although it may be possible to adapt the approach to P-1 or parts of it. See https://www.mersenneforum.org/showpo...&postcount=127 AIDs help detect some transmission or transcription errors. Exponent not matching what was assigned for that AID will be detected by the server and error messages issued. Anomalous P-1 bounds could also be detected. The observed P-1 error rates are low enough and their impact low enough, that it is better for total project throughput to accept them than to double check all P-1 for them. Duplicate P-1 work of a bounds-set / exponent combination is a waste. It may be recorded in the PrimeNet database, but I think computing credit is not given, to encourage efficiency. It may be worthwhile to rerun P-1 if an obviously wrong small factor is produced. The rates of P-1 error are, as far as I know, not well known. I suppose it could be estimated by linear interpolation from primality test run times assuming errors occur at a rate over time; ~2% x 2.5% = ~0.05%. If the rate is proportional to memory footprint, a sizable multiplier should be applied to that. I think the complexities of CUDAPm1 make its error rate higher; the bug list is nontrivial. Known implementations of P-1 factoring for GIMPS contain little in the way of error detection or correction. There are possibilities for adding error detection, at some increase in computation time. Gpuowl V7 has more P-1 error detection than most if not all other GIMPS applications. See https://mersenneforum.org/showpost.p...2&postcount=82, and https://mersenneforum.org/showpost.p...7&postcount=98. The standard method of checking a hardware and software combination for reliable P-1 factoring operation is to attempt to reproduce factors on exponents with known factors. It's best to use exponents with the same fft length as what will be run on unfactored exponents, or as close as possible if no test value is available. See the list of selected test exponent & factor combinations at https://www.mersenneforum.org/showpo...8&postcount=31 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-02-02 at 19:35 Reason: minor edits |
2019-03-03, 00:52 | #4 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
17DD_{16} Posts |
primality testing
How reliable are the major Mersenne testing codes, and how is it achieved? What more might be done?
There are three types of errors in primality testing we would like to avoid, and one we generally accept
GIMPS uses two types of primality tests. The Lucas-Lehmer sequence is understood to be a conclusive primality test for Mersenne numbers when performed correctly with a properly selected seed value. There is a small set of seed values suitable for any prime exponent. GIMPS convention is to use the seed 4. The Fermat pseudoprime test (PRP) is not conclusive regarding primality, but a pseudoprime test that indicates whether a candidate is definitely composite, or extremely likely but not certainly prime. Usually the seed (base) 3 is used. Verification with multiple LL tests and probably also a pseudoprime double check with different users, hardware, and software would promptly follow a Fermat pseudoprime test indicating "probably prime". Lucas-Lehmer primality tests have produced, as of 18-20 Dec 2018, the following "interesting" bad residues that occurred near minimum or maximum value: LL seed 4 64-bit residues near zero regarded as bad: https://www.mersenneforum.org/showpo...&postcount=142 Code:
Residue Count 0000000000000000 311 (known issue with CUDALucas < v2.06 and other software) 0000000000000002 14 (known issue with CUDALucas < v2.06 and other software) 0000000000000003 1 attributed to V17 prime95 shift bug 0000000000000004 1 cause uncertain 000000000000006C 1 attributed to V17 prime95 shift bug 0000000000000269 1 attributed to V17 prime95 shift bug total 329 Code:
exponent Partial Residue Status 3370013 FFFFFFFFFFFFFFFE Bad 37614691 FFFFFFFFFFFFFFFD Bad 47359579 FFFFFFFFFFFFFFFF Bad 64847711 FFFFFFFF80000000 Bad 67077133 FFFFFFFFFFFFFFFD Bad 67151753 FFFFFFFFFFFFFFFD Bad 68834723 FFFFFFFFFFFFFFFD Bad 81857519 FFFFFFFF80000000 Bad 81857537 FFFFFFFF80000000 Bad 81860447 FFFFFFFF80000000 Bad 81860479 FFFFFFFF80000000 Bad 88283761 FFFFFFFF80000000 Bad 88283807 FFFFFFFF80000000 Bad Residue Count ffffffffffffffff 1 (residue is equivalent to 0) fffffffffffffffe 1 fffffffffffffffd 4 (residue is equivalent to -2; known issue with algebra, and CUDALucas < v2.06) ffffffff80000000 7 (known issue with CUDALucas; 6 of 7 were from the same user in 2018) total 13 The number of total PRP primality tests done to date is not nearly as large as of LL tests, and the highly effective Gerbicz error check prevents most errors from persisting, so statistics are thin on errors from that. From a PRP results export available 2019-05-13 at https://www.mersenneforum.org/showpo...1&postcount=11, a dash of perl after residue-sort in OpenOffice yielded: 1 header record 9945 exponent records 0 residue-64s shared among multiple exponents (no duplication of res64 values) A review of verified good and verified bad results above 77M from 2019-08-12 to 2021-08-12 yields an error rate of 3 PRP bad results/123522 verified (good and bad) total, or ~24. ppm PRP error rate, on a mix of PRP/GEC and PRP/GEC/proof runs. A check of 50M to 77M for the same period showed 751 verified results, 0 errors. Duplication would be very highly unlikely by random chance in such a small sample sprinkled over a 2^{64} res64 space, so if duplication occurred, it would be indicative of some sort of errors. Inspection of the low and high ends showed no special or suspicious residues (0, 3, ff...ffc, ff...ff), just as you'd expect if the Gerbicz error check was working well. State indicators found were as follows R Reliable (which means it's Gerbicz check passed, but not yet verified) S Suspect (had error codes attached) U Unverified (one result for the exponent; not R or S) V Verified (more than one result for the exponent; matched) Perl counted 7111 reliable, 3 suspect, 1878 unverified, 953 verified, total 9945. Two of those 3 suspect are by the same user and have 0 offset. A suspect rate of 3 / 953 = 0.0031 is markedly better than the typical LL track record of ~2.0% bad results. Some of the codes have ancestry in common. My crude diagram for the ancestry of the various codes is attached at https://www.mersenneforum.org/showpo...04&postcount=5 If there was an issue in common, present in a bit of "DNA in common", despite the best efforts of the authors and extreme care in using differing programs on differing hardware in multiple confirmations by multiple individuals, how could we know? The ancestry in common question occurs on multiple levels.
If they all pass, it does not necessarily mean the existing codes are without flaw. If one fails, it does not necessarily mean the existing codes are flawed, it could be a transient hardware issue. There's a sliver of a chance of a computation error occurring outside the loop of iterations that are guarded for accuracy by the Jacobi check or Gerbicz check. Bugs have been identified outside the checking loop. No disrespect to any of the authors (or users, or hardware designers for that matter; some tasks are just very hard). I've fixed code that failed nonreproducibly at low rates. Ppb or lower frequency bugs can be hard to determine exist, much less identify and resolve. Current assignments can be of order 10^{18} core clocks to complete. The output of programs, and the best understandings and conjectures of number theorists, are compared. In the case of prime95, which can process not only Mersenne number computations, but Proth numbers, Fermat, etc, or its underlying gwnum routines that are also used in other software, an error detected in those other numbers if it occurred might be revealed to lie in fft or other code common to the various computations. The more eyes on it and the more cases checked, the better for the code's reliability. Over time, and with a basis in number theory, some interim results (indicated by the low 64 bit values) have been shown to sometimes occur when they should not, and checks for specific incorrect early residues have been incorporated into most of the various software applications. Some of these problem residues are application-specific, not merely algorithm-specific. Most of the programs use
To my knowledge, only prime95/mprime and gpuOwL incorporated the Jacobi test or Gerbicz PRP check, as of early March 2019. Mlucas added PRP and GEC at some point. The Gerbicz check has quite low overhead, costing only about 0.2% on run time at a one million iteration interval (Gerbicz block size 1000), which is comparable to the checking overhead of Jacobi symbol on LL tests. https://www.mersenneforum.org/showpo...0&postcount=31 The big difference is in net error rate after checking; Jacobi symbol check is only 50% effective at detecting errors, while the GEC is essentially 100% reliable in detecting errors, resuming from known-good save files, and preventing final residues from being in error. A more recent development is the PRP proof and verification (certificate) process. Using MD5 hash and other safeguards, this provides proof of correct completion of a PRP test and protects against falsified results. Gpuowl and prime95 now include this feature. It is being added to Mlucas. There is an analogous proof proposed by R. Gerbicz for the LL test, which requires more interim residues and data storage, and has not yet been implemented by anyone in any application. Another layer of error checking, detection, and correction occurs at the project / PrimeNet server level. The GIMPS project records 64-bit residues returned for composite primality tests, At least two tests are done per unfactored Mersenne number (except for a PRP first test with successful proof, which is more reliable). Additional tests are done if the first two are of different types, or the first n are of the same type but not matching, until two of matching type and residue are obtained, produced by different users. Some programs maintain and report error counts. Certain errors substantially increase the probability of a wrong final residue. Those results are flagged for early checking. (See the strategic double check and triple check thread on mersenneforum.org.) Reliability of a given set of hardware declines over time. Users can help maintain the reliability of their hardware by avoiding overclocking and environmental extremes, by periodically checking logs, by periodically testing their hardware with memory test programs, by periodically running double checks, and periodically running self-tests incorporated in the software. Users can help maintain and improve the accuracy of their own results and the project as a whole by logging runs, reviewing the logs for anomalies, and reporting anomalies seen, to the software authors, to the GIMPS community, and/or to me for inclusion in the bug and wish lists I am maintaining for GPU software. If we assume the hardware error rate is about constant per unit time, averaged over the fleet, as the project moves to larger exponents over time we can expect the error rate per primality test to increase. Addition of recently identified error checks allowing restart from a known-good savefile will help counter this effect, as will switching from LL, to PRP with its excellent Gerbicz error check. It is possible that other error checks or periodic self test will be added in the future. Current error rate of final residue from LL testing (in the absence of substantial hardware issues or software misconfiguration or certain logged error types) is around 2% per LL primality test, 4% per double-LL-tested exponent. Error rate rises to around 40% for prime95 LL tests with illegal sumout errors logged. Error rate of final residue from PRP3 with Gerbicz check is quite small by comparison, but nonzero; hardware error or bugs affecting software operation "outside" the Gerbicz error check of blocks of iterations can occur and affect the results, and have occurred since introduction of the Gerbicz check. Software has been rewritten to reduce the occurrence of errors outside the GEC. An evaluation of PRP errors found and verified results over the period 2019-08-12 to 2021-08-12 for exponents above 77M or above 50M yielded an observed PRP/GEC and PRP/GEC/proof combined error rate of ~24. per million PRP tests (~24 ppm). That's about 800. times lower than the LL without Jacobi check error rate, and corresponds to 3 PRP errors total, at least one of which occurred before the prime95 modification to reduce errors occurring outside the GEC error check, such as handling the final residue value. There was a software bug in gpuowl V7.1 which produced 3 proof errors before it was identified. (Not residue errors, IIRC.) That has been addressed. There is discussion of this error type beginning at https://mersenneforum.org/showpost.p...&postcount=405 In prime95 / mprime, measures hardening proof generation against error are described briefly at https://mersenneforum.org/showpost.p...40&postcount=8 Error rate from a user can be seasonal or longer scale variation. https://www.mersenne.org/various/math.php indicates an error rate per LL primality test of 1.5% when no serious errors (excluding illegal sumout as a serious error) are reported; 50% when a serious error is reported. Historically, LL primality test error rate varied considerably, up to nearly 4% per test. https://www.mail-archive.com/mersenn.../msg07476.html See also https://www.mersenneforum.org/showpo...3&postcount=19 for more discussion of error rates. This post is already rather long. https://www.mersenneforum.org/showpo...&postcount=102 late 2016 error rate charts https://www.mersenneforum.org/showpo...&postcount=104 late 2017 error rate charts https://www.mersenneforum.org/showpo...&postcount=105 late 2018 error rate charts https://www.mersenneforum.org/showpo...&postcount=107 late 2019 error rate charts https://www.mersenneforum.org/showpo...&postcount=111 late 2020 error rate charts (Thanks Patrik for all those!) You can read a bit about the history circa 1999 of GIMPS bugs and QA at https://www.mersenneforum.org/showth...=23877&page=13 Additional error checks, after discovery, that save more than they cost, and are otherwise deemed worthwhile, may be implemented by the various software titles' authors or maintainers at their discretion. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-08-13 at 15:03 Reason: added PRP error rate numbers |
2019-05-21, 20:51 | #5 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
41×149 Posts |
Hardware
For what a gpu's memory going bad looks like, and how to detect it, see the GPU RIP thread https://www.mersenneforum.org/showthread.php?t=23472
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-19 at 06:40 |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
ERRORS | Unregistered | Information & Answers | 2 | 2013-04-01 04:14 |
!@#$%^& Mail Errors | schickel | Aliquot Sequences | 5 | 2012-04-22 06:41 |
Rounding Errors in v25.11 | iBeta | Software | 10 | 2011-10-09 04:15 |
Core 2 Duo errors? | paulunderwood | Hardware | 3 | 2006-11-16 00:00 |
heat and errors | crash893 | Hardware | 37 | 2002-11-12 16:33 |