![]() |
|
|
#397 | |
|
"6800 descendent"
Feb 2005
Colorado
32·83 Posts |
Quote:
There is something I can't get my head around. I understand the Gerbicz error check has only a 50% chance of detecting an error, should one occur. But what causes it to report a false error? |
|
|
|
|
|
|
#398 | |
|
Sep 2003
2·5·7·37 Posts |
Quote:
The Jacobi error check is for LL testing, and has only a 50% chance of detecting an error. Suppose I look at a coin lying on a table and I see that heads is facing up. I call heads. If you happen to be looking at a different coin for some reason, there's a 50% chance that you will see tails and realize that something's wrong, but also a 50% chance that your coin will also be heads and therefore you won't notice any problem. If the Jacobi check does report an error, it's certain that the current state of calculations is bad and has to be discarded. There is no false error. However, the program can go back to an earlier save file that passed the Jacobi check, and restart from there. Then you cross your fingers and hope that no 50-50 undetected error happened prior to that save file being saved. Last fiddled with by GP2 on 2019-01-04 at 20:35 |
|
|
|
|
|
|
#399 | |
|
"6800 descendent"
Feb 2005
Colorado
32×83 Posts |
Quote:
In this case, it caught an error and tried to go back to a good save file, but couldn't. So it reported the chance of a good test as "fair". So I am still confused as to how it can catch a definite error, not be able to revert to a backup that fixes it, but produce a good result anyway (the test was a double check). Last fiddled with by PhilF on 2019-01-04 at 22:29 |
|
|
|
|
|
|
#400 | |
|
Sep 2003
50368 Posts |
Quote:
Any chance you could post a copy of the error or informational messages you got? Is it possible that out of multiple save files, it warned that it couldn't use one of them but then silently resumed from another, older one that did pass the Jacobi check? Is the test ongoing and when would it be expected to complete? Here's the original message that described and proposed the Jacobi check. As I read it, every good interim or final residue is always −1, but every time an error occurs there's a coin flip and a 50-50 chance of getting either +1 or −1. Coin flips only happen when there's an error, so a +1 will not change back by itself unless there is a second error (or third, or higher). So a −1 can indicate: a) no errors b) one error and an unlucky coin flip c) two or more errors (and flips) with various results, but the final flip gave −1. Whereas a +1 can indicate: a) exactly one error b) two or more errors (and flips) with various results, but the final flip gave +1. So a +1 is absolutely an indication that you have a bad residue and you can't move forward from that point, only try to backtrack to some prior good save file. So there are various possibilities. Maybe the error messages are misleading and the program really did end up finding an older save file that passed the Jacobi check. Maybe the program has faulty error handling for Jacobi checks and fails to abort even when there are no good save files to fall back to, and instead defaults to the same handling used for older forms of error checking (roundoff errors, sumout errors, etc), where those kind of errors merely indicate that a result is suspect and should have higher priority for a quick double-check, rather than a guaranteed bad result. Or maybe you misread or misinterpreted the error messages. It's probably worth getting to the bottom of this. Last fiddled with by GP2 on 2019-01-05 at 03:16 |
|
|
|
|
|
|
#401 | |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
Quote:
In the default configuration, it is also possible that all intermediate files are bad, since Jacobi checks are only done every 12 hours, but files are saved every 30 minutes, and only three old files are kept. So, for example, if the error occurred after that last Jacobi check, but before that oldest file was saved, it's all gone. In my case, this was caused by over-optimistic memory overclocking that was stable elsewhere, but yet again, Prime95/mprime stresses the whole system like nothing else. Before this, I had maybe 99% confidence that the hardware is working and stable, but now I have 100%. (okay, maybe 99.99% - cosmic rays and no ECC, and everything...)The same machine has now produced four matching double check LL residues with no further errors, working on a further set of four, and after that I'll be switching to first time PRP tests. |
|
|
|
|
|
|
#402 | |
|
"6800 descendent"
Feb 2005
Colorado
13538 Posts |
Quote:
Iteration: 30271839/50930029, ERROR: Jacobi error check failed! Continuing from last save file. Error reading intermediate file: p9P30029 Renaming p9P30029 to p9P30029.bad1 Trying backup intermediate file: p9P30029.bu Error reading intermediate file: p9P30029.bu Renaming p9P30029.bu to p9P30029.bad2 Trying backup intermediate file: p9P30029.bu2 It might be worth noting that this machine is running from a USB stick, and is set for 2 save files instead of 3. It is not overclocked. After this, I corrected/tested the hardware, stress tested it for 50 hours, then let it complete the test. It kept reporting chances of a good result was "fair". The test turned out to be good, since it was a double check and the residues matched. |
|
|
|
|
|
|
#403 | ||
|
Sep 2003
2·5·7·37 Posts |
Quote:
Quote:
The chances of a good result were only "fair" because there could have been earlier errors, even if the Jacobi check passed. |
||
|
|
|
|
|
#404 |
|
"6800 descendent"
Feb 2005
Colorado
10111010112 Posts |
So if it could not find a good save file, would the program abort the test and start it over?
|
|
|
|
|
|
#405 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
827910 Posts |
|
|
|
|
|
|
#406 |
|
"6800 descendent"
Feb 2005
Colorado
2EB16 Posts |
Ok, thanks. Now I have a better understanding of how a test can report a Jacobi error yet still produce a good result. I also have a better understanding as to the importance of multiple save files. :)
|
|
|
|
|
|
#407 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7,823 Posts |
Quote:
Code:
You can control how many save files are kept that have passed the Jacobi error check.
This value is in addition to the value set by the NumBackupFiles setting. So if
NumBackupFiles=3 and JacobiBackupFiles=2 then 5 save files are kept - the first three
may or may not pass a Jacobi test, the last two save files have passed the Jacobi error
check. In prime.txt:
JacobiBackupFiles=N (default is 2)
Code:
You can have the program generate save files every n iterations. The files
will have a .XXX extension where XXX equals the current iteration divided
by n. In prime.txt enter:
InterimFiles=n
I do redundant backup. Cheap USB sticks with daily xcopy/s, in addition to network or separate HD automatic backup. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Prime95 version 27.3 | Prime95 | Software | 148 | 2012-03-18 19:24 |
| Prime95 version 26.3 | Prime95 | Software | 76 | 2010-12-11 00:11 |
| Prime95 version 25.5 | Prime95 | PrimeNet | 369 | 2008-02-26 05:21 |
| Prime95 version 25.4 | Prime95 | PrimeNet | 143 | 2007-09-24 21:01 |
| When the next prime95 version ? | pacionet | Software | 74 | 2006-12-07 20:30 |