![]() |
Prime95 roundoff errors
I just re-started an old primality test after completing another and Prime 95 is throwing up round off errors:-
"Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4" The other 3 cores of my i7 are running other tests without errors. Should I be concerned? Should I do anything about it? |
It is not "throwing up round off errors". It is telling you that you have had [B]one[/B] during the course of the LL test. This could be quite normal. Look in results.txt for the actual error message. If the roundoff error was barely over 0.4 (like 0.40625 or .4375, not 0.4997) and the error was reproducible then this is a non-issue.
|
[QUOTE=Prime95;258685]It is not "throwing up round off errors". It is telling you that you have had [B]one[/B] during the course of the LL test. This could be quite normal. Look in results.txt for the actual error message. If the roundoff error was barely over 0.4 (like 0.40625 or .4375, not 0.4997) and the error was reproducible then this is a non-issue.[/QUOTE]
A figure of speech. Actually it's generating the same error report for each successive set of 10,000 iterations. It's not [B]one[/B] isolated incident. results.txt only contains a single error report "Iteration: 25227368/48995293, ERROR: ROUND OFF (0.5) > 0.40" but the screen for that worker reports a new one every 7-8 minutes or so. [Apr 16 17:16] Waiting 10 seconds to stagger worker starts. [Apr 16 17:16] Worker starting [Apr 16 17:16] Setting affinity to run worker on logical CPUs 4,5 [Apr 16 17:16] Resuming primality test of M48995293 using Core2 type-3 FFT length 2560K, Pass1=640, Pass2=4K [Apr 16 17:16] Iteration: 25336590 / 48995293 [51.71%]. [Apr 16 17:16] Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4. [Apr 16 17:16] Confidence in final result is fair. [Apr 16 17:18] Iteration: 25340000 / 48995293 [51.71%]. Per iteration time: 0.039 sec. [Apr 16 17:18] Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4. [Apr 16 17:18] Confidence in final result is fair. [Apr 16 17:25] Iteration: 25350000 / 48995293 [51.73%]. Per iteration time: 0.040 sec. [Apr 16 17:25] Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4. [Apr 16 17:25] Confidence in final result is fair. |
See also: [URL]http://www.mersenneforum.org/showpost.php?p=238103&postcount=71[/URL]
[QUOTE=Prime95;238103]This is normal -- a result of a new "feature" in v26. In v25, one would get a SUM(INPUTS) error or ROUNDOFF > 0.4 error and it would scroll off the screen unnoticed. You had to go to the effort of looking in results.txt to see that you had a problem. In v26, every time prime95 does its regular screen update it prints out a summary of the total number of errors that occurred during the test. See undoc.txt for options on controlling this new feature. So, one of your workers had an SUM(INPUTS) error sometime during the test. Since it only happened once there is a fair chance that your LL result will be OK.[/QUOTE] See also: undoc.txt [CODE] You can control how the "count of errors during this test" message is output with every screen update. These messages only appear if possible hardware errors occur during a test. In prime.txt set: ErrorCountMessages=0, 1, 2, or 3 Value 0 means no messages, value 1 means a very short messages, value 2 means a longer message on a separate line, value 3 means a very long message possibly on multiple lines. Default value is 3.[/CODE] |
And what about this mysterious phenomen? At First time LL-Test (with Mprime26.5-linux64 ) of M42818549 running on 1 Worker (core) twelve similar Round OFF Error occured.The other 3 LL Tests who are running on the 3 other cores are without errors.
[Tue Mar 8 07:31:28 2011] Iteration: 18872759/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. [Tue Mar 8 07:55:14 2011] Iteration: 18903381/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. [Tue Mar 8 08:09:58 2011] Iteration: 18916265/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. [Tue Mar 8 08:58:00 2011] Iteration: 18989396/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. Iteration: 18977799/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. [Tue Mar 8 10:29:45 2011] Iteration: 19103776/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. [Tue Mar 8 11:19:09 2011] Iteration: 19171522/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. [Tue Mar 8 21:32:58 2011] Iteration: 19959306/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. [Tue Mar 8 21:39:15 2011] Iteration: 19956922/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. Iteration: 19953354/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. [Tue Mar 8 21:50:22 2011] Iteration: 19963829/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. [Tue Mar 8 23:32:26 2011] Iteration: 20067548/42818549, ERROR: ROUND OFF (0.5) > 0.40 Continuing from last save file. ..........led to a Suspect LL. |
That core looks pretty flaky to me. Any better luck on the next exponent.
|
So just to clarify - what has happened is that there was only one error (so far), but the prime95 worker keeps a cumulative score, and will report the same error every time it logs a new batch of 10,000 iterations on the screen. If there are subsequent errors then they will show up as separate entries in results.txt as in moebius post and the worker would report
"Possible hardware errors have occurred during the test! N ROUNDOFF > 0.4" if N errors occurred? |
You understand correctly
|
Is it worth manually assigning either of these exponents to one of my machines, knocking out some ECM progress?
|
More stressful than Prime95
You might want to run the latest IntelBurn test. It's even tougher on the processor than Prime95, and identifies calculation errors in an hour or so. Crank the memory setting up close to maximum, and it will hit your processor harder than Prime95. I.e. if it passes this stress test, you won't have a hardware issue with running LL tests.
[URL]http://www.softpedia.com/get/System/Benchmarks/IntelBurnTest.shtml[/URL] |
Damnit I've been running this test for so long, and it figures this POS laptop locks up and now Prime spews the error message every line just to remind me about it.
It first occurred at 70%. My question is - If there was an error in the calculation, why doesn't prime have some sort of 'save point', and recalculate from the last known good numbers that it was at? What am I supposed to do about it? I figured Prime would have some sort of 'checkpoint' it could revert to. |
Prime95 does revert to the last save file. The problem is not with the reported roundoff error - the auto-restart from the last save file ensures that that particular hardware error will not affect your final result. The problem is prime95 cannot detect every hardware error. If you happened to have one of these undetectable hardware errors your final result will be corrupt.
|
[QUOTE=SeeD419;266912]My question is - If there was an error in the calculation, why doesn't prime have some sort of 'save point', and recalculate from the last known good numbers that it was at? [/quote]
It did. Don't worry about it. [QUOTE=SeeD419;266912]What am I supposed to do about it? [/QUOTE] Nothing. See posts #3 & 4. |
Ohh okay thanks guys. I was a little confused by the screen output.
Okay...so worst case scenario is that I did have a few undetected hardware errors - then what? When I get to the end of the calculation will that be apparent then? Or will I never really know if the result is correct? |
You could ask me for (or even do yourself) an LL-D on the same exponent. Be ready to wait a month or two for the result. You could also have it TF'ed a bit further on a GPU on the off (about 1 in 10, at best) chance of proving it composite that way....
Up to you... |
[QUOTE=Rhyled;259250]You might want to run the latest IntelBurn test. It's even tougher on the processor than Prime95, and identifies calculation errors in an hour or so. [/QUOTE]
I'd like to share something that you may find interesting. One of the computers I built in Dec 2010 was starting to behave oddly in the March 2011 timeframe. I ran every stress test I could think of on it, every hardware diagnostic, and it passed them all, despite a 24x7 gauntlet being thrown at it for about a week. Then, sure enough, during "normal use," the problem returned, the system rebooted "for no good reason." Finally, I decided to blame the RAM, but I did not have any of the same rated speed to swap out. So, I wrote the world's simplest RAM testing application in C. It called malloc() with large chunks (1 GB) until it failed, then in 512 MB chunks until it failed, then 256 MB, 128 MB, all the way down to the last available kilobyte. Basically, it used every available byte of RAM it could. And, for every byte that was allocated, I first loop through and set the byte = 0000 0001. Then, I looped around and "read" each byte, making sure the result was == 1. I repeated this for 0000 00010 to 1111 1111. Sure enough, there were a few "flakey bytes" on one IC somewhere that could not retain their values. While the RAM chip would pass "hardware tests," there was no escaping this "byte-level" test which drilled down to the IC level. It was just one faulty IC on one of the RAM chips. I mention all of this because not every "stress test" can find "the exact problem." Sometimes RAM will behave fine on a large scale, but such a microscopic examination will uncover the problem. If the problem was with my CPU instead of the RAM, this test would be of no help (possibly) in determining the true culprit. Just something to think about. |
[QUOTE=pjaj;258698]"Iteration: 25227368/48995293, ERROR: ROUND OFF (0.5) > 0.40"[/QUOTE]
That exponent is very close to the upper limit permitted by a 2560-Kdouble FFT, so I'm not surprised to see an occasional ROE > 0.4 error there. OTOH, the exponent moebius notes for his errors is not near FFT boundary, at least of the kind my code uses (each power-of-2 length interval evenly subdivided into 8 subintervals of form [8,9,10,11,12,13,14,15]*2^n.) George, is p = 42818549 close to any of the length-breakovers used by your program? |
[QUOTE=ewmayer;266983] George, is p = 42818549 close to any of the length-breakovers used by your program?[/QUOTE]
Not really, 2240K can handle up to 43,060,000. If you get roundoffs due being near the FFT size limit, then you usually see the roundoff error of 0.40625 or 0.4375, not 0.5 |
[QUOTE=LiquidNitrogen;266982]I'd like to share something that you may find interesting.
One of the computers I built in Dec 2010 was starting to behave oddly in the March 2011 timeframe. I ran every stress test I could think of on it, every hardware diagnostic, and it passed them all, despite a 24x7 gauntlet being thrown at it for about a week. (snip) [/QUOTE] That hardware diagnostic include memtest86, which does much the same thing? |
| All times are UTC. The time now is 19:31. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.