mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 roundoff errors (https://www.mersenneforum.org/showthread.php?t=15528)

pjaj 2011-04-16 11:14

Prime95 roundoff errors
 
I just re-started an old primality test after completing another and Prime 95 is throwing up round off errors:-

"Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4"

The other 3 cores of my i7 are running other tests without errors.

Should I be concerned?
Should I do anything about it?

Prime95 2011-04-16 13:37

It is not "throwing up round off errors". It is telling you that you have had [B]one[/B] during the course of the LL test. This could be quite normal. Look in results.txt for the actual error message. If the roundoff error was barely over 0.4 (like 0.40625 or .4375, not 0.4997) and the error was reproducible then this is a non-issue.

pjaj 2011-04-16 16:30

[QUOTE=Prime95;258685]It is not "throwing up round off errors". It is telling you that you have had [B]one[/B] during the course of the LL test. This could be quite normal. Look in results.txt for the actual error message. If the roundoff error was barely over 0.4 (like 0.40625 or .4375, not 0.4997) and the error was reproducible then this is a non-issue.[/QUOTE]
A figure of speech.
Actually it's generating the same error report for each successive set of 10,000 iterations. It's not [B]one[/B] isolated incident.
results.txt only contains a single error report
"Iteration: 25227368/48995293, ERROR: ROUND OFF (0.5) > 0.40"
but the screen for that worker reports a new one every 7-8 minutes or so.


[Apr 16 17:16] Waiting 10 seconds to stagger worker starts.
[Apr 16 17:16] Worker starting
[Apr 16 17:16] Setting affinity to run worker on logical CPUs 4,5
[Apr 16 17:16] Resuming primality test of M48995293 using Core2 type-3 FFT length 2560K, Pass1=640, Pass2=4K
[Apr 16 17:16] Iteration: 25336590 / 48995293 [51.71%].
[Apr 16 17:16] Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4.
[Apr 16 17:16] Confidence in final result is fair.
[Apr 16 17:18] Iteration: 25340000 / 48995293 [51.71%]. Per iteration time: 0.039 sec.
[Apr 16 17:18] Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4.
[Apr 16 17:18] Confidence in final result is fair.
[Apr 16 17:25] Iteration: 25350000 / 48995293 [51.73%]. Per iteration time: 0.040 sec.
[Apr 16 17:25] Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4.
[Apr 16 17:25] Confidence in final result is fair.

S34960zz 2011-04-16 17:46

See also: [URL]http://www.mersenneforum.org/showpost.php?p=238103&postcount=71[/URL]
[QUOTE=Prime95;238103]This is normal -- a result of a new "feature" in v26. In v25, one would get a SUM(INPUTS) error or ROUNDOFF > 0.4 error and it would scroll off the screen unnoticed. You had to go to the effort of looking in results.txt to see that you had a problem.

In v26, every time prime95 does its regular screen update it prints out a summary of the total number of errors that occurred during the test. See undoc.txt for options on controlling this new feature.

So, one of your workers had an SUM(INPUTS) error sometime during the test. Since it only happened once there is a fair chance that your LL result will be OK.[/QUOTE]

See also: undoc.txt
[CODE]
You can control how the "count of errors during this test" message
is output with every screen update. These messages only appear if
possible hardware errors occur during a test. In prime.txt set:
ErrorCountMessages=0, 1, 2, or 3
Value 0 means no messages, value 1 means a very short messages, value 2
means a longer message on a separate line, value 3 means a very long message
possibly on multiple lines. Default value is 3.[/CODE]

moebius 2011-04-16 17:54

And what about this mysterious phenomen? At First time LL-Test (with Mprime26.5-linux64 ) of M42818549 running on 1 Worker (core) twelve similar Round OFF Error occured.The other 3 LL Tests who are running on the 3 other cores are without errors.

[Tue Mar 8 07:31:28 2011]
Iteration: 18872759/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
[Tue Mar 8 07:55:14 2011]
Iteration: 18903381/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
[Tue Mar 8 08:09:58 2011]
Iteration: 18916265/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
[Tue Mar 8 08:58:00 2011]
Iteration: 18989396/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
Iteration: 18977799/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
[Tue Mar 8 10:29:45 2011]
Iteration: 19103776/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
[Tue Mar 8 11:19:09 2011]
Iteration: 19171522/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
[Tue Mar 8 21:32:58 2011]
Iteration: 19959306/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
[Tue Mar 8 21:39:15 2011]
Iteration: 19956922/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
Iteration: 19953354/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
[Tue Mar 8 21:50:22 2011]
Iteration: 19963829/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
[Tue Mar 8 23:32:26 2011]
Iteration: 20067548/42818549, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.

..........led to a Suspect LL.

Prime95 2011-04-16 21:01

That core looks pretty flaky to me. Any better luck on the next exponent.

pjaj 2011-04-16 23:05

So just to clarify - what has happened is that there was only one error (so far), but the prime95 worker keeps a cumulative score, and will report the same error every time it logs a new batch of 10,000 iterations on the screen. If there are subsequent errors then they will show up as separate entries in results.txt as in moebius post and the worker would report

"Possible hardware errors have occurred during the test! N ROUNDOFF > 0.4"

if N errors occurred?

Prime95 2011-04-17 02:02

You understand correctly

Christenson 2011-04-17 16:56

Is it worth manually assigning either of these exponents to one of my machines, knocking out some ECM progress?

Rhyled 2011-04-22 01:12

More stressful than Prime95
 
You might want to run the latest IntelBurn test. It's even tougher on the processor than Prime95, and identifies calculation errors in an hour or so. Crank the memory setting up close to maximum, and it will hit your processor harder than Prime95. I.e. if it passes this stress test, you won't have a hardware issue with running LL tests.

[URL]http://www.softpedia.com/get/System/Benchmarks/IntelBurnTest.shtml[/URL]

SeeD419 2011-07-19 11:52

Damnit I've been running this test for so long, and it figures this POS laptop locks up and now Prime spews the error message every line just to remind me about it.

It first occurred at 70%.

My question is - If there was an error in the calculation, why doesn't prime have some sort of 'save point', and recalculate from the last known good numbers that it was at? What am I supposed to do about it? I figured Prime would have some sort of 'checkpoint' it could revert to.


All times are UTC. The time now is 14:08.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.