mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Hardware Error? (https://www.mersenneforum.org/showthread.php?t=21046)

Fred 2016-02-29 03:37

Hardware Error?
 
I've been seeing the following on one worker for the past 12 hours or so.

[Work thread Feb 28 22:23] Iteration: 59150000 / 78096433 [75.73%], roundoff: 0.358, ms/iter: 21.516, ETA: 4d 17:14
[Work thread Feb 28 22:23] Possible hardware errors have occurred during the test!
[Work thread Feb 28 22:23] 2 ROUNDOFF > 0.4 of which 1 were repeatable (not hardware errors).
[Work thread Feb 28 22:23] Confidence in final result is fair.

I've seen a similar error once before on a different computer, but I believe that time it said 1 roundoff > 0.4 of which 1 was repeatable, so I wasn't too concerned. This one seems to indicate there was 1 of 2 errors that [U]was[/U] a hardware error? With confidence in the final result as "fair", should I re-run the exponent from the start on a different machine? The machine experiencing this issue recently successfully completed a DC, has not been overclocked, and the exponents being tested in the 3 other workers seem ok.

Prime95 2016-02-29 03:43

post the relevant lines from results.txt

In general, if the roundoff error was below 0.45 then your hardware is OK (a normal occurrence of prime95 running an exponent at the very upper limit of what can be safely done for that FFT size). If the roundoff was 0.5 keep an eye on the machine.

I got a 0.5 error a few days ago on one of my machines when the power flickered on and off several times.

Fred 2016-02-29 13:30

[Sun Feb 28 13:20:36 2016]
Iteration: 57644825/78096433, Possible error: round off (0.5) > 0.40625
Continuing from last save file.

Dang. .5

Should I just let it keep going? Restart on another machine to see if it experiences the same issue?

LaurV 2016-02-29 13:43

Let it going, P95 is clever enough to get out of it. Watch for a while to see what's going on, and if it is reproducible, then no problem. If not, then you just had a hardware glitch, it happens sometime. If it became too often, you have to worry.

Fred 2016-02-29 14:09

[QUOTE=LaurV;427765]Let it going, P95 is clever enough to get out of it. Watch for a while to see what's going on, and if it is reproducible, then no problem. If not, then you just had a hardware glitch, it happens sometime. If it became too often, you have to worry.[/QUOTE]

Ok, thanks. Being newer to the project, and seeing the error, I was feeling a bit like your avatar (and mine too for that matter).

henryzz 2016-02-29 19:00

Minor bug: 1 were repeatable

Madpoo 2016-03-01 23:49

[QUOTE=LaurV;427765]Let it going, P95 is clever enough to get out of it. Watch for a while to see what's going on, and if it is reproducible, then no problem. If not, then you just had a hardware glitch, it happens sometime. If it became too often, you have to worry.[/QUOTE]

If it's non-repeatable, it may mark the result as "suspect".

If this is a double-check and it matches the first result, no worries. If it doesn't match the first check, you would have to wonder...

If this is a first time check and it's "suspect", that exponent gets reassigned as a first-time check again.

Statistically, a suspect result has a 50/50 shot at being correct, more or less. Whereas a "clean" result has a 95-96% chance of being right. That's just averaged across all machines... some machines have a terrible record, near or at 100% wrong. :smile:

obalouafi 2016-03-05 15:55

wish you get your answer cause i'm getting the same error

Madpoo 2016-03-06 01:40

[QUOTE=Fred;427737]I've been seeing the following on one worker for the past 12 hours or so...[/QUOTE]

By the way, I'm running a check of this as well. Either you match and I'd do an independent triple check like I do for them all anyway, or you mismatch your first result and my run will be a double-check.

Mine may finish first... if so, do you want me to go ahead and check mine in?

If I match your first result, that would give you the option of cancelling the rest of your run if you wanted to move to another exponent.

Madpoo 2016-03-08 00:03

[QUOTE=Madpoo;428200]By the way, I'm running a check of this as well. Either you match and I'd do an independent triple check like I do for them all anyway, or you mismatch your first result and my run will be a double-check.

Mine may finish first... if so, do you want me to go ahead and check mine in?

If I match your first result, that would give you the option of cancelling the rest of your run if you wanted to move to another exponent.[/QUOTE]

My result did match your first run, so I went ahead and checked it in. You can quit your own second check, it won't be needed.

Fred 2016-03-08 01:40

[QUOTE=Madpoo;428340]My result did match your first run, so I went ahead and checked it in. You can quit your own second check, it won't be needed.[/QUOTE]

Cool. Out of curiosity, did your double check start throwing a possible hardware error? Was the possible hardware error I was seeing something anyone would see if they let that exponent run on their computer?


All times are UTC. The time now is 04:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.