mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-02-29, 03:37   #1
Fred
 
Fred's Avatar
 
"Ron"
Jan 2016
Fitchburg, MA

11000012 Posts
Default Hardware Error?

I've been seeing the following on one worker for the past 12 hours or so.

[Work thread Feb 28 22:23] Iteration: 59150000 / 78096433 [75.73%], roundoff: 0.358, ms/iter: 21.516, ETA: 4d 17:14
[Work thread Feb 28 22:23] Possible hardware errors have occurred during the test!
[Work thread Feb 28 22:23] 2 ROUNDOFF > 0.4 of which 1 were repeatable (not hardware errors).
[Work thread Feb 28 22:23] Confidence in final result is fair.

I've seen a similar error once before on a different computer, but I believe that time it said 1 roundoff > 0.4 of which 1 was repeatable, so I wasn't too concerned. This one seems to indicate there was 1 of 2 errors that was a hardware error? With confidence in the final result as "fair", should I re-run the exponent from the start on a different machine? The machine experiencing this issue recently successfully completed a DC, has not been overclocked, and the exponents being tested in the 3 other workers seem ok.

Last fiddled with by Fred on 2016-02-29 at 03:38
Fred is offline   Reply With Quote
Old 2016-02-29, 03:43   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

23×3×7×43 Posts
Default

post the relevant lines from results.txt

In general, if the roundoff error was below 0.45 then your hardware is OK (a normal occurrence of prime95 running an exponent at the very upper limit of what can be safely done for that FFT size). If the roundoff was 0.5 keep an eye on the machine.

I got a 0.5 error a few days ago on one of my machines when the power flickered on and off several times.
Prime95 is offline   Reply With Quote
Old 2016-02-29, 13:30   #3
Fred
 
Fred's Avatar
 
"Ron"
Jan 2016
Fitchburg, MA

97 Posts
Default

[Sun Feb 28 13:20:36 2016]
Iteration: 57644825/78096433, Possible error: round off (0.5) > 0.40625
Continuing from last save file.

Dang. .5

Should I just let it keep going? Restart on another machine to see if it experiences the same issue?
Fred is offline   Reply With Quote
Old 2016-02-29, 13:43   #4
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22×2,239 Posts
Default

Let it going, P95 is clever enough to get out of it. Watch for a while to see what's going on, and if it is reproducible, then no problem. If not, then you just had a hardware glitch, it happens sometime. If it became too often, you have to worry.
LaurV is offline   Reply With Quote
Old 2016-02-29, 14:09   #5
Fred
 
Fred's Avatar
 
"Ron"
Jan 2016
Fitchburg, MA

9710 Posts
Default

Quote:
Originally Posted by LaurV View Post
Let it going, P95 is clever enough to get out of it. Watch for a while to see what's going on, and if it is reproducible, then no problem. If not, then you just had a hardware glitch, it happens sometime. If it became too often, you have to worry.
Ok, thanks. Being newer to the project, and seeing the error, I was feeling a bit like your avatar (and mine too for that matter).

Last fiddled with by Fred on 2016-02-29 at 14:10
Fred is offline   Reply With Quote
Old 2016-02-29, 19:00   #6
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

11×523 Posts
Default

Minor bug: 1 were repeatable
henryzz is offline   Reply With Quote
Old 2016-03-01, 23:49   #7
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

29×113 Posts
Default

Quote:
Originally Posted by LaurV View Post
Let it going, P95 is clever enough to get out of it. Watch for a while to see what's going on, and if it is reproducible, then no problem. If not, then you just had a hardware glitch, it happens sometime. If it became too often, you have to worry.
If it's non-repeatable, it may mark the result as "suspect".

If this is a double-check and it matches the first result, no worries. If it doesn't match the first check, you would have to wonder...

If this is a first time check and it's "suspect", that exponent gets reassigned as a first-time check again.

Statistically, a suspect result has a 50/50 shot at being correct, more or less. Whereas a "clean" result has a 95-96% chance of being right. That's just averaged across all machines... some machines have a terrible record, near or at 100% wrong.
Madpoo is offline   Reply With Quote
Old 2016-03-05, 15:55   #8
obalouafi
 
"Joseph James"
Mar 2016
algeria

1 Posts
Default

wish you get your answer cause i'm getting the same error
obalouafi is offline   Reply With Quote
Old 2016-03-06, 01:40   #9
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

29·113 Posts
Default

Quote:
Originally Posted by Fred View Post
I've been seeing the following on one worker for the past 12 hours or so...
By the way, I'm running a check of this as well. Either you match and I'd do an independent triple check like I do for them all anyway, or you mismatch your first result and my run will be a double-check.

Mine may finish first... if so, do you want me to go ahead and check mine in?

If I match your first result, that would give you the option of cancelling the rest of your run if you wanted to move to another exponent.
Madpoo is offline   Reply With Quote
Old 2016-03-08, 00:03   #10
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

29×113 Posts
Default

Quote:
Originally Posted by Madpoo View Post
By the way, I'm running a check of this as well. Either you match and I'd do an independent triple check like I do for them all anyway, or you mismatch your first result and my run will be a double-check.

Mine may finish first... if so, do you want me to go ahead and check mine in?

If I match your first result, that would give you the option of cancelling the rest of your run if you wanted to move to another exponent.
My result did match your first run, so I went ahead and checked it in. You can quit your own second check, it won't be needed.
Madpoo is offline   Reply With Quote
Old 2016-03-08, 01:40   #11
Fred
 
Fred's Avatar
 
"Ron"
Jan 2016
Fitchburg, MA

6116 Posts
Default

Quote:
Originally Posted by Madpoo View Post
My result did match your first run, so I went ahead and checked it in. You can quit your own second check, it won't be needed.
Cool. Out of curiosity, did your double check start throwing a possible hardware error? Was the possible hardware error I was seeing something anyone would see if they let that exponent run on their computer?
Fred is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hardware Error after 1s StechusKaktus Information & Answers 13 2018-02-20 07:46
Possible hardware error kladner Hardware 2 2011-09-01 22:13
Software error or hardware error GuloGulo Software 3 2011-01-19 00:36
Error, hardware causing CRC error's Unregistered Information & Answers 3 2008-05-05 05:40
Hardware error Citrix Prime Sierpinski Project 12 2006-06-07 09:40

All times are UTC. The time now is 16:51.

Wed Dec 2 16:51:28 UTC 2020 up 83 days, 14:02, 2 users, load averages: 2.61, 2.39, 2.02

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.