mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 roundoff errors (https://www.mersenneforum.org/showthread.php?t=15528)

Prime95 2011-07-19 12:37

Prime95 does revert to the last save file. The problem is not with the reported roundoff error - the auto-restart from the last save file ensures that that particular hardware error will not affect your final result. The problem is prime95 cannot detect every hardware error. If you happened to have one of these undetectable hardware errors your final result will be corrupt.

axn 2011-07-19 15:22

[QUOTE=SeeD419;266912]My question is - If there was an error in the calculation, why doesn't prime have some sort of 'save point', and recalculate from the last known good numbers that it was at? [/quote]
It did. Don't worry about it.

[QUOTE=SeeD419;266912]What am I supposed to do about it? [/QUOTE]

Nothing. See posts #3 & 4.

SeeD419 2011-07-19 17:48

Ohh okay thanks guys. I was a little confused by the screen output.

Okay...so worst case scenario is that I did have a few undetected hardware errors - then what? When I get to the end of the calculation will that be apparent then? Or will I never really know if the result is correct?

Christenson 2011-07-19 23:39

You could ask me for (or even do yourself) an LL-D on the same exponent. Be ready to wait a month or two for the result. You could also have it TF'ed a bit further on a GPU on the off (about 1 in 10, at best) chance of proving it composite that way....
Up to you...

LiquidNitrogen 2011-07-20 00:53

[QUOTE=Rhyled;259250]You might want to run the latest IntelBurn test. It's even tougher on the processor than Prime95, and identifies calculation errors in an hour or so. [/QUOTE]

I'd like to share something that you may find interesting.

One of the computers I built in Dec 2010 was starting to behave oddly in the March 2011 timeframe. I ran every stress test I could think of on it, every hardware diagnostic, and it passed them all, despite a 24x7 gauntlet being thrown at it for about a week.

Then, sure enough, during "normal use," the problem returned, the system rebooted "for no good reason."

Finally, I decided to blame the RAM, but I did not have any of the same rated speed to swap out. So, I wrote the world's simplest RAM testing application in C.

It called malloc() with large chunks (1 GB) until it failed, then in 512 MB chunks until it failed, then 256 MB, 128 MB, all the way down to the last available kilobyte.

Basically, it used every available byte of RAM it could.

And, for every byte that was allocated, I first loop through and set the byte = 0000 0001. Then, I looped around and "read" each byte, making sure the result was == 1. I repeated this for 0000 00010 to 1111 1111.

Sure enough, there were a few "flakey bytes" on one IC somewhere that could not retain their values. While the RAM chip would pass "hardware tests," there was no escaping this "byte-level" test which drilled down to the IC level.

It was just one faulty IC on one of the RAM chips.

I mention all of this because not every "stress test" can find "the exact problem." Sometimes RAM will behave fine on a large scale, but such a microscopic examination will uncover the problem.

If the problem was with my CPU instead of the RAM, this test would be of no help (possibly) in determining the true culprit.

Just something to think about.

ewmayer 2011-07-20 01:02

[QUOTE=pjaj;258698]"Iteration: 25227368/48995293, ERROR: ROUND OFF (0.5) > 0.40"[/QUOTE]

That exponent is very close to the upper limit permitted by a 2560-Kdouble FFT, so I'm not surprised to see an occasional ROE > 0.4 error there.

OTOH, the exponent moebius notes for his errors is not near FFT boundary, at least of the kind my code uses (each power-of-2 length interval evenly subdivided into 8 subintervals of form [8,9,10,11,12,13,14,15]*2^n.) George, is p = 42818549 close to any of the length-breakovers used by your program?

Prime95 2011-07-20 02:36

[QUOTE=ewmayer;266983] George, is p = 42818549 close to any of the length-breakovers used by your program?[/QUOTE]

Not really, 2240K can handle up to 43,060,000.

If you get roundoffs due being near the FFT size limit, then you usually see the roundoff error of 0.40625 or 0.4375, not 0.5

Christenson 2011-07-20 03:04

[QUOTE=LiquidNitrogen;266982]I'd like to share something that you may find interesting.

One of the computers I built in Dec 2010 was starting to behave oddly in the March 2011 timeframe. I ran every stress test I could think of on it, every hardware diagnostic, and it passed them all, despite a 24x7 gauntlet being thrown at it for about a week.

(snip)
[/QUOTE]

That hardware diagnostic include memtest86, which does much the same thing?


All times are UTC. The time now is 19:31.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.