20140814, 02:51  #12 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{3}·863 Posts 
No. We assume the iteration is bad, we backtrack to the last save file and when we reach the problematic iteration use a different method to square the number. Where "different" could be "use a larger FFT size" or "split the number into high/low halves and do three multiplies to do the squaring".

20140814, 06:00  #13 
May 2013
East. Always East.
11010111111_{2} Posts 
The test appears to be progressing normally. I am highly doubtful that the "4 Roundoff Errors of which 3 are repeatable" is triggering a backtrack to the save file unless the file is refreshed after the problematic iteration is dealt with, because I have been getting the four roundoff errors message exactly once every 10,000 iterations (read: every two minutes) since I started this thread, and probably before, as well; yet the worker is progressing at a normallooking pace.
I'm about to head to bed so it'll have all night to do its thing and I'll be able to compare the ETA's from a few hours apart to the actual elapsed time. I did that last night, too, and it seemed okay. Check the screenshot I've attached. This isn't cherry picked. It has looked like this for 24 hours. Yet, in results.txt, there is only one reference to a roundoff error larger than 0.4 (0.4375 to be exact). So what about the other 4 x 770 = roughly 3,000 roundoff errors that the client says it is encountering? 
20140814, 06:16  #14 
Aug 2002
North San Diego County
2·5·67 Posts 
Have it update the screen every 100 iterations and see how many errors you get then

20140814, 07:44  #15 
Jun 2003
2×5×463 Posts 
That error message is cumulative for the entire test. Not just since last error.
EDIT: I see 3 roe events in your very own post (once at iteration 6557188, and twice at iteration 7679973). I'd bet that there is one more. Of which, one of the errors (the first one at 7679973) could not be confirmed as reproducible, because you stopped/started P95 in between. Hence the "3 out of 4 is reproducible" thing. Last fiddled with by axn on 20140814 at 08:01 
20140814, 16:28  #16 
May 2013
East. Always East.
11·157 Posts 
Oh! Alright, that settles everything. I didn't know the program repeatedly reminded me of the errors previously.
Thanks a bunch! 
20140814, 19:25  #17 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{3}·863 Posts 

20140815, 00:46  #18  
∂^{2}ω=0
Sep 2002
República de California
2·3·1,879 Posts 
Quote:
1. RetrywithsameN from last savefile generally fails again with a fatal ROE, but not always reproducibly (i.e. different iteration and/or ROE value). 2. RetrywithlargerN from last savefile may succeed if the data corruption is restricted to data local to the smaller FFT length, but even if so, one is taking an unneeded runtime hit because what is really needed is an auxiliary data reinit. In the near future I will be adding internal checksums to all auxiliary data tables in an effort to better deal with this sort of thing. (I expect you've done so long ago with your code, since you have much more exposure to marginalquality hardware.) 

20140816, 03:54  #19 
May 2013
East. Always East.
11·157 Posts 
The test ended with 4/5 roundoff > 0.4 checking out, and it recommended a hardware check, but the test "successfully verified the DC"
EDIT: And my reliability dropped to 0.92 Last fiddled with by TheMawn on 20140816 at 03:58 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
POST LOTS AND LOTS OF PRIMES HERE  Kosmaj  Riesel Prime Search  1947  20200618 10:24 
Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4.  Xyzzy  Software  7  20161220 00:01 
Prime95 roundoff errors  pjaj  Software  18  20110720 03:04 
POST LOTS AND LOTS AND LOTS OF PRIMES HERE  lsoule  Riesel Prime Search  1999  20100317 22:33 
lots of large primes  Peter Hackman  Factoring  2  20080815 14:26 