mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2006-05-03, 21:50   #1
Unregistered
 

2·72·59 Posts
Default Curiosity: Some errors more common than others

I noticed the following in my results.txt file:

Code:
Iteration: 17844238/34XXXXXX, ERROR: ROUND OFF (0.40625) > 0.40
Continuing from last save file.
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
I noticed that the specific round off error 0.40625 has occurred three times in my results.txt file. Is there a reason why that specific number occurs more often than others?

Also, is there something I can do to avoid these errors? My worry is, does Prime95 catch most of these errors when they are made?
  Reply With Quote
Old 2006-05-04, 00:16   #2
dsouza123
 
dsouza123's Avatar
 
Sep 2002

2·331 Posts
Default

These errors are when the calculations
are near size boundaries for dwt, they
are redone in a slightly different way.
dsouza123 is offline   Reply With Quote
Old 2006-05-04, 01:00   #3
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103·113 Posts
Default

Assuming the code has been carefully written and the FFT-size breakpoints appropriately chosen, a typical plot of roundoff errors (not just the ones > 0.4 that get reported) will show something like a Poisson distribution, with most RO errors clustered between (say) 0 and 0.1, and the frequency decreasing rapidly (IIRC quasi-exponentially) for larger errors. Of course errors near 0.5 tend to be of form a/2^k, with k some fairly small integer, i.e. you'll see a very coarse distribution there (Think of discrete errors as being like English socket wrench sizes). Those two effects combined explain why near an FFT threshold you tend to see some 13/32 = 0.40625 errors (often quite a few of these), hopefully relatively few 7/16 = 0.4375s, and yes, an occasional 15/32 = 0.46875 or the deadly 1/2 = 0.5.
ewmayer is offline   Reply With Quote
Old 2006-05-05, 01:34   #4
Unregistered
 

210210 Posts
Default

It still sounds like these errors, whether a hardware problem or not, are still bad. Do I know that all (or at least 99.99% of) errors are found? Is it possible to avoid these errors?
  Reply With Quote
Old 2006-05-05, 02:51   #5
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

23×1,223 Posts
Default

Quote:
Originally Posted by Unregistered
It still sounds like these errors, whether a hardware problem or not, are still bad. Do I know that all (or at least 99.99% of) errors are found? Is it possible to avoid these errors?
These error are not bad, per se.
If you can run a test 7% faster, but on occasion you need to rerun part of it (that takes less than 2% of the time, these numbers are pure hand waving), you are time ahead.

This is what happens near the breakpoints, if you can run the short FFT length, it is faster, but the errors can pop up (but they can be dealt with). There are other threads that deal with this issue. Look down in this thread:http://www.mersenneforum.org/showthr...r+reproducible

What you may want to do is short the time between save file writes.
Uncwilly is online now   Reply With Quote
Old 2006-05-05, 02:59   #6
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

10100001002 Posts
Default

Actually, those errors are not bad. If you were to look at the file containing the results returned for completed exponents near the one you are testing, you would see that the error code indicates many tests run on exponents near the upper limit of the FFT size produces these errors.

Here is an excerpt from the file I referred to:

Code:
34623493,jwdepen,pennsy12,Wc1,03000300
34623521,maekke,meiermb,WZ1,00000000
34623731,pfrakes,DP280336,Wc2,05000500
34623739,bobhinkle,Margo,WZ1,00000000
34623751,edwardsm,C84262AF2,WZ1,00000000
34623839,S62207,C052FA406,Wc2,04000400
34623857,pfrakes,DC240356,Wc2,01000100
The 8 digit number at the end of each line is the error code. The first two digits are how many round off > 0.4 errors occurred, and the 5th and 6th digit is how many of those errors are reproducible. As you can see, 4 out of seven tests had round off errors > 0.4, and all were reproducible. Quite normal.

In a way, they are not errors at all. As long as the round off is less than 0.5, then proper rounding will always occur. But George wrote the program so that round off errors should always be less than 0.4, just to be on the safe side. Sometimes, as a result of an "unlucky bit pattern" while testing an exponent that is bumping up against the FFT size limit, the round off might be 0.4125 or even slightly more. That's OK, because it is still below 0.5 and will get rounded properly. But, since the program expects all round offs to be less than 0.4, it calls it an error and tries again, starting from the last save file. As long as the second try results in the exact same amount of round off error we know we are OK.

You could get rid of the errors by forcing the program to use the next larger FFT size, but that would make the test run slower without accomplishing anything. You are better off considering these "errors" as "informational messages".
PhilF is offline   Reply With Quote
Old 2006-05-05, 17:23   #7
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103·113 Posts
Default

Quote:
Originally Posted by PhilF
As long as the round off is less than 0.5, then proper rounding will always occur.
Not so - assuming one defines the fractional error in a floating number x as

frac = abs(x - nint(x))

then the result is always in [0, 0.5]. The danger of frac values near 0.5 is that the closer frac is to 0.5, the greater the chance of an incorrect rounding, i.e. what you think is a fractional error (frac) is really a (1-frac) aliased to (frac) by the above formula, e.g. a fractional error of 0.6 aliased to 0.4. In my experience if you only see a few errors of the 0.40625 variety there is little chance of this having occurred, but once one starts seeing errors like 0.4375 (especially more than one of these) one is on dangerous ground.

The point of setting an appropriate fraction-error threshold is to reduce the odds of this kind of incorrect rounding to acceptably low levels. If you think about it, it's really quite astonishing that we routinely can get away with setting it as high as 0.4 and get away with it - that only works due to a combination of very carefully written code and the quasirandom nature of LL-test intermediate residues.

Last fiddled with by ewmayer on 2006-05-05 at 17:24
ewmayer is offline   Reply With Quote
Old 2006-05-05, 22:11   #8
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

22×7×23 Posts
Default

Quote:
Originally Posted by ewmayer
Not so - assuming one defines the fractional error in a floating number x as

frac = abs(x - nint(x))
I stand corrected.
Quote:
Originally Posted by ewmayer
works due to a combination of very carefully written code...
Hear, hear!
PhilF is offline   Reply With Quote
Old 2006-05-08, 23:52   #9
Unregistered
 

71716 Posts
Default

I'm the same poster as above.

I looked at my results.txt file, and I have had one error 0.5 which wasn't reproducible, and one error .4375 which was. Would you suggest then that I manually change my fft length to something safer?
  Reply With Quote
Old 2006-05-09, 00:45   #10
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103·113 Posts
Default

Quote:
Originally Posted by Unregistered
I'm the same poster as above.

I looked at my results.txt file, and I have had one error 0.5 which wasn't reproducible, and one error .4375 which was. Would you suggest then that I manually change my fft length to something safer?
No - from the sound of it the 0.5 error was probably a hardware glitch (and was caught, i.e. didn't corrupt the computation), and a small number of reproducible 0.4375 errors is usually not fatal .
ewmayer is offline   Reply With Quote
Old 2006-05-09, 00:48   #11
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

64410 Posts
Default

Quote:
Originally Posted by Unregistered
I looked at my results.txt file, and I have had one error 0.5 which wasn't reproducible, and one error .4375 which was. Would you suggest then that I manually change my fft length to something safer?
No. The 0.5 error was not caused by FFT size. That indicates some sort of hardware problem, especially if you get another one. In general, one-half of tests that have one un-reproducible round off error give incorrect results.

The reproducible errors are fine, they are not causing harm.

If I were you, I would start a Prime95 Torture Test on the machine, and let it run for at least 24 hours.
PhilF is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Found a factor during TF, common I know but what does it mean? sr13798 Information & Answers 7 2016-11-22 01:56
Human/Ape Evidence for Common Ancestry aymanmikhail Soap Box 7 2011-09-05 04:06
multiple (3+) Unverified LL -- how common? S34960zz PrimeNet 26 2011-07-11 18:37
space complexity of common algorithms? ixfd64 Math 4 2006-11-27 20:52
least common multiple of numbers of the form a^x-1 juergen Math 2 2004-04-17 12:19

All times are UTC. The time now is 18:13.


Fri Jul 16 18:13:29 UTC 2021 up 49 days, 16 hrs, 1 user, load averages: 2.76, 2.31, 1.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.