mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Data

Reply
 
Thread Tools
Old 2003-09-15, 17:12   #1
GP2
 
GP2's Avatar
 
Sep 2003

1010000100112 Posts
Default Error rate for LL tests

We estimate error rate as follows:

- Every single line in BAD is a separate verified-bad result
- Every single line in LUCAS_V.TXT is a separate verified-good result.
- Lines in HRF3.TXT handled as described below:

The file HRF3.TXT contains unverified results (only one LL test, or more than one but with non-matching double-checks). How do we estimate the error rate for these results?

Any exponent that occurs only once must be ignored: we have no idea whether it is a good or a bad result. However, when an exponent occurs N times (in N separate lines of HRF3.TXT), we know for sure that there are N distinct non-matching residues returned (otherwise there would have been a match and the results would have been removed from HRF3.TXT and moved to the files BAD and LUCAS_V.TXT), and therefore at least N-1 of them must be bad, and the remaining one could be good or bad.

The odds are, that remaining one result is good (only we don't yet know which of the N it is). After all, the error rate is relatively low, so the odds of N-1 bad + 1 good are much larger than the odds of all N bad.

In the most common case of 2 separate lines in HRF3.TXT for the same exponent, in most cases one will be good and one will be bad and a triple-check will sort out which is which.

So, to summarize:

- If an exponent occurs in only one line in HRF3.TXT, ignore it.
- If an exponent occurs in N separate lines in HRF3.TXT, assume one good result and N-1 bad results.

Error rates for the various exponent ranges are:
[Sept 9 2003 data]

Code:
          0 -     999,999       (163+0-0)/(163+0+70581) = .002
  1,000,000 -   1,999,999       (718+0-0)/(718+0+58971) = .012
  2,000,000 -   2,999,999       (1203+0-0)/(1203+0+54591) = .021
  3,000,000 -   3,999,999       (1465+0-0)/(1465+0+52939) = .026
  4,000,000 -   4,999,999       (1837+0-0)/(1837+0+51026) = .034
  5,000,000 -   5,999,999       (1905+0-0)/(1905+0+49346) = .037
  6,000,000 -   6,999,999       (1804+0-0)/(1804+0+49253) = .035
  7,000,000 -   7,999,999       (1956+27-12)/(1956+27+47579) = .039
  8,000,000 -   8,999,999       (1612+500-235)/(1612+500+45865) = .039
  9,000,000 -   9,999,999       (625+1312-639)/(625+1312+33724) = .036
 10,000,000 -  10,999,999       (53+1369-672)/(53+1369+2978) = .170
 11,000,000 -  11,999,999       (50+1384-679)/(50+1384+1993) = .220
 12,000,000 -  12,999,999       (31+1819-895)/(31+1819+1415) = .292
 13,000,000 -  13,999,999       (33+1611-798)/(33+1611+1392) = .278
 14,000,000 -  14,999,999       (4+1541-764)/(4+1541+1172) = .287
 15,000,000 -  15,999,999       (2+1091-541)/(2+1091+796) = .292
 16,000,000 -  16,999,999       (0+757-375)/(0+757+598) = .281
 17,000,000 -  17,999,999       (0+134-67)/(0+134+233) = .182
 18,000,000 -  18,999,999       (0+86-43)/(0+86+174) = .165
 19,000,000 -  19,999,999       (2+32-16)/(2+32+40) = .243
 20,000,000 -  20,999,999       (1+2-1)/(1+2+14) = .117
We note:
- The results for the low exponents have very low error rates. Maybe this is because the run time is very short, or maybe for such old results the bad results were purged or not recorded.
- The results for the higher exponents are artificially high. This is because when the server gets a result returned with a nonzero error code, it automatically reassigns that exponent for another first-time LL test without waiting a couple of years for regular double-checking to catch up to the current first-time range. Thus, a significant fraction of bad results are caught much sooner, but good results are not verified until perhaps years later.

Note: it is possible for a nonzero error code to still yield a good result and it is possible for a zero error code to yield a bad result. See the Most popular error codes thread.

Since the current leading edge of double checking is around 10.1M, all error rates above this are artificially high for the time being.

We also note:
So far, there is no evidence that error rates are increasing for larger exponents. The error rate remains steady around 3.5% - 4.0% over a broad range of exponents. Larger exponents have longer run times and thus we might expect more errors, but on the other hand newer machines run Windows XP and other modern operating systems with much better memory protection. So perhaps these effects cancel each other.

Note that this error rate of 3.5% - 4.0% is an average over all users and computers. Some computers have a 0% error rate, others have a high double-digit error rate. This depends on hardware issues, memory quality, CPU temperature, etc.

Finally, we might ask, what do we get if we only consider results returned by programs Wxx (George Woltman's Prime95/mprime) and ignore results returned by other programs? The answer is: almost exactly the same.

Error rates for the various exponent ranges, taking into account only results returned by programs Wxx (George Woltman), are:
[Sept 9 2003 data]

Code:
          0 -     999,999       (85+0-0)/(85+0+30303) = .002
  1,000,000 -   1,999,999       (552+0-0)/(552+0+45733) = .011
  2,000,000 -   2,999,999       (1171+0-0)/(1171+0+52421) = .021
  3,000,000 -   3,999,999       (1445+0-0)/(1445+0+52267) = .026
  4,000,000 -   4,999,999       (1779+0-0)/(1779+0+49024) = .035
  5,000,000 -   5,999,999       (1891+0-0)/(1891+0+48521) = .037
  6,000,000 -   6,999,999       (1793+0-0)/(1793+0+47982) = .036
  7,000,000 -   7,999,999       (1945+27-12)/(1945+27+46770) = .040
  8,000,000 -   8,999,999       (1602+500-235)/(1602+500+45616) = .039
  9,000,000 -   9,999,999       (622+1312-639)/(622+1312+33397) = .036
 10,000,000 -  10,999,999       (53+1369-672)/(53+1369+2964) = .170
 11,000,000 -  11,999,999       (49+1384-679)/(49+1384+1984) = .220
 12,000,000 -  12,999,999       (30+1819-895)/(30+1819+1405) = .293
 13,000,000 -  13,999,999       (33+1611-798)/(33+1611+1330) = .284
 14,000,000 -  14,999,999       (4+1541-764)/(4+1541+1147) = .290
 15,000,000 -  15,999,999       (1+1091-541)/(1+1091+781) = .294
 16,000,000 -  16,999,999       (0+757-375)/(0+757+581) = .285
 17,000,000 -  17,999,999       (0+134-67)/(0+134+225) = .186
 18,000,000 -  18,999,999       (0+86-43)/(0+86+168) = .169
 19,000,000 -  19,999,999       (2+32-16)/(2+32+40) = .243
 20,000,000 -  20,999,999       (1+2-1)/(1+2+12) = .133
GP2 is offline   Reply With Quote
Old 2003-09-15, 18:17   #2
NickGlover
 
NickGlover's Avatar
 
Aug 2002
Richland, WA

22×3×11 Posts
Default Re: Error rate for LL tests

Quote:
Originally posted by GP2
We also note:
So far, there is no evidence that error rates are increasing for larger exponents. The error rate remains steady around 3.5% - 4.0% over a broad range of exponents. Larger exponents have longer run times and thus we might expect more errors, but on the other hand newer machines run Windows XP and other modern operating systems with much better memory protection. So perhaps these effects cancel each other.
I don't think we can assume the error rate is not increasing based on the data. We should only consider ranges where all exponents have been double-checked. For ranges where this is not the case (7M to 10M), the error rates could be either artificially low or artificially high, so I think it is difficult to make a conclusion about them. Also I believe George significantly improved the error checking with one version of Prime95/mprime, so we would expect an improvement in the error rate for ranges that where checked more with newer version. This would not stop the error rate from continuing to go up for later ranges.
NickGlover is offline   Reply With Quote
Old 2003-09-15, 22:14   #3
GP2
 
GP2's Avatar
 
Sep 2003

50238 Posts
Default Re: Re: Error rate for LL tests

Quote:
Originally posted by NickGlover
I don't think we can assume the error rate is not increasing based on the data. We should only consider ranges where all exponents have been double-checked. For ranges where this is not the case (7M to 10M), the error rates could be either artificially low or artificially high, so I think it is difficult to make a conclusion about them.
Well, my calculations take into account only exponents that have had at least two LL tests done. As outlined in the first post in this thread, I believe we can draw fairly accurate conclusions about error rates for such exponents whether or not a matching residue was found.

In the range 7M-8M there are only 60 exponents that have never had at least two LL tests done. In the range 8M-9M, there are only 525 such exponents, and in the range 9M-10M, there are 5791 such exponents. So arguably, only the 9M-10M error rate could be expected to change much over time.


For higher exponents, the rates are artificially high because results returned with a nonzero error code get double-checked several years sooner than results returned with a zero error code. That is because the server immediately reassigns such nonzero-error-code results for another "first-time" LL test.

However, as soon as the leading edge of double-checking (currently around 10.1M) arrives, all those lagging double-checks of zero-error-code results finally end up getting done and the ratio gets back into proper balance.

For this reason, I'd argue that for anything below about 0.5M less than the leading edge of double-checking, we already have a fairly accurate estimate of error rate.


Quote:

Also I believe George significantly improved the error checking with one version of Prime95/mprime, so we would expect an improvement in the error rate for ranges that where checked more with newer version. This would not stop the error rate from continuing to go up for later ranges.
That's a valid point. And also Windows NT/2000/XP machines have much better protection from different processes overwriting each other's memory than older machines using Windows 3.1/95/98, which is another thing that affects error rates.


It's unfortunate that the server behavior which is optimized for detecting bad results as quickly as possible also makes it very difficult to estimate error rates for the leading edge of first-time LL tests.
GP2 is offline   Reply With Quote
Old 2003-09-15, 22:18   #4
GP2
 
GP2's Avatar
 
Sep 2003

2,579 Posts
Default

To summarize, the algorithm I use is:

If an exponent has had 1 LL test done:
- We can't draw any conclusions.

If an exponent has had (N > 1) LL tests done, with a match:
- We know exactly how many of the N tests are good and how many are bad

If an exponent has had (N > 1) LL tests done, with no match:
- We know that at least N-1 of the tests are bad.
- Assume N-1 bad and 1 good, because that's much more likely than all N bad.
GP2 is offline   Reply With Quote
Old 2003-09-15, 23:26   #5
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11011011001012 Posts
Default

The error checking has not improved much since way back. There have been some changes around the edges: more conservative FFT lengths, round-off checking every iteration if near an FFT limit, tolerating roundoff errors up to 0.6, etc.

Also, I think the error rate is likely to remain fairly constant because computers are getting faster at roughly the same rate as the difficulty of running an LL test. That is, a 10 million first time test 3 years ago probably took as much elapsed time as a 20 million first time test today.
Prime95 is offline   Reply With Quote
Old 2003-09-15, 23:34   #6
NickGlover
 
NickGlover's Avatar
 
Aug 2002
Richland, WA

22·3·11 Posts
Default

I understand the algorithm you are using and I agree that is fairly accurate, but I'm not convinced it is accurate enough to say that the error rate is not still increasing with exponent size. I'd be willing to concede that the error rates for the 7M and 8M ranges are probably not going to change very much, but I don't see how we can conclude that the error rate for the 9M range is definitely not going to end up greater than 4%.

I just don't trust this type of prediction when there may be a bias one way or the other with the exponents that have had enough tests run on them to be used in your data.

I do think it is likely that the error rates will likely level off/drop over time simply because:
(1) George has improved Prime95/mprime error checking over time.
(2) I think error rates are mostly a function of runtime for an exponent, and I think average runtimes are levelling off if not dropping over time (which was not the case early in the project's history).

However, it is possible that these factors may be countered (at least in the 7M to 20M ranges) by the fact that processors in the last few years have been running hotter than they did in the past due to greater competition among the CPU makers.
NickGlover is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Error rate plot patrik Data 109 2020-01-09 18:43
error rate and mitigation ixfd64 Hardware 4 2011-04-12 02:14
Faster LL tests, less error checking? Prime95 Software 68 2010-12-31 00:06
EFF prize and error rate S485122 PrimeNet 15 2009-01-16 11:27
What ( if tracked ) is the error rate for Trial Factoring dsouza123 Data 6 2003-10-23 22:26

All times are UTC. The time now is 10:54.

Thu Aug 6 10:54:15 UTC 2020 up 20 days, 6:41, 1 user, load averages: 1.36, 1.72, 1.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.