mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Data (https://www.mersenneforum.org/forumdisplay.php?f=21)
-   -   Error rate for LL tests (https://www.mersenneforum.org/showthread.php?t=1116)

GP2 2003-09-15 17:12

Error rate for LL tests
 
We estimate error rate as follows:

- Every single line in BAD is a separate verified-bad result
- Every single line in LUCAS_V.TXT is a separate verified-good result.
- Lines in HRF3.TXT handled as described below:

The file HRF3.TXT contains unverified results (only one LL test, or more than one but with non-matching double-checks). How do we estimate the error rate for these results?

Any exponent that occurs only once must be ignored: we have no idea whether it is a good or a bad result. However, when an exponent occurs N times (in N separate lines of HRF3.TXT), we know for sure that there are N distinct non-matching residues returned (otherwise there would have been a match and the results would have been removed from HRF3.TXT and moved to the files BAD and LUCAS_V.TXT), and therefore at least N-1 of them must be bad, and the remaining one could be good or bad.

The odds are, that remaining one result is good (only we don't yet know which of the N it is). After all, the error rate is relatively low, so the odds of N-1 bad + 1 good are much larger than the odds of all N bad.

In the most common case of 2 separate lines in HRF3.TXT for the same exponent, in most cases one will be good and one will be bad and a triple-check will sort out which is which.

So, to summarize:

- If an exponent occurs in only one line in HRF3.TXT, ignore it.
- If an exponent occurs in N separate lines in HRF3.TXT, assume one good result and N-1 bad results.

Error rates for the various exponent ranges are:
[Sept 9 2003 data]

[code]
0 - 999,999 (163+0-0)/(163+0+70581) = .002
1,000,000 - 1,999,999 (718+0-0)/(718+0+58971) = .012
2,000,000 - 2,999,999 (1203+0-0)/(1203+0+54591) = .021
3,000,000 - 3,999,999 (1465+0-0)/(1465+0+52939) = .026
4,000,000 - 4,999,999 (1837+0-0)/(1837+0+51026) = .034
5,000,000 - 5,999,999 (1905+0-0)/(1905+0+49346) = .037
6,000,000 - 6,999,999 (1804+0-0)/(1804+0+49253) = .035
7,000,000 - 7,999,999 (1956+27-12)/(1956+27+47579) = .039
8,000,000 - 8,999,999 (1612+500-235)/(1612+500+45865) = .039
9,000,000 - 9,999,999 (625+1312-639)/(625+1312+33724) = .036
10,000,000 - 10,999,999 (53+1369-672)/(53+1369+2978) = .170
11,000,000 - 11,999,999 (50+1384-679)/(50+1384+1993) = .220
12,000,000 - 12,999,999 (31+1819-895)/(31+1819+1415) = .292
13,000,000 - 13,999,999 (33+1611-798)/(33+1611+1392) = .278
14,000,000 - 14,999,999 (4+1541-764)/(4+1541+1172) = .287
15,000,000 - 15,999,999 (2+1091-541)/(2+1091+796) = .292
16,000,000 - 16,999,999 (0+757-375)/(0+757+598) = .281
17,000,000 - 17,999,999 (0+134-67)/(0+134+233) = .182
18,000,000 - 18,999,999 (0+86-43)/(0+86+174) = .165
19,000,000 - 19,999,999 (2+32-16)/(2+32+40) = .243
20,000,000 - 20,999,999 (1+2-1)/(1+2+14) = .117
[/code]

We note:
- The results for the low exponents have very low error rates. Maybe this is because the run time is very short, or maybe for such old results the bad results were purged or not recorded.
- The results for the higher exponents are artificially high. This is because when the server gets a result returned with a nonzero error code, it automatically reassigns that exponent for another first-time LL test without waiting a couple of years for regular double-checking to catch up to the current first-time range. Thus, a significant fraction of bad results are caught much sooner, but good results are not verified until perhaps years later.

Note: it is possible for a nonzero error code to still yield a good result and it is possible for a zero error code to yield a bad result. See the [url=http://www.mersenneforum.org/showthread.php?s=&threadid=1111]Most popular error codes[/url] thread.

Since the current leading edge of double checking is around 10.1M, all error rates above this are artificially high for the time being.

We also note:
So far, there is no evidence that error rates are increasing for larger exponents. The error rate remains steady around 3.5% - 4.0% over a broad range of exponents. Larger exponents have longer run times and thus we might expect more errors, but on the other hand newer machines run Windows XP and other modern operating systems with much better memory protection. So perhaps these effects cancel each other.

Note that this error rate of 3.5% - 4.0% is an average over all users and computers. Some computers have a 0% error rate, others have a high double-digit error rate. This depends on hardware issues, memory quality, CPU temperature, etc.

Finally, we might ask, what do we get if we only consider results returned by programs Wxx (George Woltman's Prime95/mprime) and ignore results returned by other programs? The answer is: almost exactly the same.

Error rates for the various exponent ranges, taking into account only results returned by programs Wxx (George Woltman), are:
[Sept 9 2003 data]

[code]
0 - 999,999 (85+0-0)/(85+0+30303) = .002
1,000,000 - 1,999,999 (552+0-0)/(552+0+45733) = .011
2,000,000 - 2,999,999 (1171+0-0)/(1171+0+52421) = .021
3,000,000 - 3,999,999 (1445+0-0)/(1445+0+52267) = .026
4,000,000 - 4,999,999 (1779+0-0)/(1779+0+49024) = .035
5,000,000 - 5,999,999 (1891+0-0)/(1891+0+48521) = .037
6,000,000 - 6,999,999 (1793+0-0)/(1793+0+47982) = .036
7,000,000 - 7,999,999 (1945+27-12)/(1945+27+46770) = .040
8,000,000 - 8,999,999 (1602+500-235)/(1602+500+45616) = .039
9,000,000 - 9,999,999 (622+1312-639)/(622+1312+33397) = .036
10,000,000 - 10,999,999 (53+1369-672)/(53+1369+2964) = .170
11,000,000 - 11,999,999 (49+1384-679)/(49+1384+1984) = .220
12,000,000 - 12,999,999 (30+1819-895)/(30+1819+1405) = .293
13,000,000 - 13,999,999 (33+1611-798)/(33+1611+1330) = .284
14,000,000 - 14,999,999 (4+1541-764)/(4+1541+1147) = .290
15,000,000 - 15,999,999 (1+1091-541)/(1+1091+781) = .294
16,000,000 - 16,999,999 (0+757-375)/(0+757+581) = .285
17,000,000 - 17,999,999 (0+134-67)/(0+134+225) = .186
18,000,000 - 18,999,999 (0+86-43)/(0+86+168) = .169
19,000,000 - 19,999,999 (2+32-16)/(2+32+40) = .243
20,000,000 - 20,999,999 (1+2-1)/(1+2+12) = .133
[/code]

NickGlover 2003-09-15 18:17

Re: Error rate for LL tests
 
[QUOTE][i]Originally posted by GP2 [/i]
We also note:
So far, there is no evidence that error rates are increasing for larger exponents. The error rate remains steady around 3.5% - 4.0% over a broad range of exponents. Larger exponents have longer run times and thus we might expect more errors, but on the other hand newer machines run Windows XP and other modern operating systems with much better memory protection. So perhaps these effects cancel each other.[/QUOTE]

I don't think we can assume the error rate is not increasing based on the data. We should only consider ranges where all exponents have been double-checked. For ranges where this is not the case (7M to 10M), the error rates could be either artificially low or artificially high, so I think it is difficult to make a conclusion about them. Also I believe George significantly improved the error checking with one version of Prime95/mprime, so we would expect an improvement in the error rate for ranges that where checked more with newer version. This would not stop the error rate from continuing to go up for later ranges.

GP2 2003-09-15 22:14

Re: Re: Error rate for LL tests
 
[QUOTE][i]Originally posted by NickGlover [/i]
[B]I don't think we can assume the error rate is not increasing based on the data. We should only consider ranges where all exponents have been double-checked. For ranges where this is not the case (7M to 10M), the error rates could be either artificially low or artificially high, so I think it is difficult to make a conclusion about them.[/B][/QUOTE]

Well, my calculations take into account only exponents that have had at least two LL tests done. As outlined in the first post in this thread, I believe we can draw fairly accurate conclusions about error rates for such exponents [i]whether or not a matching residue was found[/i].

In the range 7M-8M there are only 60 exponents that have never had at least two LL tests done. In the range 8M-9M, there are only 525 such exponents, and in the range 9M-10M, there are 5791 such exponents. So arguably, only the 9M-10M error rate could be expected to change much over time.


For higher exponents, the rates are artificially high because results returned with a nonzero error code get double-checked several years sooner than results returned with a zero error code. That is because the server immediately reassigns such nonzero-error-code results for another "first-time" LL test.

However, as soon as the leading edge of double-checking (currently around 10.1M) arrives, all those lagging double-checks of zero-error-code results finally end up getting done and the ratio gets back into proper balance.

For this reason, I'd argue that for anything below about 0.5M less than the leading edge of double-checking, we already have a fairly accurate estimate of error rate.


[QUOTE][B]
Also I believe George significantly improved the error checking with one version of Prime95/mprime, so we would expect an improvement in the error rate for ranges that where checked more with newer version. This would not stop the error rate from continuing to go up for later ranges. [/B][/QUOTE]

That's a valid point. And also Windows NT/2000/XP machines have much better protection from different processes overwriting each other's memory than older machines using Windows 3.1/95/98, which is another thing that affects error rates.


It's unfortunate that the server behavior which is optimized for detecting bad results as quickly as possible also makes it very difficult to estimate error rates for the leading edge of first-time LL tests.

GP2 2003-09-15 22:18

To summarize, the algorithm I use is:

If an exponent has had 1 LL test done:
- We can't draw any conclusions.

If an exponent has had (N > 1) LL tests done, with a match:
- We know exactly how many of the N tests are good and how many are bad

If an exponent has had (N > 1) LL tests done, with no match:
- We know that at least N-1 of the tests are bad.
- Assume N-1 bad and 1 good, because that's much more likely than all N bad.

Prime95 2003-09-15 23:26

The error checking has not improved much since way back. There have been some changes around the edges: more conservative FFT lengths, round-off checking every iteration if near an FFT limit, tolerating roundoff errors up to 0.6, etc.

Also, I think the error rate is likely to remain fairly constant because computers are getting faster at roughly the same rate as the difficulty of running an LL test. That is, a 10 million first time test 3 years ago probably took as much elapsed time as a 20 million first time test today.

NickGlover 2003-09-15 23:34

I understand the algorithm you are using and I agree that is fairly accurate, but I'm not convinced it is accurate enough to say that the error rate is not still increasing with exponent size. I'd be willing to concede that the error rates for the 7M and 8M ranges are probably not going to change very much, but I don't see how we can conclude that the error rate for the 9M range is definitely not going to end up greater than 4%.

I just don't trust this type of prediction when there may be a bias one way or the other with the exponents that have had enough tests run on them to be used in your data.

I do think it is likely that the error rates will likely level off/drop over time simply because:
(1) George has improved Prime95/mprime error checking over time.
(2) I think error rates are mostly a function of runtime for an exponent, and I think average runtimes are levelling off if not dropping over time (which was not the case early in the project's history).

However, it is possible that these factors may be countered (at least in the 7M to 20M ranges) by the fact that processors in the last few years have been running hotter than they did in the past due to greater competition among the CPU makers.


All times are UTC. The time now is 15:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.