mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   M40, what went wrong? (https://www.mersenneforum.org/showthread.php?t=672)

ewmayer 2003-06-13 17:31

[quote="jeff8765"]So have the tests already been finished?[/quote]

Yes, George's test (using Prime95 on his ~2.5GHz P4) and Guillermo's (using Glucas in multithreaded mode on a dual-CPU 1.1GHz Itanium, which was thus running at about the same speed as George's Prime95 run) have both completed, with matching nonzero Res64s. I'll leave any announcement of the precise exponent to George, but will say that it was slightly less than 17M.

ewmayer 2003-06-13 17:37

Without being too cute (OK, well perhaps somewhat cute), the candidate was very nearly the same size as the 24th Fermat number, which latter number I have some personal experience with.

It's up to George to decide whether he wants to divulge the actual exponent.

trif 2003-06-13 18:35

I don't think the exact exponent should be announced, as that might encourage fame seekers to falsify reports, and the poor sod who thought he had a prime might not want to be identified. I know if I had what I thought was a prime find that turned out to be composite, I would want it swept under the rug as quickly as possible. :D

Prime95 2003-06-13 19:13

It is official - not prime. Both Guillermo's Mlucas run and my prime95 run return a matching non-zero residue.

We will never know what caused prime95 to generate a false positive, I am studying the code for ways in which a memory corruption could cause this. Something good will come from this sorry ending.

I am confidant this incident will have little negative impact on GIMPS. While the episode was probably an unhappy roller-coaster ride for one individual, the false positive problem is far less damaging to GIMPS than the version 17 shift bug disaster.

This incident illustrates why most other distributed projects keep any client finds secret (even from the discoverer) until verified. If we had a similar policy this could have been swept under the rug and no one would ever have known. I kind of like our policy though. It lets everyone in on the ups and downs of the project.

Thanks to Guillermo and Ernst for dedicating time to the verification run.

Prime95 2003-06-13 19:19

M40, what went wrong?
 
Warning, technical talk follows:

It was once thought that a false positive report "couldn't happen". So what went wrong?

So far I've come up with two possibilities.

1) The FFT data is zeroed AND does not go into the -2,2,2,2 loop. I don't know how the data gets zeroed, but I've seen it happen. It has happened a lot less often once code was added to reject any save file with all zero data. The not going into the -2,2,2,2 loop can happen with the corruption of a single local variable. I've just added code that makes sure this local variable is always in the range 0 <= variable < exponent.

2) This case results from the way my C compiler treats floating point NaN. NaN stands for not a number. If NaN is converted to an integer, the integer is zero. So if the FFT data is all NaNs, prime95 will report a prime. Prime95 check for NaNs every iteration, but if every FFT data value becomes NaN after the inverse FFT of the last LL iteration, then we get a false positive. Furthermore, corrupting a single value (the initial carry input to rounding and carry propogation code) could set every FFT data value to NaN. I've fixed the code to make sure there are no NaNs in the final is-it-a-prime check.

Which of the above is more likely? I don't know. It seems the first requires two or more pieces of memory corrupted, but it leads to a steady state. So the errors can occur at any point during the LL test. The second case requires only one failure, but at a very specific point in time.

asdf 2003-06-13 19:31

If any number is a prime, the LL will end with a 0. Because this is a false positive, therefore the last number would have been a 0. If it zeroed, then it would have been a 2 at the very end, and not a 0. How could zeroing cause a false positive? (unless it was really really bad luck)

philmoore 2003-06-13 19:37

In case of a zero residue in the low-order 64 bits, does the program check the other n million bits as well? Granted, the chance of a non-zero residue having the 64 low-order bits all zero is 1 in 2^64, about 1 in 1.84x10^19, so we wouldn't expect it to happen for a long, long time, but it isn't impossible!

Prime95 2003-06-13 19:37

[quote="asdf"]If any number is a prime, the LL will end with a 0. Because this is a false positive, therefore the last number would have been a 0. If it zeroed, then it would have been a 2 at the very end, and not a 0. How could zeroing cause a false positive? (unless it was really really bad luck)[/quote]

Case three causes the subtract two to not take place. Thus, if the fft data was zeroed (and that key variable was trashed) then you would get: 0^2 = 0, 0^2 = 0, 0^2 = 0, ...

Case four assumes the fft data is zeroed after the last LL iteration (including the subtract two), but before the is-this-a-prime-result code is executed.

Prime95 2003-06-13 19:38

[quote="philmoore"]In case of a zero residue in the low-order 64 bits, does the program check the other n million bits as well? Granted, the chance of a non-zero residue having the 64 low-order bits all zero is 1 in 2^64, about 1 in 1.84x10^19, so we wouldn't expect it to happen for a long, long time, but it isn't impossible![/quote]

Yes, every FFT word is checked.

Prime95 2003-06-13 19:42

[quote="ewmayer"]It's up to George to decide whether he wants to divulge the actual exponent.[/quote]

Maybe we need a new poll???? Or maybe we could announce it if everyone promises not to go back to an old cleared results report to see who submitted the result. I don't really want to add to his frustrating experience!

jeff8765 2003-06-13 19:44

Would it be possible for the result to be checked every million iterations or so and if the result is zero then throw out the test. Because if the result of any iteration is zero and the client is working wouldnt all of the following results be -2 anyway. That way if the client zeroed and was no longer subtracting 2 it would be caught after a million iterations. However, I do not know how much time would be wasted checking for a zero result.


All times are UTC. The time now is 20:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.