![]() |
[QUOTE=frmky;411445]Another strategy would be to triple check a few numbers that Madpoo has done a DC without a match. If you don't match his, then you likely have a problem.
[url]http://www.mersenne.org/report_ll/?exp_lo=34000000&exp_hi=36000000&exp_date=&user_only=1&user_id=Madpoo&exdchk=1&exbad=1&exfactor=1&B1=[/url][/QUOTE] That got me to looking at my history so far... I have 261 results now where my test didn't match any others and it's still unverified. 219 of them are unassigned, so if anyone really felt like triple-checking any of those for fun, go for it. But just realize that for the purposes of this little experiment, I'm assuming my results are always right. :smile: I would be terribly embarrassed if, later on, it turns out some of my results were actually wrong, once they do actually get triple-checked. So far though, my total stats are: Good (verified) = 26531 Unknown = 555 Factored later = 405 Bad = 3 (and I can explain that, really) Suspect = 0 The 3 bad results are, well... weird. When I was doing triple-checks of exponents below 1M, for whatever strange reason 3 of them gave me mismatched residues. I re-ran all 3 *on the same machine* and each time they then came up with the correct residue. I have no real good answer for that actually... I attribute it to some funkiness with really small FFT sizes and the fact that I was doing super small exponents on many-threaded workers, and it was actually the code to blame, not the machine (sorry George, I just threw you under the bus there...LOL). The residues only seemed to match the last 33 bits or so of the actual verified residue... the other bits were 0 for whatever reason. I'd call that a program bug? If you're curious, the 3 exponents are: [URL="http://www.mersenne.org/M8291"]M8291[/URL] [URL="http://www.mersenne.org/M12281"]M12281[/URL] [URL="http://www.mersenne.org/M801883"]M801883[/URL] |
[QUOTE=Madpoo;411517]
I have no real good answer for that actually... I attribute it to some funkiness with really small FFT sizes and the fact that I was doing super small exponents on many-threaded workers, and it was actually the code to blame, not the machine (sorry George, I just threw you under the bus there...LOL). The residues only seemed to match the last 33 bits or so of the actual verified residue... the other bits were 0 for whatever reason. I'd call that a program bug?[/QUOTE] That is weird. First, I'd say the FFTs were done correctly and the problem occurred in generating the residue. Second, I'd point out that the first two exponents are so small that you were not running multi-threaded FFTs (only two-pass FFTs are multi-threaded). Have you tried running LL on M8291 a hundred times to see if it happens again? These were all on the same machine at the same time? Did you exit and restart prime95 before the correct reruns? I'll look at the code to see if I can imagine any way the residue creation code could exit prematurely. |
Nevermind the questions. You have found a real bug!!
If the final shift count is more than (exponent - 64), then the top (64 - (exponent - shiftcount)) bits are zeroed. I'll code up a fix. The chance this is affecting existing LL tests is small. For exponents around 64M, 1 in 1,000,000 LL tests will be affected. I'll query the database to get us a list affected LL tests. |
[QUOTE=Prime95;411519]That is weird. First, I'd say the FFTs were done correctly and the problem occurred in generating the residue. Second, I'd point out that the first two exponents are so small that you were not running multi-threaded FFTs (only two-pass FFTs are multi-threaded).
Have you tried running LL on M8291 a hundred times to see if it happens again? These were all on the same machine at the same time? Did you exit and restart prime95 before the correct reruns? I'll look at the code to see if I can imagine any way the residue creation code could exit prematurely.[/QUOTE] Yeah, I'm a little puzzled by it. Actually, upon reflection, I think on these sub 1M exponents, I may have set them all up to run on a single thread for each worker. M8291 probably only took a few seconds to run even then. Still, out of the ~ 20K tests I did of exponents below 2M, with only 3 of them being weird like this, that's not too bad? :smile: I only ran it one more time and got the correct result after which I moved on. I could try it again, repeatedly in some way (add the same worktodo entry many times over) and see if any of the runs get another funky residue. Maybe I'll get that setup in a little bit, let it chunk through a couple hundred times and see what happens. By the way, apparently I lied. All 3 were originally run on the same machine (madpoo6), and then tested again on a different machine the second time around (same different machine, madpoo8). The machine that gave the truncated residues was a dual 6-core server, and the one that gave the correct residues is a dual 10-core server. I still have that old 6-core box...it got moved around but I still have access to it for running a sanity test as mentioned. |
Update: The bug has not always existed. There are a dozen or two pre-2008 tests between 2M and 31M that should have been affected but are not.
Only one LL test was affected: [url]http://www.mersenne.org/report_exponent/?exp_lo=37830997&exp_hi=&full=1[/url] |
[QUOTE=Madpoo;411521]I still have that old 6-core box...it got moved around but I still have access to it for running a sanity test as mentioned.[/QUOTE]
Okay, on that same server, I ran this 1000 times. Prime95 setup to use a single worker with just one thread: DoubleCheck=8291,62,1 It does not take long at all, if anyone wants to try this at home. Takes < 1 minute. Out of 1000 results, 995 have the correct residue of: 75B8C8A553773232 The other 5 weird residues are (I'm including the full line in case the shift-count has any part): [CODE] 58th attempt = UID: madpoo/madpoo1c, M8291 is not prime. Res64: 15B8C8A553773232. We4: 29F62AB6,8230,00000000 452nd attempt = UID: madpoo/madpoo1c, M8291 is not prime. Res64: 0000000000173232. We4: 40D162CB,8270,00000000 657th attempt = UID: madpoo/madpoo1c, M8291 is not prime. Res64: 0038C8A553773232. We4: 13762B4F,8237,00000000 732nd attempt = UID: madpoo/madpoo1c, M8291 is not prime. Res64: 0000000000013232. We4: 40C762CF,8274,00000000 824th attempt = UID: madpoo/madpoo1c, M8291 is not prime. Res64: 0000000000073232. We4: 40C162C9,8272,00000000[/CODE] For fun, I did the same experiment on the other system (the dual 10-core box). Same setup, just 1 worker with one thread, running this same check 1000 times in a row. Again, it's super quick so anyone could do this. It also missed 5 out of the 1000: [CODE]103rd attempt = UID: madpoo/madpoo8, M8291 is not prime. Res64: 0000000003773232. We4: 43B16231,8264,00000000 206th attempt = UID: madpoo/madpoo8, M8291 is not prime. Res64: 0000000000000002. We4: 40C61029,8288,00000000 480th attempt = UID: madpoo/madpoo8, M8291 is not prime. Res64: 0000C8A553773232. We4: 13BE2B43,8241,00000000 514th attempt = UID: madpoo/madpoo8, M8291 is not prime. Res64: 0000000153773232. We4: 13B1623E,8258,00000000 851st attempt = UID: madpoo/madpoo8, M8291 is not prime. Res64: 0000002553773232. We4: 13B162D8,8252,00000000[/CODE] That 206th attempt... residue of 0x2 ... hmm... either it dropped a bunch of the actual residue or it was stuck in that 0x2 loop. |
[QUOTE=Prime95;411520]Nevermind the questions. You have found a real bug!![/QUOTE]
Hooray! What do I win? :smile: Can I remove the "bad" status on those 3? Just kidding... I'll leave them there to keep me humble. |
[QUOTE=Prime95;411522]Update: The bug has not always existed. There are a dozen or two pre-2008 tests between 2M and 31M that should have been affected but are not.
Only one LL test was affected: [url]http://www.mersenne.org/report_exponent/?exp_lo=37830997&exp_hi=&full=1[/url][/QUOTE] Nice catch to find that bug. Why is it "Andrew Daniels" is visible in the LL section but is ANONYMOUS in the history section? It was explained once before, but I forgot the reason. |
[QUOTE=ATH;411525]Nice catch to find that bug.
Why is it "Andrew Daniels" is visible in the LL section but is ANONYMOUS in the history section? It was explained once before, but I forgot the reason.[/QUOTE] The name in the "LL" section and the name in the "History" section come from different tables in the data. I think it has to do with the LL results table including the old v4 name that was used at the time it got checked in. All of the LL stuff moved over fairly well during the v4 to v5 update. The history section, on the other hand, looks at the actual log of messages that came in from the client. Until recently it didn't include any of the v4 messages at all, and now that it does, you'll see "Anonymous" for most entries where the v4 user never created a v5 account and linked them up. Something like that. I'm trying to remember if we actually do pull the v5 username even if they did link to a v4 account, but you get the idea. |
[QUOTE=Madpoo;411524]Can I remove the "bad" status on those 3?[/QUOTE]
Yeah, you do that [U]after[/U] you remove my bad results from mid of january to end of march 2012 - which is almost all my list of bad results; at that time cudaLucas was switching from "powers of 2 FFT only" to "non powers of 2 too", and I was the main tester, all those "bad" results were software bugs :razz: |
[QUOTE=LaurV;411545]Yeah, you do that [U]after[/U] you remove my bad results from mid of january to end of march 2012 - which is almost all my list of bad results; at that time cudaLucas was switching from "powers of 2 FFT only" to "non powers of 2 too", and I was the main tester, all those "bad" results were software bugs :razz:[/QUOTE]
Wear those bad results as a badge of honor! You earned them through your selfless dedication to testing new software. :smile: I'm keeping mine... after all, technically they are bad residues, whether it was a software or hardware issue. LOL And then we have a ready excuse whenever someone points out that we have some bad stuff in our past. |
| All times are UTC. The time now is 22:49. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.