mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Bug with LLR !! (https://www.mersenneforum.org/showthread.php?t=12950)

T.Rex 2010-01-05 13:56

Bug with LLR !!
 
Hi,

I've found a bug with [URL="http://pagesperso-orange.fr/jean.penne/index2.html"]LLR[/URL] (Lucas-Lehmer-Riesel) version 3.7.2 . :help:

Given a specific Wagstaff exponent, I get a different residue each time I test it on Intel HW (Core2 and Xeon)
Double-check on Intel HW over about 1200 other exponents is OK.
Verification of this exponent on Opteron is OK (2 identical residues for 2 checks).

So, it seems that some range of exponents may lead to a wrong random Residue.

Jean Penné does not answer to emails since a while (I think is 75 and he may have stopped reading emails. I'll try to get news from him).

Since LLR makes use of prime95 core code, there is also the possibility that the bug is common with prime95 (at least with an old version of Prime95).

I need help from LLR and Prime95 experts.

Who can help ?

Thanks,

Tony

ldesnogu 2010-01-05 15:43

Sorry if this is a stupid idea: what about picking the sources, replace gwnum with its latest source, recompile and test again? That could at least make a gwnum bug less probable.

rogue 2010-01-05 15:50

[QUOTE=T.Rex;200937]Hi,

I've found a bug with [URL="http://pagesperso-orange.fr/jean.penne/index2.html"]LLR[/URL] (Lucas-Lehmer-Riesel) version 3.7.2 . :help:

Given a specific Wagstaff exponent, I get a different residue each time I test it on Intel HW (Core2 and Xeon)
Double-check on Intel HW over about 1200 other exponents is OK.
Verification of this exponent on Opteron is OK (2 identical residues for 2 checks).

So, it seems that some range of exponents may lead to a wrong random Residue.

Jean Penné does not answer to emails since a while (I think is 75 and he may have stopped reading emails. I'll try to get news from him).

Since LLR makes use of prime95 core code, there is also the possibility that the bug is common with prime95 (at least with an old version of Prime95).

I need help from LLR and Prime95 experts.

Who can help ?

Thanks,

Tony[/QUOTE]

Run the test with PFGW. If you get consistent residues, then the problem is with LLR. If not, then it is with gwnum.

diep 2010-01-05 15:57

[QUOTE=rogue;200943]Run the test with PFGW. If you get consistent residues, then the problem is with LLR. If not, then it is with gwnum.[/QUOTE]

Could be bug in CPU also that pops up. There is like 250 bugs found in core2's later on, bios cannot fix them all.

ldesnogu 2010-01-05 16:26

[quote=diep;200944]Could be bug in CPU also that pops up. There is like 250 bugs found in core2's later on, bios cannot fix them all.[/quote]
That's not impossible, but the probability is extremely low: after all did Prime95 hit any CPU computation bug? And LLR is using the same code base.

Also aren't most of the 250 bugs (BTW I only found about 100 in Intel doc) very specific to system? I mean I'm not aware of a bug such as the infamous DIV bug found a few years ago.

henryzz 2010-01-05 16:37

Is there a way to forcefully make the FFT length higher to hopefully get a correct result?

kar_bon 2010-01-05 16:38

Maybe another reason:

The EXE for V3.7.2 is from 2008-09-13 and from V3.7.1c from 2009-05-17!

I've send Yves in December 2008 an issue about residues with V3.7.1c with small n-values,
the example was a twin: 7945335*2^5426+/-1

With the 'old' version i got this for example:
7945335*2^5426+1 is not prime. Proth RES64: D634636A24EF9A52

After his corrections it's all ok with:
7945335*2^5426-1 is prime!
7945335*2^5426+1 is prime!

So there's a difference in the LLR versions: V3.7.2 is not corrected to that issue i think!

T.Rex 2010-01-05 16:46

[QUOTE=ldesnogu;200942]Sorry if this is a stupid idea: what about picking the sources, replace gwnum with its latest source, recompile and test again? That could at least make a gwnum bug less probable.[/QUOTE]Not stupid at all. However, I know nothing about how LLR is built. There is gwnum in source259.zip of Prime95. But there are so many files... Aren't there people who already know how to upgrade LLR with a fresher version of gwnum ?
Tony

T.Rex 2010-01-05 16:49

[QUOTE=rogue;200943]Run the test with PFGW. If you get consistent residues, then the problem is with LLR. If not, then it is with gwnum.[/QUOTE]How do you ask PFGW to output a residue ?
Tony

rogue 2010-01-05 16:56

[QUOTE=T.Rex;200952]How do you ask PFGW to output a residue ?
Tony[/QUOTE]

If you are not using the -t switch, it will always output the residue, but the base 2 residues with PFGW will not match those from LLR. Other bases will match. My point was that if you can run base 2 tests with PFGW and get consistent residues, then the issue is with LLR and not gwnum.

[QUOTE=henryzz;200949]Is there a way to forcefully make the FFT length higher to hopefully get a correct result?[/QUOTE]

Use -a1 or -a2 with PFGW

T.Rex 2010-01-05 16:57

[QUOTE=diep;200944]Could be bug in CPU also that pops up. There is like 250 bugs found in core2's later on, bios cannot fix them all.[/QUOTE]The bug also appears on Intel Xeon. I will test ASAP on Nehalem.
Tony

T.Rex 2010-01-05 17:12

[QUOTE=henryzz;200949]Is there a way to forcefully make the FFT length higher to hopefully get a correct result?[/QUOTE]Hummm. Reading the (very short) documentation given with LLR, we cannot. Maybe LLR can understand the .ini options of Prime95 ? I really don't know.
Who knows about LLR ?
Tony

T.Rex 2010-01-05 17:14

[QUOTE=kar_bon;200950]Maybe another reason:
...
So there's a difference in the LLR versions: V3.7.2 is not corrected to that issue i think![/QUOTE]So, who is Yves ? and who is maintaining LLR now ?
Tony

kar_bon 2010-01-05 17:15

[QUOTE=T.Rex;200957]So, who is Yves ? and who is maintaining LLR now ?
Tony[/QUOTE]

oh, sorry, i meant Jean!

T.Rex 2010-01-05 17:24

[QUOTE=rogue;200954]If you are not ....[/QUOTE]Hummm The help of last version of PFGW does not work (file pfgw.txt missing.
How do you do: pfgw -t -q"(2^42737+1)/3" without -t ? (I even don't know what -t means !!)
Isn't there a website dedicated to pfgw where there are explanations ?

So pfgw and LLR share the same gwnum library ?

T.

kar_bon 2010-01-05 17:26

just new version [url=http://openpfgw.svn.sourceforge.net/viewvc/openpfgw/pfgw_win_3.2.6_20100105.zip?view=log]here[/url]

docs included in the ZIP-file.

T.Rex 2010-01-05 17:42

[QUOTE=kar_bon;200961]just new version [url=http://openpfgw.svn.sourceforge.net/viewvc/openpfgw/pfgw_win_3.2.6_20100105.zip?view=log]here[/url]
docs included in the ZIP-file.[/QUOTE]I'm on Linux... However, the fgpw tool searches for a fgpw.txt file and there is a fgpwdoc.txt file. Is it the same ?
T.

rogue 2010-01-05 17:48

[QUOTE=T.Rex;200960]Hummm The help of last version of PFGW does not work (file pfgw.txt missing.
How do you do: pfgw -t -q"(2^42737+1)/3" without -t ? (I even don't know what -t means !!)
Isn't there a website dedicated to pfgw where there are explanations ?

So pfgw and LLR share the same gwnum library ?

T.[/QUOTE]

If you d/l'd the official release, it has a pfgwdoc.txt file included in the distribution. Remove -t to do a PRP test. -t will try to do a primality test.

PFGW and LLR both use gwnum, but different versions of it.

T.Rex 2010-01-06 12:30

PFGW 3.2.5 (gwnum 25.13) gives 2 times the same residue for the faulty exponent.

Jean is back !!

ATH 2010-01-06 14:23

[QUOTE=T.Rex;201042]PFGW 3.2.5 (gwnum 25.13) gives 2 times the same residue for the faulty exponent.

Jean is back !![/QUOTE]

Was it just one exponent or a range of exponents? You should check one exponent in Prime95, just to make sure there is no bug there, in worktodo.txt use:

PRP=1,2,<exponent>,1,0,0,"3"

Joe O 2010-01-11 21:19

[QUOTE=T.Rex;201042]
Jean is back !![/QUOTE]

Does this mean that you are in contact with him?

T.Rex 2010-01-11 21:31

[QUOTE=Joe O;201524]Does this mean that you are in contact with him?[/QUOTE]Yes. But Jean is not very talkative ! I think he is working hard on a new version of LLR. Maybe that the problems we have led him to investigate and improve LLR.

BTW, now I've found 5 exponents that generate random Residues through LLR. More are to come, I guess.

One of these days, I'll provide exponent examples. For now, my colleagues and myself keep that confidential.

Tony

ATH 2010-01-11 21:47

[QUOTE=T.Rex;201526]Yes. But Jean is not very talkative ! I think he is working hard on a new version of LLR. Maybe that the problems we have led him to investigate and improve LLR.

BTW, now I've found 5 exponents that generate random Residues through LLR. More are to come, I guess.

One of these days, I'll provide exponent examples. For now, my colleagues and myself keep that confidential.

Tony[/QUOTE]

Did you test any of the 5 in Prime95?

Kosmaj 2010-01-11 21:49

It will be nice to provide k/n pairs of tests producing wrong residues, so that we can test various versions of LLR on respective hardware. Why the secrecy? We have been uncovering bugs in LLR since 2005, always providing concrete k/n numerical values.

Thanks.

T.Rex 2010-01-11 22:12

[QUOTE=ATH;201527]Did you test any of the 5 in Prime95?[/QUOTE]No. Jean is using the Vrba-Reix PRP test for Wagstaff numbers. How could I do a test on such a number with Prime95 ?
Tony

ATH 2010-01-11 23:10

[QUOTE=T.Rex;201533]No. Jean is using the Vrba-Reix PRP test for Wagstaff numbers. How could I do a test on such a number with Prime95 ?
Tony[/QUOTE]

Vrba-Reix PRP test also use Prime95 gwnum code right? So it *could* be a bug in gwnum. You can test one of the 5 numbers in Prime95 PRP test. In worktodo.txt:
PRP=1,2,<exponent>,1,0,0,"3"

Then test again on another machine and see if the residues match.

jrk 2010-01-12 01:28

[QUOTE=T.Rex;201526]BTW, now I've found 5 exponents that generate random Residues through LLR. More are to come, I guess.[/QUOTE]
I'd guess that it is some sort of memory access problem (overflow, out-of-bounds array, etc) causing the random numbers to appear.

[QUOTE=T.Rex;201526]One of these days, I'll provide exponent examples. For now, my colleagues and myself keep that confidential.[/QUOTE]
So if you want to do all the experimenting for yourself, you should try running LLR with Valgrind to see if there is a memory access problem. Or I'm sure others here would like to, if you don't. But please give your results here if you find something interesting.

T.Rex 2010-01-12 14:13

[QUOTE=ATH;201547]Vrba-Reix PRP test also use Prime95 gwnum code right? So it *could* be a bug in gwnum. You can test one of the 5 numbers in Prime95 PRP test. In worktodo.txt:
PRP=1,2,<exponent>,1,0,0,"3"

Then test again on another machine and see if the residues match.[/QUOTE]We do not know yet if it is a HW or SW problem. There are today 11 exponents with random residues and possibly at least 500 unsure exposants.
We are doing comparison with AMD HW.
Then, we'll see if it is SW.
T.

T.Rex 2010-01-12 14:15

[QUOTE=jrk;201565]I'd guess that it is some sort of memory access problem (overflow, out-of-bounds array, etc) causing the random numbers to appear.[/QUOTE]Yes...
[QUOTE]
So if you want to do all the experimenting for yourself, you should try running LLR with Valgrind to see if there is a memory access problem. Or I'm sure others here would like to, if you don't. But please give your results here if you find something interesting.[/QUOTE]I'll first get some fresh news from Jean, in order to know if he is making progress on his side. I've used Valgrind in the past, 3 or 4 years ago !!
T.

Cruelty 2010-01-12 18:16

The LLR page you are referring to is different to what I use as reference: [URL="http://jpenne.free.fr/"]http://jpenne.free.fr/[/URL]. Also, as I don't see version 3.7.2, can you tell me which version exactly were you using?

ATH 2010-01-12 20:05

[QUOTE=Cruelty;201623]The LLR page you are referring to is different to what I use as reference: [URL="http://jpenne.free.fr/"]http://jpenne.free.fr/[/URL]. Also, as I don't see version 3.7.2, can you tell me which version exactly were you using?[/QUOTE]

[URL="http://jpenne.free.fr/Development/"]http://jpenne.free.fr/Development/[/URL]

T.Rex 2010-01-16 22:50

New LLR version
 
Jean (Penné) is soon to deliver a new version of LLR that fixes the bug we found. The bug was in the old version of gwnum used by LLR. Jean will use a fresh version of gwnum. Now, he also uses the random-switch technic for Vrba-Test !
Seems that the problem is fixed.
And it seems that only 4348 Wagstaff exponents produced a bad residue with LLR 7.3.2 . To be retested...
Tony

Cruelty 2010-01-16 23:02

On the development page I can see the 3.80 version dated 14-th of January - anyone tried it already?

T.Rex 2010-01-16 23:05

[QUOTE=Cruelty;202130]On the development page I can see the 3.80 version dated 14-th of January - anyone tried it already?[/QUOTE]I tried it: seems good ! However, Jean said that he will officialy deliver this version next week. He has some more testing to do and some more documentation to write.
Wait till he says: GO !
Tony

jrk 2010-01-17 04:45

Did this bug affect only Wagstaff numbers, or is it more general?

Is there a way to determine which results may be bad and need to be re-tested?

T.Rex 2010-01-17 07:49

[QUOTE=jrk;202136]Did this bug affect only Wagstaff numbers, or is it more general?

Is there a way to determine which results may be bad and need to be re-tested?[/QUOTE]The problem was in gwnum. So it depends how the FFT computation of any number comes into the RED zone...
It's enough to add: ErrorCheck=1 in the llr.ini file. (and ... yes ! we did not use it.)
It seems that LLR does not have a ErrorCheck of result every N steps, like Prime95 has.
Tony

philmoore 2010-01-19 06:37

If the problem was in gwnum, it probably also affected prime95 and pfgw as well. Prime95 catches some of these errors through intermittent error checking, but others probably got through. It would be helpful to know the ranges (and FFT sizes) where this happened.

I was confused earlier where you said LLR was producing random residues. I took this to mean that the residue was different each time you ran the test, but I think you meant simply that LLR was producing incorrect residues, no?

mdettweiler 2010-01-19 06:46

[quote=philmoore;202380]If the problem was in gwnum, it probably also affected prime95 and pfgw as well. Prime95 catches some of these errors through intermittent error checking, but others probably got through. It would be helpful to know the ranges (and FFT sizes) where this happened.[/quote]
BTW, I think PFGW does similar intermittent error checking.

T.Rex 2010-01-19 09:00

[QUOTE=philmoore;202380] It would be helpful to know the ranges (and FFT sizes) where this happened.[/QUOTE](2^3251251+1)/3 - (2^3473747+1)/3[QUOTE]the residue was different each time you ran the test ?[/QUOTE]YES !

T.Rex 2010-01-19 09:01

[QUOTE=mdettweiler;202383]BTW, I think PFGW does similar intermittent error checking.[/QUOTE]LLR did not. It will now, said Jean.

philmoore 2010-01-19 12:31

Good work, Tony, and good luck with your retesting!

Cruelty 2010-02-01 00:04

Any idea when the new LLR version will become... official?

T.Rex 2010-02-01 18:43

Last news is that Jean delivered a stable 3.8.0 version.
He is now working on the documentation. A lot of work, he said. I should read his work before publishing it, I proposed, and he accepted.
Wait some more.
T.


All times are UTC. The time now is 13:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.