![]() |
llr, pfgw and prime95 choose different FFT for same number, why?
For our repunit search project I compared llr, pfgw and prime95 for their runtimes. I noticed quite large deviations on the runtime allthough they all use gwnum.
Below you find the the chosen FFT type and size on different plattforms for the prp test of (10^2460043-1)/9. I noticed on all machines which I tested that prime95 is always the fastest (64 Bit also much faster than 32 Bit), which you can see from the FFT size. Especially on the Core i7 the Core2 type is faster, but llr and pfgw (and also prime95 32 bit) choose the Pentium4 type. My question: shouldn't the FFT type and size be the same for all three programs? Or does this depend only on the corresponding implementation of the Fermat test? Or could this be a bug/glitch? Tested programs: pfgw 3.6.0 (32/64 Bit), llr 3.8.6 (only 32 Bit), prime95/mprime 26.6 (32/64 Bit) [B]Windows 7 64 Bit on Core i7 920 [/B]pfgw (32/64 Bit): Pentium4 type-3 FFT length 896K, Pass1=896, Pass2=1K llr : Pentium4 type-3 FFT length 480K, Pass1=640, Pass2=768 prime95 32 Bit: Pentium4 type-3 FFT length 480K, Pass1=640, Pass2=768 prime95 64 Bit: Core2 type-3 FFT length 480K, Pass1=320, Pass2=1536 [B]Linux 64 Bit on Core 2 Duo U9600 [/B]pfgw (32/64 Bit): Core2 type-3 FFT length 896K, Pass1=896, Pass2=1K llr: Core2 type-3 FFT length 480K, Pass1=640, Pass2=768 mprime (32/64 Bit): Core2 type-3 FFT length 480K, Pass1=640, Pass2=768 [B]Linux 64 Bit on Phenom II X4 810 [/B]pfgw (32/64 Bit): AMD K10 type-3 FFT length 864K, Pass1=384, Pass2=2304 llr: AMD K10 type-3 FFT length 480K, Pass1=320, Pass2=1536 mprime32: AMD K10 type-3 FFT length 480K, Pass1=320, Pass2=1536 mprime64: AMD K10 type-3 FFT length 480K, Pass1=384, Pass2=1280 |
LLR is 32-bit only and is choosing the same FFT as 32-bit prime95/mprime.
Prime95 64-bit chooses slightly different implementations than 32-bit prime95. This may save a percent or two. The big gain comes from using the extra 8 registers available in 64-bit mode. PFGW is not recognizing that it can use a fast IBDWT. It is using general purpose multiplication rather than taking advantage of the special form of your number. I'd post a bug report to the author (or a query as to whether a different command line will help PFGW recognize the special form). |
Thanks for the quick answer! That explains it well.
|
pfgw64 3.6.2 on MacIntel gives this:
Special modular reduction using Core2 type-3 FFT length 480K, Pass1=640, Pass2=768 on 10^2460043-1 Windows executes the exact same code. What does -V tell you? Does it indicate special or generic modular reduction? |
[QUOTE=rogue;288342]pfgw64 3.6.2 on MacIntel gives this:
Special modular reduction using Core2 type-3 FFT length 480K, Pass1=640, Pass2=768 on 10^2460043-1 Windows executes the exact same code. What does -V tell you? Does it indicate special or generic modular reduction?[/QUOTE] Where is the /9 ? Is it just missing in your post or in your command to pfgw as well? |
[QUOTE=rogue;288342]pfgw64 3.6.2 on MacIntel gives this:
Special modular reduction using Core2 type-3 FFT length 480K, Pass1=640, Pass2=768 on 10^2460043-1 Windows executes the exact same code. What does -V tell you? Does it indicate special or generic modular reduction?[/QUOTE] I just tested different command lines on Linux and Windows: on Linux: -V -q"R(2460043)": Generic modular reduction using generic reduction Core2 type-3 FFT length 896K, Pass1=896, Pass2=1K on A 8172086-bit number -V -q"(10^2460043-1)/9" Special modular reduction using Core2 type-3 FFT length 480K, Pass1=640, Pass2=768 on 10^2460043-1 Resuming at bit 2005 PRP: (10^2460043-1)/9 2006/8172082 -V -q"((10^2460043-1)/9)" Generic modular reduction using generic reduction Core2 type-3 FFT length 896K, Pass1=896, Pass2=1K on A 8172086-bit number PRP: ((10^2460043-1)/9) 1/8172082 On Windows: -V -qR(2460043) Generic modular reduction using generic reduction Pentium4 type-3 FFT length 896K, Pass1=896, Pass2=1K on A 8172086-bit number -V -q(10^2460043-1)/9 Special modular reduction using Core2 type-3 FFT length 480K, Pass1=320, Pass2=1536 on 10^2460043-1 PRP: (10^2460043-1)/9 1/8172082 -V -q((10^2460043-1)/9) Generic modular reduction using generic reduction Pentium4 type-3 FFT length 896K, Pass1=896, Pass2=1K on A 8172086-bit number PRP: ((10^2460043-1)/9) 1/8172082 Before that I always tested with R(2460043). It seems that the special form is not recognized always. |
Clearly the output is incorrect. I'll fix that in a future release.
Nevertheless, pfgw has its limits WRT how it determine if you specified a special form. It can parse "(k*b^n-1)/d" as a special form, but "((k*b^n-1)/d)", ""(k*b^n-1)/(a*x)","((s+t)*b^n-1)/d", etc. will not be. I believe that the documentation states as such. |
It is okay if at least one special input form gives the same FFT type and size as prime95.
But still what puzzles me: I benchmarked (10^2460043-1)/9 using the input which gives the same FFT for pfgw and prime95 (both 64 bit) -V -q"(10^2460043-1)/9" vs PRP=1,10,2460043,-1,"9" Even with the special modular reduction pfgw is slower (~10%). Is this still related to the (unused) IBDWT? |
[QUOTE=MrRepunit;288375]Even with the special modular reduction pfgw is slower (~10%). Is this still related to the (unused) IBDWT?[/QUOTE]
No. Possible causes: 1) The PFGW you are using is based on a different version of gwnum. 2) PFGW may not be using the start-next-fft feature of gwnum. 3) PFGW might be doing extra error checking that mprime is not doing. 4) Something else that I haven't thought of. |
[QUOTE=rogue;288372]Clearly the output is incorrect. I'll fix that in a future release.
Nevertheless, pfgw has its limits WRT how it determine if you specified a special form. It can parse "(k*b^n-1)/d" as a special form, but "((k*b^n-1)/d)", ""(k*b^n-1)/(a*x)","((s+t)*b^n-1)/d", etc. will not be. I believe that the documentation states as such.[/QUOTE] Hello rogue, There is a function in the GMP-ECM source that you might find useful to identify special forms. It is in the file Fgw.c and the function name is kbnc_str. This is designed to find k,b,n,c for numbers of the form k*b^n+c. It should be able to identify the first three examples you give: "(k*b^n-1)/d", "((k*b^n-1)/d)", "(k*b^n-1)/(a*x)" but not the fourth one: "((s+t)*b^n-1)/d" But, you may be able to adapt it to suit your needs. I wrote the function so if you have any problems or questions on it, feel free to ask. |
[QUOTE=WraithX;288404]There is a function in the GMP-ECM source that you might find useful to identify special forms. It is in the file Fgw.c and the function name is kbnc_str. This is designed to find k,b,n,c for numbers of the form k*b^n+c. It should be able to identify the first three examples you give:
"(k*b^n-1)/d", "((k*b^n-1)/d)", "(k*b^n-1)/(a*x)" but not the fourth one: "((s+t)*b^n-1)/d" But, you may be able to adapt it to suit your needs. I wrote the function so if you have any problems or questions on it, feel free to ask.[/QUOTE] Thanks, but I'll pass on that right now. IMO, this can get really complex really quickly and unless the code can handle all cases, it probably isn't worth the effort. This is the first time that this has been an issue and as the workaround is simple, I'm not going to worry about it. |
| All times are UTC. The time now is 09:43. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.