mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   llr, pfgw and prime95 choose different FFT for same number, why? (https://www.mersenneforum.org/showthread.php?t=16503)

MrRepunit 2012-02-05 01:09

llr, pfgw and prime95 choose different FFT for same number, why?
 
For our repunit search project I compared llr, pfgw and prime95 for their runtimes. I noticed quite large deviations on the runtime allthough they all use gwnum.
Below you find the the chosen FFT type and size on different plattforms for the prp test of (10^2460043-1)/9. I noticed on all machines which I tested that prime95 is always the fastest (64 Bit also much faster than 32 Bit), which you can see from the FFT size. Especially on the Core i7 the Core2 type is faster, but llr and pfgw (and also prime95 32 bit) choose the Pentium4 type.

My question: shouldn't the FFT type and size be the same for all three programs? Or does this depend only on the corresponding implementation of the Fermat test? Or could this be a bug/glitch?


Tested programs: pfgw 3.6.0 (32/64 Bit), llr 3.8.6 (only 32 Bit), prime95/mprime 26.6 (32/64 Bit)

[B]Windows 7 64 Bit on Core i7 920
[/B]pfgw (32/64 Bit): Pentium4 type-3 FFT length 896K, Pass1=896, Pass2=1K
llr : Pentium4 type-3 FFT length 480K, Pass1=640, Pass2=768
prime95 32 Bit: Pentium4 type-3 FFT length 480K, Pass1=640, Pass2=768
prime95 64 Bit: Core2 type-3 FFT length 480K, Pass1=320, Pass2=1536

[B]Linux 64 Bit on Core 2 Duo U9600
[/B]pfgw (32/64 Bit): Core2 type-3 FFT length 896K, Pass1=896, Pass2=1K
llr: Core2 type-3 FFT length 480K, Pass1=640, Pass2=768
mprime (32/64 Bit): Core2 type-3 FFT length 480K, Pass1=640, Pass2=768

[B]Linux 64 Bit on Phenom II X4 810
[/B]pfgw (32/64 Bit): AMD K10 type-3 FFT length 864K, Pass1=384, Pass2=2304
llr: AMD K10 type-3 FFT length 480K, Pass1=320, Pass2=1536
mprime32: AMD K10 type-3 FFT length 480K, Pass1=320, Pass2=1536
mprime64: AMD K10 type-3 FFT length 480K, Pass1=384, Pass2=1280

Prime95 2012-02-05 01:22

LLR is 32-bit only and is choosing the same FFT as 32-bit prime95/mprime.

Prime95 64-bit chooses slightly different implementations than 32-bit prime95. This may save a percent or two. The big gain comes from using the extra 8 registers available in 64-bit mode.

PFGW is not recognizing that it can use a fast IBDWT. It is using general purpose multiplication rather than taking advantage of the special form of your number. I'd post a bug report to the author (or a query as to whether a different command line will help PFGW recognize the special form).

MrRepunit 2012-02-05 01:34

Thanks for the quick answer! That explains it well.

rogue 2012-02-05 03:56

pfgw64 3.6.2 on MacIntel gives this:

Special modular reduction using Core2 type-3 FFT length 480K, Pass1=640, Pass2=768 on 10^2460043-1

Windows executes the exact same code. What does -V tell you? Does it indicate special or generic modular reduction?

henryzz 2012-02-05 08:03

[QUOTE=rogue;288342]pfgw64 3.6.2 on MacIntel gives this:

Special modular reduction using Core2 type-3 FFT length 480K, Pass1=640, Pass2=768 on 10^2460043-1

Windows executes the exact same code. What does -V tell you? Does it indicate special or generic modular reduction?[/QUOTE]
Where is the /9 ? Is it just missing in your post or in your command to pfgw as well?

MrRepunit 2012-02-05 10:16

[QUOTE=rogue;288342]pfgw64 3.6.2 on MacIntel gives this:

Special modular reduction using Core2 type-3 FFT length 480K, Pass1=640, Pass2=768 on 10^2460043-1

Windows executes the exact same code. What does -V tell you? Does it indicate special or generic modular reduction?[/QUOTE]

I just tested different command lines on Linux and Windows:

on Linux:
-V -q"R(2460043)":
Generic modular reduction using generic reduction Core2 type-3 FFT length 896K, Pass1=896, Pass2=1K on A 8172086-bit number

-V -q"(10^2460043-1)/9"
Special modular reduction using Core2 type-3 FFT length 480K, Pass1=640, Pass2=768 on 10^2460043-1
Resuming at bit 2005
PRP: (10^2460043-1)/9 2006/8172082

-V -q"((10^2460043-1)/9)"
Generic modular reduction using generic reduction Core2 type-3 FFT length 896K, Pass1=896, Pass2=1K on A 8172086-bit number
PRP: ((10^2460043-1)/9) 1/8172082

On Windows:
-V -qR(2460043)
Generic modular reduction using generic reduction Pentium4 type-3 FFT length 896K, Pass1=896, Pass2=1K on A 8172086-bit number

-V -q(10^2460043-1)/9
Special modular reduction using Core2 type-3 FFT length 480K, Pass1=320, Pass2=1536 on 10^2460043-1
PRP: (10^2460043-1)/9 1/8172082

-V -q((10^2460043-1)/9)
Generic modular reduction using generic reduction Pentium4 type-3 FFT length 896K, Pass1=896, Pass2=1K on A 8172086-bit number
PRP: ((10^2460043-1)/9) 1/8172082

Before that I always tested with R(2460043).

It seems that the special form is not recognized always.

rogue 2012-02-05 14:30

Clearly the output is incorrect. I'll fix that in a future release.

Nevertheless, pfgw has its limits WRT how it determine if you specified a special form. It can parse "(k*b^n-1)/d" as a special form, but "((k*b^n-1)/d)", ""(k*b^n-1)/(a*x)","((s+t)*b^n-1)/d", etc. will not be. I believe that the documentation states as such.

MrRepunit 2012-02-05 14:56

It is okay if at least one special input form gives the same FFT type and size as prime95.
But still what puzzles me: I benchmarked (10^2460043-1)/9 using the input which gives the same FFT for pfgw and prime95 (both 64 bit)
-V -q"(10^2460043-1)/9" vs PRP=1,10,2460043,-1,"9"
Even with the special modular reduction pfgw is slower (~10%). Is this still related to the (unused) IBDWT?

Prime95 2012-02-05 18:56

[QUOTE=MrRepunit;288375]Even with the special modular reduction pfgw is slower (~10%). Is this still related to the (unused) IBDWT?[/QUOTE]

No.

Possible causes:

1) The PFGW you are using is based on a different version of gwnum.
2) PFGW may not be using the start-next-fft feature of gwnum.
3) PFGW might be doing extra error checking that mprime is not doing.
4) Something else that I haven't thought of.

WraithX 2012-02-05 21:18

[QUOTE=rogue;288372]Clearly the output is incorrect. I'll fix that in a future release.

Nevertheless, pfgw has its limits WRT how it determine if you specified a special form. It can parse "(k*b^n-1)/d" as a special form, but "((k*b^n-1)/d)", ""(k*b^n-1)/(a*x)","((s+t)*b^n-1)/d", etc. will not be. I believe that the documentation states as such.[/QUOTE]

Hello rogue,

There is a function in the GMP-ECM source that you might find useful to identify special forms. It is in the file Fgw.c and the function name is kbnc_str. This is designed to find k,b,n,c for numbers of the form k*b^n+c. It should be able to identify the first three examples you give:
"(k*b^n-1)/d", "((k*b^n-1)/d)", "(k*b^n-1)/(a*x)"
but not the fourth one:
"((s+t)*b^n-1)/d"
But, you may be able to adapt it to suit your needs. I wrote the function so if you have any problems or questions on it, feel free to ask.

rogue 2012-02-05 22:31

[QUOTE=WraithX;288404]There is a function in the GMP-ECM source that you might find useful to identify special forms. It is in the file Fgw.c and the function name is kbnc_str. This is designed to find k,b,n,c for numbers of the form k*b^n+c. It should be able to identify the first three examples you give:
"(k*b^n-1)/d", "((k*b^n-1)/d)", "(k*b^n-1)/(a*x)"
but not the fourth one:
"((s+t)*b^n-1)/d"
But, you may be able to adapt it to suit your needs. I wrote the function so if you have any problems or questions on it, feel free to ask.[/QUOTE]

Thanks, but I'll pass on that right now. IMO, this can get really complex really quickly and unless the code can handle all cases, it probably isn't worth the effort. This is the first time that this has been an issue and as the workaround is simple, I'm not going to worry about it.


All times are UTC. The time now is 09:43.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.