![]() |
LLR vs PFGW speed
What do you have this et to ? usellroverpfgw=
Most of my computer stared using pfgw. Lennart |
[QUOTE=Lennart;375518]What do you have this et to ? usellroverpfgw=
Most of my computer stared using pfgw. Lennart[/QUOTE] It is set to nothing, i.e. simply usellroverpfgw= . So the clients will use PFGW over LLR if it is available. All CRUS/NPLB servers are set this way by default. |
[QUOTE=gd_barnes;375527]Excellent! I'll go ahead and add all of Riesel base 6 for n=1.5M-2M shortly.
It is set to nothing, i.e. simply usellroverpfgw= . So the clients will use PFGW over LLR if it is available. All CRUS/NPLB servers are set this way by default.[/QUOTE] Change that or it takes much more time to test the numbers. Lennart |
[QUOTE=Lennart;375529]Change that or it takes much more time to test the numbers.
Lennart[/QUOTE] Hummmm...I thought they used the same program/GWNUM libraries. Maybe this is machine and/or size specific. I'll go ahead and change it and see if there are comments from others about it. |
[QUOTE=gd_barnes;375531]Hummmm...I thought they used the same program/GWNUM libraries. Maybe this is machine and/or size specific. I'll go ahead and change it and see if there are comments from others about it.[/QUOTE]
I can't say why but on my faster computer pfgw takes about 50%-100% more time. Could be some memorything also Lennart |
There are various reasons the pfgw is slower. I haven't taken the time to investigate the reasons for it. I suspect that it has something to do with data conversions in the code between calls to the gwnum library. The sourcecode for pfgw is an absolute beast. The original developers tried to make it as C++ pure as possible so a lot of things are done in very abstract ways, which really hurts performance in some areas. I would love to rewrite it, but I have so little time and it would be an immense project that would require a lot of redesign.
|
[QUOTE=Lennart;375533]I can't say why but on my faster computer pfgw takes about 50%-100% more time. Could be some memorything also[/QUOTE]
[QUOTE=rogue;375541]There are various reasons the pfgw is slower. I haven't taken the time to investigate the reasons for it. I suspect that it has something to do with data conversions in the code between calls to the gwnum library. The sourcecode for pfgw is an absolute beast. The original developers tried to make it as C++ pure as possible so a lot of things are done in very abstract ways, which really hurts performance in some areas. I would love to rewrite it, but I have so little time and it would be an immense project that would require a lot of redesign.[/QUOTE] Wow, I didn't know about this...I haven't really been paying close attention to my various computers' timings lately, but that might help explain why my Core2Duo (running LLR, because it was having trouble with PFGW earlier...not sure why, it's a wonky system anyway) has been faster than my Phenom II laptop (running PFGW), which is a few years newer. I had just assumed it was due to having more L2 cache, but maybe there's more to it than that. Is this just a base 6 thing, or has it been observed on other bases too? Or, is it only a big factor on newer CPUs/newer gwnums? (If this has been around for a while, I'm surprised it hasn't been as well-known...again, I haven't been paying too close attention the last year so maybe this is old news and I just didn't get the memo. :smile:) |
Can someone please direct me to the latest Windows and Linux versions of LLR? I want to test the difference between PFGW and LLR myself on some of my old(ish) machines.
|
LLR downloads are here: [url]http://jpenne.free.fr/index2.html[/url]
That reminds me, I need to upgrade... :ermm: |
The differences between PFGW and LLR could be that LLR uses AVX, the new and improved instruction set from Intel. I've used it for about 2 years now.
It only works on Sandy Bridge and Ivy Bridge CPU's. |
I'm getting from 10% to 50% speedup on my various machines by using the latest version of LLR vs. PFGW; both Windows and Linux. It's mostly in the 10-20% range on my Intel machines. A 40-50% speedup was noticed on a couple of my oldest AMD machines on large tests for a non-power-of-2 base. The speedup appears both machine and size dependent. There is less speedup on smaller tests.
I've now changed all of my private PRPnet servers to use LLR instead of PFGW. Port 1400 is the only public PRPnet server for NPLB/CRUS doing non-power-of-2 bases and it was changed earlier. Because there is little speedup on small tests, I will still use PFGW for conjecture searches at low n-ranges because I like the Windows GUI and the scripting that is available. Thank you for the suggestion Lennart! :smile: |
Well I long time ago contacted Mark, when I discovered that for the same bases and n values, (basically the same tests) LLR were 15% faster compared to LLR running the same conditions, IE 1, 2, 3 or 4 cores. I have only investigated on my Sandy Bridge, so I'm not sure if other systems are affected. But since Mark conquered that his system was also 15% slower than LLR for the same tests, I generally assumed that it has something overall to do with the way PFGW works, rather than the system itself :smile:
|
[QUOTE=MyDogBuster;375575]The differences between PFGW and LLR could be that LLR uses AVX, the new and improved instruction set from Intel. I've used it for about 2 years now.
It only works on Sandy Bridge and Ivy Bridge CPU's.[/QUOTE] pfgw uses AVX as well as LLR and pfgw use versions of gwnum that support it. The FFT selected for each test is determined by gwnum. |
[QUOTE=gd_barnes;375573]Wow. Port 1400 is already handing out R6 work. Although we still have a little over 100 tests left to finish up on S6, that was very fast to get to this point! :smile:
On another note, can someone please direct me to the latest Windows and Linux versions of LLR? I want to test the difference between PFGW and LLR myself on some of my old(ish) machines.[/QUOTE] [url]http://jpenne.free.fr/index2.html[/url] [url]http://sourceforge.net/projects/openpfgw/[/url] I ran some timings on 87800*6^992733+1 with various versions on my i5-4670K (Haswell). cllr 3.8.13 [GWNUM 28.5] - 1.02 ms/iter cllr 3.8.9 [GWNUM 27.7] - 1.26 ms/iter pfgw 3.7.7 [GWNUM 27.11] - 1.5 ms/iter pfgw 3.5.7 [GWNUM 26.6] - 2.9 ms/iter I wish I had been using the right program/version for this earlier! And: 36772*6^1509139-1 cllr 3.8.13 [GWNUM 28.5] - 1.66 ms/iter PFGW 3.7.7 [GWNUM 27.11] - 13.2 ms/iter |
Interesting. So it seems it is dependent both on the range being tested, and the CPU used; and it's also much more pronounced for newer gwnum's. That'd explain why I never noticed this before (because I could have sworn I'd done such a head-to-head comparison in the past).
In any event, I'm in the process of upgrading all my boxes to the latest LLR. Looks like LLR is the thing to use for this kind of PRPnet work at the moment. :smile: ----------- Edit: Say, wait a minute - could this just be because the latest PFGW (3.7.7) is still on gwnum 27.11, while the latest LLR (3.8.13) is on gwnum 28.5? Given gwnum incremented by a whole major version number, I'd imagine there's got to be some significant improvements in there...even for older CPUs. (My best computer is a Sandy Bridge, so I have AVX but nothing fancier than that.) |
[QUOTE=mdettweiler;375591]Edit: Say, wait a minute - could this just be because the latest PFGW (3.7.7) is still on gwnum 27.11, while the latest LLR (3.8.13) is on gwnum 28.5? Given gwnum incremented by a whole major version number, I'd imagine there's got to be some significant improvements in there...even for older CPUs. (My best computer is a Sandy Bridge, so I have AVX but nothing fancier than that.)[/QUOTE]
Partly, yes. I'd guess that's responsible for the difference between cllr 3.8.9 (gwnum 27) and cllr 3.8.13 (gwnum 28), for example. But pfgw 3.7.7 and cllr 3.8.9 both have gwnum 27 (pfgw even has a newer version of it), and cllr is still faster there. IIRC, Sandy/Ivy Bridge have small improvements (2-3%) with gwnum 28 (compare to the ~20% improvement I measured). |
Does llr use gwstartnextfft where pfgw does not?
|
[QUOTE=Prime95;375681]Does llr use gwstartnextfft where pfgw does not?[/QUOTE]
pfgw does not use that function. What does it do and how would I use it? |
gwstartnextfft affects the next gwmul, gwsquare, or gwfftmul. These operations will compute their result and perform part of the next forward FFT. Why is this better? It allows a long string of squarings or multiplies to use just 2 passes over main memory instead of 3. The 3 memory pass case you are using now is pass1-forward-FFT,pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation. The 2 memory pass case is pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation-and-next-pass1-forward-FFT.
Obviously, you can't use this option on your last squaring/mul or before writing a save file. To see if this may be the cause of the speed difference time PFGW against a PRP test in prime95 using the same gwnum library. |
[QUOTE=Prime95;375715]gwstartnextfft affects the next gwmul, gwsquare, or gwfftmul. These operations will compute their result and perform part of the next forward FFT. Why is this better? It allows a long string of squarings or multiplies to use just 2 passes over main memory instead of 3. The 3 memory pass case you are using now is pass1-forward-FFT,pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation. The 2 memory pass case is pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation-and-next-pass1-forward-FFT.
Obviously, you can't use this option on your last squaring/mul or before writing a save file. To see if this may be the cause of the speed difference time PFGW against a PRP test in prime95 using the same gwnum library.[/QUOTE] This would be very helpful for PRP tests, but probably not so for primality tests. Would you mind taking a look at the PRP code for pfgw and telling me what to do? |
| All times are UTC. The time now is 10:05. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.