![]() |
Well I long time ago contacted Mark, when I discovered that for the same bases and n values, (basically the same tests) LLR were 15% faster compared to LLR running the same conditions, IE 1, 2, 3 or 4 cores. I have only investigated on my Sandy Bridge, so I'm not sure if other systems are affected. But since Mark conquered that his system was also 15% slower than LLR for the same tests, I generally assumed that it has something overall to do with the way PFGW works, rather than the system itself :smile:
|
[QUOTE=MyDogBuster;375575]The differences between PFGW and LLR could be that LLR uses AVX, the new and improved instruction set from Intel. I've used it for about 2 years now.
It only works on Sandy Bridge and Ivy Bridge CPU's.[/QUOTE] pfgw uses AVX as well as LLR and pfgw use versions of gwnum that support it. The FFT selected for each test is determined by gwnum. |
[QUOTE=gd_barnes;375573]Wow. Port 1400 is already handing out R6 work. Although we still have a little over 100 tests left to finish up on S6, that was very fast to get to this point! :smile:
On another note, can someone please direct me to the latest Windows and Linux versions of LLR? I want to test the difference between PFGW and LLR myself on some of my old(ish) machines.[/QUOTE] [url]http://jpenne.free.fr/index2.html[/url] [url]http://sourceforge.net/projects/openpfgw/[/url] I ran some timings on 87800*6^992733+1 with various versions on my i5-4670K (Haswell). cllr 3.8.13 [GWNUM 28.5] - 1.02 ms/iter cllr 3.8.9 [GWNUM 27.7] - 1.26 ms/iter pfgw 3.7.7 [GWNUM 27.11] - 1.5 ms/iter pfgw 3.5.7 [GWNUM 26.6] - 2.9 ms/iter I wish I had been using the right program/version for this earlier! And: 36772*6^1509139-1 cllr 3.8.13 [GWNUM 28.5] - 1.66 ms/iter PFGW 3.7.7 [GWNUM 27.11] - 13.2 ms/iter |
Interesting. So it seems it is dependent both on the range being tested, and the CPU used; and it's also much more pronounced for newer gwnum's. That'd explain why I never noticed this before (because I could have sworn I'd done such a head-to-head comparison in the past).
In any event, I'm in the process of upgrading all my boxes to the latest LLR. Looks like LLR is the thing to use for this kind of PRPnet work at the moment. :smile: ----------- Edit: Say, wait a minute - could this just be because the latest PFGW (3.7.7) is still on gwnum 27.11, while the latest LLR (3.8.13) is on gwnum 28.5? Given gwnum incremented by a whole major version number, I'd imagine there's got to be some significant improvements in there...even for older CPUs. (My best computer is a Sandy Bridge, so I have AVX but nothing fancier than that.) |
[QUOTE=mdettweiler;375591]Edit: Say, wait a minute - could this just be because the latest PFGW (3.7.7) is still on gwnum 27.11, while the latest LLR (3.8.13) is on gwnum 28.5? Given gwnum incremented by a whole major version number, I'd imagine there's got to be some significant improvements in there...even for older CPUs. (My best computer is a Sandy Bridge, so I have AVX but nothing fancier than that.)[/QUOTE]
Partly, yes. I'd guess that's responsible for the difference between cllr 3.8.9 (gwnum 27) and cllr 3.8.13 (gwnum 28), for example. But pfgw 3.7.7 and cllr 3.8.9 both have gwnum 27 (pfgw even has a newer version of it), and cllr is still faster there. IIRC, Sandy/Ivy Bridge have small improvements (2-3%) with gwnum 28 (compare to the ~20% improvement I measured). |
Does llr use gwstartnextfft where pfgw does not?
|
[QUOTE=Prime95;375681]Does llr use gwstartnextfft where pfgw does not?[/QUOTE]
pfgw does not use that function. What does it do and how would I use it? |
gwstartnextfft affects the next gwmul, gwsquare, or gwfftmul. These operations will compute their result and perform part of the next forward FFT. Why is this better? It allows a long string of squarings or multiplies to use just 2 passes over main memory instead of 3. The 3 memory pass case you are using now is pass1-forward-FFT,pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation. The 2 memory pass case is pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation-and-next-pass1-forward-FFT.
Obviously, you can't use this option on your last squaring/mul or before writing a save file. To see if this may be the cause of the speed difference time PFGW against a PRP test in prime95 using the same gwnum library. |
[QUOTE=Prime95;375715]gwstartnextfft affects the next gwmul, gwsquare, or gwfftmul. These operations will compute their result and perform part of the next forward FFT. Why is this better? It allows a long string of squarings or multiplies to use just 2 passes over main memory instead of 3. The 3 memory pass case you are using now is pass1-forward-FFT,pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation. The 2 memory pass case is pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation-and-next-pass1-forward-FFT.
Obviously, you can't use this option on your last squaring/mul or before writing a save file. To see if this may be the cause of the speed difference time PFGW against a PRP test in prime95 using the same gwnum library.[/QUOTE] This would be very helpful for PRP tests, but probably not so for primality tests. Would you mind taking a look at the PRP code for pfgw and telling me what to do? |
| All times are UTC. The time now is 10:05. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.