mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Conjectures 'R Us

Reply
 
Thread Tools
Old 2014-06-11, 08:25   #12
KEP
Quasi Admin Thing
 
KEP's Avatar
 
May 2005

17068 Posts
Default

Well I long time ago contacted Mark, when I discovered that for the same bases and n values, (basically the same tests) LLR were 15% faster compared to LLR running the same conditions, IE 1, 2, 3 or 4 cores. I have only investigated on my Sandy Bridge, so I'm not sure if other systems are affected. But since Mark conquered that his system was also 15% slower than LLR for the same tests, I generally assumed that it has something overall to do with the way PFGW works, rather than the system itself
KEP is offline   Reply With Quote
Old 2014-06-11, 11:52   #13
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24×397 Posts
Default

Quote:
Originally Posted by MyDogBuster View Post
The differences between PFGW and LLR could be that LLR uses AVX, the new and improved instruction set from Intel. I've used it for about 2 years now.

It only works on Sandy Bridge and Ivy Bridge CPU's.
pfgw uses AVX as well as LLR and pfgw use versions of gwnum that support it. The FFT selected for each test is determined by gwnum.
rogue is offline   Reply With Quote
Old 2014-06-11, 12:22   #14
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

10AB16 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Wow. Port 1400 is already handing out R6 work. Although we still have a little over 100 tests left to finish up on S6, that was very fast to get to this point!

On another note, can someone please direct me to the latest Windows and Linux versions of LLR? I want to test the difference between PFGW and LLR myself on some of my old(ish) machines.
http://jpenne.free.fr/index2.html
http://sourceforge.net/projects/openpfgw/

I ran some timings on 87800*6^992733+1 with various versions on my i5-4670K (Haswell).

cllr 3.8.13 [GWNUM 28.5] - 1.02 ms/iter
cllr 3.8.9 [GWNUM 27.7] - 1.26 ms/iter
pfgw 3.7.7 [GWNUM 27.11] - 1.5 ms/iter
pfgw 3.5.7 [GWNUM 26.6] - 2.9 ms/iter

I wish I had been using the right program/version for this earlier! And:

36772*6^1509139-1
cllr 3.8.13 [GWNUM 28.5] - 1.66 ms/iter
PFGW 3.7.7 [GWNUM 27.11] - 13.2 ms/iter
Mini-Geek is offline   Reply With Quote
Old 2014-06-11, 13:27   #15
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

186916 Posts
Default

Interesting. So it seems it is dependent both on the range being tested, and the CPU used; and it's also much more pronounced for newer gwnum's. That'd explain why I never noticed this before (because I could have sworn I'd done such a head-to-head comparison in the past).

In any event, I'm in the process of upgrading all my boxes to the latest LLR. Looks like LLR is the thing to use for this kind of PRPnet work at the moment.

-----------
Edit: Say, wait a minute - could this just be because the latest PFGW (3.7.7) is still on gwnum 27.11, while the latest LLR (3.8.13) is on gwnum 28.5? Given gwnum incremented by a whole major version number, I'd imagine there's got to be some significant improvements in there...even for older CPUs. (My best computer is a Sandy Bridge, so I have AVX but nothing fancier than that.)

Last fiddled with by mdettweiler on 2014-06-11 at 13:31
mdettweiler is offline   Reply With Quote
Old 2014-06-11, 14:24   #16
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Edit: Say, wait a minute - could this just be because the latest PFGW (3.7.7) is still on gwnum 27.11, while the latest LLR (3.8.13) is on gwnum 28.5? Given gwnum incremented by a whole major version number, I'd imagine there's got to be some significant improvements in there...even for older CPUs. (My best computer is a Sandy Bridge, so I have AVX but nothing fancier than that.)
Partly, yes. I'd guess that's responsible for the difference between cllr 3.8.9 (gwnum 27) and cllr 3.8.13 (gwnum 28), for example. But pfgw 3.7.7 and cllr 3.8.9 both have gwnum 27 (pfgw even has a newer version of it), and cllr is still faster there.

IIRC, Sandy/Ivy Bridge have small improvements (2-3%) with gwnum 28 (compare to the ~20% improvement I measured).
Mini-Geek is offline   Reply With Quote
Old 2014-06-13, 00:58   #17
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·7·269 Posts
Default

Does llr use gwstartnextfft where pfgw does not?
Prime95 is offline   Reply With Quote
Old 2014-06-13, 11:41   #18
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

18D016 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Does llr use gwstartnextfft where pfgw does not?
pfgw does not use that function. What does it do and how would I use it?
rogue is offline   Reply With Quote
Old 2014-06-13, 12:22   #19
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1D6C16 Posts
Default

gwstartnextfft affects the next gwmul, gwsquare, or gwfftmul. These operations will compute their result and perform part of the next forward FFT. Why is this better? It allows a long string of squarings or multiplies to use just 2 passes over main memory instead of 3. The 3 memory pass case you are using now is pass1-forward-FFT,pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation. The 2 memory pass case is pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation-and-next-pass1-forward-FFT.

Obviously, you can't use this option on your last squaring/mul or before writing a save file.

To see if this may be the cause of the speed difference time PFGW against a PRP test in prime95 using the same gwnum library.

Last fiddled with by Prime95 on 2014-06-13 at 12:23
Prime95 is offline   Reply With Quote
Old 2014-06-13, 13:35   #20
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24×397 Posts
Default

Quote:
Originally Posted by Prime95 View Post
gwstartnextfft affects the next gwmul, gwsquare, or gwfftmul. These operations will compute their result and perform part of the next forward FFT. Why is this better? It allows a long string of squarings or multiplies to use just 2 passes over main memory instead of 3. The 3 memory pass case you are using now is pass1-forward-FFT,pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation. The 2 memory pass case is pass2-forward-FFT-and-pointwise-mul-and-pass2-inverse-FFT, pass1-inverse-FFT-carry-propagation-and-next-pass1-forward-FFT.

Obviously, you can't use this option on your last squaring/mul or before writing a save file.

To see if this may be the cause of the speed difference time PFGW against a PRP test in prime95 using the same gwnum library.
This would be very helpful for PRP tests, but probably not so for primality tests. Would you mind taking a look at the PRP code for pfgw and telling me what to do?
rogue is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Using PFGW and NewPGen robert44444uk Prime Gap Searches 34 2018-06-06 14:37
PFGW GUI vs CMD houding Software 1 2016-06-20 12:11
How do I operate PFGW MattcAnderson Information & Answers 1 2015-06-04 17:13
How do I run this formula in PFGW? Stargate38 Software 1 2014-08-19 15:23
PFGW 3.3.6 or PFGW 3.4.2 Please update now! Joe O Sierpinski/Riesel Base 5 5 2010-09-30 14:07

All times are UTC. The time now is 10:05.


Tue Jul 27 10:05:07 UTC 2021 up 4 days, 4:34, 0 users, load averages: 1.72, 1.90, 1.91

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.