mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2019-04-10, 15:44   #1
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

10608 Posts
Default LLR Version 3.8.23 released

Hi All,

I uploaded today the version 3.8.23 of the LLR program.
You can find it now on my personal site :

http://jpenne.free.fr/

The 32bit Windows and Linux compressed binaries are available as usual.
The Linux 64bit binaries are released here, and also the Mac OS 32bit and 64bit binaries I can now build using a Mac mini machine with software OS X 10.9.1
I uploaded also the complete source in a compressed file ; it may be used to build the 64bit Windows binaries.
In order to build the Linux binaries, I had to add -DSQLITE_OMIT_LOAD_EXTENSION in the CFLAGS in gwnum/Makefile and gwnum/make64.
I had also to suppress irrelevant source and binary libraries references in makemac and makemac64. The makefiles with .orig suffixes are the unmodified ones...

No new feature in this LLR version, but it is linked with the Version 29.8 of George Woltman's gwnum library which includes AVX 512 code.
A bug that affected tests using AVX 512 on Windows64 platforms has been fixed in this version.
So I suggest to avoid now to use LLR 3.8.22 binaries on Windows64 machines...

As usual, I need help to build the 64bit Windows binaries.
Please, inform me if you encountered any problem while using this new version.
Best Regards,
Jean
Jean Penné is offline   Reply With Quote
Old 2019-04-10, 19:08   #2
rebirther
 
rebirther's Avatar
 
Sep 2011
Germany

2,423 Posts
Default

as always download (console and GUI)


@Jean: sent the files

Last fiddled with by rebirther on 2019-04-10 at 19:08
rebirther is offline   Reply With Quote
Old 2019-04-10, 20:34   #3
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

24×5×7 Posts
Default Win64 binaries

Thanks to Rebirther, the Win64 binaries are now available.

Regards,
Jean
Jean Penné is offline   Reply With Quote
Old 2019-04-18, 05:58   #4
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×5×172 Posts
Default

When running a list of twin prime test from a newpgen file LLR runs k*2^n-1 test first and when it finds a prime it runs the k*2^n+1 test to check if it is a twin.

Is there a way to reverse this running k*2^n+1 first?
ATH is offline   Reply With Quote
Old 2019-04-18, 06:14   #5
axn
 
axn's Avatar
 
Jun 2003

466910 Posts
Default

Quote:
Originally Posted by ATH View Post
When running a list of twin prime test from a newpgen file LLR runs k*2^n-1 test first and when it finds a prime it runs the k*2^n+1 test to check if it is a twin.

Is there a way to reverse this running k*2^n+1 first?
You can edit the header of the newpgen file to test the P form. But then, you'll have to manually check the minus side if you find a prime (basically take the output file which lists all the found primes, change the header to the M form and run it once more through LLR).
axn is online now   Reply With Quote
Old 2019-04-18, 13:28   #6
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

55128 Posts
Default

Yeah thanks, I tested that already. I was hoping there was in option in LLR to do the reverse automagically.
ATH is offline   Reply With Quote
Old 2019-05-22, 20:54   #7
AG5BPilot
 
AG5BPilot's Avatar
 
Dec 2011
New York, U.S.A.

97 Posts
Default

Can anyone explain this? This is from one of PrimeGrid's users via Discord:

Quote:
So testing 3.8.21 v. 3.8.23 on my AVX-512 capable system is showing AVX-512 slower, at least on a manual SoB test.
CPU: Xeon Silver 4110
Code:
C:\ProgramData\BOINC\projects\www.primegrid.com>cllr64.3.8.21 -d -t8 -q"19249*2^13018586+1"
Starting Proth prime test of 19249*2^13018586+1
Using all-complex FMA3 FFT length 1152K, Pass1=384, Pass2=3K, 8 threads, a = 3
19249*2^13018586+1, bit: 90000 / 13018600 [0.69%].  Time per bit: 1.189 ms.
 Caught signal.  Terminating.
Code:
C:\ProgramData\BOINC\projects\www.primegrid.com>cllr64.3.8.23 -d -t8 -q"19249*2^13018586+1"
Starting Proth prime test of 19249*2^13018586+1
Using all-complex AVX-512 FFT length 1152K, Pass1=1152, Pass2=1K, clm=2, 8 threads, a = 3
19249*2^13018586+1, bit: 80000 / 13018600 [0.61%].  Time per bit: 1.491 ms.
 Caught signal.  Terminating.
This is an 8c/16t Xeon. We also tried it with a -t4 test. Same results.

How can the AVX-512 transform be that much slower that the FMA3 transform?

Normally we see a substantial increase in speed using AVX-512.

Here's the 4 thread test:

Code:
C:\ProgramData\BOINC\projects\www.primegrid.com>del llr.ini

C:\ProgramData\BOINC\projects\www.primegrid.com>del z0336833

C:\ProgramData\BOINC\projects\www.primegrid.com>cllr64.3.8.23 -d -t4 -q"19249*2^13018586+1"
Starting Proth prime test of 19249*2^13018586+1
Using all-complex AVX-512 FFT length 1152K, Pass1=1152, Pass2=1K, clm=2, 4 threads, a = 3
19249*2^13018586+1, bit: 50000 / 13018600 [0.38%].  Time per bit: 2.427 ms.
 Caught signal.  Terminating.

C:\ProgramData\BOINC\projects\www.primegrid.com>del llr.ini

C:\ProgramData\BOINC\projects\www.primegrid.com>del z0336833

C:\ProgramData\BOINC\projects\www.primegrid.com>cllr64.3.8.21 -d -t4 -q"19249*2^13018586+1"
Starting Proth prime test of 19249*2^13018586+1
Using all-complex FMA3 FFT length 1152K, Pass1=384, Pass2=3K, 4 threads, a = 3
19249*2^13018586+1, bit: 50000 / 13018600 [0.38%].  Time per bit: 2.149 ms.
AG5BPilot is offline   Reply With Quote
Old 2019-05-22, 21:01   #8
Darkclown
 
Jun 2007
Seattle, WA

1012 Posts
Default

I'm the user referenced above, in case there are other questions or additional testing needed for this scenario.
Darkclown is offline   Reply With Quote
Old 2019-05-22, 21:36   #9
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

2×191 Posts
Default

I think I know why the Xeon Silver is not seeing a speed increase. According to following link, it only has one AVX-512 unit per core. Skylake-X and higher series Xeon CPUs have two AVX-512 units.
https://ark.intel.com/content/www/us...-2-10-ghz.html

To my understanding, a single unit AVX-512 in this kind of application is no better than AVX2/FMA. It might have other advantages not used here. You need the 2nd unit to have a performance increase.

That a reduction in performance was seen might be down to the code being optimised for two units, and not working well when not present. If this is repeatable across different single unit AVX-512 CPUs, then a change in FFT selection might be a good idea.

I don't know what the command would be, if you force 3.8.23 to not use AVX-512, do you then get the same performance as 3.8.21?
mackerel is offline   Reply With Quote
Old 2019-05-22, 21:40   #10
AG5BPilot
 
AG5BPilot's Avatar
 
Dec 2011
New York, U.S.A.

97 Posts
Default

Quote:
Originally Posted by mackerel View Post
I don't know what the command would be, if you force 3.8.23 to not use AVX-512, do you then get the same performance as 3.8.21?
The P95 flag CpuSupportsAVX512F=0 doesn't seem to exist in the LLR source. I don't think LLR has the ability to turn off AVX-512.
AG5BPilot is offline   Reply With Quote
Old 2019-05-22, 22:26   #11
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

B4A16 Posts
Default

You can probably add:
CpuSupportsAVX512F=0

to LLR.ini, or use the -oCpuSupportsAVX512F=0 in the command line.

I cannot test this since I do not have an AVX512 cpu, but for me using -oCpuSupportsFMA3=0 turns off AVX2 and uses AVX instead.
ATH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLR Version 3.8.21 Released Jean Penné Software 26 2019-07-08 16:54
LLR Version 3.8.22 released Jean Penné Software 51 2019-04-10 06:04
LLR Version 3.8.20 released Jean Penné Software 30 2018-08-13 20:00
LLR Version 3.8.19 released Jean Penné Software 11 2017-02-23 08:52
llr 3.8.2 released as dev-version opyrt Prime Sierpinski Project 11 2010-11-18 18:24

All times are UTC. The time now is 04:33.

Tue Aug 11 04:33:39 UTC 2020 up 25 days, 20 mins, 1 user, load averages: 2.81, 3.01, 2.61

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.