![]() |
![]() |
#1 |
May 2004
FRANCE
59010 Posts |
![]()
Hi All,
I uploaded today the version 3.8.23 of the LLR program. You can find it now on my personal site : http://jpenne.free.fr/ The 32bit Windows and Linux compressed binaries are available as usual. The Linux 64bit binaries are released here, and also the Mac OS 32bit and 64bit binaries I can now build using a Mac mini machine with software OS X 10.9.1 I uploaded also the complete source in a compressed file ; it may be used to build the 64bit Windows binaries. In order to build the Linux binaries, I had to add -DSQLITE_OMIT_LOAD_EXTENSION in the CFLAGS in gwnum/Makefile and gwnum/make64. I had also to suppress irrelevant source and binary libraries references in makemac and makemac64. The makefiles with .orig suffixes are the unmodified ones... No new feature in this LLR version, but it is linked with the Version 29.8 of George Woltman's gwnum library which includes AVX 512 code. A bug that affected tests using AVX 512 on Windows64 platforms has been fixed in this version. So I suggest to avoid now to use LLR 3.8.22 binaries on Windows64 machines... As usual, I need help to build the 64bit Windows binaries. Please, inform me if you encountered any problem while using this new version. Best Regards, Jean |
![]() |
![]() |
![]() |
#3 |
May 2004
FRANCE
2·5·59 Posts |
![]()
Thanks to Rebirther, the Win64 binaries are now available.
Regards, Jean |
![]() |
![]() |
![]() |
#4 |
Einyen
Dec 2003
Denmark
2·1,669 Posts |
![]()
When running a list of twin prime test from a newpgen file LLR runs k*2^n-1 test first and when it finds a prime it runs the k*2^n+1 test to check if it is a twin.
Is there a way to reverse this running k*2^n+1 first? |
![]() |
![]() |
![]() |
#5 |
Jun 2003
19×283 Posts |
![]()
You can edit the header of the newpgen file to test the P form. But then, you'll have to manually check the minus side if you find a prime (basically take the output file which lists all the found primes, change the header to the M form and run it once more through LLR).
|
![]() |
![]() |
![]() |
#6 |
Einyen
Dec 2003
Denmark
2·1,669 Posts |
![]()
Yeah thanks, I tested that already. I was hoping there was in option in LLR to do the reverse automagically.
|
![]() |
![]() |
![]() |
#7 | |
Dec 2011
New York, U.S.A.
9710 Posts |
![]()
Can anyone explain this? This is from one of PrimeGrid's users via Discord:
Quote:
Code:
C:\ProgramData\BOINC\projects\www.primegrid.com>cllr64.3.8.21 -d -t8 -q"19249*2^13018586+1" Starting Proth prime test of 19249*2^13018586+1 Using all-complex FMA3 FFT length 1152K, Pass1=384, Pass2=3K, 8 threads, a = 3 19249*2^13018586+1, bit: 90000 / 13018600 [0.69%]. Time per bit: 1.189 ms. Caught signal. Terminating. Code:
C:\ProgramData\BOINC\projects\www.primegrid.com>cllr64.3.8.23 -d -t8 -q"19249*2^13018586+1" Starting Proth prime test of 19249*2^13018586+1 Using all-complex AVX-512 FFT length 1152K, Pass1=1152, Pass2=1K, clm=2, 8 threads, a = 3 19249*2^13018586+1, bit: 80000 / 13018600 [0.61%]. Time per bit: 1.491 ms. Caught signal. Terminating. How can the AVX-512 transform be that much slower that the FMA3 transform? Normally we see a substantial increase in speed using AVX-512. Here's the 4 thread test: Code:
C:\ProgramData\BOINC\projects\www.primegrid.com>del llr.ini C:\ProgramData\BOINC\projects\www.primegrid.com>del z0336833 C:\ProgramData\BOINC\projects\www.primegrid.com>cllr64.3.8.23 -d -t4 -q"19249*2^13018586+1" Starting Proth prime test of 19249*2^13018586+1 Using all-complex AVX-512 FFT length 1152K, Pass1=1152, Pass2=1K, clm=2, 4 threads, a = 3 19249*2^13018586+1, bit: 50000 / 13018600 [0.38%]. Time per bit: 2.427 ms. Caught signal. Terminating. C:\ProgramData\BOINC\projects\www.primegrid.com>del llr.ini C:\ProgramData\BOINC\projects\www.primegrid.com>del z0336833 C:\ProgramData\BOINC\projects\www.primegrid.com>cllr64.3.8.21 -d -t4 -q"19249*2^13018586+1" Starting Proth prime test of 19249*2^13018586+1 Using all-complex FMA3 FFT length 1152K, Pass1=384, Pass2=3K, 4 threads, a = 3 19249*2^13018586+1, bit: 50000 / 13018600 [0.38%]. Time per bit: 2.149 ms. |
|
![]() |
![]() |
![]() |
#8 |
Jun 2007
Seattle, WA
5 Posts |
![]()
I'm the user referenced above, in case there are other questions or additional testing needed for this scenario.
|
![]() |
![]() |
![]() |
#9 |
Feb 2016
UK
1101110002 Posts |
![]()
I think I know why the Xeon Silver is not seeing a speed increase. According to following link, it only has one AVX-512 unit per core. Skylake-X and higher series Xeon CPUs have two AVX-512 units.
https://ark.intel.com/content/www/us...-2-10-ghz.html To my understanding, a single unit AVX-512 in this kind of application is no better than AVX2/FMA. It might have other advantages not used here. You need the 2nd unit to have a performance increase. That a reduction in performance was seen might be down to the code being optimised for two units, and not working well when not present. If this is repeatable across different single unit AVX-512 CPUs, then a change in FFT selection might be a good idea. I don't know what the command would be, if you force 3.8.23 to not use AVX-512, do you then get the same performance as 3.8.21? |
![]() |
![]() |
![]() |
#10 |
Dec 2011
New York, U.S.A.
97 Posts |
![]() |
![]() |
![]() |
![]() |
#11 |
Einyen
Dec 2003
Denmark
2×1,669 Posts |
![]()
You can probably add:
CpuSupportsAVX512F=0 to LLR.ini, or use the -oCpuSupportsAVX512F=0 in the command line. I cannot test this since I do not have an AVX512 cpu, but for me using -oCpuSupportsFMA3=0 turns off AVX2 and uses AVX instead. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
LLR Version 3.8.21 Released | Jean Penné | Software | 26 | 2019-07-08 16:54 |
LLR Version 3.8.22 released | Jean Penné | Software | 51 | 2019-04-10 06:04 |
LLR Version 3.8.20 released | Jean Penné | Software | 30 | 2018-08-13 20:00 |
LLR Version 3.8.19 released | Jean Penné | Software | 11 | 2017-02-23 08:52 |
llr 3.8.2 released as dev-version | opyrt | Prime Sierpinski Project | 11 | 2010-11-18 18:24 |