![]() |
![]() |
#1 |
May 2004
FRANCE
61510 Posts |
![]()
Hi All,
I uploaded today the version 4.0.0 of the LLR program. You can find it now on my personal site : http://jpenne.free.fr/ The 32bit Windows and Linux compressed binaries are available as usual. The Linux 64bit binaries are released here, and also the Mac OS 64bit binaries. The Mac OS 32bit is not released here because I have not the 32bit hwloc library which is needed, and could not build it on my Mac mini... I uploaded also the complete source in a compressed file ; it may be used to build the 64bit Windows binaries. What is new in this version : It is linked with the last version 30.6 of George Woltman's gwnum library. No really new feature, from 3.8.24, but some improvements related to reliability and speed. I avoid now the use of giants functions invg() and gcdg() which are slow and seem not to be very reliable. To do that, I am using gtompz() and mpztog() conversion functions. Also, I replaced everywhere the gwnum squaring and multiplication functions gwsquare() and gwmul() by their new forms : gwsquare2(), gwmul3(), gwmul3_carefully(), etc... As usual, I need help to build the 64bit Windows binaries. I uploaded also the GNU gmp6.1.0 compressed source I used on 32bit VC6.0 I hope it can be used to build this library on Windows 64bit and link it with LLR... Please, inform me if you encountered any problem while using this new version. Best Regards, Jean |
![]() |
![]() |
![]() |
#2 |
"Alexander"
Nov 2008
The Alamo City
3×307 Posts |
![]()
Thanks for the long-needed major version bump!
Can someone actually post a 64-bit Windows build this time? I've been running the 32-bit cllr 3.8.24 on my old laptop since it's not well-suited to run Visual Studio. |
![]() |
![]() |
![]() |
#3 | |
Mar 2019
3×109 Posts |
![]()
Thanks for the new version!
Quote:
In some limited testing, I have not observed a speedup. |
|
![]() |
![]() |
![]() |
#4 |
Sep 2006
The Netherlands
2×13×31 Posts |
![]()
Can someone verify something for me running LLR at a Xeon V4 system and others? (edit: a system with AVX)
I have built new dual Xeon e5-2699v4 yet it is ES version. It has 44 cores in total and i'm very happy with it. It runs on all 44 cores without hyperthreading and it clocks itself to 2.0Ghz under this full load (watercooled all cores are around 50C under full load, highest 51C). Now my old system is 8 core Xeons L5420 2.5Ghz. No turboboost obviously. In theory the e5v4 Xeon delivers each core 32 flops fp64 a clock. In theory the core2duo Xeon L5420 delivers 4 flops fp64 a clock. So factor 8 difference a clock. Yet at small bitsizes (1mbit) i measure only factor 3.13 here and larger bitsizes i measure that going up to factor 3.6x faster for the Xeon e5 a clock. That should be closer to factor 8 difference however. Now i do not know what causes this. It is as if AVX doesn't work or maybe something else. Can someone try to test this at a box single core with sllr64 and see which timings he gets? Much appreciated. So timings do not need to be very accurate. A few seconds more or less no big deal. Am i missing a factor 2 performance somewhere is the question. Here is the file 'try.txt' 314000000000000000:M:0:2:258 9473 1024968 9473 1025002 9473 1025220 9473 1025338 9473 1025602 9473 1025724 llr.ini : WorkDone=0 Work=0 PgenInputFile=try.txt PgenOutputFile=resbench4 PgenLine=1 HeaderLine=0 Pid=10817 OldCpuSpeed=2500 NewCpuSpeedCount=0 NewCpuSpeed=0 PRPGerbiczCompareIntervalAdj=1 At the Xeon e5 the timings of this is 511.x or 512.x seconds for all of the above exponents single core tested (other 43 cores busy). At 2.0Ghz, no hyperthreading. At the old Xeon L5420's this runs now it's 1280 seconds first timing. yet whether it's 3.13 diff (compensated for clockspeed) or 3.6 (larger bitsizes) - i miss a factor 2 there. Many thanks, Vincent Last fiddled with by diep on 2021-10-15 at 23:30 |
![]() |
![]() |
![]() |
#5 |
Jun 2003
153E16 Posts |
![]() |
![]() |
![]() |
![]() |
#6 | |
Sep 2006
The Netherlands
2·13·31 Posts |
![]() Quote:
(edit: even if for some cpu's hardware architects have proven that the instruction stream cannot decode enough instructions a clock to achieve the manufacturer perspective, such manufacturers in question still kept defining things there to the above model - so the definition doesn't mean it is theoretic possible for 16 fp64 instructions to get executed - which in reality as we know is 4 AVX instructions a clock as we look at it from a single double viewpoints perspective). hashwell and broadwell cores are given therefore as 32 gflops a clock by definition. At c2d xeons cores with off chip memory - i can find that as 4 gflops fp64 a clock given. Which doesn't mean necessarily that is the case with the L5420's i have been running on the past 10 years here. Now that would mean it can execute up to 2 g instructions a clock a core (edit: with a SSE2 instruction already counted as 2 here as it executes 2 doubles so we see it from the instruction on a double type perspective). With multiplication the situation there is complicated though to ever achieve this as the throughput latency of fp64 multiplication SSE2/SSSE instructions is most definitely more than 1 clock. Initial Nehalem i7 core also needs more than 1 clock throughputlatency for multiplication whereas later cores can do that in 1 clock. At least this is the situation as how i understand it. Now what cpu's practical achieve - that is yet a total other reality of course. Here the question is what timing you get there and whether that's significantly faster than what i get here. These engineering sample cpu's i got might for example not identify themselves correctly to the software that run on them, just to mention something. Last fiddled with by diep on 2021-10-16 at 06:38 |
|
![]() |
![]() |
![]() |
#7 |
Sep 2006
The Netherlands
14468 Posts |
![]()
Timings i have on a single core of the L5420s is:
9473*2^1024968-1 is not prime. RES64: 20E2F460750CC947. Time : 1280.557 sec. 9473*2^1025002-1 is not prime. RES64: B76C5E01AC969C27. Time : 1233.439 sec. 9473*2^1025220-1 is not prime. RES64: 4207BF5915F3B797. Time : 1191.275 sec. 9473*2^1025338-1 is not prime. RES64: 75EB97511F4070E6. Time : 1241.910 sec. 9473*2^1025602-1 is not prime. RES64: C8255E7C578A5C36. Time : 1183.743 sec. 9473*2^1025724-1 is not prime. RES64: 2AD235954071BD7F. Time : 1209.558 sec. That's at 2.5Ghz. The wildly varying timings is something i see already for 10 years there on those Xeons - this box is connected to the internet. Such boxes not connected to internet have less variety there. Now the Xeon e5's v4 ES here which run under full load at 2.0Ghz: 511-512 seconds very consistently (not a single timing different). Are my Xeon e5's achieving the same speed like others have here with broadwell/haswell core type cpu's or do i lose factor 2 somewhere? |
![]() |
![]() |
![]() |
#8 |
Apr 2013
Durham, UK
67 Posts |
![]()
For comparison I ran your tests on a sightly older E5-2630 v3 using LLR4.0.0:
9473*2^1024968-1 is not prime. RES64: 20E2F460750CC947. Time : 415.186 sec. 9473*2^1025002-1 is not prime. RES64: B76C5E01AC969C27. Time : 381.239 sec. 9473*2^1025220-1 is not prime. RES64: 4207BF5915F3B797. Time : 382.592 sec. 9473*2^1025338-1 is not prime. RES64: 75EB97511F4070E6. Time : 382.597 sec. 9473*2^1025602-1 is not prime. RES64: C8255E7C578A5C36. Time : 383.175 sec. 9473*2^1025724-1 is not prime. RES64: 2AD235954071BD7F. Time : 383.757 sec. The machine was busy with something else during the first job. |
![]() |
![]() |
![]() |
#9 | |
Jun 2003
10101001111102 Posts |
![]() Quote:
EDIT:- Probably part of the explanation is that you did not have all 8 cores fully occupied. Can you run 8 tests in parallel and report the timing? Last fiddled with by axn on 2021-10-16 at 13:26 |
|
![]() |
![]() |
![]() |
#10 | |
Sep 2006
The Netherlands
11001001102 Posts |
![]() Quote:
I assume it turboboosted to 3.2Ghz running this 1 core load. If i extrapolate the time: 383s * 3.2Ghz / 2Ghz = 612.8 seconds and i had 612 seconds as a timing. So that's exactly the same! |
|
![]() |
![]() |
![]() |
#11 | |
Sep 2006
The Netherlands
2×13×31 Posts |
![]() Quote:
383 * 3.2 / 2 = 612 seconds Time to try LLR2. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
LLR Version 3.8.21 Released | Jean Penné | Software | 26 | 2019-07-08 16:54 |
LLR Version 3.8.20 released | Jean Penné | Software | 30 | 2018-08-13 20:00 |
LLR Version 3.8.19 released | Jean Penné | Software | 11 | 2017-02-23 08:52 |
LLR Version 3.8.11 released | Jean Penné | Software | 37 | 2014-01-29 16:32 |
llr 3.8.2 released as dev-version | opyrt | Prime Sierpinski Project | 11 | 2010-11-18 18:24 |