mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-10-02, 21:30   #1
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3×13×197 Posts
Default Prime95 30.7

Prime95 version 30.7 build 9 is available.

P-1/P+1/ECM users should consider upgrading to help with testing. Intel Alder Lake users definitely need to upgrade to iron out any issues. Win11 users should also consider upgrading to test for affinity issues. First time PRP users can consider upgrading for the P-1 stage 2 speed boost.

WARNING: If you upgrade in the middle of P-1/P+1/ECM stage 2, then all your stage 2 work will be lost -- stage 2 starts from scratch.

From whatsnew.txt:

Code:
1) Better prime pairing in stage 2 of ECM/P-1/P+1.  This usually results in slightly better
   stage 2 timings or less memory used.  Save file formats changed - upgrading to 30.7 while
   ECM/P-1/P+1 work is in stage 2 will result in stage 2 being restarted from scratch.
2) P-1 converted to use P+1 style stage 2.  From the users perpective there is no difference.
   Internally a modular inverse is required at stage 2 init, but there is one multiplication
   saved for every D-block processed.  For all common P-1 cases, this is a little faster.
3) ECM/P-1/P+1 no longer use a bit map for prime pairs.  Instead a compressed pairing map is
   created to save memory.  For large B2 values this also results in fewer calls to generate
   pairing maps.  It also makes stage 2 save files smaller.
4) Some minor changes in AVX-512 FFT crossovers.  ECM/P-1/P+1 all changed to rollback to the
   last save file and switch to a larger FFT size should an excessive roundoff error be
   encountered.
5) Support for asymmetric processor architectures such as Intel's Alder Lake.
6) Torture test dialog now asks for number of cores to test along with a "Use hyperthreading"
   checkbox.  Previously, the dialog box asked for total number of torture threads to execute.
7) Versions 30.4/30.5/30.6 were underestimating the cost of P-1 stage 2 relative to P-1 stage 1.
   Expect this version to use lower stage 2 bounds in P-1.
Download links:
Windows 64-bit: https://mersenne.org/ftp_root/gimps/p95v307b9.win64.zip
Linux 64-bit: https://mersenne.org/ftp_root/gimps/...linux64.tar.gz
FreeBSD 64-bit: https://mersenne.org/ftp_root/gimps/...SD11-64.tar.gz
Windows 32-bit: https://mersenne.org/ftp_root/gimps/p95v307b9.win32.zip
Linux 32-bit: https://mersenne.org/ftp_root/gimps/...linux32.tar.gz
Windows 64-bit Service: https://mersenne.org/ftp_root/gimps/...64.service.zip
Windows 32-bit Service: https://mersenne.org/ftp_root/gimps/...32.service.zip
Source: https://mersenne.org/ftp_root/gimps/...7b9.source.zip
Please report any bugs you may find by email or posting in this thread.

Last fiddled with by Prime95 on 2021-11-15 at 06:12
Prime95 is online now   Reply With Quote
Old 2021-10-02, 21:30   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·13·197 Posts
Default

1) Benchmarking broken. Fixed in build 2.
2) Most non-Mersenne FFTs broken. Fixed in build 3.
3) Hyperthreaded torture tests not setting affinity properly for small FFTs. Fixed in build 4.
4) Hyperthreaded in-place torture tests crash for small FFTs. Fixed in build 4.
5) Semi-obscure ECM crash. If an ECM curve needed modular inverses in stage 2 and a subsequent curve needed none (more memory available), then a crash occurred. Fixed in build 4.
6) Assume CERTs will complete before all other work types in computing estimated completion dates. Fixed in build 5.
7) A low-memory situation during stage 2 init of ECM could lead to a crash writing a save file. Fixed in build 5.
8) During stage 2 init, checking for a restart due to a reduction in available memory was infrequent. Fixed in build 5 - might reduce chance of an out-of-memory event.
9) Options/Benchmark tries to run a hyperthreaded benchmark on non-hyperthreaded CPUs. Fixed in build 6.
10) Another possible crash bug in stage 2 init when memory settings change. Fixed in build 6.
11) ECM sometimes generated excessive roundoff error, usually at start up which then forced using a larger FFT size than necessary. Fixed in build 6.
12) On stage 2 restart due to more memory now available, stage 2 % complete was erroneously reported to be 100%. Fixed in build 8.
13) On stage 2 restart due to less memory being available, stage 2 might restart from scratch. Fixed in build 8.
14) Rare radix conversion excessive roundoff error affecting PRP of non-base-2 numbers. Fixed in build 8.
15) Trial factoring crashes. Fixed in build 9.

Last fiddled with by Prime95 on 2021-11-13 at 04:25
Prime95 is online now   Reply With Quote
Old 2021-10-02, 21:30   #3
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·13·197 Posts
Default

How you can help:

1) Help fine-tune the P-1 stage 1 vs stage 2 cost function. In preferences, set output iterations low -- like 10000. Report the typical P-1 stage 1 timings vs. typical stage 2 timings as well as minimal architectural info. Example for one of my machines:
Code:
Skylake CPU. FMA FFT, 106M exponent:  stage 1 = 83.9 sec, stage 2 = 129 sec.
The optimal P-1 bounds depends on the stage 2 to stage 1 timing ratio. I'm seeing stage 2 anywhere from 30-50% slower.

2) Alder Lake and Win11 -- verify CPU affinities make sense and are working as expected. Add to prime.txt:
Code:
AffinityVerbosity=2
AffinityVerbosityTorture=2
AffinityVerbosityTime=2
AffinityVerbosityBench=2
Run regular work, torture test, benchmarks, and even advanced/time. Prime95 should prefer assigning work to the performance cores.

Make sure the cpu affinities output to each worker window make sense. Bring up task manager to verify that the work is being done on the cores prime95 assigned each worker.

Try running on a subset of cores. For example, 1 worker running on 2 hyperthreaded cores -- do the 4 threads in fact run on only 2 performance cores according to task manager?

Last fiddled with by Prime95 on 2021-10-02 at 21:52
Prime95 is online now   Reply With Quote
Old 2021-10-02, 22:59   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×1,481 Posts
Default

I invite any adventurous Alder Lake owner to try it on both Win11 and WSL Ubuntu. And native Linux too if you've got dual-boot in place.
kriesel is offline   Reply With Quote
Old 2021-10-02, 23:51   #5
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×5,023 Posts
Default

Quote:
Originally Posted by kriesel View Post
And native Linux too if you've got dual-boot in place.
Are those who have Linux as primary boot welcome as well?
chalsall is online now   Reply With Quote
Old 2021-10-03, 00:08   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·1,481 Posts
Default

Quote:
Originally Posted by chalsall View Post
Are those who have Linux as primary boot welcome as well?
Sure. There's a little advantage to Win, WSL, and Lin on identical hardware for a 3-way comparison on performance and proper core handling, but I don't think there's a capacity limit at this party. IIRC Windows requires primary partition, Linux doesn't.

Last fiddled with by kriesel on 2021-10-03 at 00:10
kriesel is offline   Reply With Quote
Old 2021-10-03, 00:29   #7
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100111001111102 Posts
Default

Quote:
Originally Posted by kriesel View Post
Windows requires primary partition, Linux doesn't.
You support my argument, sir...

Micro$oft doesn't "play well with others". Some have learnt to stop playing the game with MicroCrap, and have gone "all in" with Linux as the primary OS.

Particularly, being tricked into thinking running virtual environments under WinBlows 10 (now being forced to WinCrows 11) simulating Linux through some kind of virtual shell is somehow doing the same thing as running a "full-up Linux stack" has been empirically shown to be little more than "Snake Oil".

Sincerely... No issues (between the two of us).
chalsall is online now   Reply With Quote
Old 2021-10-03, 03:59   #8
Zhangrc
 
"University student"
May 2021
Beijing, China

12610 Posts
Default

Excuse me, but where's the source code?

Last fiddled with by Zhangrc on 2021-10-03 at 04:11
Zhangrc is offline   Reply With Quote
Old 2021-10-03, 12:25   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

592410 Posts
Default

Quote:
Originally Posted by chalsall View Post
You support my argument, sir...

Micro$oft doesn't "play well with others".

Particularly, being tricked into thinking running virtual environments...
Hmm, you seem a bit zealous. This thread is about a new release of prime95 / mprime. It isn't the place for refighting the favorite-OS wars. Or whether single-boot, multi-boot, or VM is the one true way, or any other techno-religious-fervor conflict. They're all just tools. Don't blame the hammer for a lack of screwdriver-ness. Or do, but in the proper threads
Quote:
Originally Posted by chalsall View Post
I'm giving Ubuntu one more chance.
Heck, run Fedora VMs of various versions on Fedora host OS if you like, and let us know how V30.7 behaves and performs on VM vs host. Or find issues with V30.7 on Fedora host OS.

WSL or VM are tools for having multiple environments available on the same hardware at the same time.
kriesel is offline   Reply With Quote
Old 2021-10-03, 13:53   #10
nordi
 
Dec 2016

7×13 Posts
Default

I noticed that for ECM on small exponents, stage 2 init now takes a lot longer than before

Code:
version 30.6b4

[Worker #4 Oct 3 11:59] ECM on M20393: curve #264 with s=652720576976964, B1=3000000, B2=TBD
[Worker #4 Oct 3 12:02] Stage 1 complete. 77076114 transforms, 1 modular inverses. Time: 191.655 sec.
[Worker #4 Oct 3 12:02] Available memory is 11000MB.
[Worker #4 Oct 3 12:02] Optimal B2 is 176*B1 = 528000000.
[Worker #4 Oct 3 12:03] D: 6930, relative primes: 21344, stage 2 primes: 27534330, pair%=96.81
[Worker #4 Oct 3 12:03] Stage 2 uses 929MB of memory, 2 FFTs per prime pair, 3-mult modinv pooling, pool size 35165.
[Worker #4 Oct 3 12:03] Stage 2 init complete. 560562 transforms, 1 modular inverses. Time: 8.126 sec.
[Worker #4 Oct 3 12:04] Stage 2 complete. 29840810 transforms, 2 modular inverses. Time: 99.321 sec.
[Worker #4 Oct 3 12:04] Stage 2 GCD complete. Time: 0.001 sec.

version 30.7b1
[Worker #4 Oct 3 15:17] ECM on M20393: curve #301 with s=7945291737592001, B1=3000000, B2=TBD
[Worker #4 Oct 3 15:20] Stage 1 complete. 77076114 transforms, 1 modular inverses. Total time: 179.730 sec.
[Worker #4 Oct 3 15:20] Available memory is 11000MB.
[Worker #4 Oct 3 15:20] Optimal B2 is 100*B1 = 300000000.
[Worker #4 Oct 3 15:21] D: 2772, relative primes: 2664, stage 2 primes: 16035509, pair%=86.48
[Worker #4 Oct 3 15:21] Stage 2 uses 75MB of memory, 2 FFTs per prime pair, 3-mult modinv pooling, pool size 2706.
[Worker #4 Oct 3 15:21] Stage 2 init complete. 109141 transforms, 2 modular inverses. Time: 31.491 sec.
[Worker #4 Oct 3 15:22] Stage 2 complete. 19634829 transforms, 31 modular inverses. Total time: 52.660 sec.
[Worker #4 Oct 3 15:22] Stage 2 GCD complete. Time: 0.001 sec.
Code:
version 30.6b4
[Worker #3 Oct 3 11:57] ECM on M307409: curve #139 with s=96291502140021, B1=250000, B2=TBD
[Worker #3 Oct 3 12:02] Stage 1 complete. 6387044 transforms, 1 modular inverses. Time: 316.008 sec.
[Worker #3 Oct 3 12:02] Available memory is 11000MB.
[Worker #3 Oct 3 12:02] Optimal B2 is 154*B1 = 38500000.
[Worker #3 Oct 3 12:02] D: 4620, relative primes: 6955, stage 2 primes: 2325683, pair%=92.69
[Worker #3 Oct 3 12:02] Stage 2 uses 2651MB of memory, 2 FFTs per prime pair, 3-mult modinv pooling, pool size 7693.
[Worker #3 Oct 3 12:02] Stage 2 init complete. 182767 transforms, 1 modular inverses. Time: 10.380 sec.
[Worker #3 Oct 3 12:05] Stage 2 complete. 2656544 transforms, 1 modular inverses. Time: 137.281 sec.
[Worker #3 Oct 3 12:05] Stage 2 GCD complete. Time: 0.030 sec.


 version 30.7b1
[Worker #3 Oct 3 15:14] ECM on M307409: curve #161 with s=8109473831276158, B1=250000, B2=TBD
[Worker #3 Oct 3 15:20] Stage 1 complete. 6387044 transforms, 1 modular inverses. Total time: 326.664 sec.
[Worker #3 Oct 3 15:20] Available memory is 11000MB.
[Worker #3 Oct 3 15:20] Optimal B2 is 147*B1 = 36750000.
[Worker #3 Oct 3 15:20] D: 2772, relative primes: 3600, stage 2 primes: 2225256, pair%=97.96
[Worker #3 Oct 3 15:20] Stage 2 uses 1056MB of memory, 2 FFTs per prime pair, 3-mult modinv pooling, pool size 2652.
[Worker #3 Oct 3 15:20] Stage 2 init complete. 125837 transforms, 2 modular inverses. Time: 15.606 sec.
[Worker #3 Oct 3 15:22] Stage 2 complete. 2412103 transforms, 3 modular inverses. Total time: 132.420 sec.
[Worker #3 Oct 3 15:22] Stage 2 GCD complete. Time: 0.032 sec.
The reduced memory usage is really impressive, though! It saves >90% on M20,393 and 60% on M307,409.

This is a Zen 2 Ryzen 3950X with one worker per CPU thread at 2.8GHz with Linux.
nordi is offline   Reply With Quote
Old 2021-10-03, 15:00   #11
axn
 
axn's Avatar
 
Jun 2003

2·3·5·173 Posts
Default

Observation: Stage 2 progress % splits start out bigger (relative to 30.6) and progressively becomes smaller towards the end. Makes ETA calculations tricky.

Code:
[Work thread Oct 3 05:34] Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 1.704 sec.
[Work thread Oct 3 05:34] D: 1848, relative primes: 4800, stage 2 primes: 20796549, pair%=99.71
[Work thread Oct 3 05:34] Using 10995MB of memory.
[Work thread Oct 3 05:35] Stage 2 init complete. 9481 transforms. Time: 55.307 sec.
[Work thread Oct 3 05:48] M5266619 stage 2 is 5.43% complete. Time: 838.973 sec.
[Work thread Oct 3 06:03] M5266619 stage 2 is 10.95% complete. Time: 840.818 sec.
[Work thread Oct 3 06:17] M5266619 stage 2 is 16.51% complete. Time: 845.246 sec.
[Work thread Oct 3 06:31] M5266619 stage 2 is 22.10% complete. Time: 841.122 sec.
[Work thread Oct 3 06:45] M5266619 stage 2 is 26.96% complete. Time: 841.132 sec.
[Work thread Oct 3 06:59] M5266619 stage 2 is 31.68% complete. Time: 841.467 sec.
[Work thread Oct 3 07:13] M5266619 stage 2 is 36.42% complete. Time: 842.283 sec.
[Work thread Oct 3 07:27] M5266619 stage 2 is 41.18% complete. Time: 842.197 sec.
[Work thread Oct 3 07:41] M5266619 stage 2 is 45.96% complete. Time: 842.407 sec.
[Work thread Oct 3 07:55] M5266619 stage 2 is 50.75% complete. Time: 841.396 sec.
[Work thread Oct 3 08:09] M5266619 stage 2 is 55.56% complete. Time: 843.942 sec.
[Work thread Oct 3 08:23] M5266619 stage 2 is 60.38% complete. Time: 843.597 sec.
[Work thread Oct 3 08:37] M5266619 stage 2 is 65.21% complete. Time: 842.223 sec.
[Work thread Oct 3 08:51] M5266619 stage 2 is 70.06% complete. Time: 842.269 sec.
[Work thread Oct 3 09:05] M5266619 stage 2 is 74.67% complete. Time: 841.707 sec.
[Work thread Oct 3 09:19] M5266619 stage 2 is 79.22% complete. Time: 842.902 sec.
[Work thread Oct 3 09:33] M5266619 stage 2 is 83.56% complete. Time: 842.358 sec.
[Work thread Oct 3 09:47] M5266619 stage 2 is 87.78% complete. Time: 842.067 sec.
[Work thread Oct 3 10:01] M5266619 stage 2 is 91.74% complete. Time: 842.497 sec.
[Work thread Oct 3 10:15] M5266619 stage 2 is 95.51% complete. Time: 843.865 sec.
[Work thread Oct 3 10:29] M5266619 stage 2 is 99.25% complete. Time: 844.053 sec.
[Work thread Oct 3 10:32] M5266619 stage 2 complete. 21204894 transforms. Total time: 17866.727 sec.

Last fiddled with by axn on 2021-10-03 at 15:01
axn is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 22:29.


Fri Dec 3 22:29:39 UTC 2021 up 133 days, 16:58, 0 users, load averages: 1.03, 1.33, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.