![]() |
I also benchmarked four Zen2 cores (=1 core complex) working on [M]11977759[/M] (FFT length in stage 2 768K) with B2=50,000,000 (which mprime modified a bit) and different RAM settings. The timings are for stage 2 init and stage 2 itself, plus the total time.
[code] 8.5 GB 10.8 + 315.6 = [B]326.4 seconds[/B] B2=51,228,870 17 GB 24.6 + 165.7 = [B]190.3 seconds[/B] B2=51,278,370 34 GB 43.4 + 138.3 = [B]181.7 seconds[/B] B2=[B]72[/B],162,090 [/code]Doubling RAM from 8.5 to 17 GB gave 72% more throughput. Doubling RAM again to 34 GB gave 5% more throughput at a much higher B2. Even with 96GB available, mprime still used 'only' 34GB, so no more benchmark results. But still, this version wants LOTS of RAM and puts it to excellent use. |
[QUOTE=nordi;594108]The automatically chosen B2 was too aggressive![/QUOTE]
That will be a problem for a while. Optimal B2 uses a cost function which I have not worked on much. There's little point working on the cost function while the stage 2 code is still being optimized. I noticed the same thing here on exponents around 80K. B1 of 300 million (2 hours) is getting a B2 of 12 trillion (4 hours). |
B2=90M for wavefront P-1(108M)
[code]
[Nov 29 12:46] Setting affinity to run worker on CPU core #2 [Nov 29 12:46] Optimal P-1 factoring of M108390077 using up to 11571MB of memory. [Nov 29 12:46] Assuming no factors below 2^77 and 2 primality tests saved if a factor is found. [Nov 29 12:46] Optimal bounds are B1=956000, [B]B2=89586000[/B] [Nov 29 12:46] Chance of finding a factor is an estimated 4.7% [Nov 29 12:46] [Nov 29 12:46] Using FMA3 FFT length 5760K, Pass1=768, Pass2=7680, clm=4, 4 threads [/code] Impressive. |
Prime95 30.8 (pre-beta) (FOR P-1 USERS ONLY; SMALL EXPONENTS ONLY)
Looks like 30.8 builds are now available. I just downloaded build 2. This should be made a Sticky as soon as possible.
|
30.8 is pre-beta. It should not be stickied yet.
See here for the current issues: [url]https://www.mersenneforum.org/showpost.php?p=594097&postcount=988[/url] |
30.8 is [B]not ready for prime-time[/B]!
I made this version available much earlier than normal because it has significant improvements for P-1 stage 2 on "smaller" exponents. This version is only for P-1 users. |
Understood. I won’t start using it yet. Hopefully later builds will fix things.
I couldn’t download the stable version of 30.7, only build 9, which I’m currently using. |
[QUOTE=Glenn;594225]I couldn’t download the stable version of 30.7, only build 9, which I’m currently using.[/QUOTE]
That is the stable version. James Heinrich said in the 30.7 thread that the problem with the mersenne.org download should already have been fixed, unless you were experiencing a different one. |
Perhaps the title of this post could be edited to reflect (on first glance) that it is not ready for all users, at least until that version comes out of pre-beta. (something like: "Prime95 30.8 (ONLY FOR P-1 USERS)")
I think it might be nice to move discussion/bug reports from the sub two k thread to here, in the software category, since there are quite a lot of posts to do with mostly this release/pre-beta version there. Just a suggestion ofcourse, and thanks for all the continued hard work on this software!! |
Build 2 is bad with multithreading:
[CODE]P-1 on M5401951 with B1=8000000, B2=8000000000 Setting affinity to run helper thread 1 on CPU core #2 Setting affinity to run helper thread 3 on CPU core #4 Setting affinity to run helper thread 4 on CPU core #5 Setting affinity to run helper thread 2 on CPU core #3 Using FMA3 FFT length 280K, Pass1=896, Pass2=320, clm=2, 6 threads Setting affinity to run helper thread 5 on CPU core #6 Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 1.024 sec. Setting affinity to run helper thread 1 on CPU core #2 Setting affinity to run helper thread 3 on CPU core #4 Switching to FMA3 FFT length 336K, Pass1=448, Pass2=768, clm=1, 6 threads Setting affinity to run helper thread 4 on CPU core #5 Setting affinity to run helper thread 2 on CPU core #3 Setting affinity to run helper thread 5 on CPU core #6 Using 56770MB of memory. D: 43890, 4320x16961 polynomial multiplication. Round off: 0, poly_size: 2, EB: 1.67728, SM: 3.33496 Round off: 0, poly_size: 4 Round off: 0, poly_size: 8 Round off: 0, poly_size: 16 Round off: 0, poly_size: 32 Round off: 0, poly_size: 64 Round off: 0, poly_size: 128 Round off: 0, poly_size: 256 Round off: 0, poly_size: 512 Round off: 0, poly_size: 1024 Round off: 0, poly_size: 2048 Round off: 0, poly_size: 4096 Round off: 0, poly_size: 8192 Stage 2 init complete. 148272 transforms. Time: 158.998 sec. Round off: 0 M5401951 stage 2 is 0.00% complete. M5401951 stage 2 complete. 2128051 transforms. Total time: 2374.162 sec. Stage 2 GCD complete. Time: 0.652 sec. M5401951 completed P-1, B1=8000000, B2=8285685870[/CODE] Compare to build 1: [CODE]P-1 on M5401993 with B1=8000000, B2=8000000000 Using FMA3 FFT length 280K, Pass1=896, Pass2=320, clm=2, 6 threads Setting affinity to run helper thread 3 on CPU core #4 Setting affinity to run helper thread 2 on CPU core #3 Setting affinity to run helper thread 1 on CPU core #2 Setting affinity to run helper thread 5 on CPU core #6 Setting affinity to run helper thread 4 on CPU core #5 Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 1.021 sec. Setting affinity to run helper thread 1 on CPU core #2 Switching to FMA3 FFT length 336K, Pass1=448, Pass2=768, clm=1, 6 threads Setting affinity to run helper thread 3 on CPU core #4 Setting affinity to run helper thread 2 on CPU core #3 Setting affinity to run helper thread 4 on CPU core #5 Setting affinity to run helper thread 5 on CPU core #6 Using 56770MB of memory. D: 43890, 4320x16961 polynomial multiplication. Setting affinity to run polymult helper thread on CPU core #2 Setting affinity to run polymult helper thread on CPU core #3 Setting affinity to run polymult helper thread on CPU core #4 Setting affinity to run polymult helper thread on CPU core #5 Setting affinity to run polymult helper thread on CPU core #6 Stage 2 init complete. 148272 transforms. Time: 112.924 sec. M5401993 stage 2 is 0.00% complete. M5401993 stage 2 complete. 2128051 transforms. Total time: 942.714 sec. Stage 2 GCD complete. Time: 0.663 sec. M5401993 completed P-1, B1=8000000, B2=8285685870[/CODE] 2374s vs 942s. top shows build 2 is using 200% (with occasional spikes to 500+%) whereas build 1 is consistently pegged at ~600% |
[QUOTE=kruoli;594103]My test case was two workers. The first had a known factor. The second had some other work:
[CODE][Worker #1] Pminus1=N/A,1,2,22463209,-1,1000000,324000000,75 [Worker #2] Pminus1=N/A,1,2,21362113,-1,1000000,32400000,75 Pminus1=N/A,1,2,21362903,-1,1000000,32400000,75[/CODE] It started normally, but was not stating which B2 it wanted to use. I had a stage 1 file which it used successfully. While stage 2 in worker #1 was running (using 110-115 % of the memory I had allowed it), stage 1 of the first assignment in worker #2 completed and the second assessment was started. After the factor was found, the worktodo entry in worker #1 was removed. It then crashed with error code 0xc0000005 at 0x000000000208b09a. I tried to start the program again. When entering the worker #2 start (it now tried to start stage 2 of the first assignment of worker #2), it gave a B2 value this time, but crashed again. So I ran it in the debugger and got an error at 0x00007FF7093CB09A in prime95.exe: 0xC0000005: access violation exception reading 0xFFFFFFFFFFFFFFE4.[/QUOTE] George, do you need a save file for that? I tested some more (stage 1 done by 30.8b2) and got this again with another exponent, but some exponents are fine. I omitted the system details… This was on a 1950X. |
All times are UTC. The time now is 12:29. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.