mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 30.8 (big P-1 changes, see post #551) (https://www.mersenneforum.org/showthread.php?t=27366)

VBCurtis 2021-11-24 21:21

[QUOTE=Luminescence;593788]Are there any diminishing returns? I can run 2 workers with ~50GB each or one with 100-110GB[/QUOTE]

Depends on how big B2 is, and how big the input is. Once available, experiment. For inputs from this project, two workers and 50GB may be better but for larger inputs a single worker would be. If memory use is like GMP-ECM, it scales linearly with input size and also with the square-root of B2.

Prime95 2021-11-25 04:16

Prime95 30.8 (pre-beta) (FOR P-1 USERS ONLY; SMALL EXPONENTS ONLY)
 
For giggles, I tried P-1 on M80071, B1=200M It appears that the code that caps B2 at 999*B1 needs to change.
B2 = 76 billion in under 2 minutes!

[CODE]
[Work thread Nov 24 22:56] M80071 stage 1 complete. 798217228 transforms. Total time: 3795.041 sec.
[Work thread Nov 24 22:56] Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 0.004 sec.
[Work thread Nov 24 22:56] Switching to FMA3 FFT length 5K using large pages
[Work thread Nov 24 22:56] With trial factoring done to 2^85, optimal B2 is 293*B1 = 58600000000.
[Work thread Nov 24 22:56] Using 6791MB of memory. D: 270270, 25920x142152 polynomial multiplication.
[Work thread Nov 24 22:56] Stage 2 init complete. 998106 transforms. Time: 31.144 sec.
[Work thread Nov 24 22:58] M80071 stage 2 complete. 2815495 transforms. Total time: 101.937 sec.
[Work thread Nov 24 22:58] Stage 2 GCD complete. Time: 0.003 sec.
[Work thread Nov 24 22:58] M80071 completed P-1, B1=200000000, B2=76673707110, Wi8: E437AD7F
[/CODE]

I'm going to try a few more and see if I can find a new factor.

Luminescence 2021-11-25 05:47

[QUOTE=Prime95;593832]For giggles, I tried P-1 on M80071, B1=200M It appears that the code that caps B2 at 999*B1 needs to change.
B2 = 76 billion in under 2 minutes!

[CODE]
[Work thread Nov 24 22:56] M80071 stage 1 complete. 798217228 transforms. Total time: 3795.041 sec.
[Work thread Nov 24 22:56] Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 0.004 sec.
[Work thread Nov 24 22:56] Switching to FMA3 FFT length 5K using large pages
[Work thread Nov 24 22:56] With trial factoring done to 2^85, optimal B2 is 293*B1 = 58600000000.
[Work thread Nov 24 22:56] Using 6791MB of memory. D: 270270, 25920x142152 polynomial multiplication.
[Work thread Nov 24 22:56] Stage 2 init complete. 998106 transforms. Time: 31.144 sec.
[Work thread Nov 24 22:58] M80071 stage 2 complete. 2815495 transforms. Total time: 101.937 sec.
[Work thread Nov 24 22:58] Stage 2 GCD complete. Time: 0.003 sec.
[Work thread Nov 24 22:58] M80071 completed P-1, B1=200000000, B2=76673707110, Wi8: E437AD7F
[/CODE]

I'm going to try a few more and see if I can find a new factor.[/QUOTE]

Holy smokes, that’s a massive boost to P-1. You guys are some truly brilliant minds.

:bow wave:

petrw1 2021-11-25 06:14

[QUOTE=Prime95;593832]For giggles, I tried P-1 on M80071, B1=200M It appears that the code that caps B2 at 999*B1 needs to change.
B2 = 76 billion in under 2 minutes!

[CODE]
...
[Work thread Nov 24 22:56] With trial factoring done to 2^85, optimal B2 is 293*B1 = 58600000000.
...
[/CODE]

Why would it say 2^85.
Does this have something to do with how much ECM has been done?

And...with Stage 2 being so much faster and supported larger values for B2 ... might there be a chance to use it to find more factors of the smallest unfactored? Maybe those under 20,000?

Prime95 2021-11-25 06:31

[QUOTE=petrw1;593840]Why would it say 2^85. Does this have something to do with how much ECM has been done?[/quote]

Yes. That was just my complete-shot-in-the-dark guess as to ECM's equivalent TF.

I upped B1 to 250M, fixed the 999x cap. B2 = 4.45 trillion in an hour and a half.

[CODE]
[Work thread Nov 24 23:42] Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 0.004 sec.
[Work thread Nov 24 23:42] Switching to FMA3 FFT length 5K using large pages
[Work thread Nov 24 23:42] With trial factoring done to 2^90, optimal B2 is 17811*B1 = 4452750000000.
[Work thread Nov 24 23:42] If no prior P-1, chance of a new factor is 6.43%
[Work thread Nov 24 23:42] Using 6791MB of memory. D: 330330, 31680x136392 polynomial multiplication.
[Work thread Nov 24 23:42] Stage 2 init complete. 1225472 transforms. Time: 37.495 sec.
[Work thread Nov 25 01:17] M80071 stage 2 complete. 145791133 transforms. Total time: 5680.476 sec.
[Work thread Nov 25 01:17] Round off: 0.048828125
[Work thread Nov 25 01:17] Stage 2 GCD complete. Time: 0.003 sec.
[Work thread Nov 25 01:17] M80071 completed P-1, B1=250000000, B2=4459674999780, Wi8: 6A0ECD7D
[/CODE]

[quote]And...with Stage 2 being so much faster and supported larger values for B2 ... might there be a chance to use it to find more factors of the smallest unfactored? Maybe those under 20,000?[/QUOTE]

Expos under approx 40000 already benefited from GMP-ECM's generous stage 2. I guess there's a better chance for new factors on expos from 50K to 1M. We'll see.

Zhangrc 2021-11-25 11:36

[QUOTE=Prime95;593842]
With trial factoring done to 2^90, optimal B2 is 17811*B1 = 4452750000000.
M80071 completed P-1, B1=250000000, B2=4459674999780
[/QUOTE]
With T-level being 30.598, you can assume no factor below 2^102 (30.598/0.301).
Why is the B2 value below inconsistent with the value above?
Also, can Prime95 itself guess the estimated T-level when it's offline?

More problems:
How much can wavefront (107-116M) P-1 benefit from v30.8? what bounds does it use?
Does the larger FFT used in stage 2 hurt throughput? Is it larger than necessary?
Can the new algorithm be implemented in ECM and PP1 too?

Prime95 2021-11-25 15:37

[QUOTE=Zhangrc;593850]Why is the B2 value below inconsistent with the value above?[/quote]

The new stage 2 selects a D value (330330 in this case) and then does batches of D values with a single polynomial multiplication. The new code completes the full batch that is larger than the target B2.

[quote]Also, can Prime95 itself guess the estimated T-level when it's offline?[/quote]No.

[quote]How much can wavefront (107-116M) P-1 benefit from v30.8? what bounds does it use?
Does the larger FFT used in stage 2 hurt throughput? Is it larger than necessary?
Can the new algorithm be implemented in ECM and PP1 too?[/QUOTE]

Sadly, wavefront P-1 will not benefit much. There are only 200 or so temporaries available if given 16GB RAM.
The larger FFT will hurt stage 2 throughput. More study is required to see if prime95 is switching to a larger FFT sooner than necessary. The new algorithm can be implemented for P+1 and ECM with some difficulty. Reading papers by Montgomery / Silverman / Kruppa / Zimmermann is no easy matter!

techn1ciaN 2021-11-25 16:10

[QUOTE=Prime95;593861]
Sadly, wavefront P-1 will not benefit much. There are only 200 or so temporaries available if given 16GB RAM.[/QUOTE]

Does this mean that more impressive improvements, like you're seeing with tiny exponents, might be possible even at the P-1 wavefront if someone has massive RAM (say, 128 or 192 GB) and allocates enough of it?

axn 2021-11-25 16:52

[QUOTE=techn1ciaN;593864]Does this mean that more impressive improvements, like you're seeing with tiny exponents, might be possible even at the P-1 wavefront if someone has massive RAM (say, 128 or 192 GB) and allocates enough of it?[/QUOTE]

Not to the same extent as tiny ones, but more memory you throw at it, the better the gains. So, yes, those kind of very large RAM allocations will be useful.

Prime95 2021-11-25 22:31

I found a bug in P-1 stage 2 init that may or may not have affected my previous runs. I'm rerunning all my v30.8 stage 2 work. [B]When using 30.8, I recommend saving your completed P-1 save files until we are confident the new code is working.[/B]

Should you wish to try 30.8, links are below.[LIST][*]Use this version only for P-1 work on Mersenne numbers. This really is pre-beta![*]Please rerun your last 3 or 4 successful P-1 runs to QA that the new P-1 stage 2 code finds those factors.[*]Use much more aggressive B2 bounds. While the optimal B2 calculations may not be perfect I recommend using them anyway.[*]Turn on roundoff error checking[*]Give stage 2 as much memory as you can. Only run one worker with high memory. (The default value for MaxHighMemWorkers will be changing).[*]Save files during P-1 stage 2 cannot be created.[*]There is no progress reporting during P-1 stage 2.[*]P-1 stage 2 is untested on 100M+ exponents. I am not sure the code can accurately gauge when the new code is faster than the old code.[*]AVX-512 is untested -- likely to fail (perhaps silently). Pre-AVX is untested but might work. Recommend using only AVX and FMA FFTs.[*]MaxStage0Prime in undoc.txt has changed.[/LIST]
Windows 64-bit: [URL]https://mersenne.org/ftp_root/gimps/p95v308b1.win64.zip[/URL]
Linux 64-bit: [URL]https://mersenne.org/ftp_root/gimps/p95v308b1.linux64.tar.gz[/URL]

lisanderke 2021-11-25 22:58

1 Attachment(s)
I'll be using 30.8 for re-doing P-1 in ranges where poor P-1 was previously done (in range 8.4M for example)
Currently running the first four of Kriesels recommended P-1 'selftest' exponents/bounds. (Though it is intended for selftesting GPU P-1 software as I understand it. See: [URL]https://www.mersenneforum.org/showpost.php?p=533168&postcount=31[/URL] )

All four exponents seem to have returned the correct factors!


(Before editing it out I pointed out in this post that reporting for stage 2 was not working. I now realize reporting wasn't supposed to work, apologies!)


All times are UTC. The time now is 13:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.