mersenneforum.org Prime95 30.8 (big P-1 changes, see post #551)
 Register FAQ Search Today's Posts Mark Forums Read

2021-11-24, 21:21   #12
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

150416 Posts

Quote:
 Originally Posted by Luminescence Are there any diminishing returns? I can run 2 workers with ~50GB each or one with 100-110GB
Depends on how big B2 is, and how big the input is. Once available, experiment. For inputs from this project, two workers and 50GB may be better but for larger inputs a single worker would be. If memory use is like GMP-ECM, it scales linearly with input size and also with the square-root of B2.

 2021-11-25, 04:16 #13 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 3×11×241 Posts Prime95 30.8 (pre-beta) (FOR P-1 USERS ONLY; SMALL EXPONENTS ONLY) For giggles, I tried P-1 on M80071, B1=200M It appears that the code that caps B2 at 999*B1 needs to change. B2 = 76 billion in under 2 minutes! Code: [Work thread Nov 24 22:56] M80071 stage 1 complete. 798217228 transforms. Total time: 3795.041 sec. [Work thread Nov 24 22:56] Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 0.004 sec. [Work thread Nov 24 22:56] Switching to FMA3 FFT length 5K using large pages [Work thread Nov 24 22:56] With trial factoring done to 2^85, optimal B2 is 293*B1 = 58600000000. [Work thread Nov 24 22:56] Using 6791MB of memory. D: 270270, 25920x142152 polynomial multiplication. [Work thread Nov 24 22:56] Stage 2 init complete. 998106 transforms. Time: 31.144 sec. [Work thread Nov 24 22:58] M80071 stage 2 complete. 2815495 transforms. Total time: 101.937 sec. [Work thread Nov 24 22:58] Stage 2 GCD complete. Time: 0.003 sec. [Work thread Nov 24 22:58] M80071 completed P-1, B1=200000000, B2=76673707110, Wi8: E437AD7F I'm going to try a few more and see if I can find a new factor.
2021-11-25, 05:47   #14
Luminescence

Oct 2021
Germany

7×17 Posts

Quote:
 Originally Posted by Prime95 For giggles, I tried P-1 on M80071, B1=200M It appears that the code that caps B2 at 999*B1 needs to change. B2 = 76 billion in under 2 minutes! Code: [Work thread Nov 24 22:56] M80071 stage 1 complete. 798217228 transforms. Total time: 3795.041 sec. [Work thread Nov 24 22:56] Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 0.004 sec. [Work thread Nov 24 22:56] Switching to FMA3 FFT length 5K using large pages [Work thread Nov 24 22:56] With trial factoring done to 2^85, optimal B2 is 293*B1 = 58600000000. [Work thread Nov 24 22:56] Using 6791MB of memory. D: 270270, 25920x142152 polynomial multiplication. [Work thread Nov 24 22:56] Stage 2 init complete. 998106 transforms. Time: 31.144 sec. [Work thread Nov 24 22:58] M80071 stage 2 complete. 2815495 transforms. Total time: 101.937 sec. [Work thread Nov 24 22:58] Stage 2 GCD complete. Time: 0.003 sec. [Work thread Nov 24 22:58] M80071 completed P-1, B1=200000000, B2=76673707110, Wi8: E437AD7F I'm going to try a few more and see if I can find a new factor.
Holy smokes, that’s a massive boost to P-1. You guys are some truly brilliant minds.

 2021-11-25, 06:14 #15 petrw1 1976 Toyota Corona years forever!     "Wayne" Nov 2006 Saskatchewan, Canada 3×37×47 Posts [QUOTE=Prime95;593832]For giggles, I tried P-1 on M80071, B1=200M It appears that the code that caps B2 at 999*B1 needs to change. B2 = 76 billion in under 2 minutes! Code: ... [Work thread Nov 24 22:56] With trial factoring done to 2^85, optimal B2 is 293*B1 = 58600000000. ... Why would it say 2^85. Does this have something to do with how much ECM has been done? And...with Stage 2 being so much faster and supported larger values for B2 ... might there be a chance to use it to find more factors of the smallest unfactored? Maybe those under 20,000? Last fiddled with by petrw1 on 2021-11-25 at 06:21 Reason: And...
2021-11-25, 06:31   #16
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

174218 Posts

Quote:
 Originally Posted by petrw1 Why would it say 2^85. Does this have something to do with how much ECM has been done?
Yes. That was just my complete-shot-in-the-dark guess as to ECM's equivalent TF.

I upped B1 to 250M, fixed the 999x cap. B2 = 4.45 trillion in an hour and a half.

Code:
[Work thread Nov 24 23:42] Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 0.004 sec.
[Work thread Nov 24 23:42] Switching to FMA3 FFT length 5K using large pages
[Work thread Nov 24 23:42] With trial factoring done to 2^90, optimal B2 is 17811*B1 = 4452750000000.
[Work thread Nov 24 23:42] If no prior P-1, chance of a new factor is 6.43%
[Work thread Nov 24 23:42] Using 6791MB of memory.  D: 330330, 31680x136392 polynomial multiplication.
[Work thread Nov 24 23:42] Stage 2 init complete. 1225472 transforms. Time: 37.495 sec.
[Work thread Nov 25 01:17] M80071 stage 2 complete. 145791133 transforms. Total time: 5680.476 sec.
[Work thread Nov 25 01:17] Round off: 0.048828125
[Work thread Nov 25 01:17] Stage 2 GCD complete. Time: 0.003 sec.
[Work thread Nov 25 01:17] M80071 completed P-1, B1=250000000, B2=4459674999780, Wi8: 6A0ECD7D
Quote:
 And...with Stage 2 being so much faster and supported larger values for B2 ... might there be a chance to use it to find more factors of the smallest unfactored? Maybe those under 20,000?
Expos under approx 40000 already benefited from GMP-ECM's generous stage 2. I guess there's a better chance for new factors on expos from 50K to 1M. We'll see.

2021-11-25, 11:36   #17
Zhangrc

"University student"
May 2021
Beijing, China

2·7·19 Posts

Quote:
 Originally Posted by Prime95 With trial factoring done to 2^90, optimal B2 is 17811*B1 = 4452750000000. M80071 completed P-1, B1=250000000, B2=4459674999780
With T-level being 30.598, you can assume no factor below 2^102 (30.598/0.301).
Why is the B2 value below inconsistent with the value above?
Also, can Prime95 itself guess the estimated T-level when it's offline?

More problems:
How much can wavefront (107-116M) P-1 benefit from v30.8? what bounds does it use?
Does the larger FFT used in stage 2 hurt throughput? Is it larger than necessary?
Can the new algorithm be implemented in ECM and PP1 too?

Last fiddled with by Zhangrc on 2021-11-25 at 11:49

2021-11-25, 15:37   #18
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

3×11×241 Posts

Quote:
 Originally Posted by Zhangrc Why is the B2 value below inconsistent with the value above?
The new stage 2 selects a D value (330330 in this case) and then does batches of D values with a single polynomial multiplication. The new code completes the full batch that is larger than the target B2.

Quote:
 Also, can Prime95 itself guess the estimated T-level when it's offline?
No.

Quote:
 How much can wavefront (107-116M) P-1 benefit from v30.8? what bounds does it use? Does the larger FFT used in stage 2 hurt throughput? Is it larger than necessary? Can the new algorithm be implemented in ECM and PP1 too?
Sadly, wavefront P-1 will not benefit much. There are only 200 or so temporaries available if given 16GB RAM.
The larger FFT will hurt stage 2 throughput. More study is required to see if prime95 is switching to a larger FFT sooner than necessary. The new algorithm can be implemented for P+1 and ECM with some difficulty. Reading papers by Montgomery / Silverman / Kruppa / Zimmermann is no easy matter!

Last fiddled with by Prime95 on 2021-11-25 at 15:38

2021-11-25, 16:10   #19
techn1ciaN

Oct 2021
U. S. / Maine

2×73 Posts

Quote:
 Originally Posted by Prime95 Sadly, wavefront P-1 will not benefit much. There are only 200 or so temporaries available if given 16GB RAM.
Does this mean that more impressive improvements, like you're seeing with tiny exponents, might be possible even at the P-1 wavefront if someone has massive RAM (say, 128 or 192 GB) and allocates enough of it?

2021-11-25, 16:52   #20
axn

Jun 2003

5,387 Posts

Quote:
 Originally Posted by techn1ciaN Does this mean that more impressive improvements, like you're seeing with tiny exponents, might be possible even at the P-1 wavefront if someone has massive RAM (say, 128 or 192 GB) and allocates enough of it?
Not to the same extent as tiny ones, but more memory you throw at it, the better the gains. So, yes, those kind of very large RAM allocations will be useful.

 2021-11-25, 22:31 #21 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 3×11×241 Posts I found a bug in P-1 stage 2 init that may or may not have affected my previous runs. I'm rerunning all my v30.8 stage 2 work. When using 30.8, I recommend saving your completed P-1 save files until we are confident the new code is working. Should you wish to try 30.8, links are below.Use this version only for P-1 work on Mersenne numbers. This really is pre-beta! Please rerun your last 3 or 4 successful P-1 runs to QA that the new P-1 stage 2 code finds those factors. Use much more aggressive B2 bounds. While the optimal B2 calculations may not be perfect I recommend using them anyway. Turn on roundoff error checking Give stage 2 as much memory as you can. Only run one worker with high memory. (The default value for MaxHighMemWorkers will be changing). Save files during P-1 stage 2 cannot be created. There is no progress reporting during P-1 stage 2. P-1 stage 2 is untested on 100M+ exponents. I am not sure the code can accurately gauge when the new code is faster than the old code. AVX-512 is untested -- likely to fail (perhaps silently). Pre-AVX is untested but might work. Recommend using only AVX and FMA FFTs. MaxStage0Prime in undoc.txt has changed. Windows 64-bit: https://mersenne.org/ftp_root/gimps/p95v308b1.win64.zip Linux 64-bit: https://mersenne.org/ftp_root/gimps/...linux64.tar.gz Last fiddled with by Prime95 on 2021-11-26 at 01:05
 2021-11-25, 22:58 #22 lisanderke   "Lisander Viaene" Oct 2020 Belgium 109 Posts I'll be using 30.8 for re-doing P-1 in ranges where poor P-1 was previously done (in range 8.4M for example) Currently running the first four of Kriesels recommended P-1 'selftest' exponents/bounds. (Though it is intended for selftesting GPU P-1 software as I understand it. See: https://www.mersenneforum.org/showpo...8&postcount=31 ) All four exponents seem to have returned the correct factors! (Before editing it out I pointed out in this post that reporting for stage 2 was not working. I now realize reporting wasn't supposed to work, apologies!) Attached Thumbnails   Last fiddled with by lisanderke on 2021-11-25 at 23:22

 Similar Threads Thread Thread Starter Forum Replies Last Post kar_bon Prime Wiki 40 2022-04-03 19:05 science_man_88 science_man_88 24 2018-10-19 23:00 xilman Linux 2 2010-12-15 16:39 kar_bon Forum Feedback 3 2010-09-28 08:01 dave_0273 Lounge 1 2005-02-27 18:36

All times are UTC. The time now is 19:06.

Sun Aug 14 19:06:05 UTC 2022 up 38 days, 13:53, 2 users, load averages: 1.52, 1.06, 0.97