mersenneforum.org > Data Let's Optimize P-1 for low exponents. TL;DR in post #1. More in posts 60 and 61.
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2022-01-04, 03:12 #1 petrw1 1976 Toyota Corona years forever!     "Wayne" Nov 2006 Saskatchewan, Canada 144F16 Posts Let's Optimize P-1 for low exponents. TL;DR in post #1. More in posts 60 and 61. This project is made possible (or dare I say necessary?) because of the speed of P-1 factoring under Prime95 v30.8. I cannot take credit for this idea. It has been suggested by others; whom shall remain nameless ... mostly because I'm sure if I try I will miss someone. In short we want to use v30.8+ to factor lower exponents (<20M?) to the more optimal levels this version affords us. We will encourage members who may be interested and who have PC's with a decent amount of RAM to participate. It has been suggested that 16GB+ is preferred but in my experience even 8GB is adequate. ============================================================== TL;DR 0. Be sure you have upgraded mprime or Prime95 to the latest build of v30.8. See here. 1. Choose a range not already listed above and inform me to add it. Preferably at least 0.1M or 1.0M in size. 2. Calculate the recommended minimum B1 to use (feel free to round this value). Code: 2.2^LOG(20,000,000/,2)*1,000,000 3. The above is based on 16GB RAM. Adjust this B1 based on your allocated RAM: (Preferably at least 6GB or more) Code: sqrt( 16 / your GB RAM) × proposed B1 4. Use this sorted by Bound#1 and/or Bound#2 to find the exponents with the least P-1 so far. Substitute your exponent range in the query. 5. Generate Pminus statements for the exponents you deem worth redoing P-1 to the recommended bounds. Popular opinion is we want the new B1 to be at least 10x the current unless the current B2 is VERY large. If the current P-1 was done prior to v30.8, B2 will be about 20x B1 whereas with v30.8 it will be hundreds or thousands time larger. If some of you know how to use Pfactor to get similar Bounds let me know. Code: Pminus1=N/A,1,2,,-1,,0, 6. Due to the huge benefit v30.8 has with LOTS of RAM it is recognized the best configuration is 1 Worker using all Threads. If you are ambitions you can run individual threads for Stage 1 then adjust the Worker Windows to use all Workers and as much RAM as you can spare for Stage 2. ====================================================== Code: Known assignments (If no activity in the last month I assume it is no longer active): .07 George .08 George .09 George 0.2 petrw1 - COMPLETE (35) 0.25 Anton 0.300-0.325 Kruoli 0.7 Lycorn 0.8 Lycorn 1.0 axn & Lord Julius - COMPLETE 1.1 mikr 1.2 petrw1 - COMPLETE (67) 2.8 mikr 3.8 DrobinsonPE 3.9 king 4.0 DrobinsonPE 4.2 DrobinsonPE 4.96 Denial140 5.0 petrw1 --- COMPLETE (100) 5.1 petrw1 5.2 petrw1 5.3 petrw1 5.4 petrw1 6.0 takahashi 7.0 harlee 7.1 harlee 7.9 harlee 8.0 harlee 8.1 Flauktorist --- COMPLETE 8.8 linament 9.0 Tha 9.4 Tha 9.9 Tha - COMPLETE 10.2 mikr 13.3 nordi 13.4 nordi 13.5 nordi 13.7 James 13.8 James 13.9 Jamesa 16.3 Alpertron 17.7 Alpertron 17.8 Alpertron Last fiddled with by petrw1 on 2022-07-05 at 19:54 Reason: Updating reserved ranges
 2022-01-05, 12:54 #2 tha     Dec 2002 2·52·17 Posts Now that we are close to the target set originally, to get under 20,000 unfactored candidates per one million range (or 2,000 per one hundred thousand) and now that 30.8 has changed the landscape I think we should set a new target. Rather than setting the same standard for each range I was thinking of each unfactored exponent having done an amount of P-1 work on it that is reasonable. I currently work in the 9.1M range with B1 set to 2,000,000 and using a machine with 64 GiB of ram which sets B2 at 3,500,000,000. That amounts to about 90 GHz-days each. Two things now would be worthwhile: A. How to get P-1 being done on machines with heavy ram resources? B. How to set a meaningful standard as the next target? Last fiddled with by petrw1 on 2022-01-10 at 15:05 Reason: Fixed ...
2022-01-05, 14:53   #3
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

3·1,733 Posts

Quote:
 Originally Posted by tha Now that we are close to the target set originally, to get under 20,000 unfactored candidates per one million range (or 2,000 per one hundred thousand) and now that 30.8 has changed the landscape I think we should set a new target. Rather than setting the same standard for each range I was thinking of each unfactored exponent having done an amount of P-1 work on it that is reasonable. I currently work in the 9.1M range with B1 set to 2,000,000 and using a machine with 64 GiB of ram which sets B2 at 3,500,000,000. That amounts to about 90 GHz-days each. Two things now would be worthwhile: A. How to get P-1 being done on machines with heavy ram resources? B. How to set a meaningful standard as the next target?
I have been thinking about what to do next ... we are probably about 6 months from completing this project.
I intend to continue with some kind of P-1.
What you suggest is a worth consideration.

Thanks

2022-01-07, 22:49   #4
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

3×1,733 Posts
So giving this more thought....

Quote:
 Originally Posted by tha Two things now would be worthwhile: A. How to get P-1 being done on machines with heavy ram resources? B. How to set a meaningful standard as the next target?
Thoughts anyone?
A. Is there a minimum/preferred RAM target? Or do lower RAM PC's just need manually specified Bound2 and take longer in order to achieve B.
B. I'll start: How about minimum B1=1M / B2=200M for exponents under 20M
--- Or should the min B1 be a product of the exponent?

Last fiddled with by petrw1 on 2022-01-07 at 22:51 Reason: More ...

2022-01-07, 23:45   #5
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

41·193 Posts

Quote:
 Originally Posted by petrw1 Thoughts anyone? A. Is there a minimum/preferred RAM target? Or do lower RAM PC's just need manually specified Bound2 and take longer in order to achieve B. B. I'll start: How about minimum B1=1M / B2=200M for exponents under 20M --- Or should the min B1 be a product of the exponent?
I assume you are still targeting Mersennes with no known factors.
Maybe you can target the same elapsed time effort for each exponent, attacking exponents with bounds significantly below your new recommended bounds. The goal bounds should be chosen such that the project is difficult but can be completed in a reasonable timeframe (I think that's one reason why 20M project caught fire).

B1 should be a multiple (rounded to a nice even number) of either exponent or FFT length.
B2 should be something a 16GB(?) machine can produce comfortably.

My data point for you in helping you come up with your formula. 6 or 7 year old quad core, 8GB mem, ~3GHz, exponent = 80K, FFTlen=4K, stage 2 FFTlen=5K:
B1 = 1B takes 16500 sec. for 4 or 4125 seconds each.
B2 = 20.5T takes 5460 sec. using 6.7GB.

Once you decide on your recommended bounds, if people have lots of RAM they should exceed your minimum B2 bounds.

2022-01-08, 04:38   #6
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

519910 Posts

Quote:
 Originally Posted by Prime95 I assume you are still targeting Mersennes with no known factors. Maybe you can target the same elapsed time effort for each exponent, attacking exponents with bounds significantly below your new recommended bounds. The goal bounds should be chosen such that the project is difficult but can be completed in a reasonable timeframe (I think that's one reason why 20M project caught fire). B1 should be a multiple (rounded to a nice even number) of either exponent or FFT length. B2 should be something a 16GB(?) machine can produce comfortably. My data point for you in helping you come up with your formula. 6 or 7 year old quad core, 8GB mem, ~3GHz, exponent = 80K, FFTlen=4K, stage 2 FFTlen=5K: B1 = 1B takes 16500 sec. for 4 or 4125 seconds each. B2 = 20.5T takes 5460 sec. using 6.7GB. Once you decide on your recommended bounds, if people have lots of RAM they should exceed your minimum B2 bounds.
Leaning towards not-yet-factored but not hard and fast.

If I use your "data point" as a reference:
B1=1B/80K=12,500xexponent
B2=20.5T=20,500xB1

If I extend that exponent/B1 ratio of 12,500 then for exponent 20M I'd need B1=250B.
If I let Prime95 choose B2 I know it wouldn't be 20,500x maybe closer to 300x.
This is not workable.

It is not obvious to me what FFT to expect so I would not be able to choose a B1 as a multiple of FFT.
So possibly a diminishing multiple of the exponent?
Or as you suggested striving for a standard time duration per exponent?
We could aim for a duration of 1 or 2 hours on a "typical" current PC.
What I've noticed across my 5 PC's with RAM available varying from 5G to 24.5G is that less RAM decreases B2 so that both Stages run times are close.

I can suggest a minimum RAM allocation to do this project justice but I won't stop anyone who wants to contribute.
I could further suggest that if someone's available RAM is less than xGB that they rather specify the B2 as a certain multiple of B1, accepting that the Stage 2 run time will be relatively longer than Stage 1.

My 7820x with 24.5GB takes about 12 Minutes for B1=1M in this 20M exponent range
It chooses 328x for exponents in this range and takes about 10 minutes

I tried 80177 on my 7820x with 24.5GB RAM.
I used B1=100M. Prime95 chose B2=122,881xB1
Stage 1: FFT=4.5K/19 Minutes
Stage 2: FFT=5K/17 Minutes

2022-01-08, 07:08   #7
tha

Dec 2002

2×52×17 Posts

Quote:
 Originally Posted by Prime95 B2 should be something a 16GB(?) machine can produce comfortably. ..... Once you decide on your recommended bounds, if people have lots of RAM they should exceed your minimum B2 bounds.
My machine is about 3 years old. The money I spent at 16 Gb at the time I have now spent on 64 Gb of RAM. That I believe is money well spent. It will repay itself in a short time due to a lesser electricity bill for the same amount of work.

So maybe set B2 accordingly and have machines with lesser RAM do more cycles.

2022-01-08, 15:06   #8
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

3·1,733 Posts

Quote:
 Originally Posted by tha So maybe set B2 accordingly and have machines with lesser RAM do more cycles.
Up until now my strategy for every range has been to chose the minimum bounds required to ensure a range cleared.

I'm thinking that once I'm below 20M I'll set B1 at a minimum of 1M and choose a decent v30.8 B2.

Thinking about exponents between 10M and 19M
Do you think a minimum of 300xB1 is reasonable? Or 500x or 1000x?
Maybe a bigger ratio for 10M to 14M?

Under 10M even my machine with 3.5GB available will choose about 300x or more.

Last fiddled with by petrw1 on 2022-01-08 at 15:12 Reason: First 2 sentences

2022-01-08, 16:14   #9
axn

Jun 2003

19×283 Posts

Quote:
 Originally Posted by petrw1 Do you think a minimum of 300xB1 is reasonable? Or 500x or 1000x? Maybe a bigger ratio for 10M to 14M?
Why not let the program calculate the optimal B2, given B1, TF & RAM? So then the problem becomes selecting the B1.

There can be three possible shape for B1 as a function of exponent.
B1 increases with exponent (along the lines of how wavefront P-1 works).
B1 constant
B1 decreases with exponent (along the lines of how <20M project typically works).

Probably #3 is the best option. #1 would be increasingly costlier at higher ranges. #2 might undershoot lower ranges or overshoot higher ranges (or both).

We can then select B1 as C1 / exp (for some const C1) or C2 / FFT (for another const C2) where FFT is the stage 1 FFT size used by P95.

EDIT:- Finally, we'll finesse B1 up or down based on RAM. If you have low RAM, you'll do higher than normal B1 to compensate.

Last fiddled with by axn on 2022-01-08 at 16:15

2022-01-08, 20:49   #10
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

41·193 Posts

Quote:
 Originally Posted by petrw1 If I extend that exponent/B1 ratio of 12,500 then for exponent 20M I'd need B1=250B. If I let Prime95 choose B2 I know it wouldn't be 20,500x maybe closer to 300x. This is not workable.
Understood. My initial suggestion was poorly presented. I'm suggesting a project along the lines of all unfactored exponents below 20M are given x hours of P-1 effort where this is based on some reference setup (a 3 yr old quad core with 16GB memory?)

My data point gives you an idea of the bounds you might expect in the 80K range.

Quote:
 It is not obvious to me what FFT to expect so I would not be able to choose a B1 as a multiple of FFT. So possibly a diminishing multiple of the exponent?
FFT size is a little more than linearly correlated with exponent (double the exponent will a little more than double the FFT size).
FFT timings also go up a little more than linearly (double the FFT size, the time it takes to do a multiplication goes up by about a factor of 2.1)
Stage 1 runtime is linearly linked to B1 (yeah! double B1 doubles the runtime).

One can probably put all that together for a formula that generates a rough suggested B1 value. Something along the lines of every time you double the exponent suggested B1 drops by a factor of 2.2?

Quote:
 Or as you suggested striving for a standard time duration per exponent? We could aim for a duration of 1 or 2 hours on a "typical" current PC. What I've noticed across my 5 PC's with RAM available varying from 5G to 24.5G is that less RAM decreases B2 so that both Stages run times are close.
Yes, this is easy to easy for B1.

For B2, the project could recommend minimum B2 values or let the user choose the optimal B2 value for their situation or "require" low RAM machines to run longer stage 2 times to reach the recommended B2 value or encourage low RAM machines to do stage 1 only and partner up with a high RAM machine which does nothing but stage 2.

Tough call. If you double the available RAM, how much of a B2 boost do you get. I wouldn't be surprised if it was 5x or more.

Last fiddled with by Prime95 on 2022-01-08 at 20:50

 2022-01-08, 22:12 #11 masser     Jul 2003 Behind BB 77416 Posts Wayne, have you looked at James' P-1 Effort page recently? Screenshot below. I don't have a nice, pithy way to describe this, but a good effort might be to start at the left non-zero column (0.06) and move each exponent in that column up two orders of magnitude in GhzD to say, 8 GhzD. When that's complete, move to the next column (0.12) and move those exponents up to the 16 GhzD column, and so on, ad nauseam. Like time-spent-per-exponent, focusing on GhzD effort would have the effect that larger exponents would naturally get lower bounds and smaller exponents would get much higher bounds. We should also consider again where the optimal point for switching to ECM/P+1 might be, given the new radically larger B2 available for P-1. Attached Thumbnails   Last fiddled with by masser on 2022-01-08 at 22:16

 Similar Threads Thread Thread Starter Forum Replies Last Post Ilya Gazman Factoring 6 2020-08-26 22:03 kladner Lounge 3 2018-10-01 20:32 gd_barnes No Prime Left Behind 6 2008-02-29 01:09 jasong Marin's Mersenne-aries 7 2006-12-22 21:59 GP2 Software 10 2003-12-09 20:41

All times are UTC. The time now is 08:15.

Wed Jul 6 08:15:51 UTC 2022 up 83 days, 6:17, 0 users, load averages: 1.25, 1.11, 1.14

Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔