mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-12-14, 15:33   #1
Zhangrc
 
"University student"
May 2021
Beijing, China

22×67 Posts
Exclamation 30.8 & optimal P-1 for wavefront exponents

I tried several different bounds here:
https://www.mersenne.ca/prob.php?exp...00&b2=22000000
My result:
Assuming that v30.8 is 2.5x faster for wavefront exponents, and 1.1 tests are saved if a factor is found, then the "PrimeNet" bounds in mersenne.ca is almost optimal.
However, we can reduce these bounds a bit, since not everyone has large enough memory (>30G) to do P-1 at peak speed.
The b1=450000&b2=22000000 in the link above should not be far from optimal.

Last fiddled with by Zhangrc on 2021-12-14 at 15:37
Zhangrc is offline   Reply With Quote
Old 2021-12-14, 22:59   #2
techn1ciaN
 
techn1ciaN's Avatar
 
Oct 2021
U. S. / New York, NY

149 Posts
Default

Quote:
Originally Posted by Zhangrc View Post
Assuming that v30.8 is 2.5x faster for wavefront exponents...
That sounds like a big assumption considering Mr. Woltman's previous comments that wavefront P-1 "will not benefit much" (see post #18 in https://www.mersenneforum.org/showth...861#post593861). Is 2.5x purely a spitball / an extrapolation or did you get that number through empirical testing? If the latter, how much RAM do you have allocated?

Quote:
Originally Posted by Zhangrc View Post
However, we can reduce these bounds a bit, since not everyone has large enough memory (>30G) to do P-1 at peak speed.
The P-1 cost optimizer in recent stable versions is already set up to take RAM allocation into account. From my limited experience, this seems to appear as B1 decreasing when more RAM is allocated, because a bigger B2 can be run at the same speed for the same chance of finding a factor in less total (S1 + S2) run time. I imagine something more aggressive but along the same lines will easily be worked into shipping versions of 30.8, once Mr. Woltman gets the "important" code more buttoned up and can focus on the new cost optimizer.

Incidentally, in the extreme low-end case (not enough RAM allocated for stage 2 to start), B1 seems to be selected such that stage 1 takes almost as long to run as both stages would take together for a user with enough RAM allocated to run them. I've seen e.g. curtisc turn in pre-PRP P-1 results of B2 = B1 = 1.2 M. Unfortunately, this still tends to produce factor chances of < 2%.

Last fiddled with by axn on 2021-12-15 at 12:41 Reason: Reference to Post #18 in original thread
techn1ciaN is offline   Reply With Quote
Old 2021-12-15, 02:04   #3
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

41·199 Posts
Default

Quote:
Originally Posted by techn1ciaN View Post
That sounds like a big assumption considering Mr. Woltman's previous comments that wavefront P-1 "will not benefit much". How much RAM do you have allocated?
Zhangrc acknowledged he is using a more-than-customary 30GB.


Quote:
Originally Posted by Zhangrc View Post
Assuming that v30.8 is 2.5x faster for wavefront exponents, and 1.1 tests are saved if a factor is found, then the "PrimeNet" bounds in mersenne.ca is almost optimal.
My takeaway here is that prime95 should reduce its default setting to 1.0 or 1.1 tests saved. Yes, this will find fewer factors now, but in the future when your 1TB RAM machine is commonplace we will want to redo the P-1 to take advantage of all that RAM.

In other words, if we set tests_saved to either 1.0 or 2.0 we will be doing more P-1 in the future. Why double the amount of P-1 effort today that will almost certainly be redone in the future?

My second takeaway is that once 30.8 is fully ready, GIMPS would benefit greatly from owners of machines with lots of RAM switching to P-1.
Prime95 is offline   Reply With Quote
Old 2021-12-15, 02:54   #4
techn1ciaN
 
techn1ciaN's Avatar
 
Oct 2021
U. S. / New York, NY

149 Posts
Default

Quote:
Originally Posted by Prime95 View Post
My takeaway here is that prime95 should reduce its default setting to 1.0 or 1.1 tests saved.
Would this also require the effort at which PrimeNet retires the P-1 task to be lowered? I don't think so because I've seen some exponents get pretty horrendous standalone P-1 (sometimes even with B2 = B1) and still be released directly to primality testing, but the last thing you would want is to change assignment lines to have tests_saved=1 and inadvertently start generating a bunch of useless results from people whose work was just barely over the threshold with tests_saved=2.

My vote between 1.0 and 1.1 is the former, perhaps just because it's a whole number and might cause less confusion for people who run Prime95 casually and don't always have a firm grasp on what the work window is printing (this was me for my first few years of GIMPS membership).

If Kriesel's analysis is correct (I have no reason to believe it isn't), the empirically optimal number assuming more granularity than tenths would be ~1.0477. At that point, it's pretty much a coin flip whether to round up or down. I'll contend that some of the factors pushing the raw number up from 1.000 are to some degree transitory, so 1.0 should be a better choice for the long term. (The last few people bootlegging FTC LL or doing unproofed PRP will eventually either upgrade or stop testing, for one example. Increasing storage drive sizes should eventually bring up the average proof power, for another.)
techn1ciaN is offline   Reply With Quote
Old 2021-12-15, 03:39   #5
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

52·211 Posts
Default

Quote:
Originally Posted by Prime95 View Post
My takeaway here is that prime95 should reduce its default setting to 1.0 or 1.1 tests saved. Yes, this will find fewer factors now, but in the future when your 1TB RAM machine is commonplace we will want to redo the P-1 to take advantage of all that RAM.

In other words, if we set tests_saved to either 1.0 or 2.0 we will be doing more P-1 in the future. Why double the amount of P-1 effort today that will almost certainly be redone in the future?.
I don't have the expertise to say you are right or wrong. I'm just trying to understand the reasoning.

When GPUs started TFing many times faster than PCs the consensus was let's TF a few bits deeper and save many more expensive LL/DC tests.
Granted 1 PRP replaces 2 of LL &DC.
Why aren't we using the same reasoning here? A P1 that used to take 5 hours now takes 1 (or so). So after the full rollout of 30.8 even if the number of P1ers doesn't change they'll be doing 5 times as many P1s in the same time.
Wouldn't they get way ahead of the PRP wavefront?
And if so, aren't we better off to stay just ahead and P1 deeper and save more PRPs?
Granted deeper P1 in the future with 1TB machines will get more factors but aren't they more beneficial before the PRP is done?

Ok now that I've spent 10 minutes one-finger typing on my mobile it just occurred to me that the average PC today won't have enough RAM to do P1 much faster at the PRP wavefront even with 30.8. Oh well, someone can slap me now.
petrw1 is offline   Reply With Quote
Old 2021-12-15, 03:52   #6
Zhangrc
 
"University student"
May 2021
Beijing, China

22×67 Posts
Default

Quote:
Originally Posted by techn1ciaN View Post
Is 2.5x purely a spitball / an extrapolation or did you get that number through empirical testing?
Purely an assumption, because the 2.5 is written in undoc.txt.
I allocate 12GB of memory; can't use more because I have only 16GB. Usually it's enough for wavefront exponents, but for 30.8 it's always beneficial to allocate more RAM.

Last fiddled with by Zhangrc on 2021-12-15 at 03:53
Zhangrc is offline   Reply With Quote
Old 2021-12-15, 04:52   #7
techn1ciaN
 
techn1ciaN's Avatar
 
Oct 2021
U. S. / New York, NY

2258 Posts
Default

Quote:
Originally Posted by petrw1 View Post
When GPUs started TFing many times faster than PCs the consensus was let's TF a few bits deeper and save many more expensive LL/DC tests.
Granted 1 PRP replaces 2 of LL &DC.
Why aren't we using the same reasoning here?
You seem to be operating with the assumption that new Prime95 versions keep the same P-1 cost calculator even when P-1 gets faster, when the reason we have a tests_saved parameter in the first place is so that individual Prime95 instances can dynamically calculate their own optimal B1 and B2 values. This synthesizes completed TF depth and the primality test effort that a P-1 factor would save with the speed of the P-1 implementation available (which includes how much RAM has been allocated).

We can assume some work line Pfactor=N/A,1,2,[exponent],-1,[TF depth],1. Loading this into Prime95 30.7 might produce bounds that take five hours to run. We suppose 30.8 could run the same bounds twice as fast on the same machine (for the sake of the example, because it probably can't for wavefront exponents in actuality). Then a 30.8 installation wouldn't calculate those bounds for that work line at all; it would calculate something appropriately larger independent of anything in the line itself needing to be changed. In simpler terms, larger P-1 bounds are always built into any boost to P-1 throughput (assuming Mr. Woltman doesn't make a serious mistake when revising the cost calculator, which there's no reason to believe he would). We could go with your initial assumption that 30.8's P-1 is drastically faster even at the PRP wavefront, and setting tests_saved=5 (for example) because of it still wouldn't accomplish anything besides wasting a load of cycles.

Quote:
Originally Posted by Zhangrc View Post
...the 2.5 is written in undoc.txt.
Could you pinpoint exactly where? I downloaded the latest 30.8 tarball and Ctrl+F'ed "2.5" in its undoc.txt with no hits.

Do you happen to be talking about the default Pm1CostFudge value? If so, that's just (approximately) the factor by which the new stage 2 cost calculator tends to undershoot; it doesn't indicate anything about the speed of the new P-1 in the abstract.
techn1ciaN is offline   Reply With Quote
Old 2021-12-15, 06:10   #8
axn
 
axn's Avatar
 
Jun 2003

22·32·151 Posts
Default

Quote:
Originally Posted by techn1ciaN View Post
it doesn't indicate anything about the speed of the new P-1 in the abstract.
It couldn't. There is no one single constant that can indicate speedup. The speedup is very much dependent on the amount of RAM.
axn is offline   Reply With Quote
Old 2021-12-15, 06:39   #9
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

52×211 Posts
Default

Quote:
Originally Posted by techn1ciaN View Post
You seem to be operating with the assumption that new Prime95 versions keep the same P-1 cost calculator even when P-1 gets faster, when the reason we have a tests_saved parameter in the first place is so that individual Prime95 instances can dynamically calculate their own optimal B1 and B2 values. This synthesizes completed TF depth and the primality test effort that a P-1 factor would save with the speed of the P-1 implementation available (which includes how much RAM has been allocated).
That's not my intent.
I understand the Cost Calculator needs keep up with P-1 improvements.

At the risk of oversimplifying let me try with actual numbers.
prob.php tells me that suggested P-1 value takes about 15 GhzDays in the 108M ranges
A PRP test in that same range takes about 450 GhzDays. That is 30 to 1.
Interestingly (with a little rounding) the success rate is about 1/30.
So 450 GhzDays of P-1 should do 30 tests and save on average 1 PRP test.
--- I hope I didn't mess this up. I guess it assumes 450 GhzDays of each take approximately the same clock time. It may not. ---

So if at a point in time, at the leading edge, the available P-1 GhzDays is 1/30 of the PRP GhzDays then P-1 should just keep up to PRP.
However, if either due to personal choice or due to the increased speed of 30.8, we find P-1 getting too far ahead of PRP then would it make sense for P-1 to choose bigger B1/B2 and save more PRP tests instead.
On the contrary if P-1 falls behind it would choose lower B1/B2.

Or is this simply what you mean by:
Quote:
dynamically calculate their own optimal B1 and B2 values
petrw1 is offline   Reply With Quote
Old 2021-12-15, 07:45   #10
axn
 
axn's Avatar
 
Jun 2003

22×32×151 Posts
Default

Quote:
Originally Posted by petrw1 View Post
However, if either due to personal choice or due to the increased speed of 30.8, we find P-1 getting too far ahead of PRP then would it make sense for P-1 to choose bigger B1/B2 and save more PRP tests instead.
On the contrary if P-1 falls behind it would choose lower B1/B2.
We shouldn't optimize based on available compute power for each work type.

First things first. When P-1 stage 2 becomes faster, the software's calculation of optimal P-1 bounds changes. It changes in a way that increases the bounds. So, the amount of time the software spends on P-1 wouldn't necessarily drastically reduce. In fact, paradoxically, it might increase (whether it does or not is a different thing, but in principle this could happen). So we need more data to understand what is the impact of 30.8 on wavefront P-1.

Second. The principle of what is the optimal cross over point of "TF vs PRP" or "P-1 vs PRP" is based on relative time it takes to run both types of computation on the _same_ processor. We do 3-4 bits of extra TF on GPU not because GPUs are faster than CPUs, but rather GPUs do better in TF relative to PRP. Like, a GPU might be 100x faster than CPUs on TF, but only 10x faster than CPUs in PRP, so the GPU's cross-over point of TF vs PRP will be a few bits higher than a CPU. If GPU was 100x faster in TF, but also 100x faster in PRP, then we wouldn't do extra TF bits with GPU (no matter how much GPU power we have).

Similarly, optimal P-1 bound is / should be independent of how many dedicated P-1 crunchers are there. We assume that, if there were no P-1 work available, they would switch over to PRP (not a 100% accurate assumption, but the only feasible way to model this). If we get a surplus of dedicated P-1 crunchers who refuses to do anything else, c'est la vie. I guess they have the option to manually change the "tests save" and do whatever they wish, but the project shouldn't waste resources by using sub-optimal parameter. After all, the original point of P-1 was to speed up the clearing of exponents.
axn is offline   Reply With Quote
Old 2021-12-15, 07:51   #11
techn1ciaN
 
techn1ciaN's Avatar
 
Oct 2021
U. S. / New York, NY

149 Posts
Default

Quote:
Originally Posted by petrw1 View Post
However, if either due to personal choice or due to the increased speed of 30.8, we find P-1 getting too far ahead of PRP then would it make sense for P-1 to choose bigger B1/B2 and save more PRP tests instead.
On the contrary if P-1 falls behind it would choose lower B1/B2.
I don't see how tying P-1 bounds to how much P-1 is being done is supposed to improve GIMPS throughput overall.

For similarly-sized exponents, a given PC can complete X PRP tests in some amount of time, or it can find Y P-1 factors in the same amount of time. Y is obviously dependent upon the P-1 bounds used. If you let it do its thing, Prime95 optimizes to have A) Y > AX (where A is the tests_saved value passed), then B) the highest Y value possible. You seem to suggest that ignoring this optimization and accepting a lower value of Y (or even accepting Y < X) will become a good idea if P-1 gets far ahead of the PRP wavefront, but in that case more benefit would be had from some P-1 users simply switching to primality testing. Since large B1 and B2 values quickly run into diminishing returns with respect to the cycles needed (yes, even with 30.8; "large" is just higher for B2), P-1 past Prime95's optimized bounds will not "save more PRP tests" than just, well, running the full PRPs.

You brought up GPU TF earlier, so we can analogously apply your logic there. GPUs are very efficient for TF, but they can run primality tests as well, so there is still an optimization puzzle: GIMPS/GPU72 must select a TF threshold such that, in the time it would take a given GPU to complete one primality test, the same GPU will find more than one factor (on average). For most consumer GPUs, this seems to be ((Prime95 TF threshold) + 4). With that threshold, GPU72 is currently very far ahead of even high-category PRP (I believe they're currently pushing around 120 M or even higher). Does it then make sense that GPU72 should go to ((Prime95 threshold) + 5) at the PRP wavefront even though that wouldn't be optimal*, just because the threshold that is optimal is easily being handled? No; anyone doing GPU TF who wants the PRP wavefront to advance more quickly should simply switch to GPU PRP.

* Some recent Nvidia models have such crippled FP64 throughput that this extra level actually can be optimal. I have such a one. However, I don't believe enough TFers own these to recommend the extra level universally.
techn1ciaN is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Intel: i7-11700 vs. i9-10900 for wavefront P-1 or PRP techn1ciaN Hardware 2 2021-11-16 08:06
COVID vaccination wavefront Batalov Science & Technology 274 2021-10-21 15:26
Production (wavefront) P-1 kriesel Marin's Mersenne-aries 23 2021-07-03 15:17
Received P-1 assignment ahead of wavefront? ixfd64 PrimeNet 1 2019-03-06 22:31
P-1 & LL wavefront slowed down? otutusaus PrimeNet 159 2013-12-17 09:13

All times are UTC. The time now is 02:43.


Thu Feb 2 02:43:01 UTC 2023 up 168 days, 11 mins, 1 user, load averages: 1.62, 1.69, 1.51

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔