20211214, 15:33  #1 
"University student"
May 2021
Beijing, China
269 Posts 
30.8 & optimal P1 for wavefront exponents
I tried several different bounds here:
https://www.mersenne.ca/prob.php?exp...00&b2=22000000 My result: Assuming that v30.8 is 2.5x faster for wavefront exponents, and 1.1 tests are saved if a factor is found, then the "PrimeNet" bounds in mersenne.ca is almost optimal. However, we can reduce these bounds a bit, since not everyone has large enough memory (>30G) to do P1 at peak speed. The b1=450000&b2=22000000 in the link above should not be far from optimal. Last fiddled with by Zhangrc on 20211214 at 15:37 
20211214, 22:59  #2  
Oct 2021
U. S. / New York, NY
2·3·5^{2} Posts 
That sounds like a big assumption considering Mr. Woltman's previous comments that wavefront P1 "will not benefit much" (see post #18 in https://www.mersenneforum.org/showth...861#post593861). Is 2.5x purely a spitball / an extrapolation or did you get that number through empirical testing? If the latter, how much RAM do you have allocated?
Quote:
Incidentally, in the extreme lowend case (not enough RAM allocated for stage 2 to start), B1 seems to be selected such that stage 1 takes almost as long to run as both stages would take together for a user with enough RAM allocated to run them. I've seen e.g. curtisc turn in prePRP P1 results of B2 = B1 = 1.2 M. Unfortunately, this still tends to produce factor chances of < 2%. Last fiddled with by axn on 20211215 at 12:41 Reason: Reference to Post #18 in original thread 

20211215, 02:04  #3  
P90 years forever!
Aug 2002
Yeehaw, FL
1111111100101_{2} Posts 
Quote:
Quote:
In other words, if we set tests_saved to either 1.0 or 2.0 we will be doing more P1 in the future. Why double the amount of P1 effort today that will almost certainly be redone in the future? My second takeaway is that once 30.8 is fully ready, GIMPS would benefit greatly from owners of machines with lots of RAM switching to P1. 

20211215, 02:54  #4  
Oct 2021
U. S. / New York, NY
2×3×5^{2} Posts 
Quote:
My vote between 1.0 and 1.1 is the former, perhaps just because it's a whole number and might cause less confusion for people who run Prime95 casually and don't always have a firm grasp on what the work window is printing (this was me for my first few years of GIMPS membership). If Kriesel's analysis is correct (I have no reason to believe it isn't), the empirically optimal number assuming more granularity than tenths would be ~1.0477. At that point, it's pretty much a coin flip whether to round up or down. I'll contend that some of the factors pushing the raw number up from 1.000 are to some degree transitory, so 1.0 should be a better choice for the long term. (The last few people bootlegging FTC LL or doing unproofed PRP will eventually either upgrade or stop testing, for one example. Increasing storage drive sizes should eventually bring up the average proof power, for another.) 

20211215, 03:39  #5  
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
2×7×13×29 Posts 
Quote:
When GPUs started TFing many times faster than PCs the consensus was let's TF a few bits deeper and save many more expensive LL/DC tests. Granted 1 PRP replaces 2 of LL &DC. Why aren't we using the same reasoning here? A P1 that used to take 5 hours now takes 1 (or so). So after the full rollout of 30.8 even if the number of P1ers doesn't change they'll be doing 5 times as many P1s in the same time. Wouldn't they get way ahead of the PRP wavefront? And if so, aren't we better off to stay just ahead and P1 deeper and save more PRPs? Granted deeper P1 in the future with 1TB machines will get more factors but aren't they more beneficial before the PRP is done? Ok now that I've spent 10 minutes onefinger typing on my mobile it just occurred to me that the average PC today won't have enough RAM to do P1 much faster at the PRP wavefront even with 30.8. Oh well, someone can slap me now. 

20211215, 03:52  #6  
"University student"
May 2021
Beijing, China
269 Posts 
Quote:
I allocate 12GB of memory; can't use more because I have only 16GB. Usually it's enough for wavefront exponents, but for 30.8 it's always beneficial to allocate more RAM. Last fiddled with by Zhangrc on 20211215 at 03:53 

20211215, 04:52  #7  
Oct 2021
U. S. / New York, NY
2·3·5^{2} Posts 
Quote:
We can assume some work line Pfactor=N/A,1,2,[exponent],1,[TF depth],1. Loading this into Prime95 30.7 might produce bounds that take five hours to run. We suppose 30.8 could run the same bounds twice as fast on the same machine (for the sake of the example, because it probably can't for wavefront exponents in actuality). Then a 30.8 installation wouldn't calculate those bounds for that work line at all; it would calculate something appropriately larger independent of anything in the line itself needing to be changed. In simpler terms, larger P1 bounds are always built into any boost to P1 throughput (assuming Mr. Woltman doesn't make a serious mistake when revising the cost calculator, which there's no reason to believe he would). We could go with your initial assumption that 30.8's P1 is drastically faster even at the PRP wavefront, and setting tests_saved=5 (for example) because of it still wouldn't accomplish anything besides wasting a load of cycles. Could you pinpoint exactly where? I downloaded the latest 30.8 tarball and Ctrl+F'ed "2.5" in its undoc.txt with no hits. Do you happen to be talking about the default Pm1CostFudge value? If so, that's just (approximately) the factor by which the new stage 2 cost calculator tends to undershoot; it doesn't indicate anything about the speed of the new P1 in the abstract. 

20211215, 06:10  #8 
Jun 2003
1010100111110_{2} Posts 

20211215, 06:39  #9  
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
12236_{8} Posts 
Quote:
I understand the Cost Calculator needs keep up with P1 improvements. At the risk of oversimplifying let me try with actual numbers. prob.php tells me that suggested P1 value takes about 15 GhzDays in the 108M ranges A PRP test in that same range takes about 450 GhzDays. That is 30 to 1. Interestingly (with a little rounding) the success rate is about 1/30. So 450 GhzDays of P1 should do 30 tests and save on average 1 PRP test.  I hope I didn't mess this up. I guess it assumes 450 GhzDays of each take approximately the same clock time. It may not.  So if at a point in time, at the leading edge, the available P1 GhzDays is 1/30 of the PRP GhzDays then P1 should just keep up to PRP. However, if either due to personal choice or due to the increased speed of 30.8, we find P1 getting too far ahead of PRP then would it make sense for P1 to choose bigger B1/B2 and save more PRP tests instead. On the contrary if P1 falls behind it would choose lower B1/B2. Or is this simply what you mean by: Quote:


20211215, 07:45  #10  
Jun 2003
153E_{16} Posts 
Quote:
First things first. When P1 stage 2 becomes faster, the software's calculation of optimal P1 bounds changes. It changes in a way that increases the bounds. So, the amount of time the software spends on P1 wouldn't necessarily drastically reduce. In fact, paradoxically, it might increase (whether it does or not is a different thing, but in principle this could happen). So we need more data to understand what is the impact of 30.8 on wavefront P1. Second. The principle of what is the optimal cross over point of "TF vs PRP" or "P1 vs PRP" is based on relative time it takes to run both types of computation on the _same_ processor. We do 34 bits of extra TF on GPU not because GPUs are faster than CPUs, but rather GPUs do better in TF relative to PRP. Like, a GPU might be 100x faster than CPUs on TF, but only 10x faster than CPUs in PRP, so the GPU's crossover point of TF vs PRP will be a few bits higher than a CPU. If GPU was 100x faster in TF, but also 100x faster in PRP, then we wouldn't do extra TF bits with GPU (no matter how much GPU power we have). Similarly, optimal P1 bound is / should be independent of how many dedicated P1 crunchers are there. We assume that, if there were no P1 work available, they would switch over to PRP (not a 100% accurate assumption, but the only feasible way to model this). If we get a surplus of dedicated P1 crunchers who refuses to do anything else, c'est la vie. I guess they have the option to manually change the "tests save" and do whatever they wish, but the project shouldn't waste resources by using suboptimal parameter. After all, the original point of P1 was to speed up the clearing of exponents. 

20211215, 07:51  #11  
Oct 2021
U. S. / New York, NY
2×3×5^{2} Posts 
Quote:
For similarlysized exponents, a given PC can complete X PRP tests in some amount of time, or it can find Y P1 factors in the same amount of time. Y is obviously dependent upon the P1 bounds used. If you let it do its thing, Prime95 optimizes to have A) Y > AX (where A is the tests_saved value passed), then B) the highest Y value possible. You seem to suggest that ignoring this optimization and accepting a lower value of Y (or even accepting Y < X) will become a good idea if P1 gets far ahead of the PRP wavefront, but in that case more benefit would be had from some P1 users simply switching to primality testing. Since large B1 and B2 values quickly run into diminishing returns with respect to the cycles needed (yes, even with 30.8; "large" is just higher for B2), P1 past Prime95's optimized bounds will not "save more PRP tests" than just, well, running the full PRPs. You brought up GPU TF earlier, so we can analogously apply your logic there. GPUs are very efficient for TF, but they can run primality tests as well, so there is still an optimization puzzle: GIMPS/GPU72 must select a TF threshold such that, in the time it would take a given GPU to complete one primality test, the same GPU will find more than one factor (on average). For most consumer GPUs, this seems to be ((Prime95 TF threshold) + 4). With that threshold, GPU72 is currently very far ahead of even highcategory PRP (I believe they're currently pushing around 120 M or even higher). Does it then make sense that GPU72 should go to ((Prime95 threshold) + 5) at the PRP wavefront even though that wouldn't be optimal*, just because the threshold that is optimal is easily being handled? No; anyone doing GPU TF who wants the PRP wavefront to advance more quickly should simply switch to GPU PRP. * Some recent Nvidia models have such crippled FP64 throughput that this extra level actually can be optimal. I have such a one. However, I don't believe enough TFers own these to recommend the extra level universally. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Intel: i711700 vs. i910900 for wavefront P1 or PRP  techn1ciaN  Hardware  2  20211116 08:06 
COVID vaccination wavefront  Batalov  Science & Technology  274  20211021 15:26 
Production (wavefront) P1  kriesel  Marin's Mersennearies  23  20210703 15:17 
Received P1 assignment ahead of wavefront?  ixfd64  PrimeNet  1  20190306 22:31 
P1 & LL wavefront slowed down?  otutusaus  PrimeNet  159  20131217 09:13 