20200928, 01:45  #12 
P90 years forever!
Aug 2002
Yeehaw, FL
2×3×1,193 Posts 
P1 bounds are a balancing act. As first stage processing of B1 becomes more efficient, the optimal B2/B1 ratio decreases. If B1 processing were to drop to zero cost, no stage 2 would make sense. Even with a 10x improvement in stage 1 speed, it still makes sense to run some stage 2. We're likely looking at changing from a B2/B1 ratio in 2040 area for the gpuowl 6 changing to a b2/B1 ratio in the 612 area for gpuowl 7.
(I ran some estimates with prime95's costing formulas to come up with the above). A GPU will have different costs. Prime95 v30.4 will be different too with improved stage 2 timings. I'm not rushing into implementing this in prime95 as prime95's default install is (I think) a mere 250MB max memory, which limits the stage 1 savings. Last fiddled with by Prime95 on 20200928 at 01:46 
20200928, 02:44  #13 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA
2×5×41 Posts 
I'd happily run mprime with 30GB memory if that made a serious difference. Heck, I could go get another 32 gig kit and run it against 60 gigs for not much compared to a video card. if mprime was running larger than the L3 cache though, I am not sure it would help at all.

20200928, 03:39  #14  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7×673 Posts 
Quote:
FYI, no P1 I run has access to less than 1.5GB per prime95 worker, and that's on old 32bit laptops. Core2 Duo have 4G; other models have 6, 8, 16, 32, 48GB respectively. (Gpus range 1, 2, 3, 4, 8, 11, 16.) 

20200928, 04:49  #15  
P90 years forever!
Aug 2002
Yeehaw, FL
1BF6_{16} Posts 
Quote:


20200928, 05:44  #16 
Jun 2003
2×2,389 Posts 
Do we have some hard numbers comparing these two schemes (i.e. traditional P1beforePRP, and P1duringPRP) and the expected savings and optimal bounds for a current wavefront PRP? Especially with the assumption of 1 PRP savings?
I don't have a good feel for the situation and not even sure that the new option is, in fact, superior, so just wondering. 
20200928, 10:06  #17 
"Robert Gerbicz"
Oct 2005
Hungary
13·109 Posts 
If you want a simple formula, counting everything in squaremod:
total cost=Time(p,B1,B2)+(1Prob(p,B1,B2))*(pB1) and you should minimize this, where Time(p,B1,B2) is the cost of doing this new algorithm, so doing B1 squarings, while doing in paralel the P1's 1st stage using an array, and then the classical 2nd stage using B2 bound [note that you could do later the 2nd stage, but that is suboptimal because with delaying it you are wasting time if you find a factor in the 2nd stage]. Probab(p,B1,B2) is the probability that you'll find a factor using B1,B2 bounds. Ofcourse Time(p,B1,B2)>=B1, and assumed above that there was no detected and corrected errors in the run (but that doesn't change much the optimal cutoffs). Just interestingly if you have enough memory then in optimal setup you could hide P1 1st stage using O(primepi(B1)) additional squarings, so basically using only O(1) mulmod per prime, matching the classical 2nd stage's running time. Hence we are not very far to see a point where the 2nd stage would just disappear/collapsing totally. 
20200928, 14:48  #18 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1267_{16} Posts 
Small fraction of users may be so, but perhaps not as small a fraction of systems/cores/workers. Colab and other cloud computing, for example, which typically have ample available ram, and top producers that automatically run occasional P1 still needed for their assigned primality tests. (Your call as always.)
Last fiddled with by kriesel on 20200928 at 14:49 
20200928, 21:53  #19  
"Mihai Preda"
Apr 2015
2^{4}·83 Posts 
Quote:
The "old" P1, in the most efficient implementation, was doing a number of squarings equal to that number of bits. P1 FS(B1) "old" cost = 1.442 * B1. In the new setup, the PRP proceeds over this number of iterations (1.442 * B1) as normal, but we don't count these iterations towards the P1 cost because they are useful for PRP. In addition to that PRP cost, every log2(N)+2 iterations we do one more multiplication needed only for the "new" P1 FS ("N" being the number of buffers used by P1 FS). Let's simplify by assuming that log2(N)+2==10, so the cost becomes: P1 FS(B1) "new" cost = 1.442 * B1 / 10 == "old" cost / 10. Second Stage: The P1 SS(B1, B2) prooceeds like this: consider all the primes between B1 and B2, let's call this "n" the number of primes to be covered by the second stage. Some of these primes can be paired ("two with one rock") which reduces the effective number to a fraction of n, empirically about 0.85*n. Afterwards, second stage proceeds by doing one mul for each of these primes, plus two additional muls periodically depeding on the "D" value used by the second stage: cost(P1 SS) = n*0.85 + 2 * (B2  B1) * (1/D). Let's assume D==2310. So we approximate cost = n*0.85 + (B2  B1)/1000. The number of primes up to x ("primepi") can be approximated such: primepi(x)~= x / (ln(x)  1) and "n" is: primepi(B2)primepi(B1) ~= (B2  B1)/(ln(B2)  1) Let's look now at the boundary between firststage and secondstage. This boundary is given by B1. The prime gap (i.e. distance between two successive primes) around B1 is about ln(B1). The FS "marginal cost"  adding one more prime (around B1) to FS increases the FS cost by: ln(B1) * 1.442 / 10. The SS "marginal cost" at B1 (moving the lower bound B1 one prime down) increases SS cost by 0.85. These two relations show that SS is more efficient even compared to the "fast" FS, which itself is 10x more efficient compared to the "old" FS. So no matter what, it's very much worth to continue doing Second Stage, it's not going away! For example: old bounds: (1M, 30M) new bounds: (8M, 40M) (anyway doublecheck with a P1 calculator to estimate the probabilities of success) PS: the incremental effort of FS is 1.4 * log2(B1), while incremental effort of SS (at B1) is 0.85 * primegap(B1). And at the B1 values we work with, the primegap is lower than the log2; that's why secondstage is more efficient. Last fiddled with by preda on 20200928 at 22:44 

20200928, 22:39  #20 
"Mihai Preda"
Apr 2015
2^{4}·83 Posts 
Savefiles
The savefiles are now named like this:
86059247000014500.prp 8605924710000000014500.p1 8605924710000.p2 Which is <exponent><iteration>.prp <exp><B1><iteration>.p1 <exp><B1>.p2 A number of savefiles is kept  by default 10, but this can be set with save <N>. This allows to restart from points far away in the past. The overall format of the files is similar  a oneline text header at the beginning, followed optionally by binary data. The PRP and P1 savefiles store one residue that is protected with a CRC that is in the header. The P2 savefile is tiny, and simply records up to what point of stagetwo a GCD was done (stage2 can be continued from that point onward). Last fiddled with by preda on 20200928 at 22:40 
20200928, 22:59  #21 
"Mihai Preda"
Apr 2015
1328_{10} Posts 
Cleanup
There is a new special folder "trashbin" (which is located in the "master directory" when using pool, i.e. is global across instances).
When an exponent is finished, the trashbin folder is emptied, and after that the finished exponent savefiles (including proof residues) is moved to trashbin. This cleanup is automatic. There will be a way to turn off the cleanup action. This allows a small time window to "revive" an exponent from trashbin. 
20200928, 23:10  #22 
Random Account
Aug 2009
U.S.A.
2·3·281 Posts 
Someone wrote a long time ago, "Many factors can be found below B1, but only one above." This was an ECM reference. Assuming this would also apply to P1, perhaps B2 bounds would not need to be nearly as high if B1 was stretched to cover part of the area B2 would normally do.
There are many thousands of LL doublechecks needing to be ran. Is it possible to run these as PRP's? I think I will keep my 6x until I see how this all shakes out. Thanks for all the info. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
GpuOwl PRPProof changes  preda  GpuOwl  20  20201017 06:51 
gpuowl: runtime error  SELROC  GpuOwl  59  20201002 03:56 
gpuOWL for Wagstaff  GP2  GpuOwl  22  20200613 16:57 
gpuowl tuning  M344587487  GpuOwl  14  20181229 08:11 
How to interface gpuOwl with PrimeNet  preda  PrimeNet  2  20171007 21:32 