![]() |
|
|
#12 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2·53·71 Posts |
P-1 bounds are a balancing act. As first stage processing of B1 becomes more efficient, the optimal B2/B1 ratio decreases. If B1 processing were to drop to zero cost, no stage 2 would make sense. Even with a 10x improvement in stage 1 speed, it still makes sense to run some stage 2. We're likely looking at changing from a B2/B1 ratio in 20-40 area for the gpuowl 6 changing to a b2/B1 ratio in the 6-12 area for gpuowl 7.
(I ran some estimates with prime95's costing formulas to come up with the above). A GPU will have different costs. Prime95 v30.4 will be different too with improved stage 2 timings. I'm not rushing into implementing this in prime95 as prime95's default install is (I think) a mere 250MB max memory, which limits the stage 1 savings. Last fiddled with by Prime95 on 2020-09-28 at 01:46 |
|
|
|
|
|
#13 |
|
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA
1A816 Posts |
I'd happily run mprime with 30GB memory if that made a serious difference. Heck, I could go get another 32 gig kit and run it against 60 gigs for not much compared to a video card. if mprime was running larger than the L3 cache though, I am not sure it would help at all.
|
|
|
|
|
|
#14 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
31·173 Posts |
Quote:
FYI, no P-1 I run has access to less than 1.5GB per prime95 worker, and that's on old 32bit laptops. Core2 Duo have 4G; other models have 6, 8, 16, 32, 48GB respectively. (Gpus range 1, 2, 3, 4, 8, 11, 16.) |
|
|
|
|
|
|
#15 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2·53·71 Posts |
Quote:
|
|
|
|
|
|
|
#16 |
|
Jun 2003
116748 Posts |
Do we have some hard numbers comparing these two schemes (i.e. traditional P-1-before-PRP, and P-1-during-PRP) and the expected savings and optimal bounds for a current wavefront PRP? Especially with the assumption of 1 PRP savings?
I don't have a good feel for the situation and not even sure that the new option is, in fact, superior, so just wondering. |
|
|
|
|
|
#17 |
|
"Robert Gerbicz"
Oct 2005
Hungary
101110011002 Posts |
If you want a simple formula, counting everything in squaremod:
total cost=Time(p,B1,B2)+(1-Prob(p,B1,B2))*(p-B1) and you should minimize this, where Time(p,B1,B2) is the cost of doing this new algorithm, so doing B1 squarings, while doing in paralel the P-1's 1st stage using an array, and then the classical 2nd stage using B2 bound [note that you could do later the 2nd stage, but that is suboptimal because with delaying it you are wasting time if you find a factor in the 2nd stage]. Probab(p,B1,B2) is the probability that you'll find a factor using B1,B2 bounds. Ofcourse Time(p,B1,B2)>=B1, and assumed above that there was no detected and corrected errors in the run (but that doesn't change much the optimal cutoffs). Just interestingly if you have enough memory then in optimal setup you could hide P1 1st stage using O(primepi(B1)) additional squarings, so basically using only O(1) mulmod per prime, matching the classical 2nd stage's running time. Hence we are not very far to see a point where the 2nd stage would just disappear/collapsing totally. |
|
|
|
|
|
#18 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10100111100112 Posts |
Small fraction of users may be so, but perhaps not as small a fraction of systems/cores/workers. Colab and other cloud computing, for example, which typically have ample available ram, and top producers that automatically run occasional P-1 still needed for their assigned primality tests. (Your call as always.)
Last fiddled with by kriesel on 2020-09-28 at 14:49 |
|
|
|
|
|
#19 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
The "old" P-1, in the most efficient implementation, was doing a number of squarings equal to that number of bits. P-1 FS(B1) "old" cost = 1.442 * B1. In the new setup, the PRP proceeds over this number of iterations (1.442 * B1) as normal, but we don't count these iterations towards the P-1 cost because they are useful for PRP. In addition to that PRP cost, every log2(N)+2 iterations we do one more multiplication needed only for the "new" P-1 FS ("N" being the number of buffers used by P-1 FS). Let's simplify by assuming that log2(N)+2==10, so the cost becomes: P-1 FS(B1) "new" cost = 1.442 * B1 / 10 == "old" cost / 10. Second Stage: The P1 SS(B1, B2) prooceeds like this: consider all the primes between B1 and B2, let's call this "n" the number of primes to be covered by the second stage. Some of these primes can be paired ("two with one rock") which reduces the effective number to a fraction of n, empirically about 0.85*n. Afterwards, second stage proceeds by doing one mul for each of these primes, plus two additional muls periodically depeding on the "D" value used by the second stage: cost(P1 SS) = n*0.85 + 2 * (B2 - B1) * (1/D). Let's assume D==2310. So we approximate cost = n*0.85 + (B2 - B1)/1000. The number of primes up to x ("primepi") can be approximated such: primepi(x)~= x / (ln(x) - 1) and "n" is: primepi(B2)-primepi(B1) ~= (B2 - B1)/(ln(B2) - 1) Let's look now at the boundary between first-stage and second-stage. This boundary is given by B1. The prime gap (i.e. distance between two successive primes) around B1 is about ln(B1). The FS "marginal cost" -- adding one more prime (around B1) to FS increases the FS cost by: ln(B1) * 1.442 / 10. The SS "marginal cost" at B1 (moving the lower bound B1 one prime down) increases SS cost by 0.85. These two relations show that SS is more efficient even compared to the "fast" FS, which itself is 10x more efficient compared to the "old" FS. So no matter what, it's very much worth to continue doing Second Stage, it's not going away! For example: old bounds: (1M, 30M) new bounds: (8M, 40M) (anyway double-check with a P-1 calculator to estimate the probabilities of success) PS: the incremental effort of FS is 1.4 * log2(B1), while incremental effort of SS (at B1) is 0.85 * prime-gap(B1). And at the B1 values we work with, the prime-gap is lower than the log2; that's why second-stage is more efficient. Last fiddled with by preda on 2020-09-28 at 22:44 |
|
|
|
|
|
|
#20 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
The savefiles are now named like this:
86059247-000014500.prp 86059247-10000-000014500.p1 86059247-10000.p2 Which is <exponent>-<iteration>.prp <exp>-<B1>-<iteration>.p1 <exp>-<B1>.p2 A number of savefiles is kept -- by default 10, but this can be set with -save <N>. This allows to restart from points far away in the past. The overall format of the files is similar -- a one-line text header at the beginning, followed optionally by binary data. The PRP and P1 savefiles store one residue that is protected with a CRC that is in the header. The P2 savefile is tiny, and simply records up to what point of stage-two a GCD was done (stage2 can be continued from that point onward). Last fiddled with by preda on 2020-09-28 at 22:40 |
|
|
|
|
|
#21 |
|
"Mihai Preda"
Apr 2015
55B16 Posts |
There is a new special folder "trashbin" (which is located in the "master directory" when using -pool, i.e. is global across instances).
When an exponent is finished, the trashbin folder is emptied, and after that the finished exponent savefiles (including proof residues) is moved to trashbin. This cleanup is automatic. There will be a way to turn off the cleanup action. This allows a small time window to "revive" an exponent from trashbin. |
|
|
|
|
|
#22 |
|
Random Account
Aug 2009
32×7×31 Posts |
Someone wrote a long time ago, "Many factors can be found below B1, but only one above." This was an ECM reference. Assuming this would also apply to P-1, perhaps B2 bounds would not need to be nearly as high if B1 was stretched to cover part of the area B2 would normally do.
There are many thousands of LL double-checks needing to be ran. Is it possible to run these as PRP's? I think I will keep my 6x until I see how this all shakes out. Thanks for all the info.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| GpuOwl PRP-Proof changes | preda | GpuOwl | 20 | 2020-10-17 06:51 |
| gpuowl: runtime error | SELROC | GpuOwl | 59 | 2020-10-02 03:56 |
| gpuOWL for Wagstaff | GP2 | GpuOwl | 22 | 2020-06-13 16:57 |
| gpuowl tuning | M344587487 | GpuOwl | 14 | 2018-12-29 08:11 |
| How to interface gpuOwl with PrimeNet | preda | PrimeNet | 2 | 2017-10-07 21:32 |