Register FAQ Search Today's Posts Mark Forums Read  2018-09-01, 18:56 #1 preda   "Mihai Preda" Apr 2015 53×11 Posts PRP-1 I would like to present a primality test method that combines PRP with P-1. (described here as applied to Mersenne numbers) In "Pollard's P-1", in the first stage, we raise 3 to a power P which is very highly composite, similar to a primorial. The idea being to concentrate in as few bits of P as many useful prime factors as possible. Given the bound B1 (a parameter of P-1), an "optimal" P for B1 is computed. The effort is then put into the modular exponentiation 3^P, which can be done in different ways but with similar effort. A similar operation, raising 3 to a power, is done in the PRP(3) test. I have been thinking about how to combine the two. There is an important difference between the two, P-1 and PRP: In P-1 the power is an "irregular" number (the product of small primes, based on B1); while in PRP the power is regular, basically 2^p for M(p). So it seems, at first sight, that the two tests (P-1 and PRP) cannot be combined because they require different exponents. Yet I found a value that is computed as part of the normal PRP, and which can be useful in a modified P-1. It turns out (see http://www.mersenneforum.org/showthread.php?t=23627 ) that 2^k - 1 is also extremely composite for some values of k; and this characteristic extends to very large k, to produce numbers 2^k - 1 with a very large number of [prime] factors. During the normal PRP we compute, iteratively, 3^(2^k) (with k from 1 to p for M(p)). We could, at any step k of our choice, compute 3^(2^k - 1) by simply doing a division by 3. And use this number, which is 3 raised to the product of a large number of primes, in the manner of Pollard's P-1. While the PRP test progresses, iterating through Ks, we could apply a large number of "variant P-1" at a subset of Ks because the "variant P-1" step is now basically free. One problem with the factors of 2^k - 1 is that, numerous as they are, they do contain (many) holes; and this may reduce the chances of the P-1 succeeding. Let's see how we can fix that, in two ways. One way is: once we have x=3^(2^k - 1), we can condition it by "salting" it with additional "mandatory" primes with a distribution of our choice. Simplifying, let's say we add all the primes p 2018-09-01, 21:23 #2 kriesel   "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 592410 Posts Very interesting, and almost obvious in retrospect, to reuse some computed large powers of three somehow. And since values soon become limited in size by the mod Mp, it doesn't much matter for speed what the size of the base used is. The memory demands of P-1 stage 1 and of PRP are modest enough (compared to 4-12GB gpu memory) such that copies of interim products could be saved in gpu memory for one computation type for the exponent, while the other proceeds, for current and near future exponent ranges. How do you handle stage 2? Is 3 raised to the product of powers of primes below B1 (Laurv called it B1-powersmooth) guaranteed to be a usable base for the PRP test? (That product is itself a number of very considerable size.) In the first attachment, the B1, B2, and e values shown for a GTX 1070 are selected by CUDAPm1, and nrp determined for stage 2. These bounds are not all as high as primenet/gputo72 would like to have. There's plenty of memory at 8GB for up to 100-megadigit mersennes or further. Similarly there would be on an 8GB RX480. Second attachment is a 4GB GTX 1050Ti (same memory size as my RX550's). Last fiddled with by kriesel on 2018-09-01 at 21:35   2018-09-01, 23:22 #3 kriesel   "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 172416 Posts Significantly changing the cost trade-off of P-1 factoring in this way should enable larger bounds, which will have memory impact, but allow a higher probability of finding factors, partly offset by saving less computing cost of completing the PRP test. In the attachments in my preceding post here, the times do not include the gcd times, which can be considerable, since it's done on a single cpu core, in CUDAPm1. It amounts to maybe 1% of run time. During that gcd time, the gpu mostly sits idle, with memory occupied. If there was a way to avoid gpu compute stalls, performance would increase. A couple of possibilities are speculative continuation of the PRP test, and additional trial factoring. Last fiddled with by kriesel on 2018-09-01 at 23:39   2018-09-01, 23:56   #4
preda

"Mihai Preda"
Apr 2015

53×11 Posts Quote:
 Originally Posted by kriesel How do you handle stage 2?
I think it removes the need for second stage P-1. What it achieves is like an intermediary between first and second stages, In that some factors in B1-powersmooth are guaranteed to be tested in any combination (like in first stage), while some other factors are tested in some combinations (brought "randomly" by the 2^k - 1), when classic second stage only covers the second set of factors (B1-to-B2) in one-at-a-time combination with the B1-powersmooth set.

OTOH there is no guarantee that absolutely all factors B1-to-B2 have been checked, but that's probably fine if they are covered with a high enough probability. If we get this working, it would also be interesting to see the empirical rates it produces, and how it compares with classical second stage.

Quote:
 Is 3 raised to the product of powers of primes below B1 (Laurv called it B1-powersmooth) guaranteed to be a usable base for the PRP test?
I would let the more math-inclined answer that. But 3^P and 2^m being relative prime is probably a good thing.

Also, the Gerbicz error check still works (for PRP), with pretty much the same cost. *AND* this very check provides a degree of coverage to the P-1 too, in the sense that the powers 3^(2^p) coming in are checked.   2018-09-02, 00:02   #5
preda

"Mihai Preda"
Apr 2015

53·11 Posts Quote:
 Originally Posted by preda OTOH there is no guarantee that absolutely all factors B1-to-B2 have been checked, but that's probably fine if they are covered with a high enough probability. If we get this working, it would also be interesting to see the empirical rates it produces, and how it compares with classical second stage.
Some mathematical analysis might be done which, given the length of the PRP (i.e. the exponent p of M(p)) and the rule for selecting Ks for 2^k - 1, would compute a sort of "equivalent B2" for the factor coverage.   2018-09-02, 02:36 #6 preda   "Mihai Preda" Apr 2015 53×11 Posts I bit of history about the development of the "PRP-1" technique: Having implemented in the past PRP with the Gerbicz error check, I fully appreciate the power and usefulness of that error check. When I started to look at P-1, one of the first question I had was whether there is a similar error check for P-1. I was suspecting that the PRP error-check technique, using the "ladder", does not directly apply to P-1, so I thought to recover it to some degree by doing the P-1 exponentiation this way: Do P-1 with "right to left binary exponentiation", where some powers 3^(2^k) are multiplied together, with increasing k corresponding to the bits that are set ("pops") in the exponent P of 3^P. The components 3^(2^k) can be covered by the normal Gerbicz error check. Thus, while the P-1 is not fully "covered" by the check, it is protected to some degree by the check. Considering the cost of the "right to left exponentiation" in P-1: if I already have the powers 3^(2^k) from the normal PRP, the additional cost of first-stage P-1 is only the multiplications, which are done for each bit set in P for 3^P. To reduce this cost, I was looking at ways of computing multiples of P that have a low count of pops. But this idea (obtaining a multiple with low pops-count) led nowhere, because it's extremely hard to lower the count of pops in multiples. At this point I started to look for other bits of data that is anyway generated in PRP and could be used in P-1. P-1 needs 3^P, where P is extremely smooth (product of many factors). My first idea for such interesting data were the check-bits of the PRP test, which represent the value: 3^(sum(k=0..n, 2^(k*step))); where "step" is the step of the error check. The question was: how smooth is (or how good is the factor coverage of) sum(2^(k*step)). Unfortunately it seems that the number of factors of that sum grows with log(k), which is not good enough for very large Ks. Finally a better answer was hiding in plain sight: the simple 2^k - 1, turns out, has better factor coverage than the exponent of the check bits; and can be obtained trivially from the 2^k of the PRP test. (it seems that the number of factors of 2^k - 1, for large "good" k, grows approximately like a power of k, let's say k^0.238, which is much better than log(k) for large Ks). So, 2^k - 1 is not so much worse then the perfect power-smooth exponent used by the first stage of classic P-1. But maybe it does have many holes, i.e. smallish factors that aren't included, and that may pose a problem for P-1. And next, how to force-include a set of mandatory factors, led me to the idea of doing the PRP in a different base 3^P. The "division by base" (where base != 3) found a solution too. And then it seemed that all this could incredibly actually work, and I was so excited to share the news! Last fiddled with by preda on 2018-09-02 at 02:40 Reason: spelling   2018-09-02, 06:39 #7 preda   "Mihai Preda" Apr 2015 53·11 Posts An example of "good K" for smooth 2^k - 1 might be multiples of 360. The factors of 2^360 - 1 are: [3, 3, 3, 5, 5, 7, 11, 13, 17, 19, 31, 37, 41, 61, 73, 109, 151, 181, 241, 331, 433, 631, 1321, 23311, 38737, 54001, 61681, 18837001, 29247661, 4562284561, 168692292721, 469775495062434961]   2018-09-02, 14:38 #8 preda   "Mihai Preda" Apr 2015 53×11 Posts So the "PRP-1" test proceeds like this: 1. First step is to compute the "base" of the PRP test. The base has the form 3^P, where P can be chosen, as a convention, in the same way as for the first stage of P-1 with some bound B1; i.e. power-smooth to B1. 1.1. Compute 3^P in some way. This proceeds in exactly the same way as the first stage of a normal P-1. Once we have Base=3^P, save Base to a file for use later, and complete this initial P-1 (by doing the GCD), and log the result of this P-1 first stage. The problem here is that the computation of Base is not error-checked. A res64 of Base can be saved for double-checking the correctness of 3^P. Cost up to now: on the order of 1M iterations, depending on the chosen B1. 2. Assuming no factor found yet, start PRP(Base), with Gerbicz error-checking and periodic P-1. Choose a P-1 "step" e.g. step=360, and every "step" iterations do: (so, at iteration k multiple of step): From Res(k)=Base^(2^k) that we have at iteration k, compute the subtraction Res(k)-Base, and feed that into the P-1 GCD [if the GCD is fast enough -- otherwise postpone the GCD by multiplying]. If at any point the GCD discovers a factor, stop. 3. If reaching the end of the PRP (so, without any factor found in the periodic GCDs), check PRP prime result like this: Assuming we're testing M(exp)=2^exp - 1, we have now Res(exp - 1)=Base^(2^(exp - 1)). Where Base==3^P, where P==2 * exp * powersmooth(B1). So P is even. Because of that, M(exp) is prime IFF Res(exp - 1)==Base. So, compare Res(exp - 1) with Base and report prime/not-prime accordingly. Also send to the server in the result, in addition to prime-or-not-prime: - information allowing the server to re-construct P identically -- i.e. B1 if the "powersmooth" is understood. - res64 of Base=3^P (allowing to double-check the computation of Base) - res64 of Res(exp - 1) (allowing to double-check the PRP) (note, instead of steps 2&3 above, a second-stage P-1 can be done using the computed Base).   2018-09-02, 15:27   #9
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

134448 Posts Quote:
 Originally Posted by preda ... I would let the more math-inclined answer that. But 3^P and 2^m being relative prime is probably a good thing. Also, the Gerbicz error check still works (for PRP), with pretty much the same cost. *AND* this very check provides a degree of coverage to the P-1 too, in the sense that the powers 3^(2^p) coming in are checked.
There's probably a cleaner way to show it, but I now think 3^m is a guaranteed safe base to use for the P-1.
https://en.wikipedia.org/wiki/Pollar...92_1_algorithm
Quote:
 randomly pick a coprime to n
where n is the mersenne number under consideration. n = 2p-1 where p is a prime integer. Prime factors of n (if any exist) are known to be of the form 2 k p + 1 where k is some positive integer. Since p>1, all prime factors of n are greater than 3, so a=3^m where m is a positive integer will be coprime to n.

Also, for the fermat primality test, https://en.wikipedia.org/wiki/Fermat_primality_test, selecting a= 3^m << n=2p-1 ensures n is too large to be a factor of a, and n is not 3.

Something I'm not seeing in your proposal is the inclusion of 2 and p in the small-factor-rich number. Perhaps that goes in your "salting", or I just missed it. Oh, there it is in post 8. But it is too late for the factoring to provide PRP computing time savings.

The factors of 2^(360b)-1 seem to be rather thin coverage of small primes compared to a usual stage 1. For b=1, odd primes are skipped beginning above 19. Is the cost of each side "porous P-1" including gcd probably on the cpu low enough that many can be done before an appreciable fraction of the p squarings of the PRP occur, so that finding a factor occasionally is a net savings of computing effort?

There are at least a couple of ways available for checking the computation of the base.
Jacobi symbol, or multiple Jacobi check, with different denominators to raise error detection probability.
Or compute the base twice, on the gpu and the cpu, or by different transform methods on the gpu (DP and M61?), and compare.

What's your sense of the relative cost of this modification of the PRP test, versus a traditional P-1 or an unmodified PRP test?

I look forward to a trained mathematician commenting on the PRP-1 approach. They seem to be enjoying the holiday weekend at the moment.

Meanwhile, a worked small example or cpu-only proof-of principle code example perhaps using gmp heavily could be very enlightening.   2018-09-02, 16:24 #10 science_man_88   "Forget I exist" Jul 2009 Dumbassville 203008 Posts As 2^k-1 are coprime, anytime the k values are, it fails to find a factor for prime k values.   2018-09-02, 18:08   #11
preda

"Mihai Preda"
Apr 2015

53·11 Posts Quote:
 Originally Posted by kriesel What's your sense of the relative cost of this modification of the PRP test, versus a traditional P-1 or an unmodified PRP test?
The cost of the GCD can be replaced by the cost of a normal modular multiplication plus an arbitrarily low fraction of the GCD cost, with the drawback of delayed detection of factor.

In one extreme, we could the GCD every 360 iteration. But if less GCD-overhead is desired, it can be done once every 360K, or every 36M iterations, and then the GCD becomes negligible, but there's instead one multiplication every 360 iteration. The overhead of the mul is similar to the overhead of the error checking used now (which does one mul every 400 iterations). Let's estimate it to 0.5%. This can be reduced, by increasing the step (from 360), with the drawback of doing a smaller number of P-1 "trials".

The error check is affected by replacing the "mul-by-3" with "mul-by-Base" in the checking step (which, by default, is now done once every 160K iterations). So, one additional mul every L^2 iterations for the check -- this is so small that it can be ignored.

The main cost is in the initial computation of Base=3^P, and this cost is log2(P) normal squaring iterations (with the "mul-by-3" being free, sunk in the squaring). Let's say this costs about 1M or 2M iterations.

But, this initial cost is exactly the same as the cost of first stage P-1 to the same bound B1. The setup of "base" is indeed a P-1 first stage. So it can and should replace a separate P-1 first-stage being done before starting the PRP.

So overall, simplifying, I would say the cost vs. unmodified PRP is about 1% (but is dependent on the B1 and exponent). This becomes lower if compared vs. (P-1 first stage + PRP).

What we gain, I estimate, is the equivalent of a "second-stage" P-1 being done for free; and I hope even more than that (i.e. "more" meaning equivalent to P-1 done to higher B1 and B2).   Thread Tools Show Printable Version Email this Page

All times are UTC. The time now is 09:01.

Sun Dec 5 09:01:41 UTC 2021 up 135 days, 3:30, 0 users, load averages: 1.68, 1.81, 1.64