![]() |
[QUOTE=axn;552604][CODE]136110417:717547
173564051:717463[/CODE][/QUOTE] [url]https://www.youtube.com/watch?v=EY27lgnPKWI&feature=youtu.be&t=9[/url] I'm at set bits=644353 for an equivalent defined problem AND still counting. |
[QUOTE=R. Gerbicz;552648][url]https://www.youtube.com/watch?v=EY27lgnPKWI&feature=youtu.be&t=9[/url]
I'm at set bits=644353 for an equivalent defined problem AND still counting.[/QUOTE] LOL. More like too dumb, too slow :smile: BTW, Interested in the your problem and solution approach. |
[QUOTE=axn;552651]BTW, Interested in the your problem and solution approach.[/QUOTE]
OK, here it is: Using an improved method Houston, reached 111.181 set bits (!), and with lower limits of the solution this could work easily on GPU also. [note that we started from 720737 ones of N !!!]. See the history here: [url]https://mersenneforum.org/showpost.php?p=552398&postcount=21[/url] why we need this for P-1 test (from another topic) The problem is to compute x^N mod M given x^(2^e) mod M for all (small values). It is similar to addition chain problem but here the 2^e is given free. Use the sliding window method, you could ask why a left to right method works here, the short reason is that because the "bases" x^(2^e) is given. So let N=sum(i=0,t,2^e[i]*u[i]), where for a fixed E>0: 0<u[i]<2^(E+1) and u[i] is odd, and in each possible choice we choose the maximal u[] that still fits in (0,2^(E+1)) [this choice is unique]. You need to compute x^N mod M what is x^sum(i=0,t,2^e[i]*u[i]) mod M= prod(i=1,2^(E+1),2,(x^(2^e[i]))^u[i]) = = prod(i=1,2^(E+1)-1,2,(prod(x^(2^e[j]) : u[j]=i))^i) mod M = = prod(i=1,2^(E+1)-1,2,y[i]^i) mod M, Here for each i we defined y[i]=prod(x^(2^e[j]) : u[j]=i) And can still compute that final product with a chain method: Say for E=2: you need R=y[1]*y[3]^3*y[5]^5*y[7]^7 a0=y[7] res=y[7] a1=y[5]*a0 (=y[5]*y[7]) res=res*a1 (=y[5]*y[7]^2) a2=y[3]*a1 (=y[3]*y[5]*y[7]) res=res*a2 (=y[3]*y[5]^2*y[7]^3) a4=y[1]*a3 (=y[1]*y[3]*y[5]*y[7]) res=res^2 res=res*a4 (=y[1]*y[3]^3*y[5]^5*y[7]^7) then R=res. So basically you'd need only 2^(E+1) mulmods in this last part. Returning to our original problem, the optimal solution is using E=13, with O(2^E*length(M)) memory, not store the x^(2^e) numbers in memory just the y[] array. Use such number that is still optimal/fits in GPU memory. |
[QUOTE=R. Gerbicz;552676]OK, here it is:
Using an improved method Houston, reached 111.181 set bits (!), and with lower limits of the solution this could work easily on GPU also. [note that we started from 720737 ones of N !!!].[/QUOTE] Hmm what do you mean by bit count here? Are you talking about the same thing (number of 1 bits in N*m for some integer m)? The algorithm you describe seems to be about computing x^N mod M more cheaply, I don't see how it'd apply to finding an m such that N*m has fewer set bits... |
[QUOTE=uau;552695]Hmm what do you mean by bit count here? Are you talking about the same thing (number of 1 bits in N*m for some integer m)? The algorithm you describe seems to be about computing x^N mod M more cheaply, I don't see how it'd apply to finding an m such that N*m has fewer set bits...[/QUOTE]
See: the original task for P-1 method is to get x^N mod M, but with multiplying N with a "small" number is still good for the method, so when you compute r(k)=x^(k*N) mod M [and then gcd(r(k)-1,M)]. In our case x^(2^e) is given free, because you're computing the prp residue for M=2^p-1. The idea from Preda was to reuse these residues to compute the required residue for the P-1 test cheaper than the standard way, and it is really possible you need just hamming(N) mulmods if log2(N)=b this is roughly b/2 for random N, and here you can still use a small multiple of N to lower this, what this topic started. In my method I'm using much smaller number of multiplications to compute this x^N mod M, if all x^(2^e) mod M is given. Note that this is a non-constant improvement, in general we can reach O(b/log(b)) mulmods, what is much better than the average b/2. You could still use the k*N idea here but as in above that gives very small gain. |
[QUOTE=R. Gerbicz;552699]In my method I'm using much smaller number of multiplications to compute this x^N mod M, if all x^(2^e) mod M is given.[/QUOTE]
Yes I got that part, what I meant to ask was what the "reached 111.181 set bits" was measuring - in context it first seemed to be about the N*m thing, but then your algorithm was about something else. So you mean something like your algorithm computing the value using 111181 multiply operations? |
[QUOTE=uau;552703]Yes I got that part, what I meant to ask was what the "reached 111.181 set bits" was measuring - in context it first seemed to be about the N*m thing, but then your algorithm was about something else. So you mean something like your algorithm computing the value using 111181 multiply operations?[/QUOTE]
Yes, so used the "number of set bits"="number of multiplications" to describe the solution's goodness. Written a small Pari code to get that number for E=13 for the given N. |
[QUOTE=R. Gerbicz;552676]OK, here it is:
[/QUOTE] Nice Robert, thank you! I look forward to implementing this, I have a feeling P-1 first-stage just got a speed boost :) (disclaimer: when used in conjuction with PRP) |
If it is possible/cheap to compute 3^-1 mod M, then it is possible to do even a little better by first recoding the exponent in signed representation {-1, 0, 1}.
[CODE] powersmooth(1000000) has 1442080 bits exp has 720737 non-zeros recoded exp has 480780 non-zeros sliding windowed recoded exp needs [B][U]100622[/U][/B] muls [/CODE] Edit: Also, all of the signed exponent words are odd. Storage of the precomputed window terms doubles because you need to precompute and store 2^-E through 2^E, but is then halved because you only need to precompute/store half of them (the odd ones), so you still need storage of O(2^E*length(M)). |
[QUOTE=bsquared;552785]If it is possible/cheap to compute 3^-1 mod M, then it is possible to do even a little better by first recoding the exponent in signed representation {-1, 0, 1}.
[CODE] powersmooth(1000000) has 1442080 bits exp has 720737 non-zeros recoded exp has 480780 non-zeros sliding windowed recoded exp needs [B][U]100622[/U][/B] muls [/CODE] [/QUOTE] Really don't understand, could you explain it? But I've just revisited my pari code and forget to subtract one, so my updated code gives only 104341 multiplications for E=13 (and that is still the optimal). In general for random N, and if we fill up the whole y[] array, then we can expect log2(N)/(E+2)+2^E multiplications (in our case this formula gives 104330 as approx. so quite close to the real number). |
[QUOTE=R. Gerbicz;552786]Really don't understand, could you explain it?
But I've just revisited my pari code and forget to subtract one, so my updated code gives only 104341 multiplications for E=13 (and that is still the optimal). In general for random N, and if we fill up the whole y[] array, then we can expect log2(N)/(E+2)+2^E multiplications (in our case this formula gives 104330 as approx. so quite close to the real number).[/QUOTE] I'm just looking for ways to minimize the number of multiplications in a modular exponentiation x^N mod M. In the [URL="http://cacr.uwaterloo.ca/hac/about/chap14.pdf"]handbook of applied cryptography[/URL], chapter 14 algorithm 14.85 we can just use left-to-right sliding windows to get the number of multiplications down to 102989 plus precomputing 2^12 odd multipliers x^b, 1 <= b < 2^k (b odd), using the (optimal) max window size k=13. Algorithm 14.121 describes how to recode the exponent in signed digits. For example e = 2^9 + 2^8 + 2^6 + 2^5 + 2^4 + 2^2 + 2 + 1 = 2^10 - 2^7 - 2^3 - 1. Doing the recoding first on our 1442080-bit exponent reduces the number of non-zero bits from to 720737 (one bits) to 480780 (one or minus-one bits). Then doing the sliding window algorithm on the recoded exponent results in 100622 muls (plus precomputation). Those multiplications will range over elements x^b, -2^13 < b < 2^13, b odd. So, now that I've written all this down, I see that recoding probably doesn't save much over just doing sliding window. Especially since you need x^-1 mod M with recoding. |
All times are UTC. The time now is 08:39. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.