mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Puzzles (https://www.mersenneforum.org/forumdisplay.php?f=18)
-   -   Is this a puzzle? (https://www.mersenneforum.org/showthread.php?t=25799)

 R. Gerbicz 2020-08-05 16:26

[QUOTE=axn;552604][CODE]136110417:717547
173564051:717463[/CODE][/QUOTE]

I'm at set bits=644353 for an equivalent defined problem AND still counting.

 axn 2020-08-05 16:47

I'm at set bits=644353 for an equivalent defined problem AND still counting.[/QUOTE]

LOL. More like too dumb, too slow :smile:

BTW, Interested in the your problem and solution approach.

 R. Gerbicz 2020-08-05 19:12

[QUOTE=axn;552651]BTW, Interested in the your problem and solution approach.[/QUOTE]

OK, here it is:
Using an improved method Houston, reached 111.181 set bits (!), and with lower limits of the solution this could work easily on GPU also. [note that we started from 720737 ones of N !!!].

See the history here: [url]https://mersenneforum.org/showpost.php?p=552398&postcount=21[/url] why we need this for P-1 test
(from another topic)
The problem is to compute x^N mod M given x^(2^e) mod M for all (small values).
It is similar to addition chain problem but here the 2^e is given free.

Use the sliding window method, you could ask why a left to right method works here,
the short reason is that because the "bases" x^(2^e) is given.

So let N=sum(i=0,t,2^e[i]*u[i]), where for a fixed E>0:
0<u[i]<2^(E+1) and u[i] is odd, and in each possible choice we choose the maximal u[] that still fits
in (0,2^(E+1)) [this choice is unique].

You need to compute x^N mod M what is x^sum(i=0,t,2^e[i]*u[i]) mod M=

prod(i=1,2^(E+1),2,(x^(2^e[i]))^u[i]) =

= prod(i=1,2^(E+1)-1,2,(prod(x^(2^e[j]) : u[j]=i))^i) mod M =

= prod(i=1,2^(E+1)-1,2,y[i]^i) mod M,

Here for each i we defined y[i]=prod(x^(2^e[j]) : u[j]=i)
And can still compute that final product with a chain method:

Say for E=2:

you need R=y[1]*y[3]^3*y[5]^5*y[7]^7

a0=y[7]
res=y[7]

a1=y[5]*a0 (=y[5]*y[7])
res=res*a1 (=y[5]*y[7]^2)

a2=y[3]*a1 (=y[3]*y[5]*y[7])
res=res*a2 (=y[3]*y[5]^2*y[7]^3)

a4=y[1]*a3 (=y[1]*y[3]*y[5]*y[7])
res=res^2
res=res*a4 (=y[1]*y[3]^3*y[5]^5*y[7]^7)

then R=res.

So basically you'd need only 2^(E+1) mulmods in this last part.

Returning to our original problem, the optimal solution is using E=13, with O(2^E*length(M)) memory, not store the x^(2^e) numbers in memory just the y[] array.
Use such number that is still optimal/fits in GPU memory.

 uau 2020-08-05 21:53

[QUOTE=R. Gerbicz;552676]OK, here it is:
Using an improved method Houston, reached 111.181 set bits (!), and with lower limits of the solution this could work easily on GPU also. [note that we started from 720737 ones of N !!!].[/QUOTE]

Hmm what do you mean by bit count here? Are you talking about the same thing (number of 1 bits in N*m for some integer m)? The algorithm you describe seems to be about computing x^N mod M more cheaply, I don't see how it'd apply to finding an m such that N*m has fewer set bits...

 R. Gerbicz 2020-08-05 22:20

[QUOTE=uau;552695]Hmm what do you mean by bit count here? Are you talking about the same thing (number of 1 bits in N*m for some integer m)? The algorithm you describe seems to be about computing x^N mod M more cheaply, I don't see how it'd apply to finding an m such that N*m has fewer set bits...[/QUOTE]

See: the original task for P-1 method is to get x^N mod M, but with multiplying N with a "small" number is still good for the method, so when you compute r(k)=x^(k*N) mod M [and then gcd(r(k)-1,M)]. In our case x^(2^e) is given free, because you're computing the prp residue for M=2^p-1. The idea from Preda was to reuse these residues to compute the required residue for the P-1 test cheaper than the standard way, and it is really possible you need just hamming(N) mulmods if log2(N)=b this is roughly b/2 for random N, and here you can still use a small multiple of N to lower this, what this topic started.

In my method I'm using much smaller number of multiplications to compute this x^N mod M, if all x^(2^e) mod M is given. Note that this is a non-constant improvement, in general we can reach O(b/log(b)) mulmods, what is much better than the average b/2. You could still use the k*N idea here but as in above that gives very small gain.

 uau 2020-08-05 22:38

[QUOTE=R. Gerbicz;552699]In my method I'm using much smaller number of multiplications to compute this x^N mod M, if all x^(2^e) mod M is given.[/QUOTE]
Yes I got that part, what I meant to ask was what the "reached 111.181 set bits" was measuring - in context it first seemed to be about the N*m thing, but then your algorithm was about something else. So you mean something like your algorithm computing the value using 111181 multiply operations?

 R. Gerbicz 2020-08-05 22:43

[QUOTE=uau;552703]Yes I got that part, what I meant to ask was what the "reached 111.181 set bits" was measuring - in context it first seemed to be about the N*m thing, but then your algorithm was about something else. So you mean something like your algorithm computing the value using 111181 multiply operations?[/QUOTE]

Yes, so used the "number of set bits"="number of multiplications" to describe the solution's goodness.
Written a small Pari code to get that number for E=13 for the given N.

 preda 2020-08-06 06:14

[QUOTE=R. Gerbicz;552676]OK, here it is:
[/QUOTE]

Nice Robert, thank you! I look forward to implementing this, I have a feeling P-1 first-stage just got a speed boost :)

(disclaimer: when used in conjuction with PRP)

 bsquared 2020-08-06 17:11

If it is possible/cheap to compute 3^-1 mod M, then it is possible to do even a little better by first recoding the exponent in signed representation {-1, 0, 1}.

[CODE]
powersmooth(1000000) has 1442080 bits
exp has 720737 non-zeros
recoded exp has 480780 non-zeros
sliding windowed recoded exp needs [B][U]100622[/U][/B] muls
[/CODE]

Edit:
Also, all of the signed exponent words are odd. Storage of the precomputed window terms doubles because you need to precompute and store 2^-E through 2^E, but is then halved because you only need to precompute/store half of them (the odd ones), so you still need storage of O(2^E*length(M)).

 R. Gerbicz 2020-08-06 17:54

[QUOTE=bsquared;552785]If it is possible/cheap to compute 3^-1 mod M, then it is possible to do even a little better by first recoding the exponent in signed representation {-1, 0, 1}.

[CODE]
powersmooth(1000000) has 1442080 bits
exp has 720737 non-zeros
recoded exp has 480780 non-zeros
sliding windowed recoded exp needs [B][U]100622[/U][/B] muls
[/CODE]
[/QUOTE]

Really don't understand, could you explain it?
But I've just revisited my pari code and forget to subtract one, so my updated code gives only 104341 multiplications for E=13 (and that is still the optimal).
In general for random N, and if we fill up the whole y[] array, then we can expect log2(N)/(E+2)+2^E multiplications (in our case this formula gives 104330 as approx. so quite close to the real number).

 bsquared 2020-08-06 18:44

[QUOTE=R. Gerbicz;552786]Really don't understand, could you explain it?
But I've just revisited my pari code and forget to subtract one, so my updated code gives only 104341 multiplications for E=13 (and that is still the optimal).
In general for random N, and if we fill up the whole y[] array, then we can expect log2(N)/(E+2)+2^E multiplications (in our case this formula gives 104330 as approx. so quite close to the real number).[/QUOTE]

I'm just looking for ways to minimize the number of multiplications in a modular exponentiation x^N mod M.
In the [URL="http://cacr.uwaterloo.ca/hac/about/chap14.pdf"]handbook of applied cryptography[/URL], chapter 14 algorithm 14.85 we can just use left-to-right sliding windows to get the number of multiplications down to 102989 plus precomputing 2^12 odd multipliers x^b, 1 <= b < 2^k (b odd), using the (optimal) max window size k=13.

Algorithm 14.121 describes how to recode the exponent in signed digits.
For example e = 2^9 + 2^8 + 2^6 + 2^5 + 2^4 + 2^2 + 2 + 1 = 2^10 - 2^7 - 2^3 - 1.

Doing the recoding first on our 1442080-bit exponent reduces the number of non-zero bits from to 720737 (one bits) to 480780 (one or minus-one bits).
Then doing the sliding window algorithm on the recoded exponent results in 100622 muls (plus precomputation). Those multiplications will range over elements x^b, -2^13 < b < 2^13, b odd.

So, now that I've written all this down, I see that recoding probably doesn't save much over just doing sliding window. Especially since you need x^-1 mod M with recoding.

All times are UTC. The time now is 23:20.