mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Puzzles (https://www.mersenneforum.org/forumdisplay.php?f=18)
-   -   Is this a puzzle? (https://www.mersenneforum.org/showthread.php?t=25799)

 preda 2020-08-03 10:26

Is this a puzzle?

Consider the number PowerSmooth(1000000) with the 2 factors removed; e.g. in pari-gp:
[CODE]
PowerSmooth(B1)= result=1; forprime(p=3, B1, result *= p^floor(log(B1)/log(p))); return(result)
PowerSmooth(1000000)
[/CODE]

This number, in binary, has 720737 bits set to 1 ("binary hamming weight", hammingweight() in pari-gp).

The puzzle is: find a multiple of this number that has a Hamming Weight <= 710000. (or, How low a HW multiple can you find?)

Thanks! :)

 axn 2020-08-03 13:24

Iterating thru all odd numbers, starting from 3,
[CODE]3:720673
9:720566
21:720169
59:720143
89:719926
127:719857
237:719477
261:719250
817:719025
6923:718588
229095:718410
276755:718351
322621:718138[/CODE]

Using the multiplicative inverse of lower order n bits:
[CODE]10:720468
12:719859
242:719603
607:719455
1945:719364
1954:718901
5343:718576[/CODE]
Here 5343 means the multiplier is Mod( N, 2^5343)^-1

Using random odd numbers < 2^64
[CODE]10719928016004921607:720163
9421399378650073937:719987
9838800997419312683:719646
9031419907504264227:719565
12801182192819992405:719465
11276055256514179587:719415
7687617259492261083:719397
3594335502178041641:719255
6046992651811463807:718858
12624414163832553385:718786
4750919058227643243:718688
18130704136173611049:718654
5224850625010740453:718528
1147537420811012579:718472
7233411429689023907:718469
1993933662070155813:718441
6862363197769235605:718113[/CODE]

 preda 2020-08-04 01:00

I broke through 718K:
multiplying with 1999099*3684553 produces HW=717965

still a long way to go to 710K.. it looks like the "brute force search" is not working very well.

 kruoli 2020-08-04 14:11

Brute force probability analysis.

Hitting < 718K should have a probability around 1 in 2e7, so still achievable by brute force, hitting < 710K is much less likely with 1 in 8e76. Formula used (please correct me if my approach is not applicable):

[$$]p = \frac{\sum_{k=0}^{x}{\binom{\lceil log2(n) \rceil}{k}}}{n}[/$$]
[LIST][*][$]p[/$] is the probability.[*][$]n[/$] in the number we want to "depopulize".[*][$]x[/$] is the target popcount. If our target popcount would be greater than [$]\lceil log2(n) \rceil[/$], take the sum from [$]x[/$] to [$]\lceil log2(n) \rceil[/$].[/LIST]
Since we are multiplying [$]n[/$] with another value to "depopulize" it, the final result number is getting larger. This is not displayed by this formula, but should not make a huge difference, since our value for multiplication is really small in comparison.

 xilman 2020-08-04 14:15

[QUOTE=kruoli;552531]Hitting < 718K should have a probability around 1 in 2e7, so still achievable by brute force, hitting < 710K is much less likely with 1 in 8e76. Formula used (please correct me if my approach is not applicable):

[$$]p = \frac{\sum_{k=0}^{x}{\binom{\lceil log2(n) \rceil}{k}}}{n}[/$$]
[LIST][*][$]p[/$] is the probability.[*][$]n[/$] in the number we want to "depopulize".[*][$]x[/$] is the target popcount. If our target popcount would be greater than [$]\lceil log2(n) \rceil[/$], take the sum from [$]x[/$] to [$]\lceil log2(n) \rceil[/$].[/LIST]
Since we are multiplying [$]n[/$] with another value to "depopulize" it, the final result number is getting larger. This is not displayed by this formula, but should not make a huge difference, since our value for multiplication is really small in comparison.[/QUOTE]There appears to be a typo. Should the exceptional sum be taken from 0, not x?

 kruoli 2020-08-04 14:19

Do you mean my third bullet point? In that case the sum really should still go from [$]x[/$] to [$]\lceil log2(n) \rceil[/$]. But maybe I'm missing something here. :confused2:

 xilman 2020-08-04 14:43

Another possibly interesting question: the quantity hammingweight(PowerSmooth(N))/N appears to converge to a value close to 0.72 ---can one prove that it converges and can one find a relatively simple expression for its value?

 uau 2020-08-04 15:07

[QUOTE=xilman;552537]Another possibly interesting question: the quantity hammingweight(PowerSmooth(N))/N appears to converge to a value close to 0.72 ---can one prove that it converges and can one find a relatively simple expression for its value?[/QUOTE]
I assume that half the bits are ones, so that's about equivalent to length of the number x being 1.44N bits, or log(x) = 1.44N*log(2) = N. The prime number theorem says the sum of log(p) for primes p less than L is about L. The PowerSmooth function multiplies the largest power of the prime less than L instead of just the prime itself, but that shouldn't make too much difference.

So the 0.72 value is probably 1/log(2)/2, where the 1/log(2) comes from the size of the product as predicted by the prime number theorem, and an additional /2 for only half the bits being ones.

 Citrix 2020-08-05 03:57

[QUOTE=xilman;552537]Another possibly interesting question: the quantity hammingweight(PowerSmooth(N))/N appears to converge to a value close to 0.72 ---can one prove that it converges and can one find a relatively simple expression for its value?[/QUOTE]

This should follow a binomial distribution-I suspect. Though no proof as bits are not independent.

 axn 2020-08-05 08:14

[QUOTE=preda;552502]I broke through 718K:
multiplying with 1999099*3684553 produces HW=717965

still a long way to go to 710K.. it looks like the "brute force search" is not working very well.[/QUOTE]

Minor improvements from brute force:
[CODE]27275863:717813
93899447:717739[/CODE]

 axn 2020-08-05 09:39

[CODE]136110417:717547
173564051:717463[/CODE]

 R. Gerbicz 2020-08-05 16:26

[QUOTE=axn;552604][CODE]136110417:717547
173564051:717463[/CODE][/QUOTE]

I'm at set bits=644353 for an equivalent defined problem AND still counting.

 axn 2020-08-05 16:47

I'm at set bits=644353 for an equivalent defined problem AND still counting.[/QUOTE]

LOL. More like too dumb, too slow :smile:

BTW, Interested in the your problem and solution approach.

 R. Gerbicz 2020-08-05 19:12

[QUOTE=axn;552651]BTW, Interested in the your problem and solution approach.[/QUOTE]

OK, here it is:
Using an improved method Houston, reached 111.181 set bits (!), and with lower limits of the solution this could work easily on GPU also. [note that we started from 720737 ones of N !!!].

See the history here: [url]https://mersenneforum.org/showpost.php?p=552398&postcount=21[/url] why we need this for P-1 test
(from another topic)
The problem is to compute x^N mod M given x^(2^e) mod M for all (small values).
It is similar to addition chain problem but here the 2^e is given free.

Use the sliding window method, you could ask why a left to right method works here,
the short reason is that because the "bases" x^(2^e) is given.

So let N=sum(i=0,t,2^e[i]*u[i]), where for a fixed E>0:
0<u[i]<2^(E+1) and u[i] is odd, and in each possible choice we choose the maximal u[] that still fits
in (0,2^(E+1)) [this choice is unique].

You need to compute x^N mod M what is x^sum(i=0,t,2^e[i]*u[i]) mod M=

prod(i=1,2^(E+1),2,(x^(2^e[i]))^u[i]) =

= prod(i=1,2^(E+1)-1,2,(prod(x^(2^e[j]) : u[j]=i))^i) mod M =

= prod(i=1,2^(E+1)-1,2,y[i]^i) mod M,

Here for each i we defined y[i]=prod(x^(2^e[j]) : u[j]=i)
And can still compute that final product with a chain method:

Say for E=2:

you need R=y[1]*y[3]^3*y[5]^5*y[7]^7

a0=y[7]
res=y[7]

a1=y[5]*a0 (=y[5]*y[7])
res=res*a1 (=y[5]*y[7]^2)

a2=y[3]*a1 (=y[3]*y[5]*y[7])
res=res*a2 (=y[3]*y[5]^2*y[7]^3)

a4=y[1]*a3 (=y[1]*y[3]*y[5]*y[7])
res=res^2
res=res*a4 (=y[1]*y[3]^3*y[5]^5*y[7]^7)

then R=res.

So basically you'd need only 2^(E+1) mulmods in this last part.

Returning to our original problem, the optimal solution is using E=13, with O(2^E*length(M)) memory, not store the x^(2^e) numbers in memory just the y[] array.
Use such number that is still optimal/fits in GPU memory.

 uau 2020-08-05 21:53

[QUOTE=R. Gerbicz;552676]OK, here it is:
Using an improved method Houston, reached 111.181 set bits (!), and with lower limits of the solution this could work easily on GPU also. [note that we started from 720737 ones of N !!!].[/QUOTE]

Hmm what do you mean by bit count here? Are you talking about the same thing (number of 1 bits in N*m for some integer m)? The algorithm you describe seems to be about computing x^N mod M more cheaply, I don't see how it'd apply to finding an m such that N*m has fewer set bits...

 R. Gerbicz 2020-08-05 22:20

[QUOTE=uau;552695]Hmm what do you mean by bit count here? Are you talking about the same thing (number of 1 bits in N*m for some integer m)? The algorithm you describe seems to be about computing x^N mod M more cheaply, I don't see how it'd apply to finding an m such that N*m has fewer set bits...[/QUOTE]

See: the original task for P-1 method is to get x^N mod M, but with multiplying N with a "small" number is still good for the method, so when you compute r(k)=x^(k*N) mod M [and then gcd(r(k)-1,M)]. In our case x^(2^e) is given free, because you're computing the prp residue for M=2^p-1. The idea from Preda was to reuse these residues to compute the required residue for the P-1 test cheaper than the standard way, and it is really possible you need just hamming(N) mulmods if log2(N)=b this is roughly b/2 for random N, and here you can still use a small multiple of N to lower this, what this topic started.

In my method I'm using much smaller number of multiplications to compute this x^N mod M, if all x^(2^e) mod M is given. Note that this is a non-constant improvement, in general we can reach O(b/log(b)) mulmods, what is much better than the average b/2. You could still use the k*N idea here but as in above that gives very small gain.

 uau 2020-08-05 22:38

[QUOTE=R. Gerbicz;552699]In my method I'm using much smaller number of multiplications to compute this x^N mod M, if all x^(2^e) mod M is given.[/QUOTE]
Yes I got that part, what I meant to ask was what the "reached 111.181 set bits" was measuring - in context it first seemed to be about the N*m thing, but then your algorithm was about something else. So you mean something like your algorithm computing the value using 111181 multiply operations?

 R. Gerbicz 2020-08-05 22:43

[QUOTE=uau;552703]Yes I got that part, what I meant to ask was what the "reached 111.181 set bits" was measuring - in context it first seemed to be about the N*m thing, but then your algorithm was about something else. So you mean something like your algorithm computing the value using 111181 multiply operations?[/QUOTE]

Yes, so used the "number of set bits"="number of multiplications" to describe the solution's goodness.
Written a small Pari code to get that number for E=13 for the given N.

 preda 2020-08-06 06:14

[QUOTE=R. Gerbicz;552676]OK, here it is:
[/QUOTE]

Nice Robert, thank you! I look forward to implementing this, I have a feeling P-1 first-stage just got a speed boost :)

(disclaimer: when used in conjuction with PRP)

 bsquared 2020-08-06 17:11

If it is possible/cheap to compute 3^-1 mod M, then it is possible to do even a little better by first recoding the exponent in signed representation {-1, 0, 1}.

[CODE]
powersmooth(1000000) has 1442080 bits
exp has 720737 non-zeros
recoded exp has 480780 non-zeros
sliding windowed recoded exp needs [B][U]100622[/U][/B] muls
[/CODE]

Edit:
Also, all of the signed exponent words are odd. Storage of the precomputed window terms doubles because you need to precompute and store 2^-E through 2^E, but is then halved because you only need to precompute/store half of them (the odd ones), so you still need storage of O(2^E*length(M)).

 R. Gerbicz 2020-08-06 17:54

[QUOTE=bsquared;552785]If it is possible/cheap to compute 3^-1 mod M, then it is possible to do even a little better by first recoding the exponent in signed representation {-1, 0, 1}.

[CODE]
powersmooth(1000000) has 1442080 bits
exp has 720737 non-zeros
recoded exp has 480780 non-zeros
sliding windowed recoded exp needs [B][U]100622[/U][/B] muls
[/CODE]
[/QUOTE]

Really don't understand, could you explain it?
But I've just revisited my pari code and forget to subtract one, so my updated code gives only 104341 multiplications for E=13 (and that is still the optimal).
In general for random N, and if we fill up the whole y[] array, then we can expect log2(N)/(E+2)+2^E multiplications (in our case this formula gives 104330 as approx. so quite close to the real number).

 bsquared 2020-08-06 18:44

[QUOTE=R. Gerbicz;552786]Really don't understand, could you explain it?
But I've just revisited my pari code and forget to subtract one, so my updated code gives only 104341 multiplications for E=13 (and that is still the optimal).
In general for random N, and if we fill up the whole y[] array, then we can expect log2(N)/(E+2)+2^E multiplications (in our case this formula gives 104330 as approx. so quite close to the real number).[/QUOTE]

I'm just looking for ways to minimize the number of multiplications in a modular exponentiation x^N mod M.
In the [URL="http://cacr.uwaterloo.ca/hac/about/chap14.pdf"]handbook of applied cryptography[/URL], chapter 14 algorithm 14.85 we can just use left-to-right sliding windows to get the number of multiplications down to 102989 plus precomputing 2^12 odd multipliers x^b, 1 <= b < 2^k (b odd), using the (optimal) max window size k=13.

Algorithm 14.121 describes how to recode the exponent in signed digits.
For example e = 2^9 + 2^8 + 2^6 + 2^5 + 2^4 + 2^2 + 2 + 1 = 2^10 - 2^7 - 2^3 - 1.

Doing the recoding first on our 1442080-bit exponent reduces the number of non-zero bits from to 720737 (one bits) to 480780 (one or minus-one bits).
Then doing the sliding window algorithm on the recoded exponent results in 100622 muls (plus precomputation). Those multiplications will range over elements x^b, -2^13 < b < 2^13, b odd.

So, now that I've written all this down, I see that recoding probably doesn't save much over just doing sliding window. Especially since you need x^-1 mod M with recoding.

 bsquared 2020-08-06 18:59

Maybe also relevant to mention that the problem of finding the shortest addition chain for a given binary string is known to be NP-hard (section 14.102 in HOAC). But you'd only have to do it once for a fixed B1.

 R. Gerbicz 2020-08-06 19:42

[QUOTE=bsquared;552789]
Doing the recoding first on our 1442080-bit exponent reduces the number of non-zero bits from to 720737 (one bits) to 480780 (one or minus-one bits).
Then doing the sliding window algorithm on the recoded exponent results in 100622 muls (plus precomputation). Those multiplications will range over elements x^b, -2^13 < b < 2^13, b odd.

So, now that I've written all this down, I see that recoding probably doesn't save much over just doing sliding window. Especially since you need x^-1 mod M with recoding.[/QUOTE]

Searched today exactly that crypto paper, thanks.
I'll check it.

 R. Gerbicz 2020-08-06 20:18

[QUOTE=bsquared;552789]we can just use left-to-right sliding windows to get the number of multiplications down to 102989 plus precomputing 2^12 odd multipliers x^b, 1 <= b < 2^k (b odd), using the (optimal) max window size k=13.
[/QUOTE]

Yes, I've used the popular sliding window name, but here there is no such precomputation here. Yeah it is hard to believe there is such an algorithm, because in that sliding window you are actually doing many squaring, so you could not reach less than log2(N)=b multiplications, that is a complexity barrier here. But as I've written with the given x^(2^e) you can reach ~b/2 multiplications if you are using only the bits=1, what is now lower than the barrier. But there is another type of "precomputation" when in y[k] you're accumulating the products you needed to raise to the k-th power.

To get an inside where I've gotten the idea, when you want to multiple by x^63 with the traditional method you need 6 multiplications but with:

1. multiple x and x^4 to get x^5
2. multiple x^5 and x^16 to get x^21
3. square x^21 to get x^42
4. multiple x^21 and x^42 to get x^63

You need only 5, because you need to multiple the partial product by x^63. But now it is already a gain over the classical method. Found this in my head, without computer, and actually this is the smallest counter-example.
You can do even better multiple by x^64 and divide by x. Using only 3 mults total. Similar to your method.
And here x is NOT fixed, but if you extract those x bases using the given N exponent/residues and see what you need to raise to the k=63 or any power in the window you can save lots of work, what we reached with the optimal E=13.

 bsquared 2020-08-06 20:42

I see that you are finding a short addition chain using the x^(2^k) elements that will be computed anyway during the test. That's a very nice approach. As mentioned, the short addition chain problem is hard, so I was looking at how other known methods compared. I had thought that exponent recoding gave a larger benefit but after looking closer it is not much better than a standard sliding window. Both of those known methods are not any better than what you have found (but close).

 R. Gerbicz 2020-08-06 20:58

[QUOTE=bsquared;552795]That's a very nice approach. As mentioned, the short addition chain problem is hard.[/QUOTE]

What addition chains here is now obsolete in the method. See one of my post, the required final task for E=2 to get:
R=y[1]*y[3]^3*y[5]^5*y[7]^7 mod M.

 R. Gerbicz 2020-08-06 23:05

[QUOTE=bsquared;552789]
Doing the recoding first on our 1442080-bit exponent reduces the number of non-zero bits from to 720737 (one bits) to 480780 (one or minus-one bits).
Then doing the sliding window algorithm on the recoded exponent results in 100622 muls (plus precomputation). Those multiplications will range over elements x^b, -2^13 < b < 2^13, b odd.
[/QUOTE]

Checked those 480780, 100622 numbers. Turned out that it is using more multiplications, I counted every mults, and to get the final product from the y[] array you also need mults. I mean this is not that a big surprise, for the old/known powmod problem your signed representation could be a winner, but this is a slightly different problem.
Counted 104979 mults for your method but using E=13, and 106085 for E=12.
[I have 104341 for my method with E=13].

Actually when you first hit a given entry in the y[] array then you can do only a (fast) set and not a mult, so that is smaller than 100622 mults (in my method you can also do this trick). And for the exponents (for E=12) yes it is true that - 2^(E+1)< exponent <2^(E+1) but since in the signed representation you don't see two consectuive non-zero term the sharper inequality holds:
abs(exponent)<=2^E*4/3 (just sum 2^E,2^(E-2),2^(E-4) etc.)
even with these tiny improvements you lost (at least for this given fixed N), but likely in the general case also.

 R. Gerbicz 2020-08-19 01:44

I've found another algorithm to compute g^exponent once you are given all g^(2^k) [mod N]. Turned out it is using more mulmods, but this still depends how efficiently you partition the bits=1 in the exponent.
The problem with the fixed window size is that you are killing those ones not that efficiently (because only half of the bits is one), what about using not fixed size windows? Fix an offset vector: offset=[0,...,], and we're searching shifts t, for that for all i index at offset[i]+t there is a one in exponent, and to get many hits say size(offset)<=14 for our N. We can use an abnormally large window, say offset[i]<2000. Accumulate the product of these x(k)=prod(g^(2^t)), at the end you need to calculate prod(x(k)^e(k)), do now the standard way, collect the product for each 2^h in the exponent e(k), note that e(k)=prod(2^offset[]), so e(k) contains very few bits=1.The complexity of each offset is:
count-1+size(offset) mulmods, where you have count different t shits for the offset, hence its rate:
rate=(count-1+size(offset))/(size(offset)*count)
[you need to lower the rate].

 All times are UTC. The time now is 09:29.