 mersenneforum.org Java modPow slower than pow following by mod
 Register FAQ Search Today's Posts Mark Forums Read 2022-11-04, 16:18 #1 rdmclure   Nov 2022 2 Posts Java modPow slower than pow following by mod testing how long it would take to raise 3 to a power x mod 2^p - 1. I tried 2^p - 1 where p = the highest known mersenne prime 2^82589933 because of Java the exponent must be int so less than 2^32 - 1 I chose 2^26 so I try something like Int a = 26; int n = 82589933 int j =0; BigInteger b = new BigInteger(“2”); BigInteger c = new BigInteger (“3”); BigInteger d= new BigInteger (“0”); BigInteger e= new BigInteger (“0”); BigInteger p= new BigInteger (“0”); BigInteger large = new BigInteger (“0”); BigInteger test= new BigInteger(“0”); d=b.pow(a); j = d.intVal(); // exponent must be int e=b.pow(n); // p=e.subtract(BigInteger.ONE); //prime large = c.pow(j); // 3^(2^26); test = large.mod(p); … this takes about a minute and a half to run then I try test = c.modPow(d,p) and I give up after an hour I would feel better if they took roughly the same time.   2022-11-04, 18:20 #2 rogue   "Mark" Apr 2003 Between here and the 156178 Posts 2^p - 1 for your p is a very large number at over 2 million digits. Of course it will take a long time to compute.   2022-11-04, 19:15 #3 rdmclure   Nov 2022 2 Posts but. … it didn’t take a long time . this code is 2 ^ 82539833 a min and forty seconds e=b.pow(n); // p=e.subtract(BigInteger.ONE); //prime It’s the modPow that is hours+   2022-11-04, 19:26   #4
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

165548 Posts Quote:
 Originally Posted by rdmclure testing how long it would take to raise 3 to a power x mod 2^p - 1. I tried 2^p - 1 where p = the highest known mersenne prime 2^82589933 because of Java the exponent must be int so less than 2^32 - 1 I chose 2^26 so I try something like Int a = 26; int n = 82589933 int j =0; BigInteger b = new BigInteger(“2”); BigInteger c = new BigInteger (“3”); BigInteger d= new BigInteger (“0”); BigInteger e= new BigInteger (“0”); BigInteger p= new BigInteger (“0”); BigInteger large = new BigInteger (“0”); BigInteger test= new BigInteger(“0”); d=b.pow(a); //2^26 j = d.intVal(); // exponent must be int e=b.pow(n); //2^82589933 p=e.subtract(BigInteger.ONE); //prime large = c.pow(j); // 3^(2^26); test = large.mod(p); … this takes about a minute and a half to run then I try test = c.modPow(d,p) and I give up after an hour I would feel better if they took roughly the same time.
Testing on how many cores & java streams of what processor?
The highest known mersenne prime is not "2^82589933" nor is its exponent.
226 = 67108864. Reached in 26 squarings.
2^82589933 is hugely bigger. Reached in 82,589,933 squarings.
Try working your way up gradually, rather than in one giant leap, & include accurate timing in the program, to determine run time scaling in acceptable testing time. For example p=
13
127
1279
11213
110503
graph resulting timings log/log & regression fit a power function to the timing data for which run time is a function of exponent. (Very small values may have initialization & other overhead greater than computation time, and should be excluded from the power fit.)

Well written, tuned and optimized compiled GPU code on a fast GPU (gpuowl 6.11-380 on Radeon VII) takes ~10 hours for PRP3 computation on M82589933, and scales as good fft implementations do ~p2 log p log log p ~p2.1. Higher powers are bad. Long multiplication algorithm scales as ~p3.
Naive implementations have been posted, whose run time scalings indicate very long wavefront timings, in one case longer than the estimated age of the universe.

Last fiddled with by kriesel on 2022-11-04 at 19:43   2022-11-04, 19:31   #5
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×7×269 Posts Quote:
 Originally Posted by rdmclure but. … it didn’t take a long time . this code is 2 ^ 82539833
The string 82539833 does not appear in post 1.   2022-11-04, 19:31   #6
chalsall
If I May

"Chris Halsall"
Sep 2002

5×2,237 Posts Quote:
 Originally Posted by kriesel ...in one case longer than the estimated age of the universe.
ROFL... Thank you for that! Seriously.    2022-11-05, 11:22   #7
Till

"Tilman Neumann"
Jan 2016
Germany

13·41 Posts Quote:
 Originally Posted by rdmclure testing how long it would take to raise 3 to a power x mod 2^p - 1. I tried 2^p - 1 where p = the highest known mersenne prime 2^82589933 because of Java the exponent must be int so less than 2^32 - 1 I chose 2^26 so I try something like Int a = 26; int n = 82589933 int j =0; BigInteger b = new BigInteger(“2”); BigInteger c = new BigInteger (“3”); BigInteger d= new BigInteger (“0”); BigInteger e= new BigInteger (“0”); BigInteger p= new BigInteger (“0”); BigInteger large = new BigInteger (“0”); BigInteger test= new BigInteger(“0”); d=b.pow(a); j = d.intVal(); // exponent must be int e=b.pow(n); // p=e.subtract(BigInteger.ONE); //prime large = c.pow(j); // 3^(2^26); test = large.mod(p); … this takes about a minute and a half to run then I try test = c.modPow(d,p) and I give up after an hour I would feel better if they took roughly the same time.

The (appended) program computes c^d mod p in both ways, where
* a = 2,3,4,5,...
* c is a random number in [3, 1023]
* d is a random number about the size of 2^a,
* and p is a random number about the size of 2^(2^a)

The results confirm the issue for random numbers of this kind. (see below)

Looking at BigInteger.modPow() source code, it looks quite sophisticated, using Montgomery squaring and multiplication and even tearing the modulus into an "odd part" and power of two,
and finally combining thew results using the chinese remainder theorem. But something the developers must have gotten terribly wrong.

Probably this is not the main cause, but one thing that irritated me for a long time is the following source code comment on BigInteger.montgomeryMultiply():
// Montgomery multiplication. These are wrappers for
// implMontgomeryXX routines which are expected to be replaced by
// virtual machine intrinsics. We don't use the intrinsics for
// very large operands: MONTGOMERY_INTRINSIC_THRESHOLD should be
// larger than any reasonable crypto key.

MONTGOMERY_INTRINSIC_THRESHOLD is 512 and measured in ints; so the intrinsics (optimized assembler code) is not used for modulus > 16384 bit.
This suggests that the method was never supposed to be optimized for large inputs.
Possibly it was only tested for the input sizes required by Java's own crypto libraries.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
test logs (getting interesting at a > 14)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

a = 2, c = 310, d = 7, computed in 0 ms ---------------------------
p (8 bit) computed in 1 ms
r = c^d (58 bit) computed in 0 ms
r % p (7 bit) computed in 0 ms
c.modPow(d,p) (7 bit) computed in 0 ms
results equal? true

a = 3, c = 36, d = 22, computed in 0 ms ---------------------------
p (24 bit) computed in 0 ms
r = c^d (114 bit) computed in 0 ms
r % p (24 bit) computed in 0 ms
c.modPow(d,p) (24 bit) computed in 0 ms
results equal? true

a = 4, c = 388, d = 24, computed in 0 ms ---------------------------
p (26 bit) computed in 0 ms
r = c^d (207 bit) computed in 0 ms
r % p (26 bit) computed in 0 ms
c.modPow(d,p) (26 bit) computed in 1 ms
results equal? true

a = 5, c = 379, d = 61, computed in 0 ms ---------------------------
p (62 bit) computed in 0 ms
r = c^d (523 bit) computed in 0 ms
r % p (60 bit) computed in 0 ms
c.modPow(d,p) (60 bit) computed in 0 ms
results equal? true

a = 6, c = 820, d = 183, computed in 0 ms ---------------------------
p (184 bit) computed in 0 ms
r = c^d (1772 bit) computed in 0 ms
r % p (183 bit) computed in 0 ms
c.modPow(d,p) (183 bit) computed in 0 ms
results equal? true

a = 7, c = 174, d = 201, computed in 0 ms ---------------------------
p (203 bit) computed in 0 ms
r = c^d (1497 bit) computed in 0 ms
r % p (203 bit) computed in 1 ms
c.modPow(d,p) (203 bit) computed in 0 ms
results equal? true

a = 8, c = 581, d = 345, computed in 0 ms ---------------------------
p (347 bit) computed in 0 ms
r = c^d (3168 bit) computed in 0 ms
r % p (347 bit) computed in 0 ms
c.modPow(d,p) (347 bit) computed in 0 ms
results equal? true

a = 9, c = 118, d = 1159, computed in 0 ms ---------------------------
p (1161 bit) computed in 0 ms
r = c^d (7977 bit) computed in 1 ms
r % p (1159 bit) computed in 0 ms
c.modPow(d,p) (1159 bit) computed in 0 ms
results equal? true

a = 10, c = 181, d = 2157, computed in 0 ms ---------------------------
p (2158 bit) computed in 1 ms
r = c^d (16178 bit) computed in 1 ms
r % p (2158 bit) computed in 0 ms
c.modPow(d,p) (2158 bit) computed in 1 ms
results equal? true

a = 11, c = 490, d = 3648, computed in 0 ms ---------------------------
p (3642 bit) computed in 0 ms
r = c^d (32601 bit) computed in 3 ms
r % p (3642 bit) computed in 2 ms
c.modPow(d,p) (3642 bit) computed in 0 ms
results equal? true

a = 12, c = 983, d = 11308, computed in 0 ms ---------------------------
p (11308 bit) computed in 1 ms
r = c^d (112414 bit) computed in 10 ms
r % p (11305 bit) computed in 8 ms
c.modPow(d,p) (11305 bit) computed in 3 ms
results equal? true

a = 13, c = 871, d = 24939, computed in 0 ms ---------------------------
p (24941 bit) computed in 0 ms
r = c^d (243568 bit) computed in 11 ms
r % p (24941 bit) computed in 9 ms
c.modPow(d,p) (24941 bit) computed in 21 ms
results equal? true

a = 14, c = 331, d = 24853, computed in 0 ms ---------------------------
p (24854 bit) computed in 0 ms
r = c^d (208037 bit) computed in 6 ms
r % p (24854 bit) computed in 6 ms
c.modPow(d,p) (24854 bit) computed in 18 ms
results equal? true

a = 15, c = 714, d = 76517, computed in 0 ms ---------------------------
p (76519 bit) computed in 1 ms
r = c^d (725365 bit) computed in 31 ms
r % p (76519 bit) computed in 34 ms
c.modPow(d,p) (76519 bit) computed in 154 ms
results equal? true

a = 16, c = 803, d = 143429, computed in 0 ms ---------------------------
p (143430 bit) computed in 1 ms
r = c^d (1383984 bit) computed in 104 ms
r % p (143429 bit) computed in 87 ms
c.modPow(d,p) (143429 bit) computed in 349 ms
results equal? true

a = 17, c = 520, d = 380768, computed in 0 ms ---------------------------
p (380770 bit) computed in 2 ms
r = c^d (3435429 bit) computed in 78 ms
r % p (380770 bit) computed in 146 ms
c.modPow(d,p) (380770 bit) computed in 2583 ms
results equal? true

a = 18, c = 792, d = 578458, computed in 0 ms ---------------------------
p (578460 bit) computed in 0 ms
r = c^d (5570179 bit) computed in 321 ms
r % p (578460 bit) computed in 293 ms
c.modPow(d,p) (578460 bit) computed in 6322 ms
results equal? true

a = 19, c = 370, d = 708558, computed in 0 ms ---------------------------
p (708559 bit) computed in 0 ms
r = c^d (6044979 bit) computed in 309 ms
r % p (708556 bit) computed in 330 ms
c.modPow(d,p) (708556 bit) computed in 9976 ms
results equal? true

a = 20, c = 888, d = 2737668, computed in 0 ms ---------------------------
p (2737669 bit) computed in 1 ms
r = c^d (26813859 bit) computed in 1849 ms
r % p (2737667 bit) computed in 2863 ms
c.modPow(d,p) (2737667 bit) computed in 148685 ms
results equal? true

a = 21, c = 126, d = 3460325, computed in 0 ms ---------------------------
p (3460327 bit) computed in 1 ms
r = c^d (24143657 bit) computed in 1902 ms
r % p (3460323 bit) computed in 2721 ms
c.modPow(d,p) (3460323 bit) computed in 260967 ms
results equal? true

a = 22, c = 621, d = 10025779, computed in 0 ms ---------------------------
p (10025777 bit) computed in 3 ms
r = c^d (93023684 bit) computed in 19541 ms
r % p (10025777 bit) computed in 19307 ms
c.modPow(d,p) (10025777 bit) computed in 2299139 ms
results equal? true
Attached Files LargeModPowTest.txt (5.3 KB, 39 views)

Last fiddled with by Till on 2022-11-05 at 11:25 Reason: interesting tests at a>14   2022-11-05, 17:24 #8 Till   "Tilman Neumann" Jan 2016 Germany 13·41 Posts After some more investigations, it seems to me that Montgomery multiplication is a bad choice for this kind of numbers (small base, big exponent, big modulus). Let B be the base, E the exponent, N the modulus. With Montgomery multiplication, modPow() does something like O(log E) multiplications/squares (and possibly modulus computations) of N-size numbers. If B^E is not much bigger than N, then a small fraction of those required Montgomery mults/squares are enough to match the runtime of the "silly" B^E computation. Doing "naive" repeated squaring (without Montgomery) might be even better because it starts with very small products/squares, whose size will surpass the modulus in only a few last operations. Last fiddled with by Till on 2022-11-05 at 17:27 Reason: clean up  Thread Tools Show Printable Version Email this Page Similar Threads Thread Thread Starter Forum Replies Last Post a1call Programming 19 2019-11-08 22:31 lucaf Computer Science & Computational Number Theory 8 2017-10-06 20:56 Ilya Gazman Factoring 3 2016-02-22 11:32 GordonBM Information & Answers 11 2012-04-24 08:27 ThiloHarich Factoring 41 2005-11-28 21:18

All times are UTC. The time now is 10:29.

Fri Mar 24 10:29:06 UTC 2023 up 218 days, 7:57, 0 users, load averages: 1.32, 1.20, 1.10