![]() |
[QUOTE=Nick;475879]The standard way to do this, in cases where you know the factorization of the exponent, is with the Chinese Remainder Theorem.[/QUOTE]
How? I know that when you need to do Mod(a,p*q)^b, you can do Mod(a,p)^b and Mod(a,q)^b and recover the final result using CRT. But how does one proceed in the case of Mod(a,b)^(p*q) using CRT? |
[QUOTE=a1call;475897]But by multiple threads calculating different small exponents and then multiplying the results and by doing this in multiple hierarchies, many computers will do simultaneously and different and small exponent calculations that should save time.[/QUOTE]
I'd like to see an example of how this would work. Suppose I want to compute 2^74207281 mod a big prime and I have 100 computers at my disposal which can communicate with each of the others instantaneously. Each can compute multiplication mod the prime in one step. How many steps does it take you to find the result? Of course this is an optimistic scenario because connectivity is usually just with near neighbors and it always introduces delay. |
[QUOTE=CRGreathouse;475954]I'd like to see an example of how this would work. Suppose I want to compute 2^74207281 mod a big prime and I have 100 computers at my disposal which can communicate with each of the others instantaneously. Each can compute multiplication mod the prime in one step. How many steps does it take you to find the result?
Of course this is an optimistic scenario because connectivity is usually just with near neighbors and it always introduces delay.[/QUOTE] Ignoring the initial small powers, that's about 26 sequential square-mods. So while you can't parallelize those, you can still parallelize within each square-mod. Assuming an arbitrary p with no special form that can be exploited Each square-mod would be: [LIST][*]Multiply-by-reciprocal of p.[*]Multiply-by-p and subtract.[/LIST]Assuming FFT multiply, the forward transforms for both p and the reciprocal of p can be precomputed. So both of the multiply steps above reduce to 1 forward and 1 inverse transform each. If you fuse the tail ends of those, you get around 6 compute passes with implied data transposes in between. So that would be approximately 26 x 6 = 156 sequential steps. Each step depends on the previous. But there is unlimited parallelism within each step. ----- In any case 2^74207281 is too small. Perhaps you meant 2^(2^74207281)? |
[QUOTE=CRGreathouse;475954]I'd like to see an example of how this would work. Suppose I want to compute 2^74207281 mod a big prime and I have 100 computers at my disposal which can communicate with each of the others instantaneously. Each can compute multiplication mod the prime in one step. How many steps does it take you to find the result?
Of course this is an optimistic scenario because connectivity is usually just with near neighbors and it always introduces delay.[/QUOTE] I will be back to work tomorrow so not likely I will have the time to give a detailed example If I had the time I world time some power mod to a given exponent. Break the exponent in two. Time the power mod for each which should be less, assume they were done simultaneously on two computers and multiply the results. That would be s proof of concept. Exponentiation and multiplication are essentially the same. The trick is to do simultaneous operations in one level of hierarchy multiply the results up the narrowing hierarchy reaching a single value. |
[QUOTE=Mysticial;475961]In any case 2^74207281 is too small.[/QUOTE]
I thought I'd start with a work-by-hand size example, and if there's progress we can move up. I get [SPOILER]31 steps[/SPOILER] with 1 machine and [SPOILER]27 steps[/SPOILER] with [SPOILER]2 or more machines[/SPOILER]. |
[QUOTE=axn;475931]How?
I know that when you need to do Mod(a,p*q)^b, you can do Mod(a,p)^b and Mod(a,q)^b and recover the final result using CRT. But how does one proceed in the case of Mod(a,b)^(p*q) using CRT?[/QUOTE] You're right (of course), I meant the modulus. The exponent is merely reduced modulo p-1 for each prime p dividing the modulus. |
[QUOTE=Mysticial;475961]
Assuming an arbitrary p with no special form that can be exploited Each square-mod would be: [LIST][*]Multiply-by-reciprocal of p.[*]Multiply-by-p and subtract.[/LIST][/QUOTE]Please explain what you mean by "reciprocal" in this instance. That's a serious question and is intended to prompt you into thinking more clearly about your algorithm at a somewhat greater level of detail. |
[QUOTE=xilman;476026]Please explain what you mean by "reciprocal" in this instance.
That's a serious question and is intended to prompt you into thinking more clearly about your algorithm at a somewhat greater level of detail.[/QUOTE] I think I forgot the square itself. So it's 3 forward/inverse FFT pairs. At each iteration: [LIST][*]Square the value: s = r0^2[*]Multiply square by 1/p to get quotient: q = Floor(s / p)[*]Multiply by p and subtract obtain new residual: r1 = s - q * p[/LIST]Precomputation steps would be: [LIST][*]Forward FFT of p with convolution bitlength 2N.[*]Forward FFT of 1/p with precision of N and convolution bitlength of 2N.[/LIST]Where N is slightly more than bitlength(p). |
[QUOTE=Mysticial;476033]I think I forgot the square itself. So it's 3 forward/inverse FFT pairs.
At each iteration: [LIST][*]Square the value: s = r0^2[*]Multiply square by 1/p to get quotient: q = Floor(s / p)[*]Multiply by p and subtract obtain new residual: r1 = s - q * p[/LIST]Precomputation steps would be: [LIST][*]Forward FFT of p with convolution bitlength 2N.[*]Forward FFT of 1/p with precision of N and convolution bitlength of 2N.[/LIST]Where N is slightly more than bitlength(p).[/QUOTE] So what you meant was modular multiplicative inverse ? |
[QUOTE=science_man_88;476035]So what you meant was modular multiplicative inverse ?[/QUOTE]Nice to see someone paying attention. :wink:
Please go on. (Both of you.) |
[QUOTE=science_man_88;476035]So what you meant was modular multiplicative inverse ?[/QUOTE]
Standard inverse - IOW, a floating-point number. ----- Actually, I think it's possible to do better than 3 forward/inverse FFT pairs of length 2N. That last operation (s - q * p) will have destructive cancellation of the upper-half of the result. Since you know what the upper-half will look like (the same as s = r0^2), you can halve the convolution length to allow the wrap-around. Then "undo" the wrap-around by using the upper-half of (s = r0^2). Not saying it'll be easy to implement. But unless I missed something, at least in theory, it should work out. Anyways, I digress since it's merely a (small) constant-factor improvement. |
| All times are UTC. The time now is 23:28. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.