20180819, 15:16  #1 
Einyen
Dec 2003
Denmark
5475_{8} Posts 
128bit mulmod request
I was wondering if those experienced in assembly programming could easily and quickly rewrite the standard 64bit mulmod into a 128bit mulmod using AVX/AVX2 instructions or if it is a big task?
Code:
uint64_t mulmod(uint64_t a, uint64_t b, uint64_t c) { uint64_t d; // to hold the result of a*b mod c // calculates a*b mod c, stores result in d asm ("mov %1, %%rax;" // put a into rax "mul %2;" // mul a*b > rdx:rax "div %3;" // (a*b)/c > quot in rax remainder in rdx "mov %%rdx, %0;" // store result in d :"=r"(d) // output :"r"(a), "r"(b), "r"(c) // input :"%rax", "%rdx" // clobbered registers ); return d; } __uint128_t mulmod(__uint128_t a, __uint128_t b, __uint128_t c) or __uint128_t mulmod(uint64_t ah, uint64_t al, uint64_t bh, uint64_t bl, uint64_t ch, uint64_t cl) 
20180819, 16:21  #2 
"Ben"
Feb 2007
6307_{8} Posts 
Is this to be used in a modular exponentiation loop? If so a Montgomery multiplication is probably a better idea.

20180819, 18:34  #3 
Jan 2008
France
1020_{8} Posts 
Does AVX2 have support for 128bit integers? And for integer divides/remainders (no matter the size)?

20180819, 18:51  #4  
Einyen
Dec 2003
Denmark
5475_{8} Posts 
Quote:
I have never learned assembly programming so this would take me a very long time starting from scratch. I would actually like to learn assembly, but I'm not sure I'm up for it right now, so this thread is just to check if anyone already was so well versed in assembly they could do this in like 60 min. I will look into Montgomery multiplication. Last fiddled with by ATH on 20180819 at 18:55 

20180819, 18:56  #5 
Jan 2008
France
2^{4}·3·11 Posts 
If I remember correctly 64x64 > 128 mul is coming with one of the numerous AVX512 variants. I don't know a lot about x86 and I feel good that way

20180819, 20:37  #6 
Tribal Bullet
Oct 2004
DC7_{16} Posts 
No vector x86 variant has vectorized division, and only one corner of AVX512 can do vector 64bit multiplies (with 64bit lowhalf result).
gcc does implement the uint128_t type and has function calls for 128bit multiply and divide, but not division with 256bit numerator. If you are really after a modular multiply then Montgomery arithmetic is by far your best bet, especially if you are always dividing by the same number. We have many experts here on Montgomery arithmetic; sample code that constructs 64bit modular multiplies for a machine (Nvidia GPUs) without 64bit operations can be found here. For each 64bit n, you computed w = montmul_w(bottom word of n) and r = montmul_r(n). Then if a and b are in Montgomery form, you produce a*b mod n in Montgomery form by calling montmul64(a, b, n, w). To convert a number a into Montgomery form, compute montmul64(a, r, n, w) and to convert out compute montmul(a, 1, n, w). There are faster ways to do these things but this way leverages a few primitives only. 
20180819, 21:08  #7 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
2^{4}·17^{2} Posts 
A bit of background:
http://www.mersenneforum.org/showthr...197#post494197 Does anyone knows if Robert G had any though on this since he did the code for gap? 
20180820, 00:47  #8  
"Ben"
Feb 2007
CC7_{16} Posts 
Quote:
One of the recent additions to yafu's sieve of Eratosthenes was a vectorized Montgomery multiply (8 modmuls in parallel). But it is limited to 32bit reductions as of now. Last fiddled with by bsquared on 20180820 at 00:47 

20180820, 01:16  #9  
"Forget I exist"
Jul 2009
Dumbassville
20C0_{16} Posts 
Quote:
Code:
loop()=if(x<=1000,print(x);x++;eval(self()) Last fiddled with by science_man_88 on 20180820 at 01:16 

20180820, 05:34  #10 
Jan 2008
France
2^{4}·3·11 Posts 
Found this: http://www.acsellab.com/arithmetic/...a/1616a032.pdf
That's offtopic as it discusses AVX512 IFMA (integer FMA) implementation of 1024bit multiplication. IFMA adds 52x52 multiply + add with instructions to get low and high parts of the result. 
20180820, 06:13  #11  
"Robert Gerbicz"
Oct 2005
Hungary
2^{3}×3^{2}×19 Posts 
Quote:


Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Bug request  Uncwilly  Software  5  20180604 16:52 
Request  pinhodecarlos  Lounge  3  20171026 18:58 
mulmod failing  ATH  Programming  4  20170608 22:22 
Bug/request  Dubslow  YAFU  4  20120331 03:07 
Odd request?  Xyzzy  Lounge  23  20110308 17:50 