mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2018-08-19, 15:16   #1
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

54758 Posts
Default 128bit mulmod request

I was wondering if those experienced in assembly programming could easily and quickly rewrite the standard 64bit mulmod into a 128bit mulmod using AVX/AVX2 instructions or if it is a big task?


Code:
uint64_t mulmod(uint64_t a, uint64_t b, uint64_t c) {
  uint64_t d; //  to hold the result of a*b mod c
  //  calculates a*b mod c, stores result in d
  asm ("mov %1, %%rax;"        // put a into rax
       "mul %2;"               // mul a*b -> rdx:rax
       "div %3;"               // (a*b)/c -> quot in rax remainder in rdx
       "mov %%rdx, %0;"        // store result in d
       :"=r"(d)                // output
       :"r"(a), "r"(b), "r"(c) // input
       :"%rax", "%rdx"         // clobbered registers
      );
  return d;
}
either using the __uint128_t variable or just using high and low 64 bit for each variable:

__uint128_t mulmod(__uint128_t a, __uint128_t b, __uint128_t c)

or

__uint128_t mulmod(uint64_t ah, uint64_t al, uint64_t bh, uint64_t bl, uint64_t ch, uint64_t cl)
ATH is online now   Reply With Quote
Old 2018-08-19, 16:21   #2
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

63078 Posts
Default

Quote:
Originally Posted by ATH View Post
I was wondering if those experienced in assembly programming could easily and quickly rewrite the standard 64bit mulmod into a 128bit mulmod using AVX/AVX2 instructions or if it is a big task?
Is this to be used in a modular exponentiation loop? If so a Montgomery multiplication is probably a better idea.
bsquared is offline   Reply With Quote
Old 2018-08-19, 18:34   #3
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

10208 Posts
Default

Quote:
Originally Posted by ATH View Post
I was wondering if those experienced in assembly programming could easily and quickly rewrite the standard 64bit mulmod into a 128bit mulmod using AVX/AVX2 instructions or if it is a big task?
Does AVX2 have support for 128-bit integers? And for integer divides/remainders (no matter the size)?
ldesnogu is online now   Reply With Quote
Old 2018-08-19, 18:51   #4
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

54758 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
Does AVX2 have support for 128-bit integers? And for integer divides/remainders (no matter the size)?
I'm not sure, I found 128bit and 256bit vectors made of 4 or 8 32 bit integers, but I did not find 128 or 256 bit integers. I always assumed AVX2 had 128bit and 256bit for both integers and floating point (and AVX512 had 512bit) but maybe not.

I have never learned assembly programming so this would take me a very long time starting from scratch. I would actually like to learn assembly, but I'm not sure I'm up for it right now, so this thread is just to check if anyone already was so well versed in assembly they could do this in like 60 min.

I will look into Montgomery multiplication.

Last fiddled with by ATH on 2018-08-19 at 18:55
ATH is online now   Reply With Quote
Old 2018-08-19, 18:56   #5
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24·3·11 Posts
Default

If I remember correctly 64x64 -> 128 mul is coming with one of the numerous AVX-512 variants. I don't know a lot about x86 and I feel good that way
ldesnogu is online now   Reply With Quote
Old 2018-08-19, 20:37   #6
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DC716 Posts
Default

No vector x86 variant has vectorized division, and only one corner of AVX-512 can do vector 64-bit multiplies (with 64-bit low-half result).

gcc does implement the uint128_t type and has function calls for 128-bit multiply and divide, but not division with 256-bit numerator. If you are really after a modular multiply then Montgomery arithmetic is by far your best bet, especially if you are always dividing by the same number.

We have many experts here on Montgomery arithmetic; sample code that constructs 64-bit modular multiplies for a machine (Nvidia GPUs) without 64-bit operations can be found here. For each 64-bit n, you computed w = montmul_w(bottom word of n) and r = montmul_r(n). Then if a and b are in Montgomery form, you produce a*b mod n in Montgomery form by calling montmul64(a, b, n, w). To convert a number a into Montgomery form, compute montmul64(a, r, n, w) and to convert out compute montmul(a, 1, n, w). There are faster ways to do these things but this way leverages a few primitives only.
jasonp is online now   Reply With Quote
Old 2018-08-19, 21:08   #7
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

24·172 Posts
Default

A bit of background:

http://www.mersenneforum.org/showthr...197#post494197

Does anyone knows if Robert G had any though on this since he did the code for gap?
pinhodecarlos is offline   Reply With Quote
Old 2018-08-20, 00:47   #8
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

CC716 Posts
Default

Quote:
Originally Posted by jasonp View Post
No vector x86 variant has vectorized division, and only one corner of AVX-512 can do vector 64-bit multiplies (with 64-bit low-half result).
Just as crucially, none of the x86 vector extensions implements add with carry, except the one-and-done Knight's Corner instruction set. Emulating those ends up being almost as bad as the lack of 64x64 multiplies.

One of the recent additions to yafu's sieve of Eratosthenes was a vectorized Montgomery multiply (8 modmuls in parallel). But it is limited to 32-bit reductions as of now.

Last fiddled with by bsquared on 2018-08-20 at 00:47
bsquared is offline   Reply With Quote
Old 2018-08-20, 01:16   #9
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

20C016 Posts
Default

Quote:
Originally Posted by ATH View Post

I have never learned assembly programming so this would take me a very long time starting from scratch. I would actually like to learn assembly, but I'm not sure I'm up for it right now, so this thread is just to check if anyone already was so well versed in assembly they could do this in like 60 min.
I haven't used it in a while, and was bad when I tried, but the basics of loops is that they are implementable as if statements. Similar to how in PARI/GP you can do:
Code:
loop()=if(x<=1000,print(x);x++;eval(self())
and then define a value of x, and call loop() and print ( or other things) in a loop you make by having the if statement call itself. yes you can also just call it by name similar to assembler. There of course many loops already now.

Last fiddled with by science_man_88 on 2018-08-20 at 01:16
science_man_88 is offline   Reply With Quote
Old 2018-08-20, 05:34   #10
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24·3·11 Posts
Default

Found this: http://www.acsel-lab.com/arithmetic/...a/1616a032.pdf


That's off-topic as it discusses AVX-512 IFMA (integer FMA) implementation of 1024-bit multiplication. IFMA adds 52x52 multiply + add with instructions to get low and high parts of the result.
ldesnogu is online now   Reply With Quote
Old 2018-08-20, 06:13   #11
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

23×32×19 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
A bit of background:

http://www.mersenneforum.org/showthr...197#post494197

Does anyone knows if Robert G had any though on this since he did the code for gap?
Don't have time for this, but for segmented sieve you would need to extend it from 32 bits to 64 bits, so at least actually this part isn't that tricky.
R. Gerbicz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bug request Uncwilly Software 5 2018-06-04 16:52
Request pinhodecarlos Lounge 3 2017-10-26 18:58
mulmod failing ATH Programming 4 2017-06-08 22:22
Bug/request Dubslow YAFU 4 2012-03-31 03:07
Odd request? Xyzzy Lounge 23 2011-03-08 17:50

All times are UTC. The time now is 14:52.

Tue Aug 4 14:52:44 UTC 2020 up 18 days, 10:39, 0 users, load averages: 1.61, 1.38, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.