 mersenneforum.org Lurid Obsession with Mod Inverse
 Register FAQ Search Today's Posts Mark Forums Read  2012-07-29, 20:32 #1 only_human   "Gang aft agley" Sep 2002 2×1,877 Posts Lurid Obsession with Mod Inverse E. Mayer is working on a paper "Efficient Long Division Via Montgomery Multiply." Among the things presented is an efficient Mod Inverse algorithm. I've been interested in this algorithm and wanted to think aloud about it without being a serious distraction so I am moving my lurid obsession into the Miscellaneous Math forum. Here I will indulge myself with muttering and pattern idolotry and welcome people pointing out mistakes I make or offering suggestions.   2012-07-29, 20:56 #2 only_human   "Gang aft agley" Sep 2002 375410 Posts For a given odd number q, what is the inverse within a power of two modulus? q * qinv ≡ 1 (mod 2b) for b = 1: q = 1, qinv = 1 for b = 2: q = 1, qinv = 1 ; the inverse 1 is always 1 q = 3, qinv = 3 ; 2b - 1 is ≡ - 1 mod (2b) 1 is its own inverse; -1 is its own inverse for b = 3: q = 1, qinv = 1 q = 3, qinv = 3 q = 5, qinv = 5 q = 7, qinv = 7 These are the smallest valid modular inverses and can also be trivially explained by noting that any odd number squared is ≡ 1 (mod 8) The mod operation is intended to take advantage an intrinsic computer data width as part of the computation. Any reduced residue system should work, so many different values of qinv might achieve the desired result using a particular bit width to truncate multiplication results.   2012-07-29, 21:11 #3 only_human   "Gang aft agley" Sep 2002 2×1,877 Posts for a bit width (b) of 4 bits, I calculated the q,qinv to be (1,2),(3,11),(5,13),(7,7),(9,9),(11,3),(13,5),(15,15) looking at q * qinv in binary, the results are: 0000 0001 0010 0001 0100 0001 0011 0001 0101 0001 0010 0001 0100 0001 1110 0001 Looking at the least significant bits we are preparing to begin discussion of an algorithm presented in E. Mayer's paper that over successive iterations converges the q * qinv value to 1 for a modulo a particular bit width.   2012-07-29, 21:36 #4 science_man_88   "Forget I exist" Jul 2009 Dumbassville 26×131 Posts I made a pari code for the brute force method to get these values: Code: for(b=4,4,forstep(q=1,2^b-1,2,print(((2^b+1)/q)%(2^b)","b","q)))   2012-07-29, 21:39 #5 only_human   "Gang aft agley" Sep 2002 2×1,877 Posts In the last post, 1 will always be the correct value of inverse for 1. The above 8 qinv values are valid inverses for varying maximum bit widths: ∞, 5, 6, 4, 4, 5, 6, 5. One point to note is that these widths are from the q * qinv result. E. Mayer's paper presents an algorithm that will always converge on a proper qinv value as long as the initial guess has at least the least significant bit correct. Since these are all odd numbers, that bit is always 1. Considering the valid bits to be the bits of the product q * qinv that end in 0...01, the algorithm is quadratic in the sense that each iteration doubles number of valid bits. Each qinv value is calculated in successive approximation qinvnew = q * (2-qinvold) He does an optimization on the last step but I skip talking about that for now. Last fiddled with by only_human on 2012-07-29 at 22:05 Reason: Deleted a simplification that I want to double check   2012-07-29, 21:41   #6
only_human

"Gang aft agley"
Sep 2002

2×1,877 Posts Quote:
 Originally Posted by science_man_88 I made a pari code for the brute force method to get these values: Code: for(b=4,4,forstep(q=1,2^b-1,2,print(((2^b+1)/q)%(2^b)","b","q)))
Ok, thanks. I will look at that and more after getting a bit more text into the thread.   2012-07-29, 21:57 #7 only_human   "Gang aft agley" Sep 2002 2·1,877 Posts Now, even if the number of converged bit doubles per iteration, the first few iterations seem particularly pointless because only a small number of bits will be calculated. Usually in these cases, a table lookup might do the trick. Picking a particular size table lookup is a bit problematic because doubling the table size only saves one iteration of the algorithm and the algorithm is already very fast) E. Mayer presents a good initial guess method that he received in a private communication from P. Montgomery. Using qinv= (3 * q) XOR 2, values will be valid for at least 5 bits. That saves a couple of iterations from the onset (if it were good for 8 bits it would save another iteration). I find the pattern of this interesting and these aspects are definitely stepping into the subforum of Miscellaneous Math's navel gazing but I intend to look a bit further and entertain myself without looking too stupid (if possible). That is the direction I am following at the moment. Last fiddled with by only_human on 2012-07-29 at 21:59   2012-07-29, 22:40 #8 only_human   "Gang aft agley" Sep 2002 EAA16 Posts Ok, I have just been working on paper and won't get much further that way but I thought I would lay down the hen scratching of what the initial guess looks like using that 3q xor 2 trick. Here are the initial values for 5 bit q values. (binary for q*qinv, (q, qinv), with of value of 1 at the end of product): Code: 0000 0000 0001 (1,1),∞ 0000 0010 0001 (3,11),5 0000 0100 0001 (5,13),6 0000 1010 0001 (7,23),5 0000 1110 0001 (9,25),5 0001 1000 0001 (11,35),7 0001 1110 0001 (13,37),5 0010 1100 0001 (15,47),6 0011 0100 0001 (17,49),6 0100 0110 0001 (19,59),5 0101 0000 0001 (21,61),8 0110 0110 0001 (23,71),5 0111 0010 0001 (25,73),5 1000 1100 0001 (27, 83),6 1001 1010 0001 (29,85),5 1011 1000 0001 (31,95),7 Looking at the binary, q * qinv always ends with at least five bits of a value of 1. Of the 16 lines, 8 of them are have a 5 bit width, 4 have a width of 6 bits , 2 have a width of 7 bits , 1 has width of 8 bits and lastly the product of 1 * 1 has the maximum width of the representation used. Looking at the top half of the table to compare qinv calculations to those I calculated in earlier messages for a q value limited to 4 bits, it can be compared to the inverse calculations I did in posts 3 and 5 of this thread. The width of the 1 value from the product of q * qinv for the first 8 values of this table (using the 3q xor 2 trick) are: ∞, 5, 6, 5, 5, 7, 5, 6 versus the values I calculated earlier: ∞, 5, 6, 4, 4, 5, 6, 5. This is interesting because, using the trick, half of the values are 5 bits and each subsequent higher bit with has half as many occurrences as the previous lower bit width. That is not true for the for the q * qinv products I calculated in earlier messages for a four bit q value. Last fiddled with by only_human on 2012-07-29 at 23:13 Reason: deleted word converged. No iterations have be run from initial guess as yet   2012-07-29, 23:39   #9
only_human

"Gang aft agley"
Sep 2002

1110101010102 Posts Now I reach the the motivation for spawning this thread. From E. Mayer's thread about his paper, I have these things that I want to consider:
Quote:
 Originally Posted by ewmayer Possibly - I did some more data mining just now and found that the number of good bits in qinv_0 resulting from the (3q ^ 2) trick depends on q (mod 16): 1/3/5/7/9/b/d/f give >= 6/5/6/5/5/6/5/6 good bits in qinv_0, respectively. That is surely a consequence of some fairly simply bit-pattern stuff going on. More interestingly, the mod-16 values which give "at least 5 bits" (3,7,9,13) *never* seem to give more than 5, whereas the ones which give "at least 6" (1,5,11,15) all give 6,7,8,... good bits with probability halving with each additional "lucky" bit of precision. Feel free to go nuts on this ... maybe you can write a paper on it, or something. :) I need to wrap this part up for now so I can focus on what I'd really intended to do this week, namely the loop-folding optimizations of the remaindering and quotient algorithms for the 64-bit case. (But one should always welcome interesting distractions - I had one such happen to me back in May, and it turned into the nascent paper. :)
Quote:
 Originally Posted by Random Poster There seems to be an interesting pattern here: Code: q (mod 16) q^2 (mod 32) bits 0001 or 1111 00001 >=6 0011 or 1101 01001 5 0101 or 1011 11001 >=6 0111 or 1001 10001 5 So the number of good bits depends on whether the leftmost two bits of q^2 (mod 32) are the same or different. I guess that more than 6 bits corresponds with the bit being repeated more than twice in q^2 (3 times for 7 bits, 4 times for 8 bits and so on).
So my curiosity is piqued about the number of good bits.

Also it has been noticed that after iterating the algorithm a few times, a region of 1 bits accumulates immediately to the left of the converged 0..01 value. I think that this region of 1s grows 1 bit wider per iteration. I want to look at that too.

My thinking is that these 1s accumulate as a sort of overflow which shouldn't optimally happen. Now the single bit immediately to the left of the 0..01 value has to be 1 because otherwise it would be part of the 0...01 value. But, if this region of 1s is growing 1 bit wider per iteration, then from a crude hand wave, each bit in the q * qinv product corresponds to half a bit of q or qinv because log2(q *qinv) = log2q +log2qinv. Not really because q is fixed but the idea is that a n-bit * n-bit multiply produces a 2n-bit result; so if the 2n-bit result has a region of 1s that is 1 bit wider per iteration, something is at least half a bit wrong (or could be improved perhaps by rounding) in the qinv value per iteration.

Last fiddled with by only_human on 2012-07-30 at 00:09   2012-07-29, 23:49   #10
only_human

"Gang aft agley"
Sep 2002

2×1,877 Posts Quote:
 Originally Posted by science_man_88 I made a pari code for the brute force method to get these values: Code: for(b=4,4,forstep(q=1,2^b-1,2,print(((2^b+1)/q)%(2^b)","b","q)))
Is this the Pari implemention compiled for windows at http://pari.math.u-bordeaux.fr/ that I should try?
Quote:
 Basic GP binary for Windows This is a very basic GP binary, without online help, or any optional package (but they can be downloaded separately). Stable version: gp-2-5-1.exe, 4060 KBy, Feb 6 19:15:44 2012 md5sum: 1a3b59ac9e4ee2d0ec2dc3004a4e1fb0 gp-2-5-1.exe

Last fiddled with by only_human on 2012-07-30 at 00:05   2012-07-29, 23:56   #11
science_man_88

"Forget I exist"
Jul 2009
Dumbassville

100000110000002 Posts Quote:
 Originally Posted by only_human Is this the Pari implemention compiled for windows available http://pari.math.u-bordeaux.fr/ that I should try?
you don't have to try it but it ( my given code) should work to your first values almost no matter what version of pari you use since most of it is a basic for loop setup so the versions it works on should mainly depend on when the divisors function and % for mod came in.

one thing that crossed my mind is that q*qinv=1 mod 2^b can be rewritten as q*qinv = z*2^b+1 for known q and b is there a way to find something about the other values ?

edit 2: I put in another mod in my printing in a newer version since it the values get over a certain value they no longer look like mod values

Last fiddled with by science_man_88 on 2012-07-30 at 00:04   Thread Tools Show Printable Version Email this Page Similar Threads Thread Thread Starter Forum Replies Last Post paul0 Math 6 2017-07-25 16:41 wreck NFS@Home 1 2016-05-08 15:44 Raman Math 5 2011-04-13 23:29 flouran Math 1 2010-01-18 23:48 ewmayer Lounge 3 2005-08-21 18:09

All times are UTC. The time now is 06:53.

Fri Jun 5 06:53:09 UTC 2020 up 72 days, 4:26, 0 users, load averages: 0.88, 1.03, 1.03