![]() |
|
|
#12 | |
|
"Gang aft agley"
Sep 2002
2·1,877 Posts |
I deleted some code I had in this message. I spun it by hand and it looked wrong.
In message 5, I wrote in error Quote:
qinvj = q * (2 - (q *qinv j-1)) Last fiddled with by only_human on 2012-07-30 at 01:25 Reason: no delete option showed in my browser. Code is botched, need to review |
|
|
|
|
|
|
#13 |
|
"Gang aft agley"
Sep 2002
375410 Posts |
Ok, I can now show how the initial guess of qinv = 3q ⊕ 2 is good for 5 bits.
There are a couple of rough spots and it is not elegant yet but I will lay it out and then the patterns should be more apparent. First b0 is the least significant bit of n. n = 2t + b0 Next q must be an odd number. Even numbers are preprocessed outside this algorithm. q = 2n + 1 ⊕ means exclusive-or. The first guess for qinv: qinv = 3q ⊕ 2 q * qinv = xxxxx00001 (it has at least five good bits) So now I will try to show why qinv = 3q ⊕ 2 = 3(2n + 1) ⊕ 2 = (6n + 3) ⊕ 2 = (6n + 2 + 1) ⊕ 2 = (2(3n + 1) + 1) ⊕ 2 The point of separating it like this is so that I can move the exclusive-or. Normally I could not do something like this because the result of +,⊕ is different from ⊕,+ but in this case I can. In this specific case, the exclusive-or affects a single bit of a number (bit 1). Only the even term is affected so I can move the exclusive-or to the left. = 2(3n + 1) ⊕ 2 + 1 In this step the xor still acts on the same bit as before (note scaling on the left) = 2((3n + 1) ⊕ 1) + 1 The first thing to know here is that the least significant bit of 3n is the same as the least significant bit of n. The next thing to know is that result of (b0 + 1) ⊕ 1 = 2b0 These two add operations leave b0 unchanged. The first add will toggle b0 and cause a carry out if b0 was previously a 1. The second add (xor) operation can't do a carry; it merely restores b0 to its initial value. So the only result of the two different add operations is the carry out which equals 2b0 This was our previous step: = 2((3n + 1) ⊕ 1) + 1 Now getting rid of the xor: = 2(3n + 2b0) + 1 = 6n +4b0 + 1 Ok, we are done rearranging qinv. Now we want to multiply by q and see what result we get: q(6n +4b0 + 1) = (2n + 1)(6n +4b0 + 1) = 12n2 + 8n + 8nb0 + 4b0 + 1 The first case is if b0 = 0: = 12n2 + 8n + also because in this case b0 is 0, n is even so call it 2t. = 12(2t)2 + 8(2t) + 1 = 12(4t2) + 8(2t) + 1 = 48t2 + 16t + 1 = 16(3t2 +t) + 1 but (3t2 + t) is even so = 32 ((3t2 + t)/2) + 1 and this case generates at least 5 good bits. The next case only generates 5 good bits, never more. Case b0=1 = 12n2 + 8n + 8nb0 + 4b0 + 1 = 12n2 + 16n + 4 + 1 = 4(3n2 + 4n + 1) + 1 = 4((3n2 + 1) + 4n) + 1 now (3n2 + 1) is divisible by four. The reason for this is a little tricky and is the reason this path never gives more than 5 good bits. Since b0 is = 1, n is odd. Now an odd number squared ≡ 1 (mod 8). So the last three bits of it look like this 0012. And the last three bits of 3n2 when n is odd will be 0112; adding 1 to that results in 1002 (which is divisible by 4). = 4((3n2 + 1) + 4n) + 1 =16((3n2 + 1)/4 +1) + 1 But one more power of two can be factored out: 1002/4 = 1, adding that other 1 in the parenthesis results in 2. Factoring that out gives: 32(((3n2 + 1)/4 +1)/2) + 1 And that is 5 good bits, but only 5 bits. the next higher bit has to be a 1 in this path. Last fiddled with by only_human on 2012-07-31 at 12:27 |
|
|
|
|
|
#14 |
|
"Gang aft agley"
Sep 2002
375410 Posts |
One thing to add here is the math in the preceding message has an error in it. Actual computations show that cases where the q * qinv initially has 5 good bits do not match my math. So a mistake exists that I will now try to find.
Initially I say: First b0 is the least significant bit of n. n = 2t + b0 Next q must be an odd number. Even numbers are preprocessed outside this algorithm. q = 2n + 1 also q = 2(2t + b0) + 1 = 4t +2b0 + 1 = 22t + 21b0+ 20 Before looking at the data, I will mention that I combined two different algebras in my effort here but assured myself that it would be acceptable. A later thought was that when I ultimately (symbolically) split computation paths based on the b0 variable I used an intrinsically hidden boolean And function that (in this case) should be equivalent in both algebras. What might occur in other instances need not matter, because in this case I decide based on b0 multiplied by (exactly) some power of 2. The decision line: = 12n2 + 8n + (8b0)n + (4b0) + 1 = 12n2 + 8n + 8(1*b0)n + 4(1*b0) + 1 So I will need to look at that more closely to see if I made a bad assumption there. One thing I notice is that the least significant bit of n is the same as b0, so in this term: 8(1*b0)n, I need not include "n" within the boolean And. I don't actually know of any spot that is problematic and these what-ifs are beginning to get excessive, so I need to regroup and look at actual data. So in my next post I will lay out some bit patterns and try to see what went wrong. Last fiddled with by only_human on 2012-07-31 at 20:36 |
|
|
|
|
|
#15 | |||
|
"Gang aft agley"
Sep 2002
2·1,877 Posts |
Ok, I have identified a wrong assumption in my math:
Quote:
Quote:
Quote:
t is unrestricted (may be even or odd) So I need to consider that t = 2ua t2 = 22ua2 Anyway, that's what I think so now I will stare into these patterns and see what I see. Last fiddled with by only_human on 2012-07-31 at 22:39 |
|||
|
|
|
|
|
#16 |
|
"Gang aft agley"
Sep 2002
2·1,877 Posts |
So what does any of this mean?
Well I think the math in message 13 is either correct or close to correct. It shows how 5 bits are good from an initial qinv value produced by the 3q xor 2 trick. Where this all goes off the rails is in trying to look further into the pattern of bits. That portion runs noisy and fruitless so far. I'm going to take a break and look at it later. |
|
|
|
|
|
#17 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72×131 Posts |
Well, you have a working 2-adic algorithm here and are trying to find the right way to initialise it; I found that initialising with the correct bottom eight bits (IE a lookup in a 128-byte table) gets you the right 64-bit answer in no more than three iterations and that was probably enough.
Also I tried initialising with the correct bottom four bits, but I noticed that they were x^(((x>>1)^(x>>2)&1)<<3) which takes longer to calculate than the Montgomery five-bits-accurate 3n^2; that took no more than four iterations. |
|
|
|
|
|
#18 |
|
"Gang aft agley"
Sep 2002
EAA16 Posts |
I am going to do a reboot of message #13 and try not to wander off into the weeds this time.
Assume positive values. q is odd, so: q = 2n + 1 3q = 6n + 3 qinv = 3q ⊕ 2 qinv = (6n + 3) ⊕ 2 6n is even so adding 1 to it will not result in any carries between bits so do that step: qinv = ((6n + 1) + 2) ⊕ 2 This next step is an interesting and useful trick ( '& ' means the boolean 'and' operation): (a + b) ⊕ b = a + 2(a & b) The reason this is so is because '+' is an arithmetic add and can cause carries to the left (hence the 2), the '⊕' is another add operation but the bits are in F2 so there are no carries. The two different add operations in this order restore 'a' to its initial value except for the carries from the first add, which are still added to 'a'. so looking at qinv again: qinv = ((6n + 1) + 2) ⊕ 2 = ( 6n+1) + 2((6n + 1) & 2) In this case, the second term is a single bit value and I wish to simplify. n = 2t + b b ∈ {0,1} This breaks out the least significant bit value of n. By careful checking I can establish that 2((6n + 1) & 2) evaluates as 4b So the initial qinv value can be restated as qinv = 6n + 1 + 4b This is in agreement with my previous walk-though in message #13, but this time I am more confident and not confused about boolean versus regular algebra. Next I want to evaluate q * qinv so this is a good breaking point. Last fiddled with by only_human on 2012-08-01 at 06:48 |
|
|
|
|
|
#19 | |
|
"Gang aft agley"
Sep 2002
72528 Posts |
Quote:
What spurred my interest for this thread is I am surprised that the Montgomery trick is so good for an initial value. Also I want to manipulate this a little so that I can better understand why (and when) the first q * qinv value is sometimes better than 5 bits. Also, when doing iterations starting from the Montgomery value, q* qinv accumulates a string of contiguous 1's to the left of the converged 000..001 value. I hope that by manipulating the algebra I will be able to understand why that happens. |
|
|
|
|
|
|
#20 |
|
"Gang aft agley"
Sep 2002
2×1,877 Posts |
qinv = 6n + 4b + 1
So now that we have the initial qinv in algebraic form, multiply by q to evaluate the quality of the initial qinv value as an inverse to q q * qinv = q(6n +4b + 1) = (2n + 1)(6n +4b + 1) = 12n2 + 8n + 8nb + 4b + 1 This product is a value ending in 0...01. I'm going to try a different approach this time to simplify bookkeeping: qual = q * qinv - 1 This way I do not have to drag along the trailing one during the manipulation. qual = 12n2 + 8n + 8nb + 4b So I know in advance that 32 can be factored out of this quantity and I am investigating how that is so. The first idea is to replace all n's with 2t + b because maybe I can group terms. 12(2t+b)2 + 8(2t+b) + 8b(2t +b) + 4b 48t2 + 48tb + 12b2 + 16t + 8b +16tb +8b2 + 4b 48t2 + 16t + 20b2 + 12b + 64tb 32((3t2 + t)/2) +64tb + 20b2 + 12b One thing that I didn't notice right away is that since b ∈ {0,1}, b2 = b, so replace b2 by b and group those terms 32((3t2 + t)/2) +64tb + 32b Isn't that interesting? Here we can see the five good bits all in one simple spot without all the twisted logic of before. So now I can just look at this and see what behavior emerges and then after gaining understanding, I can move on to iterations of the converging algorithm. To be pedantic, lets put the scaling factor on the left: qual = 25(((3t2 + t)/2) + 2tb + b) To restate these variables, q = 2n + 1 n = 2t + b b ∈ {0,1} So q breaks down into three fields t:b:1 b,1 are single bits, t is the rest of the number Last fiddled with by only_human on 2012-08-01 at 10:00 |
|
|
|
|
|
#21 |
|
"Gang aft agley"
Sep 2002
2·1,877 Posts |
Ok, I worked out how to pick a q that will have a lot of 0s when multiplied by the initial qinv value of 3q xor 2.
To recap the names I am using: The least significant bits of q are b and 1 respectively. q looks like this t:b:1 q = 4t + 2b + 1 b ∈ {0,1} The initial qinv is 3q xor 2 q * (3q xor 2) = 25((((3t2 + t)/2) + 2tb + b) + 1 Ignoring the lower 5 bits, the 0s in the next bits are from an even coefficient: When b = 0, t(3t+1)/2 Either t or (3t + 1) will be even (never both). The even coefficient will determine how may 0s based on the power of its factor of 2. When b = 1, (3t + 2)(t + 1)/2 Either (3t + 2) or (t + 1) will be even (never both). The even coefficient will determine how may 0s based on the power of its factor of 2. Combining the two cases: (t + b)(3t + b + 1)/2 Again, only one of the two coefficients will be even (never both). The power of its factor of 2 of the even coefficient will determine the number of 0s Next I will look at an iteration or two of the algorithm to look at the 1s that accumulate to the left of the converged 00..001 value. Last fiddled with by only_human on 2012-08-01 at 22:23 |
|
|
|
|
|
#22 | |
|
"Gang aft agley"
Sep 2002
EAA16 Posts |
Quote:
I felt that the first three bits of the initial qinv value were the same as q because an odd number squared is 1 (mod 8). But for the fourth bit I was not sure. My four-bit example early in this thread was actually using the 3n^2 trick ('^' is xor in C code). So looking at this directly calculated fourth bit of qinv I've noticed (((x>>1)^(x>>2)&1)<<3) matches what is in the fourth bit after a an odd number is squared. Fivemack xor's this value with x. That is interesting the xor with x is in a sense pre-adding into the fourth bit the value that would be there after a squaring (the xor is an add without carry) -- so the q * qinv result will be that fourth bit value added to itself (thus carried out of that position) leaving a 0 behind. I think maybe that is so. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Inverse of Smoothness Probability | paul0 | Math | 6 | 2017-07-25 16:41 |
| mi64: inverse does not exist | wreck | NFS@Home | 1 | 2016-05-08 15:44 |
| Inverse of functions | Raman | Math | 5 | 2011-04-13 23:29 |
| Inverse Laplace Transform | flouran | Math | 1 | 2010-01-18 23:48 |
| Google.com's unhealthy obsession with famous mathematical constants | ewmayer | Lounge | 3 | 2005-08-21 18:09 |