20110619, 13:31  #1 
"Lucan"
Dec 2006
England
2·3·13·83 Posts 
TF algorithm(s)
I assume the "exponentiation" algorithm as described on the "Math" page
is as good as any. Say the TFs have 45 bits. The first five steps can be precalculated for a given exponent. For the rest of the bits in the exponent, one has to find a 90 bit number mod the TF. I accomplished this by shifting both right (or more likely selecting the appropriate storage bytes) and doing a 64/32 bit div (careful about the quotient overflowing), multiplying the quotient by the TF and then a tiny bit of "suck and see". Two questions: 1) Is this the best way of doing division on this size of number? 2) Is there someone (Euclid perhaps) who has a cleverer way of arriving at the remainder? David Last fiddled with by davieddy on 20110619 at 13:43 
20110619, 14:42  #2 
Romulan Interpreter
"name field"
Jun 2011
Thailand
23606_{8} Posts 
you can build a lookup table with 45 entries for every power of 2 bigger or equal to your TF, so your mod function is just maximum 46 additions mod TF. This is not the fastest method, but is the easiest (in fact, depending how fast you can add and search the bits on the 90bit result from 45 to 90, shift and check bit x, it can be the fastest too, long ago i tried to implement that on a virtex, but failed miserably due to insuficient knowlwdges of fpga design).
think like that, if you do LL, compute Sn=(Sn1)^22, then to to compute the Sn mod 2^n1 at every step, that is just adding the two parts of the result together (the lower n bits with the higher n bits) and check if you got 2^n1 (in this case the mod is 0, but when you repeat in the next step, it does not matter, if you compute the square, as in LL or exponentiation, then (x^2 mod n) = ((nx)^2 mod n) so I always select the one with less bits). same for general case: 11010110 (base2) mod 5 is (from right to left): 2+4+1+4+3=4 mod 5 Last fiddled with by LaurV on 20110619 at 14:46 
20110619, 15:42  #3  
"Lucan"
Dec 2006
England
2×3×13×83 Posts 
Quote:
Sounds "uncrankish" Will attempt to interpret it when sober (cf Winston Churchill). Meantime can I take this opportunity to repeat my plea for participants to divulge their age, and approx location*. "Experience" is optional, and quickly sussed. As I think Uncwilly said in another thread: Welcome to the House of Fun David *Default sex seems to be male more's the pity! Last fiddled with by davieddy on 20110619 at 15:45 

20110621, 08:56  #4  
"Lucan"
Dec 2006
England
2×3×13×83 Posts 
Quote:
I'm afraid I think you have got the wrong end of the stick there. We are not TFing a fixed 91 bit number. We are TFing 2^50xxxxxx  1, and 50M has 26 bits. The "exponentiation" algorithm (see the Math page) involves 26 steps of which the result of the first 5 can be precalculated if the TF bit range >47 bits, because the result (a power of 2) is less than the TFs. For the remaining 21 steps, the result has to be found mod TF. If the TF has 50 bits, the next step may produce 101 bits (squaring and doubling the unpredictable 50 bit result of the previous step). This is what needs to be "modded TF". No point in building a lookup table for just 21 steps is there? (Given a decent div instruction of course). David Last fiddled with by davieddy on 20110621 at 09:39 

20110621, 10:52  #5  
Romulan Interpreter
"name field"
Jun 2011
Thailand
2×5,059 Posts 
Quote:
You have to TF one m which is 2^p1. Say m=2^50xxxxxx, as you said, with 2030 bits on the exponent, but that does not matter. To do TF, you consider potential factors. Not my business how you compute that potential factors. I personally consider numbers of the form 8kp+sp+1 (with s=2 or 6, depending of p being 3 or 1 mod 4, respectively), or 8kp+1 which are 2SPRP as potential factors. It is easy to show that if a factor f is of the form 2kp+1 and 8x+1, then it is of the form 8kp+1, because then 4 divides k. It is also easy to show that if a factor is 2kp+1 and 8x1, that it has to be 8kp+2p+1 for all p that are 3 mod 4, and it has to be 8kp+6p+1 for all p that are 1 mod 4. So, the TF code could look more or less like that: 1. take next p prime (from the exponent's list). 2. s=2p (left shift 1 position), z=4s (left shift 2 positions) 3. if p=1 mod 4, then s=3s (just check last 2 bits of p, for multiplying by 3 is just a shift and add) 4. f=s+1 5. if f is not 2SPRP go to step 8 (much faster then primality check or sieving an 60 to 80 bit interval, and you can end up trying some factors which are not primes, but it really does not matter). 6. check if 2^p=1 mod f, by exponentiation 7. if so, factor found and it is f, stop. 8. s=zs 9. f=f+s 10. if not at the end of interval for factors that you want to check, go to step 5. This is the fastest method to generate potential factors, as you need to make only two additions and an PRP check to get to the next factor, without walking through all 2kp+1 numbers, assuming you do not have plenty of memory to store a sieved string of bits (as GIMPS is doing). As I said already, I tried to implement this stuff in "hardware", as my job is electronic design and MCU programming. A 4kp+1 or 12kp+1 or 20kp+1 could never be a factor of a mersenne number, for an odd k. So, for some p which is 1 mod 4 you need only to test 6p+1, 8p+1, 14p+1, 16p+1, 22p+1, 24p+1 and so on, and for some p which is 3 mod 4 you only need to test 2p+1, 8p+1, 10p+1, 16p+1, 18p+1, 24p+1 ... for SPRPness and eventually do an exponentiation. Now there could be a long discussion here, about the existence of the "+" factors (that is, factors of the form 8x+1) and "" factors (that is factors of the form 8x1). One composite m can not have only "+" factors, because their product would be 1 mod 8, and m is always 7 mod 8. So, if m is composite, it always has an odd number of "" factors (read: at least one) and it can possibly have no "+" factors. So, it could make sense that someone looks only for the factors of the form 8kp+sp+1, as above, therefore reaching higher "bit count" faster, but this could be a waste of resources if the only existent "" factor is huge. Nobody says the "" factor is the smallest, in fact is not, for example 2^291 has the factors 233, 1103, 2089, and the "minus" factor is 1103. Whatever... I assumed you were only asking here about step 6 in the algorithm. If not, then, yes, I got the wrong side of the stick. For your example, you can "precalculate" the first 5 steps (nothing to precalculate, just a 1 shifted 32 positions to the left). Then you square it (not my business how you do that!) and you have a result, call it w, which is to be computed "mod f" (same f from my step 6 above). Did I got it right up to now? Now, you start aligning your f to w, and doing a series of comparings and subtractions, in fact w is just a power of 2 in this step, and the enigma could be solved with a simple lookup in an array that stores "precalculated" powers of 2, modulo f. Say f is 70 bits (more realistic case, because you want to find factors in 2^60++ range for a 50M exponent). To generate the table you start with 2^70 (an 1 shifted 70 positions to the left), which is the first power of 2 bigger then your f, and you store 2^70f as the first entry in the table. This is right, because your f is 70 bits, so is higher then 2^69, therefore 2^70 can not contain 2f, it just contains 1f, so 2^70f is smaller then f. Then you do 70 times a shift left by 1 bit, and if the result is bigger then f, you subtract f and store it in the table. For that, you did 70 shifts and about (statistically) 55 subtractions. Now, the only problem you have with exponentiation is related to the squaring, because multiplying by 2 is just a shift left, a test (if w>f) and an eventual subtraction of f. So, for your 21 squares you will get numbers in range of 1 to 140 bits. That numbers, 21 of them, you have to compute mod f. For each, you have to do aligning f to the MSB and check if the result of the subtraction is not negative, in such case realigning f to the bit which is "next to MSB" in w (well, this you can do it in a more clever way, but just for the sake of the example), subtracting f, and if the result is still bigger then f, repeat. In the worst case, you make 70 alignchecksubtract cycles. Align is costlyest if not done clever. If you would have a table as built above, then you have to parse the bits in w from 70 to 140, and for each bit you find being 1 you have to add the value in the table to the modulus. If is bigger then f, you subtract f. In the worst case you make 70 additions and 70 subtractions, but you avoid the alignment process. This is "theoretical" worst case, because it would imply all bits being 1 from 70 to 140, and all additions being bigger then f, which will never happen. Statistically you have about half of the bits set, and about 65 percent of additions will require a subtract of f. Trust me, this is faster then the "classic", and only involves additions and subtractions and bit check. For our particular example, if you just make 7 operation (addition or subtraction of f) less on each of your 21 steps, then you already have 144 operations less comparing with the "clasic", and you spent 140 to generate the table, so you are already 4 operations faster. Think about it... Last fiddled with by LaurV on 20110621 at 11:07 

20110621, 12:58  #6  
"Lucan"
Dec 2006
England
2·3·13·83 Posts 
Getting on the same wavelength/hymnsheet
Any assistance from a third party in this direction would be appreciated.
(PM if necessary). Quote:
Quote:
I stressed out too:) Quote:
Quote:
Your way sounds crazy! Quote:
discussing how division is accomplished bit by bit in either software or hardware. Quote:
Quote:
top five bits of the exponent Quote:
Calculating 1/f might be worth it though. The whole question is hardware dependent anyway. David Last fiddled with by davieddy on 20110621 at 13:17 

20110621, 15:03  #7  
Romulan Interpreter
"name field"
Jun 2011
Thailand
2·5,059 Posts 
Quote:
Quote:
Last fiddled with by LaurV on 20110621 at 15:10 

20110621, 15:44  #8  
"Richard B. Woods"
Aug 2002
Wisconsin USA
2^{2}·3·641 Posts 
Help me understand here:
Quote:
If we're TFing 2^50xxxxxx1, isn't p a single fixed value (50xxxxxx) while we try a list of candidate factors 2kp+1? Or, are you proposing some shortcut for TFing a list of several exponents (50xxxxxx, 50xxxxyy, 50xxxxzz, ...)? Last fiddled with by cheesehead on 20110621 at 15:54 

20110621, 16:18  #9  
"Lucan"
Dec 2006
England
2·3·13·83 Posts 
Quote:
overemphasizing the difficulty (although not the usefulness) of doing it. You Really Got Me David PS Excuse my music links. They are a bit of a trade/birthmark, like the gerbils, snakes, fish and bamboozle trees 

20110621, 17:27  #10  
Romulan Interpreter
"name field"
Jun 2011
Thailand
23606_{8} Posts 
Quote:
(joke) of course in my mind was something more grandiose as "trial factoring all mersenne for all prime exponents" (hehe) but the algoritm went wrong, because in that case I should not say "stop" in step 7, but say "go to step 1" 

20110621, 18:35  #11  
"Lucan"
Dec 2006
England
2·3·13·83 Posts 
Quote:
That way it would make sense to do anything that facilitated doing "mod f" for fixed f. But how many "p"s satisfy f=2kp + 1 etc? David PS Do note that "choose p" was the outermost loop in LaurV's "program". Last fiddled with by davieddy on 20110621 at 19:01 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
new factorization algorithm  jasonp  Math  2  20120617 20:04 
Sieve algorithm  geoff  Twin Prime Search  5  20101107 17:02 
Is there an algorithm which solves this?  Unregistered  Homework Help  0  20070809 17:40 
Maybe new sieving algorithm  nuggetprime  Riesel Prime Search  5  20070420 04:19 
Prime95's FFT Algorithm  Angular  Math  6  20020926 00:13 