 mersenneforum.org Trial Factoring Elimination up to 2^64 Project
 Register FAQ Search Today's Posts Mark Forums Read  2014-04-10, 08:00   #12
tapion64

Apr 2014

5×17 Posts Quote:
 Originally Posted by retina I'd rather use a deterministic method since it should be just as fast on such a small number. Technically we could use msieve for the test to screen, and then check with GIMPS to see if any data for the exponent exists, since GIMPS knows all the primes less than 2^32.   2014-04-10, 08:20   #13
jnml

Feb 2012
Prague, Czech Republ

A416 Posts Quote:
 Originally Posted by tapion64 I'd rather use a deterministic method since it should be just as fast on such a small number. Technically we could use msieve for the test to screen, and then check with GIMPS to see if any data for the exponent exists, since GIMPS knows all the primes less than 2^32.
M-R test is deterministic in that range if you use proper bases: http://miller-rabin.appspot.com/   2014-04-10, 08:27 #14 LaurV Romulan Interpreter   Jun 2011 Thailand 34×109 Posts So, to have an idea about the involved effort. Divide this by thousands, considering that pari is quite a slow animal comparing with the gpu, but the proportions remain. As I said in the post from the parallel thread, you need an efficient algorithm to factor small numbers very fast, which you don't have. You will lose the most of the time in factoring q-1, for every q. After that, you don't need to effectively find the order, but it is much faster to check, for all prime factors p of q-1, which are in the range of p's you are interested (that is from 0 to 1 billion, GIMPS range, or to say 2^32), if q divides 2^p-1 by exponentiation. This is faster, but yet too slow comparing with the algorithms used by GIMPS, when q gets higher. To try, see this code: Code: `/* "random" trial factoring, specify a starting possible factor and the outfput file */ ratf(a,fis)= { if(fis==0, print("Specify an output file!"); return(0)); while(1, until(a%8==1 || a%8==7,a=nextprime(a+1)); c=factorint(a-1)[,1]~; for(i=1,#c, if(c[i]<1000000000 && Mod(2,a)^c[i]==1, print(a","c[i]); write(fis,a","c[i])) ) ); } /*same as above, but split the output to different files depending on n and k -if a=-1, it will read the last_k from the file, assuming the file has ONE LINE only -use the flags.txt file to gracefully stop it */ ratfpro(a,lastvalue,printstep)= { my(tnextpr,tmod,tfactint,telse,wa,sg); if(a==-1, a=read("fact_last_k.txt")); if(lastvalue<=0, error("There is a must you provide a maximum value of k.")); print("Starting/continuing from: "a); tnextpr=tfactint=tmod=telse=0; if(printstep==0,printstep=2^20); prst=a; gettime(); while(a<=lastvalue, telse+=gettime(); until(a%8==1 || a%8==7,a=nextprime(a+1)); tnextpr+=gettime(); if(a>prst, prst=a+printstep; printf("... %d, (%d digits, 2^%d++) --- Timers: %6.4f, %6.4f, %6.4f, %6.4f (seconds)\n", a,ceil(log(a)/ten),floor(log(a)/two), tnextpr/1000, tfactint/1000, tmod/1000, telse/1000); extern("del fact_last_k.bak"); extern("ren fact_last_k.txt fact_last_k.bak"); write("fact_last_k.txt",a-1); shouldstop=read("flags.txt"); if(shouldstop==1,break) ); if(a>lastvalue,print("Candidate list is finished now."); break); telse+=gettime(); c=factorint((a-1)>>1)[,1]~; tfactint+=gettime(); for(i=1,#c, if(c[i]<10000000000, telse+=gettime(); if(Mod(2,a)^c[i]==1, tmod+=gettime(); print(a","c[i]" "); if(c[i]<1000000000,fis="fact_0B_", if(c[i]<2000000000,fis="fact_1B_", if(c[i]<3000000000,fis="fact_2B_", if(c[i]<4000000000,fis="fact_3B_", if(c[i]<5000000000,fis="fact_4B_", if(c[i]<6000000000,fis="fact_5B_", if(c[i]<7000000000,fis="fact_6B_", if(c[i]<8000000000,fis="fact_7B_", if(c[i]<9000000000,fis="fact_8B_", fis="fact_9B_" ) ) ) ) ) ) ) ) ); expo=37; k=1<lastvalue,break); if(a>prst, prst=a+printstep; printf("... %d, (%d digits, 2^%d++) --- Timers: %10.4f, %10.4f, %10.4f, %10.4f seconds\n", a,ceil(log(a)/ten),floor(log(a)/two), tnextpr/1000, tznord/1000, tprchk/1000, telse/1000); write("fact_last_k",a)); telse+=gettime(); c=znorder(Mod(2,a)); tznord+=gettime(); if(isprime(c), tprchk+=gettime(); if(c<1000000000,fis="fact_0B_", if(c<2000000000,fis="fact_1B_", if(c<3000000000,fis="fact_2B_", if(c<4000000000,fis="fact_3B_", if(c<5000000000,fis="fact_4B_", if(c<6000000000,fis="fact_5B_", if(c<7000000000,fis="fact_6B_", if(c<8000000000,fis="fact_7B_", if(c<9000000000,fis="fact_8B_", if(c<10000000000,fis="fact_9B_", telse+=gettime(); next)))))))))); expo=40; k=1< 2014-04-10, 13:25 #15 tapion64   Apr 2014 5·17 Posts The Go code I referenced in the other thread used Bach Shallit's algorithm, which factors q-1 so that it can do exactly what you're saying in order to calculate the order. Otherwise there's no point in factoring the number. And we have a fast way to factor the numbers, msieve has CUDA implementations that factor 30+ digit numbers in no time (the timer literally says 00:00:00). Factoring isn't the bottle neck on this, it's the insanely large number of primes we need to test. I need to learn more about the logic behind Atkin's sieve before I attempt to modify it, but I did a traditional Eratosthenes sieve coupled with +/- 1 mod 8, and we can really only get up to 23. The factor we reduce by when adding a new prime to the sieve is (p-1)/p, which has diminishing returns. 23 gets you to around .085, which is just a bit less than 4 times the n/ln(n) approximation for the number of primes in the 2^64 to 2^65. One option would be to try a 2 step sieve, although I think modifying Atkin's sieve will probably yield better results. Even so, if we assume the factoring is fast enough for numbers in that range (msieve says it is), that just means it takes 4 times longer, which really isn't /that/ bad. Also, these are based on what a 560 Ti can do. A faster GPU with more cores can increase the factoring speed and the resulting calculation of the order. Even if the latest cards in the market can't do this quite yet, give it a few years and they will. Last fiddled with by tapion64 on 2014-04-10 at 13:33   2014-04-10, 18:46 #16 tapion64   Apr 2014 5516 Posts So I went ahead and read the algorithm and reasoning for why Atkin's works, and then I ignored all of that and just multiplied everything by 2 and sieved in respect to 120 instead of 60, taking out all numbers that weren't +/-1 mod 8. The result, surprisingly, appears to not only work, but eliminates the need for crossing off multiples of the squares in the final step. In other words, it appears to be a completely parallelizable algorithm that can start from any point in the set of integers instead of just starting with 1. Perhaps more interesting, this algorithm appears to be able to take a pre-sieved list with no problem, not just a contiguous section. Granted, I only ran the algorithm on 1 through 240 by hand, but since the cycle length is 120, you'd think that means the results are good throughout. I'll do some more extensive testing and see if I find discrepancies. It'd be really neat if this turned out to be a fast way to search for primes = +/- 1 mod 8. I think the need to solve for the solutions to the curves might impact larger numbers too much to be too feasible, though. If anyone's interested, the revised algorithm: 1. List all integers k to be sieved, marked as composite (well, composite or prime and not = +/-1 mod 8) 2. For all k:a. If k is congruent to {1, 17, 41, 49, 73, 89, 97, 113} mod 120, find the number of solutions d to 4x^2 + y^2 = k, where x,y > 0. If d is odd, mark k as prime = +/- 1 mod 8. b. If k is congruent to {7, 31, 79, 103} mod 120, find the number of solutions d to 3x^2 + y^2 = k, where x,y > 0. If d is odd, mark k as prime = +/- 1 mod 8. c. If k is congruent to {23 47 71 119} mod 120, find the number of solutions d to 3x^2 - y^2 = k, where x > y > 0. If d is odd, mark k as prime = +/- 1 mod 8. d. If k is congruent to any other residue mod 120, leave k as marked as it is (in other words, composite or prime and not = +/- 1 mod 8) 3. You appear to be done! But the original algorithm says to square each element marked prime and then mark multiples of the square as composite. I didn't have to do this, but that doesn't mean it's not necessary for other test sets. EDIT: Now that I think about it, this algorithm should certainly be extendable to cover 7, 11, 13, and so on, by multiplying by 120 and increasing the cycle range then sieving out the 'bad' elements in the congruency tests. It may not be necessary, but it might prevent needless curve solving, and it makes the problem set larger to be split up by cards with more cores. Last fiddled with by tapion64 on 2014-04-10 at 19:02   2014-04-10, 20:27 #17 tapion64   Apr 2014 5516 Posts Apparently after some amount of time you can't edit posts, so sorry for multiple posts. Some basic estimations of running time for the original Atkin's is O(N/log(log(N))). If we assume a clock rate of 1 GHz on the GPU and 384 cores and we're doing primes up to 2^65, that takes around 185 days (6 months). With the modified algorithm, we only look from 2^64 through 2^65, which halves the work, and we assume that atleast 1/2 of primes being filtered out as not +/-1 mod 8. Thus we're looking at about 1 and half months (possibly less, need to see how removing the square sieving affects running time) just to iterate through the primes. Since we previously assumed 1/2 the primes are filtered out, we're dealing with 208 quadrillion inputs. With the same clock rate and cores before, and exponential time using quadratic sieve for the 65 bit numbers, 1 core could factor roughly 2048 of them per second, gives us roughly 8306 years of computing time. Ok, yeah, that's a bottleneck. My earlier calculations were off by a factor of 100, which is enough. I should see if there's a better fit for calculating the multiplicative order that doesn't require factoring, possibly make a modified naive algorithm that takes into account the qualities of p which will work for numbers of this size. Last fiddled with by tapion64 on 2014-04-10 at 20:36   2014-04-10, 22:42 #18 tapion64   Apr 2014 5×17 Posts The solution was there all along. Since we only care if the multiplicative order is less than 2^32, naive multiply and subtract can be performed up to 2^31 and if we still haven't hit -1, we know that the order is larger than 2^32. Although probably use some squared exponentiation techniques and operate in [-(p+1)/2,(p+1)/2) to speed it up. No factoring, no ridiculous bottleneck, but this algorithm isn't as parallelizable... I'll keep looking for better options   2014-04-10, 23:53 #19 retina Undefined   "The unspeakable one" Jun 2006 My evil lair 22·1,447 Posts tapion64: Maybe you should write all this in a blog somewhere and when you've finally figured it all out come back and post here to get people to join you in your search.   2014-04-11, 01:30 #20 tapion64   Apr 2014 5×17 Posts Will do. I can't believe the site doesn't have perma-editable posts though.   2014-04-11, 02:45   #21
TheMawn

May 2013
East. Always East.

11×157 Posts Quote:
 Originally Posted by tapion64 Will do. I can't believe the site doesn't have perma-editable posts though.
It's different from what other places do, but I think it adds an element of permanence I think is more valuable. I've seen posts proven wrong changed weeks after being first written because the original poster wanted to save face.

EDIT: On the other hand, it appears there are no Crusades against double-posting here, unlike the same forums I am referring to.

Last fiddled with by TheMawn on 2014-04-11 at 02:47 Reason: permanenc3 is not a word   2014-04-11, 03:53 #22 LaurV Romulan Interpreter   Jun 2011 Thailand 34·109 Posts Now you got to more realistic times :D Related to your question, there are many ways to get that order, without factoring. Unfortunately none of them is faster. One very simple (but still exponential) method is extremely easy to implement in binary, as it only needs additions and shifts: Starting with 1: -- add q -- count the (binary) zeroes at the end, and eliminate them -- repeat the two steps above until you get 1. Example: say q=5, which in binary is 101, so 101+1=110, we have 11 and a zero. -- 11+101=1000, we have 1 (so we stop) and additional 3 zeroes, so we have a total of 4 zeroes. Therefore 5 divides 24-1, more exactly 5 divides all numbers of the form 24k-1, and none others (i.e. there is no other power x such as 2x-1 is divisible by 5) You can do this for all odd numbers (not necessarily prime) and there is a simple proof that you will always end in 1, after the right number of steps. say q=9, which in binary is 1001, so 1001+1=1010, we have 101 and a zero. -- 101+1001=1110, we have 111 and additional one zero, total 2 zeroes. -- 111+1001=10000, we have 1 (so we stop) and additional 4 zeroes, total 6 zeroes. Therefore, 9 divides 2^6-1, and all numbers of the form 26k-1 and none others. say q=11 (decimal), which in binary is 1011, so 1011+1=1100, we have 11 and 2 zeroes. -- 11+1011=1110, we have 111 and additional one zero, total 3 zeroes. -- 111+1011=10010, we have 1001 and additional 1 zero, total 4 zeroes. -- 1001+1011=10100, we have 101 and additional 2 zeroes, total 6 zeroes. -- 101+1011=10000, we have 1 (so we stop) and additional 4 zeroes, total 10 zeroes. Therefore, 11 divides 2^10-1, and all numbers of the form 210k-1 and none others. One which will interest us, where the result is prime: say q=23, which in binary is 10111, so 10111+1=11000, we have 11 and 3 zeroes. -- 11+10111=11010, we have 1101 and additional one zero, total 4 zeroes. -- 1101+10111=100100, we have 1001 and additional 2 zeroes, total 6 zeroes. -- 1001+10111=10000, we have 1 (so we stop) and additional 5 zeroes, total 11 (decimal) zeroes. Therefore, 23 divides 2^11-1, and all numbers of the form 211k-1 and none others. The same (last) calculus in decimal, involving "divide by 2 if it is even", would look like: q=23+1=24, even, we have q=q/2=12 and p=1 (one zero); q=12 even, we have q=6 and p=2; q=6 even we have q=3 and p=3; q=3 odd, etc... q=3+23=26, q=13, p=4; q=13+23=36; q=18, p=5; q=9, p=6; q=9+23=32; q=16, p=7; q=8, p=8; q=4, p=9; q=2, p=10; q=1, p=11; end. So, 23 divides 2^11-1. Next step, you have to do an ASIC to compute this (it won't be difficult, anyhow it is much simpler than the asics for sha256 for bitcoin mining, hehe, just few counters and shifters) but it will still be (much) slower than normal exponentiation, by squaring - but well... you do no factoring, neither multiplication nor division ). Last fiddled with by LaurV on 2014-04-11 at 04:01   Thread Tools Show Printable Version Email this Page Similar Threads Thread Thread Starter Forum Replies Last Post garo GPU Computing 100 2019-04-22 10:58 Stargate38 GPU Computing 9 2018-08-31 07:58 JFB Software 23 2004-08-22 05:37 michael Software 23 2004-01-06 08:54 gbvalor Math 4 2003-05-22 02:04

All times are UTC. The time now is 04:52.

Wed Oct 21 04:52:32 UTC 2020 up 41 days, 2:03, 0 users, load averages: 1.65, 1.68, 1.76