mersenneforum.org Trial Factoring Elimination up to 2^64 Project
 Register FAQ Search Today's Posts Mark Forums Read

2014-04-10, 08:00   #12
tapion64

Apr 2014

5·17 Posts

Quote:
 Originally Posted by retina
I'd rather use a deterministic method since it should be just as fast on such a small number. Technically we could use msieve for the test to screen, and then check with GIMPS to see if any data for the exponent exists, since GIMPS knows all the primes less than 2^32.

2014-04-10, 08:20   #13
jnml

Feb 2012
Prague, Czech Republ

2·3·31 Posts

Quote:
 Originally Posted by tapion64 I'd rather use a deterministic method since it should be just as fast on such a small number. Technically we could use msieve for the test to screen, and then check with GIMPS to see if any data for the exponent exists, since GIMPS knows all the primes less than 2^32.
M-R test is deterministic in that range if you use proper bases: http://miller-rabin.appspot.com/

 2014-04-10, 08:27 #14 LaurV Romulan Interpreter     "name field" Jun 2011 Thailand 2×31×163 Posts So, to have an idea about the involved effort. Divide this by thousands, considering that pari is quite a slow animal comparing with the gpu, but the proportions remain. As I said in the post from the parallel thread, you need an efficient algorithm to factor small numbers very fast, which you don't have. You will lose the most of the time in factoring q-1, for every q. After that, you don't need to effectively find the order, but it is much faster to check, for all prime factors p of q-1, which are in the range of p's you are interested (that is from 0 to 1 billion, GIMPS range, or to say 2^32), if q divides 2^p-1 by exponentiation. This is faster, but yet too slow comparing with the algorithms used by GIMPS, when q gets higher. To try, see this code: Code: `/* "random" trial factoring, specify a starting possible factor and the outfput file */ ratf(a,fis)= { if(fis==0, print("Specify an output file!"); return(0)); while(1, until(a%8==1 || a%8==7,a=nextprime(a+1)); c=factorint(a-1)[,1]~; for(i=1,#c, if(c[i]<1000000000 && Mod(2,a)^c[i]==1, print(a","c[i]); write(fis,a","c[i])) ) ); } /*same as above, but split the output to different files depending on n and k -if a=-1, it will read the last_k from the file, assuming the file has ONE LINE only -use the flags.txt file to gracefully stop it */ ratfpro(a,lastvalue,printstep)= { my(tnextpr,tmod,tfactint,telse,wa,sg); if(a==-1, a=read("fact_last_k.txt")); if(lastvalue<=0, error("There is a must you provide a maximum value of k.")); print("Starting/continuing from: "a); tnextpr=tfactint=tmod=telse=0; if(printstep==0,printstep=2^20); prst=a; gettime(); while(a<=lastvalue, telse+=gettime(); until(a%8==1 || a%8==7,a=nextprime(a+1)); tnextpr+=gettime(); if(a>prst, prst=a+printstep; printf("... %d, (%d digits, 2^%d++) --- Timers: %6.4f, %6.4f, %6.4f, %6.4f (seconds)\n", a,ceil(log(a)/ten),floor(log(a)/two), tnextpr/1000, tfactint/1000, tmod/1000, telse/1000); extern("del fact_last_k.bak"); extern("ren fact_last_k.txt fact_last_k.bak"); write("fact_last_k.txt",a-1); shouldstop=read("flags.txt"); if(shouldstop==1,break) ); if(a>lastvalue,print("Candidate list is finished now."); break); telse+=gettime(); c=factorint((a-1)>>1)[,1]~; tfactint+=gettime(); for(i=1,#c, if(c[i]<10000000000, telse+=gettime(); if(Mod(2,a)^c[i]==1, tmod+=gettime(); print(a","c[i]" "); if(c[i]<1000000000,fis="fact_0B_", if(c[i]<2000000000,fis="fact_1B_", if(c[i]<3000000000,fis="fact_2B_", if(c[i]<4000000000,fis="fact_3B_", if(c[i]<5000000000,fis="fact_4B_", if(c[i]<6000000000,fis="fact_5B_", if(c[i]<7000000000,fis="fact_6B_", if(c[i]<8000000000,fis="fact_7B_", if(c[i]<9000000000,fis="fact_8B_", fis="fact_9B_" ) ) ) ) ) ) ) ) ); expo=37; k=1<lastvalue,break); if(a>prst, prst=a+printstep; printf("... %d, (%d digits, 2^%d++) --- Timers: %10.4f, %10.4f, %10.4f, %10.4f seconds\n", a,ceil(log(a)/ten),floor(log(a)/two), tnextpr/1000, tznord/1000, tprchk/1000, telse/1000); write("fact_last_k",a)); telse+=gettime(); c=znorder(Mod(2,a)); tznord+=gettime(); if(isprime(c), tprchk+=gettime(); if(c<1000000000,fis="fact_0B_", if(c<2000000000,fis="fact_1B_", if(c<3000000000,fis="fact_2B_", if(c<4000000000,fis="fact_3B_", if(c<5000000000,fis="fact_4B_", if(c<6000000000,fis="fact_5B_", if(c<7000000000,fis="fact_6B_", if(c<8000000000,fis="fact_7B_", if(c<9000000000,fis="fact_8B_", if(c<10000000000,fis="fact_9B_", telse+=gettime(); next)))))))))); expo=40; k=1<
 2014-04-10, 13:25 #15 tapion64   Apr 2014 5×17 Posts The Go code I referenced in the other thread used Bach Shallit's algorithm, which factors q-1 so that it can do exactly what you're saying in order to calculate the order. Otherwise there's no point in factoring the number. And we have a fast way to factor the numbers, msieve has CUDA implementations that factor 30+ digit numbers in no time (the timer literally says 00:00:00). Factoring isn't the bottle neck on this, it's the insanely large number of primes we need to test. I need to learn more about the logic behind Atkin's sieve before I attempt to modify it, but I did a traditional Eratosthenes sieve coupled with +/- 1 mod 8, and we can really only get up to 23. The factor we reduce by when adding a new prime to the sieve is (p-1)/p, which has diminishing returns. 23 gets you to around .085, which is just a bit less than 4 times the n/ln(n) approximation for the number of primes in the 2^64 to 2^65. One option would be to try a 2 step sieve, although I think modifying Atkin's sieve will probably yield better results. Even so, if we assume the factoring is fast enough for numbers in that range (msieve says it is), that just means it takes 4 times longer, which really isn't /that/ bad. Also, these are based on what a 560 Ti can do. A faster GPU with more cores can increase the factoring speed and the resulting calculation of the order. Even if the latest cards in the market can't do this quite yet, give it a few years and they will. Last fiddled with by tapion64 on 2014-04-10 at 13:33
 2014-04-10, 18:46 #16 tapion64   Apr 2014 10101012 Posts So I went ahead and read the algorithm and reasoning for why Atkin's works, and then I ignored all of that and just multiplied everything by 2 and sieved in respect to 120 instead of 60, taking out all numbers that weren't +/-1 mod 8. The result, surprisingly, appears to not only work, but eliminates the need for crossing off multiples of the squares in the final step. In other words, it appears to be a completely parallelizable algorithm that can start from any point in the set of integers instead of just starting with 1. Perhaps more interesting, this algorithm appears to be able to take a pre-sieved list with no problem, not just a contiguous section. Granted, I only ran the algorithm on 1 through 240 by hand, but since the cycle length is 120, you'd think that means the results are good throughout. I'll do some more extensive testing and see if I find discrepancies. It'd be really neat if this turned out to be a fast way to search for primes = +/- 1 mod 8. I think the need to solve for the solutions to the curves might impact larger numbers too much to be too feasible, though. If anyone's interested, the revised algorithm: 1. List all integers k to be sieved, marked as composite (well, composite or prime and not = +/-1 mod 8) 2. For all k:a. If k is congruent to {1, 17, 41, 49, 73, 89, 97, 113} mod 120, find the number of solutions d to 4x^2 + y^2 = k, where x,y > 0. If d is odd, mark k as prime = +/- 1 mod 8. b. If k is congruent to {7, 31, 79, 103} mod 120, find the number of solutions d to 3x^2 + y^2 = k, where x,y > 0. If d is odd, mark k as prime = +/- 1 mod 8. c. If k is congruent to {23 47 71 119} mod 120, find the number of solutions d to 3x^2 - y^2 = k, where x > y > 0. If d is odd, mark k as prime = +/- 1 mod 8. d. If k is congruent to any other residue mod 120, leave k as marked as it is (in other words, composite or prime and not = +/- 1 mod 8) 3. You appear to be done! But the original algorithm says to square each element marked prime and then mark multiples of the square as composite. I didn't have to do this, but that doesn't mean it's not necessary for other test sets. EDIT: Now that I think about it, this algorithm should certainly be extendable to cover 7, 11, 13, and so on, by multiplying by 120 and increasing the cycle range then sieving out the 'bad' elements in the congruency tests. It may not be necessary, but it might prevent needless curve solving, and it makes the problem set larger to be split up by cards with more cores. Last fiddled with by tapion64 on 2014-04-10 at 19:02
 2014-04-10, 20:27 #17 tapion64   Apr 2014 5×17 Posts Apparently after some amount of time you can't edit posts, so sorry for multiple posts. Some basic estimations of running time for the original Atkin's is O(N/log(log(N))). If we assume a clock rate of 1 GHz on the GPU and 384 cores and we're doing primes up to 2^65, that takes around 185 days (6 months). With the modified algorithm, we only look from 2^64 through 2^65, which halves the work, and we assume that atleast 1/2 of primes being filtered out as not +/-1 mod 8. Thus we're looking at about 1 and half months (possibly less, need to see how removing the square sieving affects running time) just to iterate through the primes. Since we previously assumed 1/2 the primes are filtered out, we're dealing with 208 quadrillion inputs. With the same clock rate and cores before, and exponential time using quadratic sieve for the 65 bit numbers, 1 core could factor roughly 2048 of them per second, gives us roughly 8306 years of computing time. Ok, yeah, that's a bottleneck. My earlier calculations were off by a factor of 100, which is enough. I should see if there's a better fit for calculating the multiplicative order that doesn't require factoring, possibly make a modified naive algorithm that takes into account the qualities of p which will work for numbers of this size. Last fiddled with by tapion64 on 2014-04-10 at 20:36
 2014-04-10, 22:42 #18 tapion64   Apr 2014 5×17 Posts The solution was there all along. Since we only care if the multiplicative order is less than 2^32, naive multiply and subtract can be performed up to 2^31 and if we still haven't hit -1, we know that the order is larger than 2^32. Although probably use some squared exponentiation techniques and operate in [-(p+1)/2,(p+1)/2) to speed it up. No factoring, no ridiculous bottleneck, but this algorithm isn't as parallelizable... I'll keep looking for better options
 2014-04-10, 23:53 #19 retina Undefined     "The unspeakable one" Jun 2006 My evil lair 23·3·52·11 Posts tapion64: Maybe you should write all this in a blog somewhere and when you've finally figured it all out come back and post here to get people to join you in your search.
 2014-04-11, 01:30 #20 tapion64   Apr 2014 5×17 Posts Will do. I can't believe the site doesn't have perma-editable posts though.
2014-04-11, 02:45   #21
TheMawn

May 2013
East. Always East.

172710 Posts

Quote:
 Originally Posted by tapion64 Will do. I can't believe the site doesn't have perma-editable posts though.
It's different from what other places do, but I think it adds an element of permanence I think is more valuable. I've seen posts proven wrong changed weeks after being first written because the original poster wanted to save face.

EDIT: On the other hand, it appears there are no Crusades against double-posting here, unlike the same forums I am referring to.

Last fiddled with by TheMawn on 2014-04-11 at 02:47 Reason: permanenc3 is not a word

 Similar Threads Thread Thread Starter Forum Replies Last Post garo GPU Computing 100 2019-04-22 10:58 Stargate38 GPU Computing 9 2018-08-31 07:58 JFB Software 23 2004-08-22 05:37 michael Software 23 2004-01-06 08:54 gbvalor Math 4 2003-05-22 02:04

All times are UTC. The time now is 21:40.

Wed Sep 28 21:40:09 UTC 2022 up 41 days, 19:08, 0 users, load averages: 1.60, 1.35, 1.17