20140410, 08:00  #12  
Apr 2014
125_{8} Posts 
Quote:


20140410, 08:20  #13  
Feb 2012
Prague, Czech Republ
A4_{16} Posts 
Quote:


20140410, 08:27  #14 
Romulan Interpreter
Jun 2011
Thailand
2·3·31·47 Posts 
So, to have an idea about the involved effort. Divide this by thousands, considering that pari is quite a slow animal comparing with the gpu, but the proportions remain. As I said in the post from the parallel thread, you need an efficient algorithm to factor small numbers very fast, which you don't have. You will lose the most of the time in factoring q1, for every q. After that, you don't need to effectively find the order, but it is much faster to check, for all prime factors p of q1, which are in the range of p's you are interested (that is from 0 to 1 billion, GIMPS range, or to say 2^32), if q divides 2^p1 by exponentiation. This is faster, but yet too slow comparing with the algorithms used by GIMPS, when q gets higher.
To try, see this code: Code:
/* "random" trial factoring, specify a starting possible factor and the outfput file */ ratf(a,fis)= { if(fis==0, print("Specify an output file!"); return(0)); while(1, until(a%8==1  a%8==7,a=nextprime(a+1)); c=factorint(a1)[,1]~; for(i=1,#c, if(c[i]<1000000000 && Mod(2,a)^c[i]==1, print(a","c[i]); write(fis,a","c[i])) ) ); } /*same as above, but split the output to different files depending on n and k if a=1, it will read the last_k from the file, assuming the file has ONE LINE only use the flags.txt file to gracefully stop it */ ratfpro(a,lastvalue,printstep)= { my(tnextpr,tmod,tfactint,telse,wa,sg); if(a==1, a=read("fact_last_k.txt")); if(lastvalue<=0, error("There is a must you provide a maximum value of k.")); print("Starting/continuing from: "a); tnextpr=tfactint=tmod=telse=0; if(printstep==0,printstep=2^20); prst=a; gettime(); while(a<=lastvalue, telse+=gettime(); until(a%8==1  a%8==7,a=nextprime(a+1)); tnextpr+=gettime(); if(a>prst, prst=a+printstep; printf("... %d, (%d digits, 2^%d++)  Timers: %6.4f, %6.4f, %6.4f, %6.4f (seconds)\n", a,ceil(log(a)/ten),floor(log(a)/two), tnextpr/1000, tfactint/1000, tmod/1000, telse/1000); extern("del fact_last_k.bak"); extern("ren fact_last_k.txt fact_last_k.bak"); write("fact_last_k.txt",a1); shouldstop=read("flags.txt"); if(shouldstop==1,break) ); if(a>lastvalue,print("Candidate list is finished now."); break); telse+=gettime(); c=factorint((a1)>>1)[,1]~; tfactint+=gettime(); for(i=1,#c, if(c[i]<10000000000, telse+=gettime(); if(Mod(2,a)^c[i]==1, tmod+=gettime(); print(a","c[i]" "); if(c[i]<1000000000,fis="fact_0B_", if(c[i]<2000000000,fis="fact_1B_", if(c[i]<3000000000,fis="fact_2B_", if(c[i]<4000000000,fis="fact_3B_", if(c[i]<5000000000,fis="fact_4B_", if(c[i]<6000000000,fis="fact_5B_", if(c[i]<7000000000,fis="fact_6B_", if(c[i]<8000000000,fis="fact_7B_", if(c[i]<9000000000,fis="fact_8B_", fis="fact_9B_" ) ) ) ) ) ) ) ) ); expo=37; k=1<<expo; for(j=1,40, if(a<k,fis=concat(fis,expo); break, k<<=1; expo++)); if(a%8==1, wa=(a1)/(8*c[i])1; sg="+", /*else*/ sg=""; wa=(a12*(4c[i]%4)*c[i])/(8*c[i]) ); write(fis,wa,sg,c[i]), /*else*/ tmod+=gettime() ), /*else*/ break ) ) ); } /*same as above, but use modular order instead of factoring k1*/ /*for bigger stuff it may become faster then factoring k1*/ ratfzno(a,lastvalue,printstep)= { my(tnextpr,tprchk,tznord,telse); if(a==1, a=read("fact_last_k")); tnextpr=tznord=tprchk=telse=0; gettime(); if(printstep==0,printstep=2^20); prst=a; while(a<=lastvalue, until(a%8==1  a%8==7,a=nextprime(a+1)); tnextpr+=gettime(); if(a>lastvalue,break); if(a>prst, prst=a+printstep; printf("... %d, (%d digits, 2^%d++)  Timers: %10.4f, %10.4f, %10.4f, %10.4f seconds\n", a,ceil(log(a)/ten),floor(log(a)/two), tnextpr/1000, tznord/1000, tprchk/1000, telse/1000); write("fact_last_k",a)); telse+=gettime(); c=znorder(Mod(2,a)); tznord+=gettime(); if(isprime(c), tprchk+=gettime(); if(c<1000000000,fis="fact_0B_", if(c<2000000000,fis="fact_1B_", if(c<3000000000,fis="fact_2B_", if(c<4000000000,fis="fact_3B_", if(c<5000000000,fis="fact_4B_", if(c<6000000000,fis="fact_5B_", if(c<7000000000,fis="fact_6B_", if(c<8000000000,fis="fact_7B_", if(c<9000000000,fis="fact_8B_", if(c<10000000000,fis="fact_9B_", telse+=gettime(); next)))))))))); expo=40; k=1<<expo; for(j=1,40, if(a<k,fis=concat(fis,expo); break, k<<=1; expo++)); print(a","c" "); write(fis,a","c); telse+=gettime() ); tprchk+=gettime() ); } (the parameters are start, stop, how often to print) then stop it with CTRL/C, check which files were created. then try "ratfpro(2^45,2^64,10^6)" The timers show how much time was spent for finding the next prime q, how much to factor it, how much for exponentiation, etc. The lines which are printed without "timers" represent factors of mersenne numbers, on the form "q,p", i.e. "q divides 2^p1" Then try starting from 2^58, for example. See how many factors you find.... Not so efficient... This is nice for didactic purpose, only. The function creates a series of files where the factors are stored in a special form, I had some reason at that time to use that form. It also checks (reads/writes) a couple of other files which can be used to resume the work after a stop or crash, of to "gracefully" shut it down (i.e. let it finish the current calculus first, I could implement some pari hook for ctrlC, but it was beyond the scope). 
20140410, 13:25  #15 
Apr 2014
5×17 Posts 
The Go code I referenced in the other thread used Bach Shallit's algorithm, which factors q1 so that it can do exactly what you're saying in order to calculate the order. Otherwise there's no point in factoring the number. And we have a fast way to factor the numbers, msieve has CUDA implementations that factor 30+ digit numbers in no time (the timer literally says 00:00:00). Factoring isn't the bottle neck on this, it's the insanely large number of primes we need to test.
I need to learn more about the logic behind Atkin's sieve before I attempt to modify it, but I did a traditional Eratosthenes sieve coupled with +/ 1 mod 8, and we can really only get up to 23. The factor we reduce by when adding a new prime to the sieve is (p1)/p, which has diminishing returns. 23 gets you to around .085, which is just a bit less than 4 times the n/ln(n) approximation for the number of primes in the 2^64 to 2^65. One option would be to try a 2 step sieve, although I think modifying Atkin's sieve will probably yield better results. Even so, if we assume the factoring is fast enough for numbers in that range (msieve says it is), that just means it takes 4 times longer, which really isn't /that/ bad. Also, these are based on what a 560 Ti can do. A faster GPU with more cores can increase the factoring speed and the resulting calculation of the order. Even if the latest cards in the market can't do this quite yet, give it a few years and they will. Last fiddled with by tapion64 on 20140410 at 13:33 
20140410, 18:46  #16 
Apr 2014
5×17 Posts 
So I went ahead and read the algorithm and reasoning for why Atkin's works, and then I ignored all of that and just multiplied everything by 2 and sieved in respect to 120 instead of 60, taking out all numbers that weren't +/1 mod 8.
The result, surprisingly, appears to not only work, but eliminates the need for crossing off multiples of the squares in the final step. In other words, it appears to be a completely parallelizable algorithm that can start from any point in the set of integers instead of just starting with 1. Perhaps more interesting, this algorithm appears to be able to take a presieved list with no problem, not just a contiguous section. Granted, I only ran the algorithm on 1 through 240 by hand, but since the cycle length is 120, you'd think that means the results are good throughout. I'll do some more extensive testing and see if I find discrepancies. It'd be really neat if this turned out to be a fast way to search for primes = +/ 1 mod 8. I think the need to solve for the solutions to the curves might impact larger numbers too much to be too feasible, though. If anyone's interested, the revised algorithm: 1. List all integers k to be sieved, marked as composite (well, composite or prime and not = +/1 mod 8) 2. For all k: a. If k is congruent to {1, 17, 41, 49, 73, 89, 97, 113} mod 120, find the number of solutions d to 4x^2 + y^2 = k, where x,y > 0. If d is odd, mark k as prime = +/ 1 mod 8.3. You appear to be done! But the original algorithm says to square each element marked prime and then mark multiples of the square as composite. I didn't have to do this, but that doesn't mean it's not necessary for other test sets. EDIT: Now that I think about it, this algorithm should certainly be extendable to cover 7, 11, 13, and so on, by multiplying by 120 and increasing the cycle range then sieving out the 'bad' elements in the congruency tests. It may not be necessary, but it might prevent needless curve solving, and it makes the problem set larger to be split up by cards with more cores. Last fiddled with by tapion64 on 20140410 at 19:02 
20140410, 20:27  #17 
Apr 2014
85_{10} Posts 
Apparently after some amount of time you can't edit posts, so sorry for multiple posts. Some basic estimations of running time for the original Atkin's is O(N/log(log(N))). If we assume a clock rate of 1 GHz on the GPU and 384 cores and we're doing primes up to 2^65, that takes around 185 days (6 months).
With the modified algorithm, we only look from 2^64 through 2^65, which halves the work, and we assume that atleast 1/2 of primes being filtered out as not +/1 mod 8. Thus we're looking at about 1 and half months (possibly less, need to see how removing the square sieving affects running time) just to iterate through the primes. Since we previously assumed 1/2 the primes are filtered out, we're dealing with 208 quadrillion inputs. With the same clock rate and cores before, and exponential time using quadratic sieve for the 65 bit numbers, 1 core could factor roughly 2048 of them per second, gives us roughly 8306 years of computing time. Ok, yeah, that's a bottleneck. My earlier calculations were off by a factor of 100, which is enough. I should see if there's a better fit for calculating the multiplicative order that doesn't require factoring, possibly make a modified naive algorithm that takes into account the qualities of p which will work for numbers of this size. Last fiddled with by tapion64 on 20140410 at 20:36 
20140410, 22:42  #18 
Apr 2014
125_{8} Posts 
The solution was there all along. Since we only care if the multiplicative order is less than 2^32, naive multiply and subtract can be performed up to 2^31 and if we still haven't hit 1, we know that the order is larger than 2^32. Although probably use some squared exponentiation techniques and operate in [(p+1)/2,(p+1)/2) to speed it up. No factoring, no ridiculous bottleneck, but this algorithm isn't as parallelizable... I'll keep looking for better options

20140410, 23:53  #19 
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2·3·5·191 Posts 
tapion64: Maybe you should write all this in a blog somewhere and when you've finally figured it all out come back and post here to get people to join you in your search.

20140411, 01:30  #20 
Apr 2014
5·17 Posts 
Will do. I can't believe the site doesn't have permaeditable posts though.

20140411, 02:45  #21  
May 2013
East. Always East.
11×157 Posts 
Quote:
EDIT: On the other hand, it appears there are no Crusades against doubleposting here, unlike the same forums I am referring to. Last fiddled with by TheMawn on 20140411 at 02:47 Reason: permanenc3 is not a word 

20140411, 03:53  #22 
Romulan Interpreter
Jun 2011
Thailand
2·3·31·47 Posts 
Now you got to more realistic times :D
Related to your question, there are many ways to get that order, without factoring. Unfortunately none of them is faster. One very simple (but still exponential) method is extremely easy to implement in binary, as it only needs additions and shifts: Starting with 1:  add q  count the (binary) zeroes at the end, and eliminate them  repeat the two steps above until you get 1. Example: say q=5, which in binary is 101, so 101+1=110, we have 11 and a zero.  11+101=1000, we have 1 (so we stop) and additional 3 zeroes, so we have a total of 4 zeroes. Therefore 5 divides 2^{4}1, more exactly 5 divides all numbers of the form 2^{4k}1, and none others (i.e. there is no other power x such as 2^{x}1 is divisible by 5) You can do this for all odd numbers (not necessarily prime) and there is a simple proof that you will always end in 1, after the right number of steps. say q=9, which in binary is 1001, so 1001+1=1010, we have 101 and a zero.  101+1001=1110, we have 111 and additional one zero, total 2 zeroes.  111+1001=10000, we have 1 (so we stop) and additional 4 zeroes, total 6 zeroes. Therefore, 9 divides 2^61, and all numbers of the form 2^{6k}1 and none others. say q=11 (decimal), which in binary is 1011, so 1011+1=1100, we have 11 and 2 zeroes.  11+1011=1110, we have 111 and additional one zero, total 3 zeroes.  111+1011=10010, we have 1001 and additional 1 zero, total 4 zeroes.  1001+1011=10100, we have 101 and additional 2 zeroes, total 6 zeroes.  101+1011=10000, we have 1 (so we stop) and additional 4 zeroes, total 10 zeroes. Therefore, 11 divides 2^101, and all numbers of the form 2^{10k}1 and none others. One which will interest us, where the result is prime: say q=23, which in binary is 10111, so 10111+1=11000, we have 11 and 3 zeroes.  11+10111=11010, we have 1101 and additional one zero, total 4 zeroes.  1101+10111=100100, we have 1001 and additional 2 zeroes, total 6 zeroes.  1001+10111=10000, we have 1 (so we stop) and additional 5 zeroes, total 11 (decimal) zeroes. Therefore, 23 divides 2^111, and all numbers of the form 2^{11k}1 and none others. The same (last) calculus in decimal, involving "divide by 2 if it is even", would look like: q=23+1=24, even, we have q=q/2=12 and p=1 (one zero); q=12 even, we have q=6 and p=2; q=6 even we have q=3 and p=3; q=3 odd, etc... q=3+23=26, q=13, p=4; q=13+23=36; q=18, p=5; q=9, p=6; q=9+23=32; q=16, p=7; q=8, p=8; q=4, p=9; q=2, p=10; q=1, p=11; end. So, 23 divides 2^111. Next step, you have to do an ASIC to compute this (it won't be difficult, anyhow it is much simpler than the asics for sha256 for bitcoin mining, hehe, just few counters and shifters) but it will still be (much) slower than normal exponentiation, by squaring  but well... you do no factoring, neither multiplication nor division ). Last fiddled with by LaurV on 20140411 at 04:01 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
GPU Trial Factoring FAQ  garo  GPU Computing  100  20190422 10:58 
Trial Factoring on AMD/ATI GPU's?  Stargate38  GPU Computing  9  20180831 07:58 
over trial factoring  JFB  Software  23  20040822 05:37 
How to only do Trial Factoring?  michael  Software  23  20040106 08:54 
About trial factoring  gbvalor  Math  4  20030522 02:04 