mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2014-04-10, 08:00   #12
tapion64
 
Apr 2014

1258 Posts
Default

Quote:
Originally Posted by retina View Post
I'd rather use a deterministic method since it should be just as fast on such a small number. Technically we could use msieve for the test to screen, and then check with GIMPS to see if any data for the exponent exists, since GIMPS knows all the primes less than 2^32.
tapion64 is offline   Reply With Quote
Old 2014-04-10, 08:20   #13
jnml
 
Feb 2012
Prague, Czech Republ

A416 Posts
Default

Quote:
Originally Posted by tapion64 View Post
I'd rather use a deterministic method since it should be just as fast on such a small number. Technically we could use msieve for the test to screen, and then check with GIMPS to see if any data for the exponent exists, since GIMPS knows all the primes less than 2^32.
M-R test is deterministic in that range if you use proper bases: http://miller-rabin.appspot.com/
jnml is offline   Reply With Quote
Old 2014-04-10, 08:27   #14
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2·3·31·47 Posts
Default

So, to have an idea about the involved effort. Divide this by thousands, considering that pari is quite a slow animal comparing with the gpu, but the proportions remain. As I said in the post from the parallel thread, you need an efficient algorithm to factor small numbers very fast, which you don't have. You will lose the most of the time in factoring q-1, for every q. After that, you don't need to effectively find the order, but it is much faster to check, for all prime factors p of q-1, which are in the range of p's you are interested (that is from 0 to 1 billion, GIMPS range, or to say 2^32), if q divides 2^p-1 by exponentiation. This is faster, but yet too slow comparing with the algorithms used by GIMPS, when q gets higher.

To try, see this code:
Code:
/* "random" trial factoring, specify a starting possible factor and the outfput file */
ratf(a,fis)=
{
    if(fis==0, print("Specify an output file!"); return(0));
    while(1,
        until(a%8==1 || a%8==7,a=nextprime(a+1));
        c=factorint(a-1)[,1]~;
        for(i=1,#c,
            if(c[i]<1000000000 && Mod(2,a)^c[i]==1, print(a","c[i]); write(fis,a","c[i]))
        )
    );
}

/*same as above, but split the output to different files depending on n and k
  -if a=-1, it will read the last_k from the file, assuming the file has ONE LINE only
  -use the flags.txt file to gracefully stop it
*/
ratfpro(a,lastvalue,printstep)=
{
    my(tnextpr,tmod,tfactint,telse,wa,sg);
    if(a==-1, a=read("fact_last_k.txt"));
    if(lastvalue<=0, error("There is a must you provide a maximum value of k."));
    print("Starting/continuing from: "a);
    tnextpr=tfactint=tmod=telse=0;
    if(printstep==0,printstep=2^20);
    prst=a;
    gettime();
    while(a<=lastvalue,
        telse+=gettime();
        until(a%8==1 || a%8==7,a=nextprime(a+1));
        tnextpr+=gettime();
        if(a>prst, 
            prst=a+printstep;
            printf("... %d, (%d digits, 2^%d++) --- Timers:  %6.4f,  %6.4f,  %6.4f,  %6.4f  (seconds)\n",
                    a,ceil(log(a)/ten),floor(log(a)/two),
                    tnextpr/1000, tfactint/1000, tmod/1000, telse/1000);
            extern("del fact_last_k.bak");
            extern("ren fact_last_k.txt fact_last_k.bak");
            write("fact_last_k.txt",a-1);
            shouldstop=read("flags.txt");
            if(shouldstop==1,break)
        );
        if(a>lastvalue,print("Candidate list is finished now."); break);
        telse+=gettime();
        c=factorint((a-1)>>1)[,1]~;
        tfactint+=gettime();
        for(i=1,#c,
            if(c[i]<10000000000,
                telse+=gettime();
                if(Mod(2,a)^c[i]==1, 
                    tmod+=gettime();
                    print(a","c[i]"  ");
                    if(c[i]<1000000000,fis="fact_0B_",
                      if(c[i]<2000000000,fis="fact_1B_",
                        if(c[i]<3000000000,fis="fact_2B_",
                          if(c[i]<4000000000,fis="fact_3B_",
                            if(c[i]<5000000000,fis="fact_4B_",
                              if(c[i]<6000000000,fis="fact_5B_",
                                if(c[i]<7000000000,fis="fact_6B_",
                                  if(c[i]<8000000000,fis="fact_7B_",
                                    if(c[i]<9000000000,fis="fact_8B_",
                                      fis="fact_9B_"
                                    )
                                  )
                                )
                              )
                            )
                          )
                        )
                      )
                    );
                    expo=37;
                    k=1<<expo;
                    for(j=1,40,
                        if(a<k,fis=concat(fis,expo); break,
                            k<<=1; expo++));
                    if(a%8==1,
                        wa=(a-1)/(8*c[i])-1;
                        sg="+",
                    /*else*/
                        sg="-";
                        wa=(a-1-2*(4-c[i]%4)*c[i])/(8*c[i])
                    );
                    write(fis,wa,sg,c[i]),
                /*else*/
                    tmod+=gettime()
                ),
            /*else*/
                break
            )
        )
    );
}

/*same as above, but use modular order instead of factoring k-1*/
/*for bigger stuff it may become faster then factoring k-1*/
ratfzno(a,lastvalue,printstep)=
{
    my(tnextpr,tprchk,tznord,telse);
    if(a==-1, a=read("fact_last_k"));
    tnextpr=tznord=tprchk=telse=0;
    gettime();
    if(printstep==0,printstep=2^20);
    prst=a;
    while(a<=lastvalue,
        until(a%8==1 || a%8==7,a=nextprime(a+1));
        tnextpr+=gettime();
        if(a>lastvalue,break);
        if(a>prst, 
            prst=a+printstep;
            printf("... %d, (%d digits, 2^%d++) --- Timers: %10.4f, %10.4f, %10.4f, %10.4f seconds\n",
                    a,ceil(log(a)/ten),floor(log(a)/two),
                    tnextpr/1000, tznord/1000, tprchk/1000, telse/1000);
            write("fact_last_k",a));
        telse+=gettime();
        c=znorder(Mod(2,a));
        tznord+=gettime();
        if(isprime(c),
            tprchk+=gettime();
            if(c<1000000000,fis="fact_0B_",
              if(c<2000000000,fis="fact_1B_",
                if(c<3000000000,fis="fact_2B_",
                  if(c<4000000000,fis="fact_3B_",
                    if(c<5000000000,fis="fact_4B_",
                      if(c<6000000000,fis="fact_5B_",
                        if(c<7000000000,fis="fact_6B_",
                          if(c<8000000000,fis="fact_7B_",
                            if(c<9000000000,fis="fact_8B_",
                              if(c<10000000000,fis="fact_9B_",
                                telse+=gettime();
                                next))))))))));
            expo=40;
            k=1<<expo;
            for(j=1,40,
                if(a<k,fis=concat(fis,expo); break,
                    k<<=1; expo++));
            print(a","c"  ");
            write(fis,a","c);
            telse+=gettime()
        );
        tprchk+=gettime()
    );
}
save the code, load it in pari, try: "ratfpro(2^10,2^64,10^6)"
(the parameters are start, stop, how often to print)

then stop it with CTRL/C, check which files were created.

then try "ratfpro(2^45,2^64,10^6)"

The timers show how much time was spent for finding the next prime q, how much to factor it, how much for exponentiation, etc.

The lines which are printed without "timers" represent factors of mersenne numbers, on the form "q,p", i.e. "q divides 2^p-1"

Then try starting from 2^58, for example. See how many factors you find.... Not so efficient...

This is nice for didactic purpose, only. The function creates a series of files where the factors are stored in a special form, I had some reason at that time to use that form. It also checks (reads/writes) a couple of other files which can be used to resume the work after a stop or crash, of to "gracefully" shut it down (i.e. let it finish the current calculus first, I could implement some pari hook for ctrl-C, but it was beyond the scope).
LaurV is offline   Reply With Quote
Old 2014-04-10, 13:25   #15
tapion64
 
Apr 2014

5×17 Posts
Default

The Go code I referenced in the other thread used Bach Shallit's algorithm, which factors q-1 so that it can do exactly what you're saying in order to calculate the order. Otherwise there's no point in factoring the number. And we have a fast way to factor the numbers, msieve has CUDA implementations that factor 30+ digit numbers in no time (the timer literally says 00:00:00). Factoring isn't the bottle neck on this, it's the insanely large number of primes we need to test.

I need to learn more about the logic behind Atkin's sieve before I attempt to modify it, but I did a traditional Eratosthenes sieve coupled with +/- 1 mod 8, and we can really only get up to 23. The factor we reduce by when adding a new prime to the sieve is (p-1)/p, which has diminishing returns. 23 gets you to around .085, which is just a bit less than 4 times the n/ln(n) approximation for the number of primes in the 2^64 to 2^65. One option would be to try a 2 step sieve, although I think modifying Atkin's sieve will probably yield better results. Even so, if we assume the factoring is fast enough for numbers in that range (msieve says it is), that just means it takes 4 times longer, which really isn't /that/ bad.

Also, these are based on what a 560 Ti can do. A faster GPU with more cores can increase the factoring speed and the resulting calculation of the order. Even if the latest cards in the market can't do this quite yet, give it a few years and they will.

Last fiddled with by tapion64 on 2014-04-10 at 13:33
tapion64 is offline   Reply With Quote
Old 2014-04-10, 18:46   #16
tapion64
 
Apr 2014

5×17 Posts
Default

So I went ahead and read the algorithm and reasoning for why Atkin's works, and then I ignored all of that and just multiplied everything by 2 and sieved in respect to 120 instead of 60, taking out all numbers that weren't +/-1 mod 8.

The result, surprisingly, appears to not only work, but eliminates the need for crossing off multiples of the squares in the final step. In other words, it appears to be a completely parallelizable algorithm that can start from any point in the set of integers instead of just starting with 1. Perhaps more interesting, this algorithm appears to be able to take a pre-sieved list with no problem, not just a contiguous section.

Granted, I only ran the algorithm on 1 through 240 by hand, but since the cycle length is 120, you'd think that means the results are good throughout. I'll do some more extensive testing and see if I find discrepancies. It'd be really neat if this turned out to be a fast way to search for primes = +/- 1 mod 8. I think the need to solve for the solutions to the curves might impact larger numbers too much to be too feasible, though.

If anyone's interested, the revised algorithm:

1. List all integers k to be sieved, marked as composite (well, composite or prime and not = +/-1 mod 8)
2. For all k:
a. If k is congruent to {1, 17, 41, 49, 73, 89, 97, 113} mod 120, find the number of solutions d to 4x^2 + y^2 = k, where x,y > 0. If d is odd, mark k as prime = +/- 1 mod 8.

b. If k is congruent to {7, 31, 79, 103} mod 120, find the number of solutions d to 3x^2 + y^2 = k, where x,y > 0. If d is odd, mark k as prime = +/- 1 mod 8.

c. If k is congruent to {23 47 71 119} mod 120, find the number of solutions d to 3x^2 - y^2 = k, where x > y > 0. If d is odd, mark k as prime = +/- 1 mod 8.

d. If k is congruent to any other residue mod 120, leave k as marked as it is (in other words, composite or prime and not = +/- 1 mod 8)
3. You appear to be done! But the original algorithm says to square each element marked prime and then mark multiples of the square as composite. I didn't have to do this, but that doesn't mean it's not necessary for other test sets.

EDIT: Now that I think about it, this algorithm should certainly be extendable to cover 7, 11, 13, and so on, by multiplying by 120 and increasing the cycle range then sieving out the 'bad' elements in the congruency tests. It may not be necessary, but it might prevent needless curve solving, and it makes the problem set larger to be split up by cards with more cores.

Last fiddled with by tapion64 on 2014-04-10 at 19:02
tapion64 is offline   Reply With Quote
Old 2014-04-10, 20:27   #17
tapion64
 
Apr 2014

8510 Posts
Default

Apparently after some amount of time you can't edit posts, so sorry for multiple posts. Some basic estimations of running time for the original Atkin's is O(N/log(log(N))). If we assume a clock rate of 1 GHz on the GPU and 384 cores and we're doing primes up to 2^65, that takes around 185 days (6 months).

With the modified algorithm, we only look from 2^64 through 2^65, which halves the work, and we assume that atleast 1/2 of primes being filtered out as not +/-1 mod 8. Thus we're looking at about 1 and half months (possibly less, need to see how removing the square sieving affects running time) just to iterate through the primes.

Since we previously assumed 1/2 the primes are filtered out, we're dealing with 208 quadrillion inputs. With the same clock rate and cores before, and exponential time using quadratic sieve for the 65 bit numbers, 1 core could factor roughly 2048 of them per second, gives us roughly 8306 years of computing time. Ok, yeah, that's a bottleneck. My earlier calculations were off by a factor of 100, which is enough. I should see if there's a better fit for calculating the multiplicative order that doesn't require factoring, possibly make a modified naive algorithm that takes into account the qualities of p which will work for numbers of this size.

Last fiddled with by tapion64 on 2014-04-10 at 20:36
tapion64 is offline   Reply With Quote
Old 2014-04-10, 22:42   #18
tapion64
 
Apr 2014

1258 Posts
Default

The solution was there all along. Since we only care if the multiplicative order is less than 2^32, naive multiply and subtract can be performed up to 2^31 and if we still haven't hit -1, we know that the order is larger than 2^32. Although probably use some squared exponentiation techniques and operate in [-(p+1)/2,(p+1)/2) to speed it up. No factoring, no ridiculous bottleneck, but this algorithm isn't as parallelizable... I'll keep looking for better options
tapion64 is offline   Reply With Quote
Old 2014-04-10, 23:53   #19
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2·3·5·191 Posts
Default

tapion64: Maybe you should write all this in a blog somewhere and when you've finally figured it all out come back and post here to get people to join you in your search.
retina is online now   Reply With Quote
Old 2014-04-11, 01:30   #20
tapion64
 
Apr 2014

5·17 Posts
Default

Will do. I can't believe the site doesn't have perma-editable posts though.
tapion64 is offline   Reply With Quote
Old 2014-04-11, 02:45   #21
TheMawn
 
TheMawn's Avatar
 
May 2013
East. Always East.

11×157 Posts
Default

Quote:
Originally Posted by tapion64 View Post
Will do. I can't believe the site doesn't have perma-editable posts though.
It's different from what other places do, but I think it adds an element of permanence I think is more valuable. I've seen posts proven wrong changed weeks after being first written because the original poster wanted to save face.

EDIT: On the other hand, it appears there are no Crusades against double-posting here, unlike the same forums I am referring to.

Last fiddled with by TheMawn on 2014-04-11 at 02:47 Reason: permanenc3 is not a word
TheMawn is offline   Reply With Quote
Old 2014-04-11, 03:53   #22
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2·3·31·47 Posts
Default

Now you got to more realistic times :D
Related to your question, there are many ways to get that order, without factoring. Unfortunately none of them is faster.

One very simple (but still exponential) method is extremely easy to implement in binary, as it only needs additions and shifts:

Starting with 1:
-- add q
-- count the (binary) zeroes at the end, and eliminate them
-- repeat the two steps above until you get 1.

Example:
say q=5, which in binary is 101, so 101+1=110, we have 11 and a zero.
-- 11+101=1000, we have 1 (so we stop) and additional 3 zeroes, so we have a total of 4 zeroes. Therefore 5 divides 24-1, more exactly 5 divides all numbers of the form 24k-1, and none others (i.e. there is no other power x such as 2x-1 is divisible by 5)

You can do this for all odd numbers (not necessarily prime) and there is a simple proof that you will always end in 1, after the right number of steps.

say q=9, which in binary is 1001, so 1001+1=1010, we have 101 and a zero.
-- 101+1001=1110, we have 111 and additional one zero, total 2 zeroes.
-- 111+1001=10000, we have 1 (so we stop) and additional 4 zeroes, total 6 zeroes.
Therefore, 9 divides 2^6-1, and all numbers of the form 26k-1 and none others.

say q=11 (decimal), which in binary is 1011, so 1011+1=1100, we have 11 and 2 zeroes.
-- 11+1011=1110, we have 111 and additional one zero, total 3 zeroes.
-- 111+1011=10010, we have 1001 and additional 1 zero, total 4 zeroes.
-- 1001+1011=10100, we have 101 and additional 2 zeroes, total 6 zeroes.
-- 101+1011=10000, we have 1 (so we stop) and additional 4 zeroes, total 10 zeroes.
Therefore, 11 divides 2^10-1, and all numbers of the form 210k-1 and none others.

One which will interest us, where the result is prime:

say q=23, which in binary is 10111, so 10111+1=11000, we have 11 and 3 zeroes.
-- 11+10111=11010, we have 1101 and additional one zero, total 4 zeroes.
-- 1101+10111=100100, we have 1001 and additional 2 zeroes, total 6 zeroes.
-- 1001+10111=10000, we have 1 (so we stop) and additional 5 zeroes, total 11 (decimal) zeroes.
Therefore, 23 divides 2^11-1, and all numbers of the form 211k-1 and none others.

The same (last) calculus in decimal, involving "divide by 2 if it is even", would look like:
q=23+1=24, even, we have q=q/2=12 and p=1 (one zero);
q=12 even, we have q=6 and p=2;
q=6 even we have q=3 and p=3;
q=3 odd, etc...
q=3+23=26, q=13, p=4;
q=13+23=36; q=18, p=5; q=9, p=6;
q=9+23=32; q=16, p=7; q=8, p=8; q=4, p=9; q=2, p=10; q=1, p=11; end.
So, 23 divides 2^11-1.

Next step, you have to do an ASIC to compute this (it won't be difficult, anyhow it is much simpler than the asics for sha256 for bitcoin mining, hehe, just few counters and shifters) but it will still be (much) slower than normal exponentiation, by squaring - but well... you do no factoring, neither multiplication nor division ).

Last fiddled with by LaurV on 2014-04-11 at 04:01
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GPU Trial Factoring FAQ garo GPU Computing 100 2019-04-22 10:58
Trial Factoring on AMD/ATI GPU's? Stargate38 GPU Computing 9 2018-08-31 07:58
over trial factoring JFB Software 23 2004-08-22 05:37
How to only do Trial Factoring? michael Software 23 2004-01-06 08:54
About trial factoring gbvalor Math 4 2003-05-22 02:04

All times are UTC. The time now is 04:11.

Fri Sep 25 04:11:41 UTC 2020 up 15 days, 1:22, 0 users, load averages: 2.52, 1.92, 1.62

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.