20140409, 18:45  #1 
Apr 2014
55_{16} Posts 
Trial Factoring Elimination up to 2^64 Project
(For those curious about the math behind this, the multiplicative order of n mod m is defined as the smallest k such that n^k = 1 mod m. As Mersenne numbers are of the form 2^p1, if a given prime q divides that Mersenne number, then its multiplicative order of 2 is equal to p, since 2^p mod q = 1, and 11 = 0. After that point, q will only divide numbers of the form 2^(k*p)1, which are eliminated from Mersenne numbers as the exponent is not prime. This is the reason that 2^n1 must have n be prime for the number to be prime, as the set of all multiplicative orders of 2 mod p contains the set of all primes. If this were not the case, then there would exist some number 2^p1 which had no factors greater than 1, which we know is impossible.)
The purpose of the project is to eventually get to 2^64 (or beyond) in contiguous primes, calculating their multiplicative order, and then eliminating them from trial factoring range. Calculating this for an individual prime is trivial even in the 2^80 range and above using the Bach Shallit algorithm, so processingwise this isn't very intensive, it's just a huge number of test cases (around 416 quadrillion primes). The original intent was to calculate and store this information for the primes, but this would easily take up several exabytes of information, so instead my server is going to track information about the range covered so far. This introduces new architectural problems I'm attempting to solve at the moment. Inevitably this will run across factors for exponents that are far outside the range that GIMPS currently stores, which I believe is for exponents less than or equal to 2^32. For the purpose of this project, it's possible to ignore those as 'outside of scope'. Thus if the calculated multiplicative order is greater than 2^32, we can still add it to the range of eliminated trial factors. If the multiplicative order is less than 2^32 and prime, we can immediately report it to GIMPS as a factor, again eliminating it from the trial factoring range. The final and most common outcome would be that it is less than 2^32 and not prime, so we can eliminate it (permanently) from trial factoring range. However, since we aren't storing any of this information to a database, it'll simply have to be recalculated next time the exponent range on GIMPS increases. The main challenge of distributing this work is actually having the primes on hand to test. But if we assume that all nonprimes are eliminated from the first step (testing for multiplicative order), and the average CPU can run the whole algorithm on 1 core for 2 million of these primes in around 1 second (I'll have to do benchmarking on this, it could be slower), total calculation time ends up being around 6592 years. If the average contributor has atleast a quad core, 1649 years or so. If the entire community of GIMPS that was working on trial factoring at this moment switched over to doing these calculations, around 2184, that would be 9 months of testing, and we'd never have to trial factor below 2^64 again (until GIMPS increases their exponent range), as we'll have eliminated every single Mersenne prime candidate with exponent less than 2^32 that has a factor less than 2^64. Obviously, this type of approach is also well suited for CUDA divide and conquer computing, but I have no idea how widespread CUDA GPUs are, and their performance can vary quite a bit. But ideally this would be a project that would support both types of contributors. I'd like any feedback/criticism/help the community can offer. I'll be doing some benchmarking to come back with hard estimates for the time this would actually take. EDIT: (I'm also aware that it's unreasonable to expect that many people to be running the program 24/7 for that long, and also that many of them may only have dual or even single cores working on it, so I will factor that in.) Last fiddled with by tapion64 on 20140409 at 18:53 
20140409, 19:32  #2 
ἀβουλία
"Mr. Meeseeks"
Jan 2012
California, USA
2^{3}·271 Posts 
Not sure where you are going with this, but programs exist for TF'ing on the GPU and people run them. See the project "GPU to 72" also. Also, no one runs TF on the CPU because the GPU can do it much, much faster.
http://mersenneforum.org/forumdisplay.php?f=92 OpenCL is faster at this moment compared to CUDA in this generation of GPU's, but that always changes Maxwell I believe has really good potential. Last fiddled with by kracker on 20140409 at 19:35 
20140409, 19:42  #3  
Apr 2014
5×17 Posts 
Quote:


20140409, 19:59  #4 
May 2013
East. Always East.
1727_{10} Posts 
I only vaguely caught the math behind this idea. If I understand correctly, it testing if 2KP+1 divides 2^{P}1, but tests every P for one K instead of every K for one P, The latter being our Trial Factoring method?
I can only assume I'm wrong, because this has actually been done. There is at least one user who is taking this approach currently, and has been finding factors < 2^{~45} (last I checked) all over the map. Also, in the interest of finding all factors less than 64 bits, this sounds pretty neat if it does work. However, you'll have a hard time getting a lot of traction because trial factoring to 64 bits has been completed up to 1000M except for very low exponents whose Prime / Composite status has long been proven. Your efforts will not clear any exponents in the list. It will only find factors for exponents we already know to be composite. I personally would rather direct efforts at exponents with unknown status, but I can't speak for everyone. 
20140409, 20:04  #5  
Apr 2014
125_{8} Posts 
Quote:
EDIT: In that case, I think this should definitely be a GPU only project. CPU will be too slow on the sieving of primes to make a difference. I need to learn how to do the sieving using only the GPU and convert my Go code for calculating the order into CUDA. That part could take some time, as I'm not familiar with CUDA in general. This could take far less time than I thought it might if the conversion from Go to CUDA is seemless. Last fiddled with by tapion64 on 20140409 at 20:25 

20140409, 21:16  #6 
May 2013
East. Always East.
6BF_{16} Posts 
Is this similar to finding factors by k instead of by P? Or is there some other improvement?

20140409, 21:38  #7  
Apr 2014
5·17 Posts 
Quote:
In order to make this feasible, we'll limit k to be less than or equal to the exponents that GIMPS currently tracks, which are those less than 2^32 (or in other words, Mersenne numbers less than 2^(2^32)1). So after we've calculated k, if it's greater than 2^32, throw the number away and move on to the next one. If not, check the primality of k (this should be simple, it's a 32 bit number). If it's composite, we don't care because 2^k1 isn't a Mersenne number and can't possibly be prime, it's already eliminated. If it's composite, go check again GIMPS to see if P is in the system for 2^k1 already. If it isn't, great success. Either way, move on to the next prime and keep checking. If GPU sieving is this fast, and we can write efficient CUDA code to calculate multiplicative order, one person alone with a 560 Ti + could wipe out those ranges in a week's time, I think. Possibly faster. I wish I could code CUDA as fast as I pseudocoded this algorithm, we'd already be done. Luckily the GPU sieving library in mfaktc should come in handy, I just have to figure out how to use it and use CUDA in general. I keep expecting to find some terrible flaw in my logic, but if I don't, it means we're about to find a lot of factors. Last fiddled with by tapion64 on 20140409 at 21:43 

20140410, 02:18  #8 
Romulan Interpreter
Jun 2011
Thailand
2^{2}×7×317 Posts 
What you want to do is called "searching by k" (as opposite to "searching by p", what p95 and mfaktX do  the "k" and "p" refers to a factor q of 2^p1 being of the form q=2kp+1) and it was discussed at least 20 times on the forum. I personally put a thousand pari code snippets, and versions of that code was used by others (James) to extend the factor data base (see his graphs) for small factors (below 2^45 or so). This is very fast for small factors, but it is getting exponentially slower as q gets higher, because of the "garbage" it produces, mostly.
This method, with all possible and impossible optimizations together (like testing only primes which are 1 or 7 mod 8, being the most obvious optimization, but other less obvious can be done too) is not feasible beyond 2^48 (or max. 2^52, if you have a supercomputer), due to the large number of primes you need to test, which won't produce a prime exponent (the most of the primes produce composite exponents). I have a data base of factors to about 2^56, which can be "easily" reproduced letting a single core of pari to crunch for half year or 8 months. James has a larger database. When you succeed going over this limit, tell us For records, PrimeNet has such a database to 2^64 at least (all exponents over some limit were TFed to 2^65 or so, by other  faster  methods). There are about 2.2*10^17 primes between 2^56 and 2^64 (superficial calculus, forgive me if it is not accurate, but the magnitude is right) from which a half are 1 or 7 mod 8. Even if you have ten thousand computers and succeed to test a hundred thousands of primes in each computer per second, you will still need about 3 to 5 years to reach 2^64. So, do you have 10 thousand computers? Edit: I had a look for old code but can't find it on the forum, anyhow, here is a sample of what you want to do, imagine the "getp" function implements the algorithm you talk about. Code:
gp > \r tf gp > forprime(q=7,1000, print(q" : "getp(q,10^9))) 7 : 3 11 : 10 13 : 12 17 : 8 19 : 18 23 : 11 29 : 28 31 : 5 37 : 36 41 : 20 43 : 14 47 : 23 53 : 52 59 : 58 61 : 60 67 : 66 71 : 35 73 : 9 79 : 39 83 : 82 89 : 11 97 : 48 101 : 100 103 : 51 107 : 106 109 : 36 113 : 28 127 : 7 131 : 130 137 : 68 139 : 138 149 : 148 151 : 15 157 : 52 163 : 162 167 : 83 173 : 172 179 : 178 181 : 180 191 : 95 193 : 96 197 : 196 199 : 99 211 : 210 223 : 37 227 : 226 229 : 76 233 : 29 239 : 119 241 : 24 251 : 50 257 : 16 263 : 131 269 : 268 271 : 135 277 : 92 281 : 70 283 : 94 293 : 292 307 : 102 311 : 155 313 : 156 317 : 316 331 : 30 337 : 21 347 : 346 349 : 348 353 : 88 359 : 179 367 : 183 373 : 372 379 : 378 383 : 191 389 : 388 397 : 44 401 : 200 409 : 204 419 : 418 421 : 420 431 : 43 433 : 72 439 : 73 443 : 442 449 : 224 457 : 76 461 : 460 463 : 231 467 : 466 479 : 239 487 : 243 491 : 490 499 : 166 503 : 251 509 : 508 521 : 260 523 : 522 541 : 540 547 : 546 557 : 556 563 : 562 569 : 284 571 : 114 577 : 144 587 : 586 593 : 148 599 : 299 601 : 25 607 : 303 613 : 612 617 : 154 619 : 618 631 : 45 641 : 64 643 : 214 647 : 323 653 : 652 659 : 658 661 : 660 673 : 48 677 : 676 683 : 22 691 : 230 701 : 700 709 : 708 719 : 359 727 : 121 733 : 244 739 : 246 743 : 371 751 : 375 757 : 756 761 : 380 769 : 384 773 : 772 787 : 786 797 : 796 809 : 404 811 : 270 821 : 820 823 : 411 827 : 826 829 : 828 839 : 419 853 : 852 857 : 428 859 : 858 863 : 431 877 : 876 881 : 55 883 : 882 887 : 443 907 : 906 911 : 91 919 : 153 929 : 464 937 : 117 941 : 940 947 : 946 953 : 68 967 : 483 971 : 194 977 : 488 983 : 491 991 : 495 997 : 332 gp > More clear: Code:
gp > forprime(q=7,10000, if(q%8==1q%8==7,p=getp(q,10^9); if(isprime(p), print(q" divides 2^"p"1")))) 7 divides 2^31 23 divides 2^111 31 divides 2^51 47 divides 2^231 89 divides 2^111 127 divides 2^71 167 divides 2^831 223 divides 2^371 233 divides 2^291 263 divides 2^1311 359 divides 2^1791 383 divides 2^1911 431 divides 2^431 439 divides 2^731 479 divides 2^2391 503 divides 2^2511 719 divides 2^3591 839 divides 2^4191 863 divides 2^4311 887 divides 2^4431 983 divides 2^4911 1103 divides 2^291 1319 divides 2^6591 1367 divides 2^6831 1399 divides 2^2331 1433 divides 2^1791 1439 divides 2^7191 1487 divides 2^7431 1823 divides 2^9111 1913 divides 2^2391 2039 divides 2^10191 2063 divides 2^10311 2089 divides 2^291 2207 divides 2^11031 2351 divides 2^471 2383 divides 2^3971 2447 divides 2^12231 2687 divides 2^791 2767 divides 2^4611 2879 divides 2^14391 2903 divides 2^14511 2999 divides 2^14991 3023 divides 2^15111 3119 divides 2^15591 3167 divides 2^15831 3343 divides 2^5571 3391 divides 2^1131 3449 divides 2^4311 3463 divides 2^5771 3607 divides 2^6011 3623 divides 2^18111 3863 divides 2^19311 4007 divides 2^20031 4079 divides 2^20391 4127 divides 2^20631 4513 divides 2^471 4567 divides 2^7611 4679 divides 2^23391 4703 divides 2^23511 4799 divides 2^23991 4871 divides 2^4871 4919 divides 2^24591 5087 divides 2^25431 5399 divides 2^26991 5471 divides 2^5471 5639 divides 2^28191 5711 divides 2^5711 5737 divides 2^2391 5807 divides 2^29031 5879 divides 2^29391 5927 divides 2^29631 6047 divides 2^30231 6079 divides 2^10131 6089 divides 2^7611 6353 divides 2^3971 6361 divides 2^531 6599 divides 2^32991 6719 divides 2^33591 6857 divides 2^8571 6983 divides 2^34911 7079 divides 2^35391 7247 divides 2^36231 7487 divides 2^1971 7529 divides 2^9411 7559 divides 2^37791 7607 divides 2^38031 7703 divides 2^38511 7727 divides 2^38631 7823 divides 2^39111 7927 divides 2^13211 8039 divides 2^40191 8167 divides 2^13611 8191 divides 2^131 8287 divides 2^13811 8423 divides 2^42111 8543 divides 2^42711 8719 divides 2^14531 8783 divides 2^43911 8831 divides 2^8831 9511 divides 2^3171 9623 divides 2^2831 9719 divides 2^431 9743 divides 2^48711 9839 divides 2^49191 9887 divides 2^49431 gp > Last fiddled with by LaurV on 20140410 at 03:03 
20140410, 03:32  #9 
Apr 2014
5·17 Posts 
Well, you're certainly right that it will take more time than I thought. I was mislead by how fast I was verifying the factors I found with ECM/trial factor, because p1 is naturally easier to factor. Still, I think with an Atkin sieve modified to only find +/1 mod 8 primes that runs on the GPU, and using msieve to factor p1, then Bach Shallit to calculate the order, then some primality test that's fast for 32 bit numbers on the order, we reach feasible for atleast 2^64 through 2^65 if it's well distributed. The key is that we don't have a CPU bottleneck, everything happens on the GPU and all of the difficult parts are highly parallelizable. Throw enough high compute power cards at it and we should get it done in a reasonable amount of time. But I'll have to throw it together and test on my 560 Ti.

20140410, 03:50  #10  
Undefined
"The unspeakable one"
Jun 2006
My evil lair
16BF_{16} Posts 
Quote:


20140410, 06:10  #11  
Jun 2003
4,723 Posts 
Quote:


Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
GPU Trial Factoring FAQ  garo  GPU Computing  100  20190422 10:58 
Trial Factoring on AMD/ATI GPU's?  Stargate38  GPU Computing  9  20180831 07:58 
over trial factoring  JFB  Software  23  20040822 05:37 
How to only do Trial Factoring?  michael  Software  23  20040106 08:54 
About trial factoring  gbvalor  Math  4  20030522 02:04 