mersenneforum.org Trial Factoring Elimination up to 2^64 Project
 Register FAQ Search Today's Posts Mark Forums Read

 2014-04-09, 18:45 #1 tapion64   Apr 2014 1258 Posts Trial Factoring Elimination up to 2^64 Project (For those curious about the math behind this, the multiplicative order of n mod m is defined as the smallest k such that n^k = 1 mod m. As Mersenne numbers are of the form 2^p-1, if a given prime q divides that Mersenne number, then its multiplicative order of 2 is equal to p, since 2^p mod q = 1, and 1-1 = 0. After that point, q will only divide numbers of the form 2^(k*p)-1, which are eliminated from Mersenne numbers as the exponent is not prime. This is the reason that 2^n-1 must have n be prime for the number to be prime, as the set of all multiplicative orders of 2 mod p contains the set of all primes. If this were not the case, then there would exist some number 2^p-1 which had no factors greater than 1, which we know is impossible.) The purpose of the project is to eventually get to 2^64 (or beyond) in contiguous primes, calculating their multiplicative order, and then eliminating them from trial factoring range. Calculating this for an individual prime is trivial even in the 2^80 range and above using the Bach Shallit algorithm, so processing-wise this isn't very intensive, it's just a huge number of test cases (around 416 quadrillion primes). The original intent was to calculate and store this information for the primes, but this would easily take up several exabytes of information, so instead my server is going to track information about the range covered so far. This introduces new architectural problems I'm attempting to solve at the moment. Inevitably this will run across factors for exponents that are far outside the range that GIMPS currently stores, which I believe is for exponents less than or equal to 2^32. For the purpose of this project, it's possible to ignore those as 'outside of scope'. Thus if the calculated multiplicative order is greater than 2^32, we can still add it to the range of eliminated trial factors. If the multiplicative order is less than 2^32 and prime, we can immediately report it to GIMPS as a factor, again eliminating it from the trial factoring range. The final and most common outcome would be that it is less than 2^32 and not prime, so we can eliminate it (permanently) from trial factoring range. However, since we aren't storing any of this information to a database, it'll simply have to be recalculated next time the exponent range on GIMPS increases. The main challenge of distributing this work is actually having the primes on hand to test. But if we assume that all non-primes are eliminated from the first step (testing for multiplicative order), and the average CPU can run the whole algorithm on 1 core for 2 million of these primes in around 1 second (I'll have to do benchmarking on this, it could be slower), total calculation time ends up being around 6592 years. If the average contributor has atleast a quad core, 1649 years or so. If the entire community of GIMPS that was working on trial factoring at this moment switched over to doing these calculations, around 2184, that would be 9 months of testing, and we'd never have to trial factor below 2^64 again (until GIMPS increases their exponent range), as we'll have eliminated every single Mersenne prime candidate with exponent less than 2^32 that has a factor less than 2^64. Obviously, this type of approach is also well suited for CUDA divide and conquer computing, but I have no idea how widespread CUDA GPUs are, and their performance can vary quite a bit. But ideally this would be a project that would support both types of contributors. I'd like any feedback/criticism/help the community can offer. I'll be doing some benchmarking to come back with hard estimates for the time this would actually take. EDIT: (I'm also aware that it's unreasonable to expect that many people to be running the program 24/7 for that long, and also that many of them may only have dual or even single cores working on it, so I will factor that in.) Last fiddled with by tapion64 on 2014-04-09 at 18:53
 2014-04-09, 19:32 #2 kracker ἀβουλία     "Mr. Meeseeks" Jan 2012 California, USA 23·271 Posts Not sure where you are going with this, but programs exist for TF'ing on the GPU and people run them. See the project "GPU to 72" also. Also, no one runs TF on the CPU because the GPU can do it much, much faster. http://mersenneforum.org/forumdisplay.php?f=92 OpenCL is faster at this moment compared to CUDA in this generation of GPU's, but that always changes Maxwell I believe has really good potential. Last fiddled with by kracker on 2014-04-09 at 19:35
2014-04-09, 19:42   #3
tapion64

Apr 2014

8510 Posts

Quote:
 Originally Posted by kracker Not sure where you are going with this, but programs exist for TF'ing on the GPU and people run them. See the project "GPU to 72" also. Also, no one runs TF on the CPU because the GPU can do it much, much faster. http://mersenneforum.org/forumdisplay.php?f=92 OpenCL is faster at this moment compared to CUDA in this generation of GPU's, but that always changes Maxwell I believe has really good potential.
The approach with trial factoring is to pick a specific exponent and test if any number in a range like 2^65 to 2^66 divides 2^exponent - 1. The difference with this project is that it picks a specific prime and discovers which Mersenne number, if one exists, it will divide. It's a faster computation, but there are magnitudes more test cases to go through to cover the 2^64 range.

 2014-04-09, 19:59 #4 TheMawn     May 2013 East. Always East. 11×157 Posts I only vaguely caught the math behind this idea. If I understand correctly, it testing if 2KP+1 divides 2P-1, but tests every P for one K instead of every K for one P, The latter being our Trial Factoring method? I can only assume I'm wrong, because this has actually been done. There is at least one user who is taking this approach currently, and has been finding factors < 2~45 (last I checked) all over the map. Also, in the interest of finding all factors less than 64 bits, this sounds pretty neat if it does work. However, you'll have a hard time getting a lot of traction because trial factoring to 64 bits has been completed up to 1000M except for very low exponents whose Prime / Composite status has long been proven. Your efforts will not clear any exponents in the list. It will only find factors for exponents we already know to be composite. I personally would rather direct efforts at exponents with unknown status, but I can't speak for everyone.
2014-04-09, 20:04   #5
tapion64

Apr 2014

10101012 Posts

Quote:
 Originally Posted by TheMawn I only vaguely caught the math behind this idea. If I understand correctly, it testing if 2KP+1 divides 2P-1, but tests every P for one K instead of every K for one P, The latter being our Trial Factoring method? I can only assume I'm wrong, because this has actually been done. There is at least one user who is taking this approach currently, and has been finding factors < 2~45 (last I checked) all over the map. Also, in the interest of finding all factors less than 64 bits, this sounds pretty neat if it does work. However, you'll have a hard time getting a lot of traction because trial factoring to 64 bits has been completed up to 1000M except for very low exponents whose Prime / Composite status has long been proven. Your efforts will not clear any exponents in the list. It will only find factors for exponents we already know to be composite. I personally would rather direct efforts at exponents with unknown status, but I can't speak for everyone.
I did not know they'd all been factored to 1000M that far up, I figured we were going sequentially. If that's the case, then we can consider the 2^64 part of this project already complete. Man, I feel accomplished. Of course, the idea behind the project will still work for 2^65 through 2^80. There's just more and more primes to test. Computationally it's an insignificant difference for numbers this 'small'.

EDIT: In that case, I think this should definitely be a GPU only project. CPU will be too slow on the sieving of primes to make a difference. I need to learn how to do the sieving using only the GPU and convert my Go code for calculating the order into CUDA. That part could take some time, as I'm not familiar with CUDA in general. This could take far less time than I thought it might if the conversion from Go to CUDA is seemless.

Last fiddled with by tapion64 on 2014-04-09 at 20:25

 2014-04-09, 21:16 #6 TheMawn     May 2013 East. Always East. 11×157 Posts Is this similar to finding factors by k instead of by P? Or is there some other improvement?
2014-04-09, 21:38   #7
tapion64

Apr 2014

5×17 Posts

Quote:
 Originally Posted by TheMawn Is this similar to finding factors by k instead of by P? Or is there some other improvement?
We're taking P and saying it's a factor of 2^k-1, which is basically calculating the multiplicative order of 2 mod P = k.

In order to make this feasible, we'll limit k to be less than or equal to the exponents that GIMPS currently tracks, which are those less than 2^32 (or in other words, Mersenne numbers less than 2^(2^32)-1). So after we've calculated k, if it's greater than 2^32, throw the number away and move on to the next one. If not, check the primality of k (this should be simple, it's a 32 bit number). If it's composite, we don't care because 2^k-1 isn't a Mersenne number and can't possibly be prime, it's already eliminated. If it's composite, go check again GIMPS to see if P is in the system for 2^k-1 already. If it isn't, great success. Either way, move on to the next prime and keep checking. If GPU sieving is this fast, and we can write efficient CUDA code to calculate multiplicative order, one person alone with a 560 Ti + could wipe out those ranges in a week's time, I think. Possibly faster.

I wish I could code CUDA as fast as I pseudo-coded this algorithm, we'd already be done. Luckily the GPU sieving library in mfaktc should come in handy, I just have to figure out how to use it and use CUDA in general. I keep expecting to find some terrible flaw in my logic, but if I don't, it means we're about to find a lot of factors.

Last fiddled with by tapion64 on 2014-04-09 at 21:43

 2014-04-10, 02:18 #8 LaurV Romulan Interpreter     Jun 2011 Thailand 22A416 Posts What you want to do is called "searching by k" (as opposite to "searching by p", what p95 and mfaktX do - the "k" and "p" refers to a factor q of 2^p-1 being of the form q=2kp+1) and it was discussed at least 20 times on the forum. I personally put a thousand pari code snippets, and versions of that code was used by others (James) to extend the factor data base (see his graphs) for small factors (below 2^45 or so). This is very fast for small factors, but it is getting exponentially slower as q gets higher, because of the "garbage" it produces, mostly. This method, with all possible and impossible optimizations together (like testing only primes which are 1 or 7 mod 8, being the most obvious optimization, but other less obvious can be done too) is not feasible beyond 2^48 (or max. 2^52, if you have a supercomputer), due to the large number of primes you need to test, which won't produce a prime exponent (the most of the primes produce composite exponents). I have a data base of factors to about 2^56, which can be "easily" reproduced letting a single core of pari to crunch for half year or 8 months. James has a larger database. When you succeed going over this limit, tell us For records, PrimeNet has such a database to 2^64 at least (all exponents over some limit were TF-ed to 2^65 or so, by other - faster - methods). There are about 2.2*10^17 primes between 2^56 and 2^64 (superficial calculus, forgive me if it is not accurate, but the magnitude is right) from which a half are 1 or 7 mod 8. Even if you have ten thousand computers and succeed to test a hundred thousands of primes in each computer per second, you will still need about 3 to 5 years to reach 2^64. So, do you have 10 thousand computers? Edit: I had a look for old code but can't find it on the forum, anyhow, here is a sample of what you want to do, imagine the "getp" function implements the algorithm you talk about. Code: gp > \r tf gp > forprime(q=7,1000, print(q" : "getp(q,10^9))) 7 : 3 11 : 10 13 : 12 17 : 8 19 : 18 23 : 11 29 : 28 31 : 5 37 : 36 41 : 20 43 : 14 47 : 23 53 : 52 59 : 58 61 : 60 67 : 66 71 : 35 73 : 9 79 : 39 83 : 82 89 : 11 97 : 48 101 : 100 103 : 51 107 : 106 109 : 36 113 : 28 127 : 7 131 : 130 137 : 68 139 : 138 149 : 148 151 : 15 157 : 52 163 : 162 167 : 83 173 : 172 179 : 178 181 : 180 191 : 95 193 : 96 197 : 196 199 : 99 211 : 210 223 : 37 227 : 226 229 : 76 233 : 29 239 : 119 241 : 24 251 : 50 257 : 16 263 : 131 269 : 268 271 : 135 277 : 92 281 : 70 283 : 94 293 : 292 307 : 102 311 : 155 313 : 156 317 : 316 331 : 30 337 : 21 347 : 346 349 : 348 353 : 88 359 : 179 367 : 183 373 : 372 379 : 378 383 : 191 389 : 388 397 : 44 401 : 200 409 : 204 419 : 418 421 : 420 431 : 43 433 : 72 439 : 73 443 : 442 449 : 224 457 : 76 461 : 460 463 : 231 467 : 466 479 : 239 487 : 243 491 : 490 499 : 166 503 : 251 509 : 508 521 : 260 523 : 522 541 : 540 547 : 546 557 : 556 563 : 562 569 : 284 571 : 114 577 : 144 587 : 586 593 : 148 599 : 299 601 : 25 607 : 303 613 : 612 617 : 154 619 : 618 631 : 45 641 : 64 643 : 214 647 : 323 653 : 652 659 : 658 661 : 660 673 : 48 677 : 676 683 : 22 691 : 230 701 : 700 709 : 708 719 : 359 727 : 121 733 : 244 739 : 246 743 : 371 751 : 375 757 : 756 761 : 380 769 : 384 773 : 772 787 : 786 797 : 796 809 : 404 811 : 270 821 : 820 823 : 411 827 : 826 829 : 828 839 : 419 853 : 852 857 : 428 859 : 858 863 : 431 877 : 876 881 : 55 883 : 882 887 : 443 907 : 906 911 : 91 919 : 153 929 : 464 937 : 117 941 : 940 947 : 946 953 : 68 967 : 483 971 : 194 977 : 488 983 : 491 991 : 495 997 : 332 gp > where each line is "q : p" and it can be read as "q divides 2^p-1". I marked red the one which interested us. As you go higher, they get exponentially rarefied. Even with optimizations, as I said, the simpler is to take only primes q which are 1 or 7 (mod 8), you will need not "one week", but years to finish. More clear: Code: gp > forprime(q=7,10000, if(q%8==1||q%8==7,p=getp(q,10^9); if(isprime(p), print(q" divides 2^"p"-1")))) 7 divides 2^3-1 23 divides 2^11-1 31 divides 2^5-1 47 divides 2^23-1 89 divides 2^11-1 127 divides 2^7-1 167 divides 2^83-1 223 divides 2^37-1 233 divides 2^29-1 263 divides 2^131-1 359 divides 2^179-1 383 divides 2^191-1 431 divides 2^43-1 439 divides 2^73-1 479 divides 2^239-1 503 divides 2^251-1 719 divides 2^359-1 839 divides 2^419-1 863 divides 2^431-1 887 divides 2^443-1 983 divides 2^491-1 1103 divides 2^29-1 1319 divides 2^659-1 1367 divides 2^683-1 1399 divides 2^233-1 1433 divides 2^179-1 1439 divides 2^719-1 1487 divides 2^743-1 1823 divides 2^911-1 1913 divides 2^239-1 2039 divides 2^1019-1 2063 divides 2^1031-1 2089 divides 2^29-1 2207 divides 2^1103-1 2351 divides 2^47-1 2383 divides 2^397-1 2447 divides 2^1223-1 2687 divides 2^79-1 2767 divides 2^461-1 2879 divides 2^1439-1 2903 divides 2^1451-1 2999 divides 2^1499-1 3023 divides 2^1511-1 3119 divides 2^1559-1 3167 divides 2^1583-1 3343 divides 2^557-1 3391 divides 2^113-1 3449 divides 2^431-1 3463 divides 2^577-1 3607 divides 2^601-1 3623 divides 2^1811-1 3863 divides 2^1931-1 4007 divides 2^2003-1 4079 divides 2^2039-1 4127 divides 2^2063-1 4513 divides 2^47-1 4567 divides 2^761-1 4679 divides 2^2339-1 4703 divides 2^2351-1 4799 divides 2^2399-1 4871 divides 2^487-1 4919 divides 2^2459-1 5087 divides 2^2543-1 5399 divides 2^2699-1 5471 divides 2^547-1 5639 divides 2^2819-1 5711 divides 2^571-1 5737 divides 2^239-1 5807 divides 2^2903-1 5879 divides 2^2939-1 5927 divides 2^2963-1 6047 divides 2^3023-1 6079 divides 2^1013-1 6089 divides 2^761-1 6353 divides 2^397-1 6361 divides 2^53-1 6599 divides 2^3299-1 6719 divides 2^3359-1 6857 divides 2^857-1 6983 divides 2^3491-1 7079 divides 2^3539-1 7247 divides 2^3623-1 7487 divides 2^197-1 7529 divides 2^941-1 7559 divides 2^3779-1 7607 divides 2^3803-1 7703 divides 2^3851-1 7727 divides 2^3863-1 7823 divides 2^3911-1 7927 divides 2^1321-1 8039 divides 2^4019-1 8167 divides 2^1361-1 8191 divides 2^13-1 8287 divides 2^1381-1 8423 divides 2^4211-1 8543 divides 2^4271-1 8719 divides 2^1453-1 8783 divides 2^4391-1 8831 divides 2^883-1 9511 divides 2^317-1 9623 divides 2^283-1 9719 divides 2^43-1 9743 divides 2^4871-1 9839 divides 2^4919-1 9887 divides 2^4943-1 gp > This is "quite straight" and produces "lots of factors" as long as q stays below 2^33, because - guess what - p is most of the time of the magnitude of q/2 (i.e. under 2^32, in our range of interest) but after that, the "productivity" is gone. Puff.. Last fiddled with by LaurV on 2014-04-10 at 03:03
 2014-04-10, 03:32 #9 tapion64   Apr 2014 5·17 Posts Well, you're certainly right that it will take more time than I thought. I was mislead by how fast I was verifying the factors I found with ECM/trial factor, because p-1 is naturally easier to factor. Still, I think with an Atkin sieve modified to only find +/-1 mod 8 primes that runs on the GPU, and using msieve to factor p-1, then Bach Shallit to calculate the order, then some primality test that's fast for 32 bit numbers on the order, we reach feasible for atleast 2^64 through 2^65 if it's well distributed. The key is that we don't have a CPU bottleneck, everything happens on the GPU and all of the difficult parts are highly parallelizable. Throw enough high compute power cards at it and we should get it done in a reasonable amount of time. But I'll have to throw it together and test on my 560 Ti.
2014-04-10, 03:50   #10
retina
Undefined

"The unspeakable one"
Jun 2006
My evil lair

10110101011112 Posts

Quote:
 Originally Posted by tapion64 ... then some primality test that's fast for 32 bit numbers ...
Maybe http://miller-rabin.appspot.com/

2014-04-10, 06:10   #11
axn

Jun 2003

10010011010002 Posts

Quote:
 Originally Posted by retina
A 256MB bitmap is sufficient to represent all odd numbers under 2^32. No that it matters, since this step is essentially noise. Almost no time will be spent on this step.

 Similar Threads Thread Thread Starter Forum Replies Last Post garo GPU Computing 100 2019-04-22 10:58 Stargate38 GPU Computing 9 2018-08-31 07:58 JFB Software 23 2004-08-22 05:37 michael Software 23 2004-01-06 08:54 gbvalor Math 4 2003-05-22 02:04

All times are UTC. The time now is 18:53.

Sat Oct 24 18:53:52 UTC 2020 up 44 days, 16:04, 0 users, load averages: 1.95, 1.96, 1.98