mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   Trial Factoring Elimination up to 2^64 Project (https://www.mersenneforum.org/showthread.php?t=19258)

 tapion64 2014-04-09 18:45

Trial Factoring Elimination up to 2^64 Project

(For those curious about the math behind this, the multiplicative order of n mod m is defined as the smallest k such that n^k = 1 mod m. As Mersenne numbers are of the form 2^p-1, if a given prime q divides that Mersenne number, then its multiplicative order of 2 is equal to p, since 2^p mod q = 1, and 1-1 = 0. After that point, q will only divide numbers of the form 2^(k*p)-1, which are eliminated from Mersenne numbers as the exponent is not prime. This is the reason that 2^n-1 must have n be prime for the number to be prime, as the set of all multiplicative orders of 2 mod p contains the set of all primes. If this were not the case, then there would exist some number 2^p-1 which had no factors greater than 1, which we know is impossible.)

The purpose of the project is to eventually get to 2^64 (or beyond) in contiguous primes, calculating their multiplicative order, and then eliminating them from trial factoring range. Calculating this for an individual prime is trivial even in the 2^80 range and above using the Bach Shallit algorithm, so processing-wise this isn't very intensive, it's just a huge number of test cases (around 416 quadrillion primes).

The original intent was to calculate and store this information for the primes, but this would easily take up several exabytes of information, so instead my server is going to track information about the range covered so far. This introduces new architectural problems I'm attempting to solve at the moment.

Inevitably this will run across factors for exponents that are far outside the range that GIMPS currently stores, which I believe is for exponents less than or equal to 2^32. For the purpose of this project, it's possible to ignore those as 'outside of scope'. Thus if the calculated multiplicative order is greater than 2^32, we can still add it to the range of eliminated trial factors. If the multiplicative order is less than 2^32 and prime, we can immediately report it to GIMPS as a factor, again eliminating it from the trial factoring range. The final and most common outcome would be that it is less than 2^32 and not prime, so we can eliminate it (permanently) from trial factoring range. However, since we aren't storing any of this information to a database, it'll simply have to be recalculated next time the exponent range on GIMPS increases.

The main challenge of distributing this work is actually having the primes on hand to test. But if we assume that all non-primes are eliminated from the first step (testing for multiplicative order), and the average CPU can run the whole algorithm on 1 core for 2 million of these primes in around 1 second (I'll have to do benchmarking on this, it could be slower), total calculation time ends up being around 6592 years. If the average contributor has atleast a quad core, 1649 years or so. If the entire community of GIMPS that was working on trial factoring at this moment switched over to doing these calculations, around 2184, that would be 9 months of testing, and we'd never have to trial factor below 2^64 again (until GIMPS increases their exponent range), as we'll have eliminated every single Mersenne prime candidate with exponent less than 2^32 that has a factor less than 2^64.

Obviously, this type of approach is also well suited for CUDA divide and conquer computing, but I have no idea how widespread CUDA GPUs are, and their performance can vary quite a bit. But ideally this would be a project that would support both types of contributors. I'd like any feedback/criticism/help the community can offer. I'll be doing some benchmarking to come back with hard estimates for the time this would actually take.

EDIT: (I'm also aware that it's unreasonable to expect that many people to be running the program 24/7 for that long, and also that many of them may only have dual or even single cores working on it, so I will factor that in.)

 kracker 2014-04-09 19:32

Not sure where you are going with this, :smile: but programs exist for TF'ing on the GPU and people run them. See the project "GPU to 72" also. Also, no one runs TF on the CPU because the GPU can do it much, [I]much [/I]faster.
[url]http://mersenneforum.org/forumdisplay.php?f=92[/url]

OpenCL is faster at this moment compared to CUDA in this generation of GPU's, but that always changes :razz: Maxwell I believe has really good potential.

 tapion64 2014-04-09 19:42

[QUOTE=kracker;370679]Not sure where you are going with this, :smile: but programs exist for TF'ing on the GPU and people run them. See the project "GPU to 72" also. Also, no one runs TF on the CPU because the GPU can do it much, [I]much [/I]faster.
[URL]http://mersenneforum.org/forumdisplay.php?f=92[/URL]

OpenCL is faster at this moment compared to CUDA in this generation of GPU's, but that always changes :razz: Maxwell I believe has really good potential.[/QUOTE]

The approach with trial factoring is to pick a specific exponent and test if any number in a range like 2^65 to 2^66 divides 2^exponent - 1. The difference with this project is that it picks a specific prime and discovers which Mersenne number, if one exists, it will divide. It's a faster computation, but there are magnitudes more test cases to go through to cover the 2^64 range.

 TheMawn 2014-04-09 19:59

I only vaguely caught the math behind this idea. If I understand correctly, it testing if 2KP+1 divides 2[SUP]P[/SUP]-1, but tests every P for one K instead of every K for one P, The latter being our Trial Factoring method?

I can only assume I'm wrong, because this has actually been done. There is at least one user who is taking this approach currently, and has been finding factors < 2[SUP]~45[/SUP] (last I checked) all over the map.

Also, in the interest of finding all factors less than 64 bits, this sounds pretty neat if it does work. However, you'll have a hard time getting a lot of traction because trial factoring to 64 bits has been completed up to 1000M except for very low exponents whose Prime / Composite status has long been proven.

Your efforts will not clear any exponents in the list. It will only find factors for exponents we already know to be composite. I personally would rather direct efforts at exponents with unknown status, but I can't speak for everyone.

 tapion64 2014-04-09 20:04

[QUOTE=TheMawn;370684]I only vaguely caught the math behind this idea. If I understand correctly, it testing if 2KP+1 divides 2[SUP]P[/SUP]-1, but tests every P for one K instead of every K for one P, The latter being our Trial Factoring method?

I can only assume I'm wrong, because this has actually been done. There is at least one user who is taking this approach currently, and has been finding factors < 2[SUP]~45[/SUP] (last I checked) all over the map.

Also, in the interest of finding all factors less than 64 bits, this sounds pretty neat if it does work. However, you'll have a hard time getting a lot of traction because trial factoring to 64 bits has been completed up to 1000M except for very low exponents whose Prime / Composite status has long been proven.

Your efforts will not clear any exponents in the list. It will only find factors for exponents we already know to be composite. I personally would rather direct efforts at exponents with unknown status, but I can't speak for everyone.[/QUOTE]
I did not know they'd all been factored to 1000M that far up, I figured we were going sequentially. If that's the case, then we can consider the 2^64 part of this project already complete. Man, I feel accomplished. Of course, the idea behind the project will still work for 2^65 through 2^80. There's just more and more primes to test. Computationally it's an insignificant difference for numbers this 'small'.

EDIT: In that case, I think this should definitely be a GPU only project. CPU will be too slow on the sieving of primes to make a difference. I need to learn how to do the sieving using only the GPU and convert my Go code for calculating the order into CUDA. That part could take some time, as I'm not familiar with CUDA in general. This could take far less time than I thought it might if the conversion from Go to CUDA is seemless.

 TheMawn 2014-04-09 21:16

Is this similar to finding factors by k instead of by P? Or is there some other improvement?

 tapion64 2014-04-09 21:38

[QUOTE=TheMawn;370694]Is this similar to finding factors by k instead of by P? Or is there some other improvement?[/QUOTE]
We're taking P and saying it's a factor of 2^k-1, which is basically calculating the multiplicative order of 2 mod P = k.

In order to make this feasible, we'll limit k to be less than or equal to the exponents that GIMPS currently tracks, which are those less than 2^32 (or in other words, Mersenne numbers less than 2^(2^32)-1). So after we've calculated k, if it's greater than 2^32, throw the number away and move on to the next one. If not, check the primality of k (this should be simple, it's a 32 bit number). If it's composite, we don't care because 2^k-1 isn't a Mersenne number and can't possibly be prime, it's already eliminated. If it's composite, go check again GIMPS to see if P is in the system for 2^k-1 already. If it isn't, great success. Either way, move on to the next prime and keep checking. If GPU sieving is this fast, and we can write efficient CUDA code to calculate multiplicative order, one person alone with a 560 Ti + could wipe out those ranges in a week's time, I think. Possibly faster.

I wish I could code CUDA as fast as I pseudo-coded this algorithm, we'd already be done. Luckily the GPU sieving library in mfaktc should come in handy, I just have to figure out how to use it and use CUDA in general. I keep expecting to find some terrible flaw in my logic, but if I don't, it means we're about to find a lot of factors.

 LaurV 2014-04-10 02:18

What you want to do is called "searching by k" (as opposite to "searching by p", what p95 and mfaktX do - the "k" and "p" refers to a factor q of 2^p-1 being of the form q=2kp+1) and it was discussed at least 20 times on the forum. I personally put a thousand pari code snippets, and versions of that code was used by others (James) to extend the factor data base (see[URL="http://www.mersenne.ca/graphs/factor_bits/factor_bits_20140409.png"] his graphs[/URL]) for small factors (below 2^45 or so). This is very fast for small factors, but it is getting exponentially slower as q gets higher, because of the "garbage" it produces, mostly.

This method, with all possible and impossible optimizations together (like testing only primes which are 1 or 7 mod 8, being the most obvious optimization, but other less obvious can be done too) [U]is not feasible[/U] beyond 2^48 (or max. 2^52, if you have a supercomputer), due to the large number of primes you need to test, which won't produce a prime exponent (the most of the primes produce composite exponents). I have a data base of factors to about 2^56, which can be "easily" reproduced letting a single core of pari to crunch for half year or 8 months. James has a larger database. When you succeed going over this limit, tell us :razz:

For records, PrimeNet has such a database to 2^64 at least (all exponents over some limit were TF-ed to 2^65 or so, by other - faster - methods).

There are about 2.2*10^17 primes between 2^56 and 2^64 (superficial calculus, forgive me if it is not accurate, but the magnitude is right) from which a half are 1 or 7 mod 8. Even if you have ten thousand computers and succeed to test a hundred thousands of primes in each computer per second, you will still need about 3 to 5 years to reach 2^64. So, do you have 10 thousand computers?

Edit: I had a look for old code but can't find it on the forum, anyhow, here is a sample of what you want to do, imagine the "getp" function implements the algorithm you talk about.

[code]
gp > \r tf
gp > forprime(q=7,1000, print(q" : "getp(q,10^9)))
[COLOR=Red]7 : 3[/COLOR]
11 : 10
13 : 12
17 : 8
19 : 18
[COLOR=Red]23 : 11[/COLOR]
29 : 28
[COLOR=Red]31 : 5[/COLOR]
37 : 36
41 : 20
43 : 14
[COLOR=Red]47 : 23[/COLOR]
53 : 52
59 : 58
61 : 60
67 : 66
71 : 35
73 : 9
79 : 39
83 : 82
[COLOR=Red]89 : 11[/COLOR]
97 : 48
101 : 100
103 : 51
107 : 106
109 : 36
113 : 28
[COLOR=Red]127 : 7[/COLOR]
131 : 130
137 : 68
139 : 138
149 : 148
151 : 15
157 : 52
163 : 162
[COLOR=Red]167 : 83[/COLOR]
173 : 172
179 : 178
181 : 180
191 : 95
193 : 96
197 : 196
199 : 99
211 : 210
[COLOR=Red]223 : 37[/COLOR]
227 : 226
229 : 76
[COLOR=Red]233 : 29[/COLOR]
239 : 119
241 : 24
251 : 50
257 : 16
[COLOR=Red]263 : 131[/COLOR]
269 : 268
271 : 135
277 : 92
281 : 70
283 : 94
293 : 292
307 : 102
311 : 155
313 : 156
317 : 316
331 : 30
337 : 21
347 : 346
349 : 348
353 : 88
[COLOR=Red]359 : 179[/COLOR]
367 : 183
373 : 372
379 : 378
[COLOR=Red]383 : 191[/COLOR]
389 : 388
397 : 44
401 : 200
409 : 204
419 : 418
421 : 420
[COLOR=Red]431 : 43[/COLOR]
433 : 72
439 : 73
443 : 442
449 : 224
457 : 76
461 : 460
463 : 231
467 : 466
[COLOR=Red]479 : 239[/COLOR]
487 : 243
491 : 490
499 : 166
[COLOR=Red]503 : 251[/COLOR]
509 : 508
521 : 260
523 : 522
541 : 540
547 : 546
557 : 556
563 : 562
569 : 284
571 : 114
577 : 144
587 : 586
593 : 148
599 : 299
601 : 25
607 : 303
613 : 612
617 : 154
619 : 618
631 : 45
641 : 64
643 : 214
647 : 323
653 : 652
659 : 658
661 : 660
673 : 48
677 : 676
683 : 22
691 : 230
701 : 700
709 : 708
[COLOR=Red]719 : 359[/COLOR]
727 : 121
733 : 244
739 : 246
743 : 371
751 : 375
757 : 756
761 : 380
769 : 384
773 : 772
787 : 786
797 : 796
809 : 404
811 : 270
821 : 820
823 : 411
827 : 826
829 : 828
[COLOR=Red]839 : 419[/COLOR]
853 : 852
857 : 428
859 : 858
[COLOR=Red]863 : 431[/COLOR]
877 : 876
881 : 55
883 : 882
[COLOR=Red]887 : 443[/COLOR]
907 : 906
911 : 91
919 : 153
929 : 464
937 : 117
941 : 940
947 : 946
953 : 68
967 : 483
971 : 194
977 : 488
[COLOR=Red]983 : 491[/COLOR]
991 : 495
997 : 332
gp >
[/code]where each line is "q : p" and it can be read as "q divides 2^p-1". I marked red the one which interested us. As you go higher, they get exponentially rarefied. Even with optimizations, as I said, the simpler is to take only primes q which are 1 or 7 (mod 8), you will need not "one week", but years to finish.

More clear:
[code]
gp > forprime(q=7,10000, if(q%8==1||q%8==7,p=getp(q,10^9); if(isprime(p), print(q" divides 2^"p"-1"))))
7 divides 2^3-1
23 divides 2^11-1
31 divides 2^5-1
47 divides 2^23-1
89 divides 2^11-1
127 divides 2^7-1
167 divides 2^83-1
223 divides 2^37-1
233 divides 2^29-1
263 divides 2^131-1
359 divides 2^179-1
383 divides 2^191-1
431 divides 2^43-1
439 divides 2^73-1
479 divides 2^239-1
503 divides 2^251-1
719 divides 2^359-1
839 divides 2^419-1
863 divides 2^431-1
887 divides 2^443-1
983 divides 2^491-1
1103 divides 2^29-1
1319 divides 2^659-1
1367 divides 2^683-1
1399 divides 2^233-1
1433 divides 2^179-1
1439 divides 2^719-1
1487 divides 2^743-1
1823 divides 2^911-1
1913 divides 2^239-1
2039 divides 2^1019-1
2063 divides 2^1031-1
2089 divides 2^29-1
2207 divides 2^1103-1
2351 divides 2^47-1
2383 divides 2^397-1
2447 divides 2^1223-1
2687 divides 2^79-1
2767 divides 2^461-1
2879 divides 2^1439-1
2903 divides 2^1451-1
2999 divides 2^1499-1
3023 divides 2^1511-1
3119 divides 2^1559-1
3167 divides 2^1583-1
3343 divides 2^557-1
3391 divides 2^113-1
3449 divides 2^431-1
3463 divides 2^577-1
3607 divides 2^601-1
3623 divides 2^1811-1
3863 divides 2^1931-1
4007 divides 2^2003-1
4079 divides 2^2039-1
4127 divides 2^2063-1
4513 divides 2^47-1
4567 divides 2^761-1
4679 divides 2^2339-1
4703 divides 2^2351-1
4799 divides 2^2399-1
4871 divides 2^487-1
4919 divides 2^2459-1
5087 divides 2^2543-1
5399 divides 2^2699-1
5471 divides 2^547-1
5639 divides 2^2819-1
5711 divides 2^571-1
5737 divides 2^239-1
5807 divides 2^2903-1
5879 divides 2^2939-1
5927 divides 2^2963-1
6047 divides 2^3023-1
6079 divides 2^1013-1
6089 divides 2^761-1
6353 divides 2^397-1
6361 divides 2^53-1
6599 divides 2^3299-1
6719 divides 2^3359-1
6857 divides 2^857-1
6983 divides 2^3491-1
7079 divides 2^3539-1
7247 divides 2^3623-1
7487 divides 2^197-1
7529 divides 2^941-1
7559 divides 2^3779-1
7607 divides 2^3803-1
7703 divides 2^3851-1
7727 divides 2^3863-1
7823 divides 2^3911-1
7927 divides 2^1321-1
8039 divides 2^4019-1
8167 divides 2^1361-1
8191 divides 2^13-1
8287 divides 2^1381-1
8423 divides 2^4211-1
8543 divides 2^4271-1
8719 divides 2^1453-1
8783 divides 2^4391-1
8831 divides 2^883-1
9511 divides 2^317-1
9623 divides 2^283-1
9719 divides 2^43-1
9743 divides 2^4871-1
9839 divides 2^4919-1
9887 divides 2^4943-1
gp >
[/code]This is "quite straight" and produces "lots of factors" as long as q stays below 2^33, because - guess what - p is most of the time of the magnitude of q/2 (i.e. under 2^32, in our range of interest) but after that, the "productivity" is gone. Puff..

 tapion64 2014-04-10 03:32

Well, you're certainly right that it will take more time than I thought. I was mislead by how fast I was verifying the factors I found with ECM/trial factor, because p-1 is naturally easier to factor. Still, I think with an Atkin sieve modified to only find +/-1 mod 8 primes that runs on the GPU, and using msieve to factor p-1, then Bach Shallit to calculate the order, then some primality test that's fast for 32 bit numbers on the order, we reach feasible for atleast 2^64 through 2^65 if it's well distributed. The key is that we don't have a CPU bottleneck, everything happens on the GPU and all of the difficult parts are highly parallelizable. Throw enough high compute power cards at it and we should get it done in a reasonable amount of time. But I'll have to throw it together and test on my 560 Ti.

 retina 2014-04-10 03:50

[QUOTE=tapion64;370731]... then some primality test that's fast for 32 bit numbers ...[/QUOTE]Maybe [url]http://miller-rabin.appspot.com/[/url]

 axn 2014-04-10 06:10

[QUOTE=retina;370732]Maybe [url]http://miller-rabin.appspot.com/[/url][/QUOTE]

A 256MB bitmap is sufficient to represent all odd numbers under 2^32. No that it matters, since this step is essentially noise. Almost no time will be spent on this step.

All times are UTC. The time now is 06:24.