mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2014-04-11, 03:57   #23
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,667 Posts
Default

The one hour limit on editing posts gives time to polish things up, add things, and remove typos. The issue of further editing has been the subject of heated debate for at least one user here.

I think freezing things at some point makes exchanges more like vocal conversation. You can always reconsider and correct earlier statements in later comments, but the process by which you arrive at later positions is retained. I find this to be a valuable feature.
kladner is offline   Reply With Quote
Old 2014-04-11, 04:32   #24
axn
 
axn's Avatar
 
Jun 2003

111638 Posts
Default

Quote:
Originally Posted by tapion64 View Post
msieve has CUDA implementations that factor 30+ digit numbers in no time (the timer literally says 00:00:00).
Really? News to me. Where did you get the idea that msieve has CUDA implementation for factoring 30 digit numbers?
Quote:
Originally Posted by tapion64 View Post
Factoring isn't the bottle neck on this, it's the insanely large number of primes we need to test.
The bottleneck is the factoring / group order calculation for the insanely large number of primes, not the generation of the primes themselves.

Last fiddled with by axn on 2014-04-11 at 04:33
axn is offline   Reply With Quote
Old 2014-04-11, 08:06   #25
tapion64
 
Apr 2014

5·17 Posts
Default

Quote:
Originally Posted by axn View Post
Really? News to me. Where did you get the idea that msieve has CUDA implementation for factoring 30 digit numbers?

The bottleneck is the factoring / group order calculation for the insanely large number of primes, not the generation of the primes themselves.
Maybe it's recent? If you go on SourceForge for msieve, they have a version for CUDA. I tried it out on a few 30 digit numbers and it handled them really quickly.

Yeah, I'm working on getting that down. Unfortunately my field theory isn't quite up to par, I'm reading through a Jager/Lenstra paper and I kind of understand why p = -1 mod 8 yields odd orders of 2 mod p. I'm trying to extend the logic to determine how the Dirichlet character values for the remaining case p = 1 mod 8 affect even/odd in the order, but I'm just not grounded enough in GF theory to do so at the moment. I'll be working on it tomorrow. If I can filter out the p = 1 mod 8 cases that will yield even orders from the modified Atkin sieve, Lenstra says this will remove 17/24 primes asymptotically. That's almost 1/4 instead of my assumed 1/2, which is a good start. Also, if we can actually skip the last step of the Atkin sieve without a problem, that should bring the complexity of the sieve itself down by the same factor and another log(log(N)) term, so we're looking at a potential 1/96 increase in performance which would get us closer to the 88 years of compute time I had originally predicted, which is certainly feasible when distributed out across 100 GPUs or so. Even more so if we consider faster cards than the 560 Ti.

Last fiddled with by tapion64 on 2014-04-11 at 08:07
tapion64 is offline   Reply With Quote
Old 2014-04-11, 08:35   #26
tapion64
 
Apr 2014

5×17 Posts
Default

@ LaurV

Interesting. That appears to be the reverse of the way I normally calculate it, starting from (p+1)/2 and going backwards to 1. Same complexity as starting at 2 and shifting left until greater than p, then subtracting p, and repeat until you get to -1, number of shifts * 2 is the order. I usually do it on the [-(p-1)/2,(p-1)/2] interval, since if your shift in r gets you close to p, r-p is a smaller number in magnitude and the shifts can be calculated in the negative direction until you get back below (p-1)/2.

I.e., for order 2 mod 13, 2 -> 4 -> 8 = -5 -> -10 = 3 -> 6 -> 12 = -1, 6 shifts so order is 12. The actual complexity of this method is less than O(order/2), since you can do a quick calculation to see how many shifts you need to get to the next boundary, which speeds it up for larger numbers by reducing the number of indivual operations. Since we're ignoring cases where order > 2^32, we actually will never exceed O(2^31) even if it's the naive method, so it's really O(sqrt(N)) complexity with a hard cap of 2^31. Still not as good as O(log(n)) but it can be greatly facilitated by combining those trivial shifts into one instruction. But I'd rather not do it this way because it's a sequential algorithm that has no advantage running on the GPU. I'd rather come up with a parallelizable algorithm that will be faster for the cases of primes we're dealing with.

Last fiddled with by tapion64 on 2014-04-11 at 08:50
tapion64 is offline   Reply With Quote
Old 2014-04-11, 08:54   #27
axn
 
axn's Avatar
 
Jun 2003

127316 Posts
Default

Quote:
Originally Posted by tapion64 View Post
Maybe it's recent? If you go on SourceForge for msieve, they have a version for CUDA. I tried it out on a few 30 digit numbers and it handled them really quickly.
There is a version with CUDA. The CUDA support is for GNFS ploy select acceleration, not general factorization. What you're seeing is the CPU code in action, which is quite capable of handling 30-digit factorizations that fast.

Quote:
Originally Posted by tapion64 View Post
Yeah, I'm working on getting that down.
A little bit here and there won't cut it. Here is what you face: there are about 4.1e17 primes b/w 2^64 & 2^65, of which we need to consider half (1 or 7 mod 8). Assuming you can compute the order for a prime in 1 microsecond (million p/s), you will need ~200 Gsecs on 1 core (~6000 Core years).

Straightaway, I am giving you a generous 1 us/prime. Are you anywhere close to that? Even with all potential optimizations, if you can't reach the microsecond territory, then it is hopeless.

Last fiddled with by axn on 2014-04-11 at 08:54
axn is offline   Reply With Quote
Old 2014-04-11, 09:04   #28
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

16BF16 Posts
Default

Quote:
Originally Posted by axn View Post
... then it is hopeless.
No. Just make it distributed. Convince everyone here and there that it is important and useful and that they should give cycles to this. Should be easy peasy.

Or buy a super computer. It doesn't even need to be one in the top 10.
retina is online now   Reply With Quote
Old 2014-04-11, 09:15   #29
tapion64
 
Apr 2014

5·17 Posts
Default

Quote:
Originally Posted by axn View Post
There is a version with CUDA. The CUDA support is for GNFS ploy select acceleration, not general factorization. What you're seeing is the CPU code in action, which is quite capable of handling 30-digit factorizations that fast.



A little bit here and there won't cut it. Here is what you face: there are about 4.1e17 primes b/w 2^64 & 2^65, of which we need to consider half (1 or 7 mod 8). Assuming you can compute the order for a prime in 1 microsecond (million p/s), you will need ~200 Gsecs on 1 core (~6000 Core years).

Straightaway, I am giving you a generous 1 us/prime. Are you anywhere close to that? Even with all potential optimizations, if you can't reach the microsecond territory, then it is hopeless.
As Retina said, if it doesn't drop into 1 year compute time territory, I'd just make it distributed, assign people blocks of 120 numbers to sieve and test. Also, yeah, I see now that the CUDA part doesn't affect quadratic sieve. I'm trying to get away from factoring anyways, so I guess it's a good thing.
tapion64 is offline   Reply With Quote
Old 2014-04-11, 09:27   #30
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

100010101011002 Posts
Default

[edit: crosspost with the last 3-4 posts]

Quote:
Originally Posted by tapion64 View Post
I'd rather come up with a parallelizable algorithm that will be faster for the cases of primes we're dealing with.
You can "parallelize" it how much you want, by handling one billion primes in the same time. It seems that you didn't get the right issue. This method (same as the other method given by me) will do a hundred million steps for one "hundred million exponent". And this for most of the primes which are over 2^33 (almost all of them have orders higher than 2^32 in this case). So, you will end up with (how much it was? 2^17 primes? I already forgot) about 2^32 steps for 2^17 exponents...

These elementary things are interesting to talk about, because they make a newcomer understand what is going on, but their practical value is zero. You will never be able to finish this type of calculus in your life, even with all the computing power in the world available to your fingers. In this case, even the factoring of q-1 (it is where we started from, remember?) would be much faster.

Try taking 10 random primes in the 2^58 range, and do this "add and shift" method for all of them. When you finish, come back here... (now we got rid of you for a week or two, and there were only 10 primes!).

Don't get me wrong, I keep this conversation with you alive only because I walked exactly this path years ago when I started with hunting for prime numbers. I also invested/wasted a lot of time in "genuine" paths like this, that went nowhere... This is how you learn.

Please see this topic fore a more interesting (yet didactic only) discussion about factoring.

Last fiddled with by LaurV on 2014-04-11 at 09:29
LaurV is offline   Reply With Quote
Old 2014-04-11, 13:43   #31
tapion64
 
Apr 2014

10101012 Posts
Default

Quote:
Originally Posted by LaurV View Post
[edit: crosspost with the last 3-4 posts]
You can "parallelize" it how much you want, by handling one billion primes in the same time. It seems that you didn't get the right issue. This method (same as the other method given by me) will do a hundred million steps for one "hundred million exponent". And this for most of the primes which are over 2^33 (almost all of them have orders higher than 2^32 in this case). So, you will end up with (how much it was? 2^17 primes? I already forgot) about 2^32 steps for 2^17 exponents...

Try taking 10 random primes in the 2^58 range, and do this "add and shift" method for all of them. When you finish, come back here... (now we got rid of you for a week or two, and there were only 10 primes!).
There are around 416 quadrillion total primes less than 2^65 by x/ln(x) estimation. We can eliminate half of those by screening for +/-1 mod 8, and 17/24 of them by screening for primes = 1 mod 8 that will yield an even multiplicative order (which I almost have sorted out).

That leaves us with 2.946~ * 10^17 primes to test. Also, since we can stop once we've hit 2^k = -1 mod p since order will be 2*k, the complexity is 2^31, which is roughly 3.728 * 10^26 instructions (actually far less than this, because we're compressing continuous right/left shifts into one instruction, but let's go with it). With only optimizing in this way, we'd need to drop the worst case complexity from 2^31 to 2^5 to do this in 87 compute years, which is indeed too much.

We could address this better if we could solve the problem in parallel, but ultimately the goal is to sieve out unnecessary primes to test for in the first place. While finding 10 random primes in that range and using this method on them won't take as long as you think it would (roughly 3 primes could be handled per second on one of my 3.4 GHz cores without trying for shift optimization, yielding 12 in 1 second), it will take too long over the range we're testing. That's why I'm trying to find an algorithm with complexity near log(n) that can be parallelized, it brings it back into reasonable range. Factoring also gives a better expected running time but is prone to catching on numbers with large prime factors which I believe would in practice slow the effort down more than naive methods.

There are two ways to attack this problem, by either further increasing the efficiency of the sieve (which will have the highest effect), or decreasing the complexity of order calculation to O(log(n)). I think there's only so much that can be done on the order calculation side, but if I can get a generalized process to sieve out numbers with even and then odd composite orders before calculation, by exploiting the facts we know (and finding facts we don't know), we can break the problem down to something more feasible, even if we have the cpu factoring bottleneck or the high O(2^31) complexity for naive method.

I'll be working to better understand GF theory and Dirichlet characters to improve the efficiency of the sieve today. I'm also looking into why for numbers p = 1 mod 8 which have even orders, the order will never be = p-1, it always goes to (p-1)/2 or less. I believe it's related to the Kroenecker/Jacobi symbol properties of p = +/-1 mod 8 numbers, but more testing is needed.

Last fiddled with by tapion64 on 2014-04-11 at 13:46
tapion64 is offline   Reply With Quote
Old 2014-04-12, 02:42   #32
tapion64
 
Apr 2014

5·17 Posts
Default

Today was a productive day. I learned a bit more about GF theory and Dirichlet characters. As I suspected, the key to sieving out p = 1 mod 8 which have even orders is the Jacobi symbol (well, the generalized power residue symbol). Every prime = +/-1 mod 8 necessarily has J(2/p) = 1, but unfortunately since (p-1)/2 is always even, this only tells us that the order of p is at most (p-1)/2 (which was a problem I wanted to solve anyways).

In order to correctly determine if the order is odd or even, we have to do a chain of power residue symbol checks for 2^(2^n) = 2 mod p. So J(2/p) we know is always 1. Then Pr(2/p)4, which functions like Jacobi except the sub replaces the 2, so x^4 = 2 mod p. If that's -1, we know immediately that the order is even and we can stop. If it's 1, then we test Pr(2/p)8. If this is -1, again we know that the order is even and we can stop. After this point, we may already be done. (p-1)/8 is a guaranteed integer, but (p-1)/8 might still be even, and (p-1)/16, and so forth. Essentially we need Pr(2/p)2^n to be equal to 1 for all n = 2 ... k, where k is the number of times 2 divides p-1, and k will always be atleast 3. That's a somewhat complicated calculation just to sieve it out, but we could sieve out to atleast k = 3 and probably have sufficient results, it gives diminishing returns as k increases.

EDIT: It might be worth noting that this might be the precise requirement for calculating totient(p)/order = (p-1)/order. Find max value of n for each pi prime factor of p such that Pr(pi/p)2^n = 1, then divide p-1 by pi^max n for each pi = order of 2 mod p. And maybe (maybe maybe maybe), if you replace the 2 in 2^n in the power symbol with another number z, then you find the order of z mod p. I'll look into if anyone's conjectured this (or proven it) before.

So forget that stuff, it was just for my own edification. I found a better way which just involves calculating 2^e = a mod p, where e = (p-1)/2^k (i.e. divide p-1 by 2 until it's odd). If a = 1, the order is odd and divides e (and heuristically speaking, is usually either e or e is a small multiple of the order, working out exact rules on this a.t.m). If e is a large multiple of the order, we found a large factor of a smaller Mersenne number or a Mersenne prime, but since we know that none of the numbers we're testing are Mersenne primes, it means they're factors of smaller Mersenne numbers we haven't found yet. If a is anything else, by definition we'd have to square it between 1 and k times to arrive at the actual order, at which point the order must be even, so we can ignore it. Much more efficient than doing all those power residue symbol calculations.

But much more importantly, this made me realize a useful feature of Mersenne factors that I touched on earlier but didn't focus on until today. If p = -1 mod 8, then the order is naturally going to be atmost (p-1)/2. By using a modified version of the modified version of the Atkin sieve I came up with earlier, it's possible to sieve out every Sophie Germain prime = 3,-1 mod 8 less than 4, then 4*3, then 4*3*5, then 4*3*5*7, and so forth. It's a complicated 2 step process to sieve out p = -1 mod 8, then repeat the sieve on (p-1)/2. It should be possible to combine the two sieve steps into one step, it was just logically easier to put it into two steps.

I want to say that because the sieve always finds all Sophie Germain primes less than 2*Primorial(n) = 3,-1 mod 8, and it would appear that every iteration of the sieve must yield new primes, this implies that there are infinitely many Sophie Germain primes... But I know that's not a proof. I think there is a way to use this to prove the conjecture if you can combine this with a sieve that finds the other set of Sophie Germain primes (those = 1,5 mod 8) and then prove that if either sieve stopped finding primes, there would be some contradiction based on the Mersenne factor properties. But I'm too burnt out to work on that and I'd rather focus on bringing eliminating Mersenne factors to reasonable compute time.

EDIT: Ok there might be too much math in this post I need to go to sleep <.<

Last fiddled with by tapion64 on 2014-04-12 at 03:10
tapion64 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GPU Trial Factoring FAQ garo GPU Computing 100 2019-04-22 10:58
Trial Factoring on AMD/ATI GPU's? Stargate38 GPU Computing 9 2018-08-31 07:58
over trial factoring JFB Software 23 2004-08-22 05:37
How to only do Trial Factoring? michael Software 23 2004-01-06 08:54
About trial factoring gbvalor Math 4 2003-05-22 02:04

All times are UTC. The time now is 10:52.

Sat Oct 31 10:52:19 UTC 2020 up 51 days, 8:03, 2 users, load averages: 1.87, 2.16, 2.07

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.