mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Cuda and a cluster (https://www.mersenneforum.org/showthread.php?t=20726)

efiGeek 2015-12-05 14:13

Cuda and a cluster
 
Hello all,
Long time reader of the forum, first time poster.

Is it possible to use Msieve with Cuda and a cluster? I.E. only the head node will have Cuda and the workers not have Cuda? It is to my understanding that Cuda is only used in one phase, therefor it would not be needed in the worker nodes.

Also, does anyone have any first hand experience with Msieve and MPI? I understand how to set it up (I think), just curious of the speedups that have been seen with large factors. How large of a cluster would one need to factor big values (1024 and above)?

Thanks in advance for all replies and moreover thanks for the tools to make factoring happen!
-Tim

EdH 2015-12-05 15:21

[QUOTE=efiGeek;418344]Hello all,
Long time reader of the forum, first time poster.

Is it possible to use Msieve with Cuda and a cluster? I.E. only the head node will have Cuda and the workers not have Cuda? It is to my understanding that Cuda is only used in one phase, therefor it would not be needed in the worker nodes.

Also, does anyone have any first hand experience with Msieve and MPI? I understand how to set it up (I think), just curious of the speedups that have been seen with large factors. How large of a cluster would one need to factor big values (1024 and above)?

Thanks in advance for all replies and moreover thanks for the tools to make factoring happen!
-Tim[/QUOTE]
I have a thread here: [URL="http://www.mersenneforum.org/showthread.php?t=18684&highlight=Openmpi"]http://www.mersenneforum.org/showthread.php?t=18684&highlight=Openmpi[/URL] which describes my novice attempt to run msieve with openMpi. I never really got anywhere, probably due to lack of bandwidth, but possibly due to my short attention span and moving to other interests.

efiGeek 2015-12-05 15:43

EdH, Thanks for the link. I did learn some things from your test that you posted. My cluster is a bit larger than yours, so I hope to see some decent results. I can run 768 threads and the avg. cpu speed is 4.3gHz. I just hope I don't spend a lot of time to see that the time to factor is years as that would break me just in a power bill.

I hope JasonP or others will chime in.

-Tim

VBCurtis 2015-12-05 17:01

If you mean 1024-bit general numbers (GNFS), you'll need roughly 10,000 of your 768-thread machines for a period of 5 (give or take an order of magnitude) years. A year of your machine might be enough to do the sieving step of a record or near-record GNFS factorization, but the matrix might be a problem.

If you mean ~1024-bit special numbers (say, 2^991+1), using SNFS, your machine will do nicely. The sieving step is trivially parallel- you can use a simple script to create n jobs for n different ranges, and then just collect the resulting data files in one location.

The matrix step has substantial exchange of data among threads; this is where MPI is needed/sensitive to setup. fivemack uses a 48-core box with MPI to solve matrices as large as kilobit SNFS, and the CADO folks used a 12x12 core cluster to solve the matrix for RSA768 (a GNFS factorization). The number of cores that can be effective for the matrix-solving step is limited by how fast the interconnect is, and the size of problem you can do the matrix for is limited by the amount of memory each socket has.

CUDA is used only for GNFS polynomial selection; if you run SNFS candidates, there is no poly select step and thus no need for CUDA. The CUDA step uses just one core of CPU per GPU, and is customarily run for 3-5% of the time the whole project is expected to take. There isn't any point in engaging your cluster for this step; every machine with a GPU would run its own copy of msieve to independently look for a poly, with the best overall result used for the sieving step.

VBCurtis 2015-12-05 17:05

Have a look at the work distribution and discussion of matrix for a recent kilobit SNFS solved by this forum at [url]http://mersenneforum.org/showthread.php?t=19821[/url]

And a discussion of how big a job kilobit GNFS is at [url]http://mersenneforum.org/showthread.php?t=10382[/url]

efiGeek 2015-12-05 17:53

[QUOTE=VBCurtis;418362]Have a look at the work distribution and discussion of matrix for a recent kilobit SNFS solved by this forum at [url]http://mersenneforum.org/showthread.php?t=19821[/url]

And a discussion of how big a job kilobit GNFS is at [url]http://mersenneforum.org/showthread.php?t=10382[/url][/QUOTE]

Thanks for the links and the previous post. I am after RSA1024 factoring, but it seems that even with 10,000 times the computing power that I already have a factor is still years away. Guess I will have to explore some more uncommon methods to attempt to factor it.

On another note, I can calculate the upper 511 bits of a private key from the public key. Is this of any use? Probably not but worth asking. Trying all combinations of the lower 513 bits is 1000's of years. But then again, maybe there is a relation between the two that I am not seeing and it could be of use.

-Tim

Dubslow 2015-12-05 18:09

[QUOTE=efiGeek;418373]Thanks for the links and the previous post. I am after RSA1024 factoring, but it seems that even with 10,000 times the computing power that I already have a factor is still years away. Guess I will have to explore some more uncommon methods to attempt to factor it.

On another note, I can calculate the upper 511 bits of a private key from the public key. Is this of any use? Probably not but worth asking. Trying all combinations of the lower 513 bits is 1000's of years. But then again, maybe there is a relation between the two that I am not seeing and it could be of use.

-Tim[/QUOTE]

If anything could be done to attack otherwise secure 1024 bit RSA, it would have already been done in the ~40 years that RSA has been around. Unless the private key was generated in an insecure manner (try P-1 factoring, or taking gcds with a whole bunch of other private keys), there's no way to factor it, for doing so would be a major mathematical accomplishment worthy of mainstream press recognition. Research mathematicians have been considering integer factorization for hundreds of years -- anyone here would be highly sceptical of the odds of success if you tried some novel way to do it yourself. Hell, even doing it with GNFS would be extremely impressive and press-worthy.

efiGeek 2015-12-05 18:20

[QUOTE=Dubslow;418374]If anything could be done to attack otherwise secure 1024 bit RSA, it would have already been done in the ~40 years that RSA has been around. Unless the private key was generated in an insecure manner (try P-1 factoring, or taking gcds with a whole bunch of other private keys), there's no way to factor it, for doing so would be a major mathematical accomplishment worthy of mainstream press recognition.[/QUOTE]

For two years Bleichenbacher's e=3 RSA Attack worked for my needs. Then they started checking the padding bytes so I had to resort to some other methods. Yes, any RSA digital signature that fails to comply to the extent that Bleichenbacher's attack worked was not a well implemented method but I was able to use that exploit. I am not trying to buck the system, but how do others come up with faster and "out of the box" methods? By never stop trying.

The P-1 was an idea some time back but I never explored it much. And I only have one private key that I am after at the moment. I have been thinking for a few years about partial factorization. I.E. a way to factor the upper half of the two primes using the public key (part or all of it) and the upper bits of the private key, but I have not came up with much other than brute forcing it (not a viable option).

-Tim

Dubslow 2015-12-05 18:43

[QUOTE=efiGeek;418375]For two years Bleichenbacher's e=3 RSA Attack worked for my needs. Then they started checking the padding bytes so I had to resort to some other methods. Yes, any RSA digital signature that fails to comply to the extent that Bleichenbacher's attack worked was not a well implemented method but I was able to use that exploit. I am not trying to buck the system, but how do others come up with faster and "out of the box" methods? By never stop trying.

The P-1 was an idea some time back but I never explored it much. And I only have one private key that I am after at the moment. I have been thinking for a few years about partial factorization. I.E. a way to factor the upper half of the two primes using the public key (part or all of it) and the upper bits of the private key, but I have not came up with much other than brute forcing it (not a viable option).

-Tim[/QUOTE]

I am quite intrigued. What are "your needs"? At any rate, the e=3 attack as you surely understand wasn't an attack on integer factorization. Again, if P-1 works, then the key was poorly produced (it's easy to check a prime's weakness to P-1/P+1, so any decent implementation will). As for the gcd, yes you may be targeting just one key but it's possible that the key shares one of its primes with other keys, which is especially likely with bad implementations (again). If target key's source is known to not be top quality, it may be worth acquiring other private keys from the same source to run gcds against your target.

Now about this "upper half" business, you'll need to me more specific. You mean you want the most significant bits of the private key or its primes? Even if this was possible, why would you ever want this?

efiGeek 2015-12-05 18:53

[QUOTE=Dubslow;418379]I am quite intrigued. What are "your needs"?[/QUOTE]
To be able to digitally sign the hash of data files.
[QUOTE=Dubslow;418379]Again, if P-1 works, then the key was poorly produced (it's easy to check a prime's weakness to P-1/P+1, so any decent implementation will)[/QUOTE]
I shall read up on this more, not sure I have found the correct source for quality information on it in the past.
[QUOTE=Dubslow;418379]As for the gcd, yes you may be targeting just one key but it's possible that the key shares one of its primes with other keys, which is especially likely with bad implementations (again). If target key's source is known to not be top quality, it may be worth acquiring other private keys from the same source to run gcds against your target.[/QUOTE]
I can't even get 1 private key, no less than getting many. All I have to date is public keys.
[QUOTE=Dubslow;418379]Now about this "upper half" business, you'll need to me more specific. You mean you want the most significant bits of the private key or its primes? Even if this was possible, why would you ever want this?[/QUOTE]
I can with a few simple calculations produce the upper half of a private RSA key that uses an e=3,7,11. I am sure that I am not the only one that has found this relation between the public and private key with a low e. In the embedded world a low e is chosen because the micro's do not have a lot of computing power so that they can decrypt and verify a digital signature in a timely fashion.

-Tim

VBCurtis 2015-12-05 20:48

You are correct in your belief that calculating the upper-half of a private key is utterly useless in determining the lower half. It does reduce the search space for a brute-force "guess the number" attack, but not enough to make brute force even remotely worthwhile.

The general concept of P-1 factoring: If the private-key prime P is such that the number P-1 is factorable into small numbers, then the P-1 factoring method will discover the prime P when run against the 1024-bit input. "small" is defined by the bounds used for the P-1 run: P-1 must break down into all-but-one factors less than B1, with the but-one less than B2. B1 and B2 are user-selectable bounds set when P-1 is run, but the time for the run scales (~linearly) with B1 choice while memory and time scale (sublinearly) with B2 choice. P-1 method is part of the GMP-ECM package, invoked with the -pm1 flag.

Dubslow's observation that it's easy to check a created key against P-1 means that good RSA implementations will check the factorization of P-1 to make sure it's not smooth enough for the key to be cracked this way. However, if a key-creating package doesn't have that check implemented, it's possible to crack the 1024-bit input with P-1. It shouldn't work, but it could if the key was poorly created.


All times are UTC. The time now is 00:57.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.