![]() |
|
|
#1 |
|
"Daniel Jackson"
May 2011
14285714285714285714
769 Posts |
Why hasen't NFS been implemented into a GPU program yet?
|
|
|
|
|
|
#2 |
|
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2·17·347 Posts |
|
|
|
|
|
|
#3 |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
'NFS' needs a large collection of sub-algorithms, the majority of which don't benefit at all from running on a graphics card. What you likely are asking about is NFS sieving running on a graphics card, which nobody that I know of has attempted because the sieving process involves lots of read-modify-write operations to random locations in a very large memory array, which has big problems running in parallel.
Sieving is inherently sensitive to memory latency, not memory bandwidth. GPUs have tons of the latter but are terrible with the former. |
|
|
|
|
|
#4 |
|
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
Bernstein and colleagues have a few papers on ECM on graphics cards for small numbers such as appear in the cofactorization of relation candidates in NFS sieving, see for example http://eecm.cr.yp.to/index.html
Last fiddled with by akruppa on 2012-04-02 at 15:32 |
|
|
|
|
|
#5 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
137758 Posts |
How does EECM-MPFQ compare with GMP-ECM speed wise?
|
|
|
|
|
|
#6 |
|
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
Opps, overlooked that. Bernstein claims that EECM-MPFQ is faster, unsurprisingly, but I never tried it myself. Also, PZ and colleagues recently added the batch-ECM variant to GMP-ECM (of which I know nothing myself) which is faster than the old ECM. I don't know how that compares to EECM-MPFQ.
|
|
|
|
|
|
#7 | |
|
"Bob Silverman"
Nov 2003
North of Boston
5×17×89 Posts |
Quote:
Two of the compute intensive tasks (as opposed to memory intensive) are: Computing the initial start points for this sieve. This means computing an expression of the form (ax + b)/(cx + d) mod p for each prime p in the factor base. The modular inverses are compute (but not memory!) intensive. Reconstructing the actual factorization of an identified smooth value after the sieving is complete. This is trial division! So, While the primary CPU is busy with sieving, the GPU can do these much less memory intensive and much more compute intensive tasks. Coordinate the primary CPU with the GPU; use them together. Each can do what it does best. |
|
|
|
|
|
|
#8 |
|
Sep 2009
25·7·11 Posts |
I'm not sure which thread this belongs in, apologies is this is the wrong one.
The latest Cryptogram from Bruce Schneier includes a link to http://2012.sharcs.org/record.pdf which includes an article on GPU assembly language programming. I've not had time to do more than skim it yet, but it looks very interesting. Some of the other papers also relate to CUDA programming as well. Chris |
|
|
|
|
|
#9 | |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
Quote:
It may be a win to combine calculating the start points and also sieving by vectors on a GPU, using a bucket or radix sort instead of updating a huge array of bytes randomly. But even in that case you'd need space on the card for every factor base entry and also for the intermediate results, and I think that would be a tight fit for large jobs even with 2GB of memory on the card. |
|
|
|
|
|
|
#10 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3×23×89 Posts |
An interesting paper on qs on gpus. http://www.cs.bath.ac.uk/~mdv/course...on-2009-10.pdf
Possible and pehaps fast if optimised. However, NFS uses much more memory and that was a bit of a challenge with just qs considering the high latency. |
|
|
|