mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-04-02, 14:08   #1
Stargate38
 
Stargate38's Avatar
 
"Daniel Jackson"
May 2011
14285714285714285714

769 Posts
Question GPU for NFS factoring?

Why hasen't NFS been implemented into a GPU program yet?
Stargate38 is offline   Reply With Quote
Old 2012-04-02, 14:42   #2
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

2·17·347 Posts
Default

Quote:
Originally Posted by Stargate38 View Post
Why hasen't NFS been implemented into a GPU program yet?
Because no-one has thought it worth doing. Why don't you do it?
xilman is offline   Reply With Quote
Old 2012-04-02, 14:46   #3
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·23·31 Posts
Default

'NFS' needs a large collection of sub-algorithms, the majority of which don't benefit at all from running on a graphics card. What you likely are asking about is NFS sieving running on a graphics card, which nobody that I know of has attempted because the sieving process involves lots of read-modify-write operations to random locations in a very large memory array, which has big problems running in parallel.

Sieving is inherently sensitive to memory latency, not memory bandwidth. GPUs have tons of the latter but are terrible with the former.
jasonp is offline   Reply With Quote
Old 2012-04-02, 15:32   #4
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

Bernstein and colleagues have a few papers on ECM on graphics cards for small numbers such as appear in the cofactorization of relation candidates in NFS sieving, see for example http://eecm.cr.yp.to/index.html

Last fiddled with by akruppa on 2012-04-02 at 15:32
akruppa is offline   Reply With Quote
Old 2012-04-07, 16:02   #5
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

137758 Posts
Default

How does EECM-MPFQ compare with GMP-ECM speed wise?
henryzz is offline   Reply With Quote
Old 2012-04-20, 12:14   #6
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

Opps, overlooked that. Bernstein claims that EECM-MPFQ is faster, unsurprisingly, but I never tried it myself. Also, PZ and colleagues recently added the batch-ECM variant to GMP-ECM (of which I know nothing myself) which is faster than the old ECM. I don't know how that compares to EECM-MPFQ.
akruppa is offline   Reply With Quote
Old 2012-04-20, 14:56   #7
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

5×17×89 Posts
Default

Quote:
Originally Posted by jasonp View Post
'NFS' needs a large collection of sub-algorithms, the majority of which don't benefit at all from running on a graphics card. What you likely are asking about is NFS sieving running on a graphics card, which nobody that I know of has attempted because the sieving process involves lots of read-modify-write operations to random locations in a very large memory array, which has big problems running in parallel.

Sieving is inherently sensitive to memory latency, not memory bandwidth. GPUs have tons of the latter but are terrible with the former.
Allow me to point out that a GPU can be used effectively in combination with the primary CPU.

Two of the compute intensive tasks (as opposed to memory intensive)
are:

Computing the initial start points for this sieve.
This means computing an expression of the form (ax + b)/(cx + d) mod p
for each prime p in the factor base. The modular inverses are compute
(but not memory!) intensive.

Reconstructing the actual factorization of an identified smooth value
after the sieving is complete. This is trial division!

So, While the primary CPU is busy with sieving, the GPU can do these
much less memory intensive and much more compute intensive tasks.

Coordinate the primary CPU with the GPU; use them together. Each can
do what it does best.
R.D. Silverman is offline   Reply With Quote
Old 2012-04-20, 16:25   #8
chris2be8
 
chris2be8's Avatar
 
Sep 2009

25·7·11 Posts
Default CUDA assembly language.

I'm not sure which thread this belongs in, apologies is this is the wrong one.

The latest Cryptogram from Bruce Schneier includes a link to http://2012.sharcs.org/record.pdf which includes an article on GPU assembly language programming. I've not had time to do more than skim it yet, but it looks very interesting.

Some of the other papers also relate to CUDA programming as well.

Chris
chris2be8 is offline   Reply With Quote
Old 2012-04-21, 19:12   #9
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·23·31 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Allow me to point out that a GPU can be used effectively in combination with the primary CPU.
Note that unless the data starts and ends in the GPU's memory, the latency of data transfer needs to be added to the latency of performing auxiliary sieving computations. Running algorithms like P+-1 and ECM on big batches of small numbers is a natural fit to a GPU, but IMO trial factoring doesn't fit so well because resieving is a viable alternative and computing start points doesn't have enough arithmetic to justify the data transfer on and off the card.

It may be a win to combine calculating the start points and also sieving by vectors on a GPU, using a bucket or radix sort instead of updating a huge array of bytes randomly. But even in that case you'd need space on the card for every factor base entry and also for the intermediate results, and I think that would be a tight fit for large jobs even with 2GB of memory on the card.
jasonp is offline   Reply With Quote
Old 2012-04-22, 12:14   #10
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

3×23×89 Posts
Default

An interesting paper on qs on gpus. http://www.cs.bath.ac.uk/~mdv/course...on-2009-10.pdf
Possible and pehaps fast if optimised. However, NFS uses much more memory and that was a bit of a challenge with just qs considering the high latency.
henryzz is offline   Reply With Quote
Reply



All times are UTC. The time now is 15:11.


Fri Jul 7 15:11:17 UTC 2023 up 323 days, 12:39, 0 users, load averages: 1.03, 1.07, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โА โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โІ โŠ‚ โŠ„ โŠŠ โЇ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”