mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-07-24, 19:38   #1
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

3×23×89 Posts
Default QS/NFS Sieving

Is there any reason why no one has tried implementing QS or NFS sieving on a GPU? It can be divided up into small chunks using litle memory(L1 cache on a cpu) and wouldn't require a huge amount of data to be transfered between it and the cpu(just found relations).
henryzz is offline   Reply With Quote
Old 2011-07-24, 21:18   #2
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

5×17×89 Posts
Default

Quote:
Originally Posted by henryzz View Post
Is there any reason why no one has tried implementing QS or NFS sieving on a GPU? It can be divided up into small chunks using litle memory(L1 cache on a cpu)

NO. It can not be divided into little chunks.

The reason why it has not been implemented is clear: lack of memory
R.D. Silverman is offline   Reply With Quote
Old 2011-07-24, 22:17   #3
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5·359 Posts
Default

Bob, Henry:
With 1-1.5Gig on typical medium to high-end GPUs these days, I think you are still bound in the memory limits of not long ago. I think we could do MPQS/NFS effectively on a GPU card, as long as it is restricted to the sieving half of the problem. I don't know squat about poly selection, so have no information there. Memory would be a serious problem for the matrix reduction stage, with estimates of the required amounts for problems like M1061 running in the 10s of Gigs.

But I have to finish some things in mfaktc before I can even think about doing it, including Wblipp's request for it to factor some bases besides 2, and these are taking me far too long.
Christenson is offline   Reply With Quote
Old 2011-07-25, 07:23   #4
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

2·17·347 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
NO. It can not be divided into little chunks.

The reason why it has not been implemented is clear: lack of memory
Actually, sieving has been divided up into "little chunks" for many (20?) years now. The little chunks are the L2 cache found on typical processors. The main sieve array is held in system RAM.

The situation wrt GPUs is similar, at least in the CUDA model with which I'm familiar. The analogue of L2 cache is shared memory --- relatively small but with very low access latency --- and the system RAM analogue is the graphics RAM. With at least several hundreds of megabytes RAM available on a typical graphics card (there's a gigabyte on my Windows laptop, for instance), it seems to me that a GPU could indeed be a very good sieving engine. Perhaps I'll think about writing a proof of concept implementation.

Incidentally, the machines which ran the MPQS sievers to factor RSA-129 were 486 and P90 class and typically had 8M of RAM, an amount which is comparable with the amount of L2 / L3 cache on many modern x86 processors.

Paul
xilman is offline   Reply With Quote
Old 2011-07-25, 12:00   #5
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

5·17·89 Posts
Default

Quote:
Originally Posted by xilman View Post
Actually, sieving has been divided up into "little chunks" for many (20?) years now. The little chunks are the L2 cache found on typical processors. The main sieve array is held in system RAM.
Perhaps this is a quibble over the meaning of "little".

And also whether one is using a line or lattice siever.
R.D. Silverman is offline   Reply With Quote
Old 2011-07-25, 13:47   #6
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

193616 Posts
Default

The gnfs-lasieve lattice siever, with the interesting trick to do with choice of lattice basis, works one line at a time, with the line held in L1 cache; the bucketting procedure uses quite a lot of memory to store information about what to do in the next line, but at no point is there a big two-dimensional array involved.

For a GPU implementation you'd want to do a dozen lines at a time (since the chip has a dozen multiprocessors), with the lines held in the local memory of the multiprocessors on the chip, and the bucketting information in the main card memory. A question is whether you can do this without having to store a whole set of bucket information per multiprocessor, for which indeed there wouldn't be enough memory.
fivemack is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
NFS sieving? Dubslow Factoring 8 2012-09-28 06:47
Line sieving vs. lattice sieving JHansen NFSNET Discussion 9 2010-06-09 19:25
10^420 + 1 sieving juno1369 Factoring 20 2010-04-28 01:11
Sieving OmbooHankvald Prime Sierpinski Project 4 2005-06-30 07:51
Sieving robert44444uk Sierpinski/Riesel Base 5 8 2005-04-02 22:30

All times are UTC. The time now is 15:09.


Fri Jul 7 15:09:00 UTC 2023 up 323 days, 12:37, 0 users, load averages: 1.20, 1.17, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โА โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โІ โŠ‚ โŠ„ โŠŠ โЇ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”