![]() |
|
|
#1 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3×23×89 Posts |
Is there any reason why no one has tried implementing QS or NFS sieving on a GPU? It can be divided up into small chunks using litle memory(L1 cache on a cpu) and wouldn't require a huge amount of data to be transfered between it and the cpu(just found relations).
|
|
|
|
|
|
#2 | |
|
"Bob Silverman"
Nov 2003
North of Boston
5×17×89 Posts |
Quote:
NO. It can not be divided into little chunks. The reason why it has not been implemented is clear: lack of memory |
|
|
|
|
|
|
#3 |
|
Dec 2010
Monticello
5·359 Posts |
Bob, Henry:
With 1-1.5Gig on typical medium to high-end GPUs these days, I think you are still bound in the memory limits of not long ago. I think we could do MPQS/NFS effectively on a GPU card, as long as it is restricted to the sieving half of the problem. I don't know squat about poly selection, so have no information there. Memory would be a serious problem for the matrix reduction stage, with estimates of the required amounts for problems like M1061 running in the 10s of Gigs. But I have to finish some things in mfaktc before I can even think about doing it, including Wblipp's request for it to factor some bases besides 2, and these are taking me far too long. |
|
|
|
|
|
#4 | |
|
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2·17·347 Posts |
Quote:
The situation wrt GPUs is similar, at least in the CUDA model with which I'm familiar. The analogue of L2 cache is shared memory --- relatively small but with very low access latency --- and the system RAM analogue is the graphics RAM. With at least several hundreds of megabytes RAM available on a typical graphics card (there's a gigabyte on my Windows laptop, for instance), it seems to me that a GPU could indeed be a very good sieving engine. Perhaps I'll think about writing a proof of concept implementation. Incidentally, the machines which ran the MPQS sievers to factor RSA-129 were 486 and P90 class and typically had 8M of RAM, an amount which is comparable with the amount of L2 / L3 cache on many modern x86 processors. Paul |
|
|
|
|
|
|
#5 | |
|
"Bob Silverman"
Nov 2003
North of Boston
5·17·89 Posts |
Quote:
And also whether one is using a line or lattice siever. |
|
|
|
|
|
|
#6 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
193616 Posts |
The gnfs-lasieve lattice siever, with the interesting trick to do with choice of lattice basis, works one line at a time, with the line held in L1 cache; the bucketting procedure uses quite a lot of memory to store information about what to do in the next line, but at no point is there a big two-dimensional array involved.
For a GPU implementation you'd want to do a dozen lines at a time (since the chip has a dozen multiprocessors), with the lines held in the local memory of the multiprocessors on the chip, and the bucketting information in the main card memory. A question is whether you can do this without having to store a whole set of bucket information per multiprocessor, for which indeed there wouldn't be enough memory. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| NFS sieving? | Dubslow | Factoring | 8 | 2012-09-28 06:47 |
| Line sieving vs. lattice sieving | JHansen | NFSNET Discussion | 9 | 2010-06-09 19:25 |
| 10^420 + 1 sieving | juno1369 | Factoring | 20 | 2010-04-28 01:11 |
| Sieving | OmbooHankvald | Prime Sierpinski Project | 4 | 2005-06-30 07:51 |
| Sieving | robert44444uk | Sierpinski/Riesel Base 5 | 8 | 2005-04-02 22:30 |