![]() |
![]() |
#1 |
(loop (#_fork))
Feb 2006
Cambridge, England
193616 Posts |
![]()
These are very random thoughts; I am wondering whether a dumb implementation of lattice sieving running on nvidia's ludicrously fast parallel hardware might manage to run at reasonable speed.
Jason has already written the basic primitives, with a reasonably fast modular inverse. Lattice reduction is not obviously a serious problem, and that at least is grotesquely parallel (one prime per thread). Enumerating lattice points: I should read the bucket-sieving paper carefully, because at the moment I think it's necessary to divide the sieve region into slices (eg X=-2^15..2^15, Y=Y0..Y0+2^11) to get the buckets to fit in the GPU memory, but IIRC their clever lattice reduction lets you list the lattice points in Y order. Pulling entries out of buckets stored in global memory into 16k shared sieve arrays on each of 27 thread-blocks on the GPU: probably fine with one block per bucket. Not sure how persistent the shared memory is, but synchronising and writing 16k back to global memory is trivial. But I can't see how getting the entries into the buckets is going to be anything other than a nightmare on stilts: it's a random store at best (as opposed to the pleasant CPU case where you can be reasonably sure that the ends of the buckets are all in cache), contention on the last-index-for-this-bucket array is enormous, and I don't know what the behaviour when umpteen threads write different values to the same location in global memory is. GPGPU guides tend to advise against bucketing, and just suggest writing things out in arbitrary order and then doing a compaction pass and a parallel sort; maybe 60GB/second bandwidth to global memory is enough to let you be foolish in that way. List factor-base primes by the number of bucket-entries they're expected to generate and partition among threads so you don't have to be insanely generous with bucket sizes, I suppose. |
![]() |
![]() |
![]() |
#2 | |
Tribal Bullet
Oct 2004
1101111000112 Posts |
![]() Quote:
For the record, Alex and Bob are primarily responsible for that fast modular inverse. |
|
![]() |
![]() |
![]() |
#3 |
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
120108 Posts |
![]()
Is it now portable the lattice siever to CUDA code or OpenCL code?
![]() Carlos |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
QS Lattice Siever | R.D. Silverman | Factoring | 31 | 2018-10-08 17:17 |
Compiling 64 bit lattice siever on gcc 4.8.5 | chris2be8 | Factoring | 6 | 2018-02-06 17:22 |
OpenCL accellerated lattice siever | pstach | Factoring | 1 | 2014-05-23 01:03 |
Msieve / lattice siever with degree 7/8 poly | Batalov | Msieve | 54 | 2010-01-13 19:45 |
ggnfs lattice siever misses some primes | fivemack | Factoring | 1 | 2008-01-18 13:47 |