Sieving k * 2^n + c with Nvidia GPU's for fixed k
Status update.
Basically comes down to doing the BSGS algorithm to crack the discrete algorithm on the GPU.
Past month and a little bit more developing at Nvidia GTX580.
Though testing it currently for k * 2^n  1, should work with minor changes for similar formula's with a single k.
Speeds up more and more yet goes step by step to get it faster. Initially was slower than newpgen.
Right for nrange is 7 million, it's about 17x faster than newpgen at a single CPU core here. For smaller nranges it's linear faster. For nrange about 4 million it'll be a 30x faster than newpgen. This is at a GTx580.
Trying to speed it up. Basically busy saving out cache usage. Hope to post code within a few weeks, maybe sooner that works a little.
At a remote GTX980 in the States i tested a little  yet will require total new kernel. Those pictures they draw of the Kepler on homepages online are marketing pictures. Right now is considerable slower than GTX580  yet with special kernel doing 128 streamcores in a single kernel, instead of 32, should speedup nearly factor 4, though for now factor 2 would be nice...
Fermi, Maxwell and every GPU generation will require its own kernel.
Right now is Fermi kernel. That means it runs of course at all those GPU's, yet doesn't benefit from the architecture of Maxwell right now. Will come!
As Fermi (4xx and 5xx series) has 32 streamcores in a single multiprocessor,
the 6xx series has 192 streamcores in a single multiprocessor (big problem)
and Maxwell has 128 streamcores in a single multiprocessor and where Fermi and Maxwell have similar L1 datacache, the 6xx series has weirdo design of its own.
Using: primesieve and intrinsics from TheJudger.
To be continued. What's a good spot to upload working source codes to so everyone can download it?
Regards,
Vincent Diepeveen
