View Single Post
Old 2011-05-20, 17:30   #4
Tribal Bullet
jasonp's Avatar
Oct 2004

353610 Posts

Kleinjung's algorithm for stage 1 poly selection can be expressed as either memory-bound or compute-bound. The memory-bound version uses the birthday paradox to become efficient as the problem size goes up, and the compute-bound version uses GPU hardware to perform a large number of tests at once but the amount of work to do increases quadratically as the problem size increases. For small problems, the memory bound version cannot build a large enough hashtable to become efficient, and the GPU code gets more results.

Incidentally, you may also want to include the number of stage 1 hits per second as a more accurate measure of efficiency, because msieve checks for time limit exceeded less often in the GPU code, so that especially for small problems you spend much more time running the GPU version. You can also just run stage 1 and count the number of lines in the .m file that are produced, to avoid being misled by the large amount of debug output the library currently produces. Running stage 2 will confuse the timings, because the CPU and GPU versions will both switch off at random times to possibly spend quite a while running stage 2. The two versions will not produce the same answers.
jasonp is offline   Reply With Quote