View Single Post
Old 2020-06-27, 14:49   #18
kriesel's Avatar
Mar 2017
US midwest

116318 Posts
Default Why don't we run gpu P-1 factoring's gcds on the gpus?

The software doesn't exist.

Currently CUDAPm1 stalls the gpu it runs on, for the duration of a stage 1 or stage 2 gcd that runs on one core of the system cpu.
Earlier versions of gpuowl that performed P-1 also stalled the gpu while running the gcd of a P-1 stage on a cpu core. At some point, Mihai reprogrammed it so a separate thread ran the gcd on one core of the cpu, while the gpu went ahead and speculatively began the second stage of the P-1 factoring in parallel with the stage 1 gcd, or the next worktodo assignment in parallel with the stage 2 gcd when one is available.
In all cases, these gcds are performed by the gmp library.
(About 98% of the time, a P-1 factoring stage won't find a factor, so continuing is a good bet, and preferable to leaving the gpu idle during the gcd computation.)

It was more efficient use of programmer time to implement it that way quickly, using an existing library routine.

On a fast cpu the impact is small. On slow cpus hosting fast gpus it is not.

Borrowing a cpu core for the gcd has the undesirable effect of stopping a worker in mprime or prime95 for the duration, and may also slow mlucas, unless hyperthreading is available and effective.

To my knowledge no one has yet written a gpu-based gcd routine for GIMPS size inputs.
For gpu use for gcd in other contexts see (RSA) and (polynomials).
If one was written for the large inputs for current and future GIMPS work, a new gcd routine for the gpu could be difficult to share between CUDAPm1 and gpuowl, since gpuowl is OpenCL based but CUDAPm1 is CUDA based, and the available data structures probably differ significantly.

Top of this reference thread:
Top of reference tree:

Last fiddled with by kriesel on 2021-03-02 at 19:55
kriesel is online now