frmky 2019-03-20 22:31

msieve LA on CUDA
I finally got a chance to try the msieve LA on a nVidia P100. It solved a 4.5M matrix in 2 hours using 5.8 GB of GPU memory. The code needs work to bring it up to date, but it looks promising on P100 and V100.


pinhodecarlos 2019-03-21 15:01

Have you tried a MPI GPU version? Just concerned about the memory usage for this small matrix.

jasonp 2019-03-22 11:48

The existing CUDA code is definitely not MPI aware; MPI processes can each use a GPU for a smaller matrix multiply but data transfers to/from GPU would be required for every such operation. I've never even tried using it so the odds are 100% that it is broken.

A better implementation would host the data buffers on GPU at all times and do direct copies from one GPU to another. Latter-day CUDA makes this possible but it has to be explicitly set up.

pinhodecarlos 2020-09-07 20:14

Hey Greg, any new updates on the above? TIA

