msieve LA on CUDA

I finally got a chance to try the msieve LA on a nVidia P100. It solved a 4.5M matrix in 2 hours using 5.8 GB of GPU memory. The code needs work to bring it up to date, but it looks promising on P100 and V100.

