mersenneforum.org GNFS matrix vector multiplications using GPUs
 Register FAQ Search Today's Posts Mark Forums Read

 2019-10-20, 07:24 #1 abmalakh   Feb 2019 2 Posts GNFS matrix vector multiplications using GPUs As the this paper mentioned: The Block Wiedemann (BW) and the Block Lanczos (BL) algorithms are frequently used to solve sparse linear systems over GF(2) and Iterative sparse matrix-vector multiplication is the most time consuming operation of these approaches. On the other hand, LightSpMV is a CUDA-compatible sparse matrix-vector multiplication (SpMv) algorithm. Now, my question is this that, there exist any implementation of GNFS algorithm using GPU SpMV? Whats your opinion about such implementation speed compared with non GPU SpMV?
 2019-10-20, 15:58 #2 VBCurtis     "Curtis" Feb 2005 Riverside, CA 24×32×29 Posts Nobody has (publicly) implemented GNFS matrix-solving on GPU. The biggest problem, as I understand it, is that the matrices of interesting problems range from 10 to 50+ GB, while GPUs are rather memory limited. Either data transfer is a hurdle that kills speed gains, or the GPU can only solve matrices that already don't take all that long (say, under GNFS-150). The speedup for small problems would be welcome, but doesn't advance the state of the art.
 2019-10-20, 16:49 #3 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 33×132 Posts https://www.mersenneforum.org/showthread.php?t=24190 (small matrice) Also Greg manage to run msieve on those intel GPU... can’t find the thread.... Last fiddled with by pinhodecarlos on 2019-10-20 at 16:54
 2019-10-20, 20:39 #4 jasonp Tribal Bullet     Oct 2004 2×3×587 Posts There is a branch of Msieve that uses CUDA to implement block Lanczos on Nvidia GPUs. We were making steady progress before I had to put it down in 2013; the smaller size of GPU memory can be worked around by tiling the matrix across a GPU cluster. Early measurements indicated the performance on a hot GPU at the time was pretty encouraging, but getting the full benefit of CUDA would have required a second restructuring of the code (over and above the first one that let CUDA be used). One complication with block Lanczos is that we need both regular and transpose matrix multiplies, and the consensus at the time was that maximum performance would require two versions of the matrix to be stored, in regular and transpose order. Obviously that makes the memory capacity issues twice as bad.
2019-10-21, 20:36   #5
abmalakh

Feb 2019

2 Posts

Quote:
 Originally Posted by VBCurtis Nobody has (publicly) implemented GNFS matrix-solving on GPU. The biggest problem, as I understand it, is that the matrices of interesting problems range from 10 to 50+ GB, while GPUs are rather memory limited. Either data transfer is a hurdle that kills speed gains, or the GPU can only solve matrices that already don't take all that long (say, under GNFS-150). The speedup for small problems would be welcome, but doesn't advance the state of the art.

Thanks. But as this paper mentioned:

Quote:
 In this paper we derive an efficient CUDA implementation of this operation using a newly designed hybrid sparse matrix format. This leads to speedups between 4 and 8 on a single GPU for a number of tested NFS matrices compared to an optimized multi-core implementation.
Experimental results in this paper show that using CUDA SpMV, we can improve linear algebra step. For example, one of their tables is evaluated using RSA-170 on an NVIDIA Tesla C2070 with 6 GB RAM and an NVIDIA GTX580 with 1.5 GB RAM.

 2019-10-22, 00:05 #6 jasonp Tribal Bullet     Oct 2004 67028 Posts Sparse matrix multiplies on GPU have become a lot more sophisticated since 2011 (incidentally $75 for an e-book is about the most expensive I have ever seen). See older versions of the moderngpu library and the CUB library, though the latter would need modifications to work in GF(2). 2019-10-22, 01:27 #7 VBCurtis "Curtis" Feb 2005 Riverside, CA 24×32×29 Posts Quote:  Originally Posted by abmalakh For example, one of their tables is evaluated using RSA-170 on an NVIDIA Tesla C2070 with 6 GB RAM and an NVIDIA GTX580 with 1.5 GB RAM. Good! I look forward to testing this code, when it is made public. If RSA-170 can be done in 1.5GB, the software will be quite useful.  2019-10-22, 06:22 #8 LaurV Romulan Interpreter Jun 2011 Thailand 2×17×251 Posts We would personally donate$75 for Jason's P book, if this would result in improving msieve/yafu to use our (always plenty) GPUs (as opposite to CPUs which are always scarce!). Send us the instructions.
 2019-10-22, 07:17 #9 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 33×132 Posts Anyone please send me the DOI’s and I’ll see what I can do.
2019-10-22, 07:47   #10
Branger

Oct 2018

13 Posts

Quote:
 Originally Posted by pinhodecarlos Anyone please send me the DOI’s and I’ll see what I can do.
https://onlinelibrary.wiley.com/doi/....1002/cpe.2896

Is a later version of the paper compared to the conference paper in the original post. The paper does seem to focus on block Wiedemann, but parts should still be of interest for block Lanczos.

2019-10-22, 11:32   #11
pinhodecarlos

"Carlos Pinho"
Oct 2011
Milton Keynes, UK

33·132 Posts

Quote:
 Originally Posted by Branger https://onlinelibrary.wiley.com/doi/....1002/cpe.2896 Is a later version of the paper compared to the conference paper in the original post. The paper does seem to focus on block Wiedemann, but parts should still be of interest for block Lanczos.

Last fiddled with by pinhodecarlos on 2019-10-22 at 12:02

 Similar Threads Thread Thread Starter Forum Replies Last Post hansl Hardware 4 2019-06-07 13:13 VBCurtis NFS@Home 5 2019-04-28 05:59 fivemack GPU Computing 0 2016-07-14 15:44 Robert Holmes Miscellaneous Math 4 2010-09-11 01:34 nuutti Programming 3 2008-04-08 18:01

All times are UTC. The time now is 15:41.

Thu May 28 15:41:56 UTC 2020 up 64 days, 13:15, 0 users, load averages: 1.78, 1.78, 1.72