20210802, 19:23  #1 
"Curtis"
Feb 2005
Riverside, CA
31·179 Posts 
Msieve GPU Linear Algebra

20210802, 19:28  #2  
Aug 2002
7×23×53 Posts 
We were told:
Quote:


20210802, 21:48  #3 
Jul 2003
So Cal
2·3·421 Posts 

20210803, 11:49  #4  
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2D37_{16} Posts 
Quote:
The three systems still in use have a 460, a 970, and a 1060 with drivers 390.138, 390.144 and 390.141 respectively. Do you think your new code might run on any of those? If so, I will try again to get CUDA installed and working. Thanks. 

20210803, 14:15  #5 
Jul 2003
So Cal
100111011110_{2} Posts 
Technically yes, but consumer cards don't have enough memory to store interesting matrices. If the GTX 1060 has 6GB, it could run matrices up to about 5Mx5M. The problem is that block Lanczos requires multiplying by both the matrix and its transpose, but gpus only seem to work well with the matrix in CSR, which doesn't allow efficiently calculating the transpose. So we load both the matrix and its transpose onto the card.
It would be possible to create a version that stores the matrices in system memory and loads the next matrix block into GPU memory while calculating the product with the current block. The block size is adjustable, but I don't know how performant that would be. 
20210803, 14:45  #6 
Aug 2002
7·23·53 Posts 
How important is ECC on a video card? (Most consumer cards don't have that, right?)
Our card has it, and we have it enabled, but it runs faster without. We haven't logged an ECC error yet. Note the "aggregate" counter described below. Code:
ECC Errors NVIDIA GPUs can provide error counts for various types of ECC errors. Some ECC errors are either single or double bit, where single bit errors are corrected and double bit errors are uncorrectable. Texture memory errors may be correctable via resend or uncorrectable if the resend fails. These errors are available across two timescales (volatile and aggregate). Single bit ECC errors are automatically corrected by the HW and do not result in data corruption. Double bit errors are detected but not corrected. Please see the ECC documents on the web for information on compute application behavior when double bit errors occur. Volatile error counters track the number of errors detected since the last driver load. Aggregate error counts persist indefinitely and thus act as a lifetime counter. 
20210804, 00:32  #7  
Jul 2003
So Cal
2×3×421 Posts 
Quote:
Code:
using VBITS=512 matrix is 42100909 x 42101088 (20033.9 MB) with weight 6102777434 (144.96/col) ... using GPU 0 (Tesla V100SXM232GB) < 32 GB card ... vector memory use: 17987.6 MB < 7 x matrix columns x VBITS / 8 bytes on card, adjust VBITS as needed dense rows memory use: 2569.6 MB < on card but could be moved to cpu memory sparse matrix memory use: 30997.3 MB < Hosted in cpu memory, transferred on card as needed memory use: 51554.6 MB < significantly exceeds 32 GB Allocated 357.7 MB for SpMV library ... linear algebra completed 33737 of 42101088 dimensions (0.1%, ETA 133h21m) 

20210804, 02:42  #8 
Jul 2003
So Cal
2526_{10} Posts 
What's your risk tolerance? msieve has robust error detection so it's not as important. But it's usually a small price to ensure no memory faults.

20210804, 02:47  #9 
Mar 2019
2·5·31 Posts 
Are there instructions on how to check out and build the msieve GPU LA code? Is it in trunk or a separate branch?

20210804, 03:41  #10  
"Curtis"
Feb 2005
Riverside, CA
31×179 Posts 
Quote:
I hope this means you'll be digging out of your matrix backlog from the big siever queue. 

20210804, 04:51  #11  
Jul 2003
So Cal
2×3×421 Posts 
Quote:
git clone https://github.com/gchilders/msieve_nfsathome.git b msievelacudanfsathome cd msieve_nfsathome make all VBITS=128 CUDA=XX where XX is the twodigit CUDA compute capability of your GPU. Specifying CUDA=1 defaults to a compute capability of 60. You may want to experiment with both VBITS=128 and VBITS=256 to see which is best on your GPU. If you want to copy msieve to another directory, you need the msieve binary, both *.ptx files, and in the cub directory both *.so files. Or just run it from the build directory. Last fiddled with by frmky on 20210812 at 08:17 Reason: Add specifying the compute capability on the make command line. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Resume linear algebra  Timic  Msieve  35  20201005 23:08 
use msieve linear algebra after CADONFS filtering  aein  Msieve  2  20171005 01:52 
Has anyone tried linear algebra on a Threadripper yet?  fivemack  Hardware  3  20171003 03:11 
Linear algebra at 600%  CRGreathouse  Msieve  8  20090805 07:25 
Linear algebra proof  Damian  Math  8  20070212 22:25 