mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2021-08-02, 19:23   #1
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2×3×953 Posts
Cool Msieve GPU Linear Algebra

Quote:
Originally Posted by Xyzzy View Post
GPU = Quadro RTX 8000
LA = 3331s


Now that we can run msieve on a GPU we have no reason to ever run it on a CPU again.
Do you mean you'll just leave the big matrices to others? How big can you solve on your GPU?
VBCurtis is offline   Reply With Quote
Old 2021-08-02, 19:28   #2
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

857610 Posts
Default

We were told:
Quote:
You can probably run up to a 25M-30M matrix, perhaps a bit larger, on that card.
Xyzzy is offline   Reply With Quote
Old 2021-08-02, 21:48   #3
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

33×97 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Do you mean you'll just leave the big matrices to others? How big can you solve on your GPU?
Unlike most of us, Mike has a high-end workstation GPU with 48GB memory. He can fit all but the largest "f small" matrices on his GPU.
frmky is offline   Reply With Quote
Old 2021-08-03, 11:49   #4
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

11,719 Posts
Default

Quote:
Originally Posted by frmky View Post
After reimplementing the CUDA SpMV with CUB, the Tesla V100 now takes 36 minutes.
I'm sorry but I am getting seriously out of date with CUDA since the updated compilers stopped working on my Ubuntu and Gentoo systems.

The three systems still in use have a 460, a 970, and a 1060 with drivers 390.138, 390.144 and 390.141 respectively.

Do you think your new code might run on any of those? If so, I will try again to get CUDA installed and working.

Thanks.
xilman is offline   Reply With Quote
Old 2021-08-03, 14:15   #5
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

A3B16 Posts
Default

Technically yes, but consumer cards don't have enough memory to store interesting matrices. If the GTX 1060 has 6GB, it could run matrices up to about 5Mx5M. The problem is that block Lanczos requires multiplying by both the matrix and its transpose, but gpus only seem to work well with the matrix in CSR, which doesn't allow efficiently calculating the transpose. So we load both the matrix and its transpose onto the card.

It would be possible to create a version that stores the matrices in system memory and loads the next matrix block into GPU memory while calculating the product with the current block. The block size is adjustable, but I don't know how performant that would be.
frmky is offline   Reply With Quote
Old 2021-08-03, 14:45   #6
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

218016 Posts
Default

How important is ECC on a video card? (Most consumer cards don't have that, right?)

Our card has it, and we have it enabled, but it runs faster without.

We haven't logged an ECC error yet. Note the "aggregate" counter described below.

Code:
ECC Errors
       NVIDIA GPUs can provide error counts for various types of ECC errors.  Some ECC errors are either single or double bit, where single bit errors  are  corrected  and  double  bit
       errors  are uncorrectable.  Texture memory errors may be correctable via resend or uncorrectable if the resend fails.  These errors are available across two timescales (volatile
       and aggregate).  Single bit ECC errors are automatically corrected by the HW and do not result in data corruption.  Double bit errors are detected but not corrected.  Please see
       the ECC documents on the web for information on compute application behavior when double bit errors occur.  Volatile error counters track the number of errors detected since the
       last driver load.  Aggregate error counts persist indefinitely and thus act as a lifetime counter.
Xyzzy is offline   Reply With Quote
Old 2021-08-04, 00:32   #7
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

50738 Posts
Default

Quote:
Originally Posted by frmky View Post
It would be possible to create a version that stores the matrices in system memory
I did that, and it's not terrible with the right settings...
Code:
using VBITS=512
matrix is 42100909 x 42101088 (20033.9 MB) with weight 6102777434 (144.96/col)
...
using GPU 0 (Tesla V100-SXM2-32GB)   <-------- 32 GB card
...
vector memory use: 17987.6 MB  <-- 7 x matrix columns x VBITS / 8 bytes on card, adjust VBITS as needed
dense rows memory use: 2569.6 MB  <-- on card but could be moved to cpu memory
sparse matrix memory use: 30997.3 MB  <-- Hosted in cpu memory, transferred on card as needed
memory use: 51554.6 MB  <-- significantly exceeds 32 GB
Allocated 357.7 MB for SpMV library
...
linear algebra completed 33737 of 42101088 dimensions (0.1%, ETA 133h21m)
frmky is offline   Reply With Quote
Old 2021-08-04, 02:42   #8
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

A3B16 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
How important is ECC on a video card? (Most consumer cards don't have that, right?)

Our card has it, and we have it enabled, but it runs faster without.

We haven't logged an ECC error yet. Note the "aggregate" counter described below.
What's your risk tolerance? msieve has robust error detection so it's not as important. But it's usually a small price to ensure no memory faults.
frmky is offline   Reply With Quote
Old 2021-08-04, 02:47   #9
mathwiz
 
Mar 2019

14A16 Posts
Default

Are there instructions on how to check out and build the msieve GPU LA code? Is it in trunk or a separate branch?
mathwiz is online now   Reply With Quote
Old 2021-08-04, 03:41   #10
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2·3·953 Posts
Default

Quote:
Originally Posted by frmky View Post
I did that, and it's not terrible with the right settings...
Code:
using VBITS=512
matrix is 42100909 x 42101088 (20033.9 MB) with weight 6102777434 (144.96/col)
...
using GPU 0 (Tesla V100-SXM2-32GB)   <-------- 32 GB card
...
vector memory use: 17987.6 MB  <-- 7 x matrix columns x VBITS / 8 bytes on card, adjust VBITS as needed
dense rows memory use: 2569.6 MB  <-- on card but could be moved to cpu memory
sparse matrix memory use: 30997.3 MB  <-- Hosted in cpu memory, transferred on card as needed
memory use: 51554.6 MB  <-- significantly exceeds 32 GB
Allocated 357.7 MB for SpMV library
...
linear algebra completed 33737 of 42101088 dimensions (0.1%, ETA 133h21m)
This is simply amazing! I'm running a matrix that size for GNFS-201 (from f-small) right now, at ~700 hr on a 12-core single-socket Haswell.
I hope this means you'll be digging out of your matrix backlog from the big siever queue.
VBCurtis is offline   Reply With Quote
Old 2021-08-04, 04:51   #11
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

33·97 Posts
Default

Quote:
Originally Posted by mathwiz View Post
Are there instructions on how to check out and build the msieve GPU LA code? Is it in trunk or a separate branch?
It's very much a work-in-progress and things may change or occasionally be broken, but you can play with it. I have it in GitHub. I recommend using CUDA 10.2 because CUDA 11.x incorporates CUB into the toolkit and tries to force you to use it, but it's missing a few pieces. That complicates things. You can get the source with

git clone https://github.com/gchilders/msieve_nfsathome.git -b msieve-lacuda-nfsathome
cd msieve_nfsathome
make all VBITS=128 CUDA=XX

where XX is the two-digit CUDA compute capability of your GPU. Specifying CUDA=1 defaults to a compute capability of 60. You may want to experiment with both VBITS=128 and VBITS=256 to see which is best on your GPU.

If you want to copy msieve to another directory, you need the msieve binary, both *.ptx files, and in the cub directory both *.so files. Or just run it from the build directory.

Last fiddled with by frmky on 2021-08-12 at 08:17 Reason: Add specifying the compute capability on the make command line.
frmky is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Resume linear algebra Timic Msieve 35 2020-10-05 23:08
use msieve linear algebra after CADO-NFS filtering aein Msieve 2 2017-10-05 01:52
Has anyone tried linear algebra on a Threadripper yet? fivemack Hardware 3 2017-10-03 03:11
Linear algebra at 600% CRGreathouse Msieve 8 2009-08-05 07:25
Linear algebra proof Damian Math 8 2007-02-12 22:25

All times are UTC. The time now is 10:21.


Wed Mar 22 10:21:30 UTC 2023 up 216 days, 7:50, 0 users, load averages: 0.71, 0.94, 0.92

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โА โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โІ โŠ‚ โŠ„ โŠŠ โЇ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”