I did that, and it's not terrible with the right settings...
using VBITS=512
matrix is 42100909 x 42101088 (20033.9 MB) with weight 6102777434 (144.96/col)
using GPU 0 (Tesla V100-SXM2-32GB)   <-------- 32 GB card
vector memory use: 17987.6 MB  <-- 7 x matrix columns x VBITS / 8 bytes on card, adjust VBITS as needed
dense rows memory use: 2569.6 MB  <-- on card but could be moved to cpu memory
sparse matrix memory use: 30997.3 MB  <-- Hosted in cpu memory, transferred on card as needed
memory use: 51554.6 MB  <-- significantly exceeds 32 GB
Allocated 357.7 MB for SpMV library
linear algebra completed 33737 of 42101088 dimensions (0.1%, ETA 133h21m)
This is simply amazing! I'm running a matrix that size for GNFS-201 (from f-small) right now, at ~700 hr on a 12-core single-socket Haswell.
I hope this means you'll be digging out of your matrix backlog from the big siever queue.
