View Single Post
Old 2021-09-13, 05:22   #36
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

3·733 Posts
Default

I spent time with Nsight Compute looking at the SpMV kernel. As expected for SpMV it's memory bandwidth limited, so increasing occupancy to hide latency should help. I adjusted parameters to reduce both register and shared memory use, which increased the occupancy. This yielded a runtime improvement of only about 5% on the V100 but it may differ on other cards. I also increased the default block_nnz to 1750M to reduce global memory use a bit.
frmky is offline   Reply With Quote