View Single Post
Old 2021-09-13, 05:22   #36
frmky's Avatar
Jul 2003
So Cal

3·751 Posts

I spent time with Nsight Compute looking at the SpMV kernel. As expected for SpMV it's memory bandwidth limited, so increasing occupancy to hide latency should help. I adjusted parameters to reduce both register and shared memory use, which increased the occupancy. This yielded a runtime improvement of only about 5% on the V100 but it may differ on other cards. I also increased the default block_nnz to 1750M to reduce global memory use a bit.
frmky is online now   Reply With Quote