Originally Posted by Xyzzy View Post
Our system has a single GPU. When we are doing compute work on the GPU the display lags.
CUDA doesn't allow display updates while a kernel is running. The only way to improve responsiveness without using a second GPU is to shorten the kernel run times. The longest kernel is the SpMV, and you can shorten that by lowering block_nnz. Use the lowest value that still allows everything to fit in GPU memory.

Edit: Lowering VBITS will also reduce kernel runtimes, but don't go below 128. See the benchmark a few posts above. Also, you can't change VBITS in the middle of a run. You would need to start over from the beginning. You can change block_nnz during a restart.

