View Single Post
Old 2013-11-17, 12:33   #1
"Svein Johansen"
May 2013

3×67 Posts
Default Nvidia Cuda 6.0 unified memory management

Nvidia is about to release cuda 6.0. With this, a unified memory management, and I created this thread to discuss it.

By reading the links, the first I think of is that it will not speed up cudalucas or P1 testing with cuda. The reason for this is that Cuda will copy memory over to gpu anyway.

However, by simplifying the memcopy operations, we can move all memory variables into GPU as references, and clean up the code a bit, so we can actually build a second thread inside eg. cudalucas, and utilize the HyperQ functionality for Titan and Tesla 20x boards.

I am currently using HyperQ for my own mathematical algorithms, and it is working just fine, but the cudalucas code has been too complex to build a second thread into it.

What we also might see with cuda applications using the new technique from Cuda 6.0 is the possibility for 2 threads (2 instances of eg. cudalucas) running towards the same GPU. Since the memcopy will be done by cuda API, the running code could use the GPU while the other copies memory back to host memory for CPU cycles, and then vice versa when cpu cycles are done copy and execute the other thread back on GPU while first copies to host mem for CPU cycles. This is how it is working with Titan boards today on linux with the gateway function for cuda (because of the TITAN compute farm project this got developed only for linux). But with Cuda now taking over memcopy operations and host program just referencing the memory this could be a reality very quickly. I am hoping for this, as the Titan boards will then be able to use 2 or maybe even 3 simultaneously threads on the GPU keeping GPU 100% active instead of now, waiting for memcopy operations to finish before activating itself again.
Manpowre is offline   Reply With Quote