Originally Posted by flashjh View Post
Is this going to push a re-write of CUDApm1 and CUDALucas?
Well, when I did the research for this and read the suggested code change, we actually loose some overhead time using the API doing the memcopy instead of doing it ourselves in the code. But with the Maxwell platform, where there will be an ARM CPU next to the GPU, we can actually probably do the normalization on the ARM CPU instead of on the host CPU. that means we dont have to memcopy. Also with PciExpress3, the speed of a memcopy will be superfast.

I expect also Nvidia to release some support for HyperQ when doing the memcopy with the API instead of ourselves in the code.
