Also, the new multiGPU possibilities for FTT and BLAS libraries are amazing.
Theoretically, 2-8 GPUs can cooperate to do FFT operations speeding up the FFT operation.

"Drop-in libraries and multi-GPU scaling are also implemented in CUDA 6. We are told that the drop-in libraries will automatically accelerate “BLAS and FFTW calculations by up to 8X by simply replacing the existing CPU libraries with the GPU-accelerated equivalents”. Multi-GPU scaling is also supported in the new BLAS and FFT GPU libraries. These libraries “automatically scale performance across up to eight GPUs in a single node, delivering over nine teraflops of double precision performance per node, and supporting larger workloads than ever before (up to 512GB)”."

Question is, will it scale 2:1 with 2 cards ? I assume some overhead here.

With this, we can easily get up to 100m mersenne exponents and bigger on the GPUs as we have alot more firepower per node (for those that want to test 100m exponents) and alot more memory per node with GPUs.
