Ralf Recker
OK. Here is the result from the GPU (details above):

ralf@quadriga ~/llrcuda.0.07 $ time ./llrCUDA -q"5*2^1282755+1" -d
Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3
5*2^1282755+1 is prime! Time : 2708.763 sec.

real 45m8.793s
user 45m2.749s
sys 0m5.644s

Edit: A version compiled with --arch=sm_21 is slower (2.155 ms per bit), a version compiled with --arch=sm_20 is a tiny bit faster (2.085 ms per bit).

