Multiple (prp) test at once?
When you are loading from a precomputed cos/sin table at fft or compute these infly then you could reuse these values. Basically hide/halve this computation cost [or just the cost of load] if you'd compute multiple fft test at once. Ofcourse use "close" N values, so close p values [for Mersenne numbers] to have the same FFT size for the problems.
Say you'd compute cos(d)*a1[i]+sin(d)*b1[i] cos(d)*a2[i]+sin(d)*b2[i] or for more than 2 tests: cos(d)*a3[i]+sin(d)*b3[i] cos(d)*a4[i]+sin(d)*b4[i] etc. Got the idea? 
