Thread: 69 * 2^n - 1
View Single Post
Old 2013-09-27, 01:19   #15
diep's Avatar
Sep 2006
The Netherlands

2·337 Posts

Originally Posted by VBCurtis View Post
CUDA-LLR is available, and in my experience stable. It only uses power-of-2 FFT sizes, and speed improves with larger exponents. The main FFT jump we care about is just over 3M for k=69, so your Teslas would be most useful in the upper 2M range, or over 5M (relative to CPU workers, that is).

Check in the hardware/GPU computing forum- I didn't see the thread when I glanced, but I've been running the program for over a year, even found a prime for k=5 with it in the 3-megabit range.
Thx Curtis, i downloaded it. Will try to get it to work!

Is that power of 2 the only 'disadvantage' over the IBDWT in SSE2 i got running currently?
I tend to remember how my own FFT implementation that also used power of 2 had another few disadvantages (let's say it polite) :)

The tesla's i got here are 0.5 Tflop in theory (of course that's always 2x more than it can do in terms of instructions, they always assume you can use multiply-add, not sure whether this FFT can), looking forward benchmarking it for this code!

Note it would be possible at Nvidia to run at each SIMD a different code stream. I don't know whether it still can deliver 0.5 Tflop doing that, yet if it can, should be easier to get rid of that power of 2 sized FFT? Maybe?

Last fiddled with by diep on 2013-09-27 at 01:26
diep is offline   Reply With Quote