View Single Post
Old 2005-03-03, 00:35   #2
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

1175410 Posts
Default

There's no mathematical reason one can't decrease the FFT stepsize as you propose - my Mlucas code uses 8 equal-size steps between adjacent powers of 2, for example. But some notes about this:

1) The relative payoff decreases with decreasing stepsize - pictorially, think of approximating the area under the slope-1 line segment (y = x) from x = N to x = 2N with a series of equal-width rectangles. The powers-of-2-only FFT case corresponds to approximating the line segment with a single rectangle of height 2N, which has an area 2*N^2, compared to the exact area (N x N square plus right triangle with short sides of length N) of 1.5*N^2, for an excess (dividing out the common N^2) of 0.5. Using two rectangles of width N/2 and heights 1.5*N and 2*N, respectively, corresponds to allowing FFT lengths of form {2,3}*2^k, and has a total area 1.75*N^2, for an excess of 0.25. Each added halving of the approximating-rectangle width (i.e. of the FFT-length stepsize) only saves half the total labor of the previous halving. For GIMPS, going from 4 steps between adjacent powers of 2 (as Prime95 currently allows) to 8 steps might save on the order of 10% total labor. And that estimate is likely generous, because...

2) Supporting 4 equal-length steps between adjacent powers of 2 means supporting FFT lengths of form {1,3,5,7}*2^k. Those odd multipliers are small enough that the corresponding short-length DFT (discrete Fourier transform) algorithms can be efficiently implemented - for Prime95 that also means without horrendous amounts of ugly x86-style assembly code, and without needing overly large numbers of floating-point registers to store intermediate quantities in the DFT computation. Going to 8 equal-length steps between adjacent powers of 2 means supporting FFT lengths of form {1,3,5,7,9,11,13,15}*2^k. Those added larger DFT radices are trickier to code, generally less relatively efficient in terms of average floating-point opcount-per-input (especially for the prime radices 11 and 13), and need lots of FP registers to store temporaries. They'd also be a bitch to implement in assembly code.
ewmayer is offline   Reply With Quote