mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   FFT size question (https://www.mersenneforum.org/showthread.php?t=3784)

 Unregistered 2005-03-02 23:29

FFT size question

I've been wondering recently about the ranges covered by the various FFT sizes. On the benchmarks page the same FFT size increases between columns are colour coded i.e. green for 64K increase between ranges, yellow for 128K and lilac for 256K - is there any easy way to increase the number of ranges i.e. decrease the stepping size between ranges as exponents get bigger to effect an overall speed increase for the Gimps project? (I'm guessing there isn't a way to implement this, but just thought I'd ask) Perhaps there is a problem with accuracy - maybe a certain margin of safety is necessary at higher exponents? (then again - is accuracy/error rate a function of FFT size only? or is time taken also involved, as a slightly smaller FFT relatively would be faster).

I don't know much about FFT's themselves, so can't really say much, but it would seem that much smaller stepping (if technically possible), would give a real boost to project speed particularly at the higher exponents and also bearing in mind that Prime95 on a P4 is probably maxed out now in terms of pure processing power. (I'm doing a 34 million exponent, I've been at it over 2 months when the PCs on anyway, and I'm only 4 million in!)
Anyway, here's to M43, wherever it may lie

 ewmayer 2005-03-03 00:35

There's no mathematical reason one can't decrease the FFT stepsize as you propose - my Mlucas code uses 8 equal-size steps between adjacent powers of 2, for example. But some notes about this:

1) The relative payoff decreases with decreasing stepsize - pictorially, think of approximating the area under the slope-1 line segment (y = x) from x = N to x = 2N with a series of equal-width rectangles. The powers-of-2-only FFT case corresponds to approximating the line segment with a single rectangle of height 2N, which has an area 2*N^2, compared to the exact area (N x N square plus right triangle with short sides of length N) of 1.5*N^2, for an excess (dividing out the common N^2) of 0.5. Using two rectangles of width N/2 and heights 1.5*N and 2*N, respectively, corresponds to allowing FFT lengths of form {2,3}*2^k, and has a total area 1.75*N^2, for an excess of 0.25. Each added halving of the approximating-rectangle width (i.e. of the FFT-length stepsize) only saves half the total labor of the previous halving. For GIMPS, going from 4 steps between adjacent powers of 2 (as Prime95 currently allows) to 8 steps might save on the order of 10% total labor. And that estimate is likely generous, because...

2) Supporting 4 equal-length steps between adjacent powers of 2 means supporting FFT lengths of form {1,3,5,7}*2^k. Those odd multipliers are small enough that the corresponding short-length DFT (discrete Fourier transform) algorithms can be efficiently implemented - for Prime95 that also means without horrendous amounts of ugly x86-style assembly code, and without needing overly large numbers of floating-point registers to store intermediate quantities in the DFT computation. Going to 8 equal-length steps between adjacent powers of 2 means supporting FFT lengths of form {1,3,5,7,9,11,13,15}*2^k. Those added larger DFT radices are trickier to code, generally less relatively efficient in terms of average floating-point opcount-per-input (especially for the prime radices 11 and 13), and need lots of FP registers to store temporaries. They'd also be a bitch to implement in assembly code.

 Unregistered 2005-03-03 12:38

Thanks very much for taking the time to go through the FFT stepping stuff - it's really appreciated, and can give us all confidence that prime95 is pretty much on the limit in terms of FFT related project throughput. I did a quick check of likely benefit to exponent test time and it was about 6% at most (for half the step size at the top end, and that's ignoring the difficulties of point 2).

Something that prompted me to post was that the (SSE2) 34.56 million crossover is rapidly approaching and would mean for me that an exponent would take 4.5 days more (over 100 hours at full speed) for a couple of exponents over the limit as opposed to under - it's something that seems a little crazy at first. Now there's an incentive to grab one of the 10 million digit exponents soon if ever I heard one!
Thanks again

 All times are UTC. The time now is 08:47.