mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   LL and FFT question (https://www.mersenneforum.org/showthread.php?t=15570)

Christenson 2011-05-02 00:14

LL and FFT question
 
I read under the math header on LL testing that after each and every LL iteration, the transformed result of squaring is inverse transformed back to a number, and the rounding errors from the FFT reconciled.

Question: Can the FFT be made exact? Obviously, this comes at a price, but bear with me for a moment -- the question arises because if the FFT can be made exact, then the inverse transform and re-transform at each step could be avoided, with an attendant, significant advantage, which might outweigh the price.

Corollary: IF the FFT can be made more exact, then fewer inverse transforms and retransforms will be required to perform an LL test. I suspect that the reduced number of transforms will offset the increased cost of the more precise FFT.

Can I get some help from experts to put numbers to this, and determine whether present practice as described above is indeed optimal?

ewmayer 2011-05-02 02:26

This question of whether one can do multiple dyadic squaring steps before taking the inverse transform and doing a normalization step has been much-discussed. It is certainly possible, either with a pure-integer transform or extended-precision floating-point arithmetic, but because each squaring roughly doubles the number of bits which must be preserved, the short answer is that the cost of the software emulation needed for the scheme makes it not cost-effective compared to properly tuned transform-based math which uses operands having fast hardware-supported arithmetic and fills each word of the input vector with close to the maximum number of bits allowed by the resulting scheme.

Christenson 2011-05-02 10:33

Wonder if it would be possible to do the normalization without doing the inverse transform? Can you point me at the literature?

retina 2011-05-02 10:45

[QUOTE=Christenson;260243]Wonder if it would be possible to do the normalization without doing the inverse transform? Can you point me at the literature?[/QUOTE]I would imagine that it is possible, but it would be mighty tricky and probably a lot slower than just the usual inverse and normalise.

ewmayer 2011-05-03 18:34

[QUOTE=Christenson;260243]Wonder if it would be possible to do the normalization without doing the inverse transform? Can you point me at the literature?[/QUOTE]

AFAIK it is impossible, or at least grossly impracticable.

On a somewhat-related note (this specifically has to do with the dyadic-square step in an LL-test setting): We can do the subtract-2 step on the forward-transformed vector, because the length-N FFT of a scalar is just an N-vector containing N copies of the scalar. But that would only be useful if staying in "FFT space" for multiple dyadic squarings were in some way advantageous. (Which it isn't, to the best of my knowledge).

henryzz 2011-05-03 19:30

Are you basically saying that it is possible to do the squaring and -2 without an inverse fft but not the modulus, meaning a doubling in the number of bits each iteration? How come modulus isn't possible when -2 is?

ewmayer 2011-05-03 20:11

I'm saying the subtract-constant step could be combined with the *dyadic* (pointwise) squaring, without requiring an iFFT. But we need the iFFT to be able to normalize and keep operands within exact-arithmetic-permitting size bounds, so it's moot.

The reason you can do the -2 step in FFT space is due to the simplicity of the data being FFTed here - a simple scalar constant.

Christenson 2011-05-03 23:20

I'd say this differently:
The cheapest normalisation operation we know is to do the iFFT followed by bitwise normalization followed by the FFT.

Being able to add a constant in FFT space is a function of the linearity of the FFT -- just do the pointwise addition of FFT(-2).

And the much-discussed claim, which I'm still looking for the pointer to, is that the additional cost of being able to stay in FFT space for two steps, in the FFT stage, exceeds the cost of doing the less-precise iFFT after the multiply and re-doing the less precise FFT for the next multiply.

ewmayer 2011-05-04 00:28

[QUOTE=Christenson;260434]And the much-discussed claim, which I'm still looking for the pointer to, is that the additional cost of being able to stay in FFT space for two steps, in the FFT stage, exceeds the cost of doing the less-precise iFFT after the multiply and re-doing the less precise FFT for the next multiply.[/QUOTE]


Doing a second dyadic squaring doubles the size (bitset count) of your convolution outputs, which means (roughly) halving the allowable number of input bits to your forward FFT, which means doubling the length of the forward FFT. So you do half as many FFTs but each is double the length, which means slightly more than twice the per-FFT runtime. QED.

Christenson 2011-05-04 00:49

OK, what happens to FFT size if I go four steps instead of two?

Also, does a GPU change the cost ratio, since doubling the FFT size might not double the time required?

ewmayer 2011-05-04 01:41

[QUOTE=Christenson;260441]OK, what happens to FFT size if I go four steps instead of two?[/QUOTE]
Your convo outputs are 4x as many bits in size, so you trade 4 length-N FFT/iFFTs for one length-4N transform pair. Repeat after me: There is no free lunch. :)

[QUOTE]Also, does a GPU change the cost ratio, since doubling the FFT size might not double the time required?[/QUOTE]

Alternate hardware only changes the calculus if it affects one scheme differently than the other. Don't see how a GPU does that.

ewmayer 2011-05-05 16:51

I should add one caveat to my assertions above: If one spends a significant time of a transform-based squaring in the DWT-weighting and output-normalization steps, then there might indeed be a modest savings from doing a longer FFT with smaller input words which allow for 2 or more dyadic multiplies in a row.

I haven't donw precise timings in a while but I seem to recall the typical overhead for the DWT-weight/unweight and normalize-and-propagate-carries is 10-20% of the overall modmul time.

For my Mlucas code, there is a roughly 10% additional overhead in Mersenne-mod mode due to "wrapper step" needed around the dyadic-mul, to convert the outputs of the fundamentally complex-input FFT I use to ones reflective of a real-vector FFT. (I didn't want to code 2 separate FFTs for e.g. Fermat-mod and Mersenne-mod arithmetic...Fermat-mod uses a straight complex FFT if one uses the right-angle-transform weighting trick).

cheesehead 2011-05-05 17:07

[QUOTE=ewmayer;260613]Fermat-mod uses a straight complex FFT if one uses the right-angle-transform weighting trick).[/QUOTE]Careful! GIMPS may lose a lot of Republicans if that comment gets out!


All times are UTC. The time now is 22:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.