![]() |
|
|
#1 |
|
Dec 2010
Monticello
5×359 Posts |
I read under the math header on LL testing that after each and every LL iteration, the transformed result of squaring is inverse transformed back to a number, and the rounding errors from the FFT reconciled.
Question: Can the FFT be made exact? Obviously, this comes at a price, but bear with me for a moment -- the question arises because if the FFT can be made exact, then the inverse transform and re-transform at each step could be avoided, with an attendant, significant advantage, which might outweigh the price. Corollary: IF the FFT can be made more exact, then fewer inverse transforms and retransforms will be required to perform an LL test. I suspect that the reduced number of transforms will offset the increased cost of the more precise FFT. Can I get some help from experts to put numbers to this, and determine whether present practice as described above is indeed optimal? |
|
|
|
|
|
#2 |
|
∂2ω=0
Sep 2002
República de California
2D7F16 Posts |
This question of whether one can do multiple dyadic squaring steps before taking the inverse transform and doing a normalization step has been much-discussed. It is certainly possible, either with a pure-integer transform or extended-precision floating-point arithmetic, but because each squaring roughly doubles the number of bits which must be preserved, the short answer is that the cost of the software emulation needed for the scheme makes it not cost-effective compared to properly tuned transform-based math which uses operands having fast hardware-supported arithmetic and fills each word of the input vector with close to the maximum number of bits allowed by the resulting scheme.
|
|
|
|
|
|
#3 |
|
Dec 2010
Monticello
5×359 Posts |
Wonder if it would be possible to do the normalization without doing the inverse transform? Can you point me at the literature?
|
|
|
|
|
|
#4 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
185416 Posts |
I would imagine that it is possible, but it would be mighty tricky and probably a lot slower than just the usual inverse and normalise.
|
|
|
|
|
|
#5 | |
|
∂2ω=0
Sep 2002
República de California
265778 Posts |
Quote:
On a somewhat-related note (this specifically has to do with the dyadic-square step in an LL-test setting): We can do the subtract-2 step on the forward-transformed vector, because the length-N FFT of a scalar is just an N-vector containing N copies of the scalar. But that would only be useful if staying in "FFT space" for multiple dyadic squarings were in some way advantageous. (Which it isn't, to the best of my knowledge). |
|
|
|
|
|
|
#6 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
7×292 Posts |
Are you basically saying that it is possible to do the squaring and -2 without an inverse fft but not the modulus, meaning a doubling in the number of bits each iteration? How come modulus isn't possible when -2 is?
|
|
|
|
|
|
#7 |
|
∂2ω=0
Sep 2002
República de California
19·613 Posts |
I'm saying the subtract-constant step could be combined with the *dyadic* (pointwise) squaring, without requiring an iFFT. But we need the iFFT to be able to normalize and keep operands within exact-arithmetic-permitting size bounds, so it's moot.
The reason you can do the -2 step in FFT space is due to the simplicity of the data being FFTed here - a simple scalar constant. |
|
|
|
|
|
#8 |
|
Dec 2010
Monticello
5·359 Posts |
I'd say this differently:
The cheapest normalisation operation we know is to do the iFFT followed by bitwise normalization followed by the FFT. Being able to add a constant in FFT space is a function of the linearity of the FFT -- just do the pointwise addition of FFT(-2). And the much-discussed claim, which I'm still looking for the pointer to, is that the additional cost of being able to stay in FFT space for two steps, in the FFT stage, exceeds the cost of doing the less-precise iFFT after the multiply and re-doing the less precise FFT for the next multiply. |
|
|
|
|
|
#9 | |
|
∂2ω=0
Sep 2002
República de California
101101011111112 Posts |
Quote:
Doing a second dyadic squaring doubles the size (bitset count) of your convolution outputs, which means (roughly) halving the allowable number of input bits to your forward FFT, which means doubling the length of the forward FFT. So you do half as many FFTs but each is double the length, which means slightly more than twice the per-FFT runtime. QED. |
|
|
|
|
|
|
#10 |
|
Dec 2010
Monticello
34038 Posts |
OK, what happens to FFT size if I go four steps instead of two?
Also, does a GPU change the cost ratio, since doubling the FFT size might not double the time required? |
|
|
|
|
|
#11 | ||
|
∂2ω=0
Sep 2002
República de California
19·613 Posts |
Quote:
Quote:
|
||
|
|
|