mersenneforum.org > Math FFT explanation for non math guys
 Register FAQ Search Today's Posts Mark Forums Read

2011-08-28, 17:11   #45
Mini-Geek
Account Deleted

"Tim Sorbera"
Aug 2006
San Antonio, TX USA

22·11·97 Posts

Quote:
 Originally Posted by ciic My problem comes more with the first mentioned book by Crandall et al. it uses mathm. notations which I do not understand, e.g. an "=" with three horizontal lines and sum-signs and pi-signs and brackets... Could someone please give me a hint, where I probaply could get information on this? Some "search-strings" for an extended google/amazon/wikipedia-search would also be very much helpful to me. Cheers, ciic.
http://en.wikipedia.org/wiki/List_of...atical_symbols is a useful reference. And, e.g., if you saw a pi-sign used in a way you don't understand, you can look up other uses of Pi on Wikipedia.

 2011-09-09, 14:17 #46 ciic     May 2011 FR Germany - Berlin 2×7 Posts Thank you! I found the fitting keyword for me: "Set Theory". Will go w/ "Goldrei, Derek (1996), Classic Set Theory, London: Chapman and Hall, p. 3, ISBN 0-412-60610-0" ... Cheers, ciic. --- P.S.: "Them ain't Bagels, they doughnuts": See http://en.wikipedia.org/wiki/Bagels , please. "PS Can't fool me: there ain't no sanity clause." : 't me either!
2011-09-09, 16:00   #47
R.D. Silverman

Nov 2003

1D2416 Posts

Quote:
 Originally Posted by DJones Marvellous, thank you. Now any chance of being pointed at a suitable site / book (preferably the former) which explains why powers of e^(i * 2 * pi / n) work as a Fourier Transform matrix. Yes, I appreciate I could google this and have done so, but mathematical functions and definitions tend to be very flexible on the web, which isn't surprising when you take into account the fact that any idiot can put a webpage up. I also appreciate I'm coming at this from the wrong end - I stumbled across the Mersenne Primes when looking for a DC project to get involved in, and have tried to backtrack through the proofs and methods so that I understand what's going on. Unfortunately backtracking through proofs, as opposed to building on proofs you already have, is proving to be rather convoluted.

 2011-11-06, 20:46 #48 Maximus   Nov 2011 22·3 Posts I found an algorithm for polynomial multiplying. The algorithm is relative to FFT and has a simple explanation. Is anybody interesting?
2011-11-07, 02:20   #49
LaurV
Romulan Interpreter

Jun 2011
Thailand

24×13×47 Posts

Quote:
 Originally Posted by Maximus I found an algorithm for polynomial multiplying. The algorithm is relative to FFT and has a simple explanation. Is anybody interesting?
I am confused here. Did you find an algorithm to multiply polynomials (in whatever complexity time) or did you find an algorithm to multiply integers in polynomial time?

 2011-11-07, 02:43 #50 Christenson     Dec 2010 Monticello 5·359 Posts Let's have it, with a small example. Be mathematically precise in the sense you mean it...and post in the homework thread, since your post (and my reply) is off-topic, and it will draw fewest flames there.
 2012-01-29, 18:36 #51 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 10111000111012 Posts I have been fiddling around trying to make small FFTs optimal. Here is my psuedo code for 4x4 multiplication using FFT. Input: (a,b,c,d,0,0,0,0)*(e,f,g,h,0,0,0,0) FFT: Code: brt=b*1/sqrt(2) drt=d*1/sqrt(2) mtmp1=a+c mtmp2=b+d mtmp3=brt-drt mtmp4=brt+drt m1=mtmp1+ntmp2 m2=a+mtmp3 m3=c+mtmp4 m4=a-c m5=b-d m6=a-mtmp3 m7=c-mtmp4 m8=mtmp1-mtmp2 frt=f*1/sqrt(2) hrt=h*1/sqrt(2) ntmp1=e+g ntmp2=f+h ntmp3=frt-hrt ntmp4=frt+hrt n1=ntmp1+ntmp2 n2=e+ntmp3 n3=g+ntmp4 n4=e-g n5=f-h n6=e-ntmp3 n7=g-ntmp4 n8=ntmp1-ntmp2 This makes up the parts of the FFTs. Each of the 2 FFTs is 6 adds, 6 subs and 2 muls. 1/sqrt(2) must also have been precalculated. The multiplication can be done by: Code: r1=m1*n1 r2=m2*n2-m3*n3 r3=m4*n4-m5*n5 r4=m6*n6-m7*n7 r5=m8*n8 r6=m2*n3+m3*n2 r7=m4*n5+m5*n4 r8=m6*n7+m7*n6 This is 3 adds, 3 subs and 14 muls. Inverse FFT: Code: tmp1=(r1+r5)/8 tmp2=r3/4 tmp3=tmp1+tmp2 tmp4=tmp1-tmp2 tmp5=(r2+r4)/4 tmp6=(r6+r8)/4 tmp7=(r1-r5)/8 tmp8=r7/4 tmp9=tmp7+tmp8 tmp10=tmp7-tmp8 tmp11=(r2-r4)*sqrt(2)/8 tmp12=(r6-r8)*sqrt(2)/8 output1: tmp3 + tmp5 output2: tmp9 + tmp11 + tmp12 output3: tmp4 + tmp6 output4: tmp10 - tmp11 + tmp12 output5: tmp3 - tmp5 output6: tmp9 - tmp11 - tmp12 output7: tmp4 - tmp6 output8: tmp10 + tmp11 - tmp12 This is 11 adds, 11 subs, 2 muls and 6 shifts. sqrt(2)/8 must also have been precalculated. Total for 4x4 multiplication by FFT is 26 adds, 26 subs, 20 muls and 6 shifts. This is 78 operations which is slightly less than the 10*N*log2(N) estimate of 80. Can anyone see any mistakes or any improvements? I am going to move onto 8x8 multiplication next.
 2012-01-29, 20:58 #52 ewmayer ∂2ω=0     Sep 2002 República de California 2·3·29·67 Posts Without delving into your code in much detail, several general notes: 1. You need to be clear what type of transform you are working on - real-vector, complex, modular, what? (It seems from your examples you intend a real-signal covolution-based multiply, is that right? 2. You should never use 1/sqrt(N) normalizations - just apply 1/N during the final normalize-and-carry step on the convolution outputs. 3. "Shifts" are not an option on most floating-point hardware. What you call shifts should instead be muls-by-inverse-powers-of-2, should never appear explicitly. These should be absorbable into the other multiplicative constants appearing in the algorithm. 4. You need to think about whether this kind of small-transform-optimization approach will generalize well to larger signal lengths. It appears to be useful for the small-DFTs surrounding the dyadic-mul step, but large FFTs will need more-generic-radix passes (mostly small powers of 2, perhaps one odd-radix pass per fFFT and iFFT) preceding and following the dyadic-mul step. Those will have variable strides, and need to accommodate array accesses with varying strides, as well as complex "twiddle" muls.
2012-01-30, 00:39   #53
R.D. Silverman

Nov 2003

1D2416 Posts

Quote:
 Originally Posted by ewmayer Without delving into your code in much detail, several general notes: 1. You need to be clear what type of transform you are working on - real-vector, complex, modular, what? (It seems from your examples you intend a real-signal covolution-based multiply, is that right? 2. You should never use 1/sqrt(N) normalizations - just apply 1/N during the final normalize-and-carry step on the convolution outputs. 3. "Shifts" are not an option on most floating-point hardware. What you call shifts should instead be muls-by-inverse-powers-of-2, should never appear explicitly. These should be absorbable into the other multiplicative constants appearing in the algorithm. 4. You need to think about whether this kind of small-transform-optimization approach will generalize well to larger signal lengths. It appears to be useful for the small-DFTs surrounding the dyadic-mul step, but large FFTs will need more-generic-radix passes (mostly small powers of 2, perhaps one odd-radix pass per fFFT and iFFT) preceding and following the dyadic-mul step. Those will have variable strides, and need to accommodate array accesses with varying strides, as well as complex "twiddle" muls.
Read Nussbaumer's book!!! It contains a lot of optimized small FFT's
for various domains.

2012-01-30, 00:58   #54
henryzz
Just call me Henry

"David"
Sep 2007
Cambridge (GMT/BST)

61·97 Posts

Quote:
 Originally Posted by ewmayer Without delving into your code in much detail, several general notes: 1. You need to be clear what type of transform you are working on - real-vector, complex, modular, what? (It seems from your examples you intend a real-signal covolution-based multiply, is that right?
All inputs and outputs should be real. All the variables are real however there are some implied imaginary parts. I just changed the sign when necessary when using them later on. I started from your example in the fifth post and used e^(i*2*pi/8)=1/sqrt(2)+1/sqrt(2)*i as E in a 8x8 matrix. Once I had done that I just looked for where there were repeat operations and optimised them out.
Quote:
 2. You should never use 1/sqrt(N) normalizations - just apply 1/N during the final normalize-and-carry step on the convolution outputs.
Are you refering to the 1/sqrt(2)s in the code or something about my iFFT? N is 4 in my case and I am doing a length 2N FFT.
Quote:
 3. "Shifts" are not an option on most floating-point hardware. What you call shifts should instead be muls-by-inverse-powers-of-2, should never appear explicitly. These should be absorbable into the other multiplicative constants appearing in the algorithm.
I didn't realize they didn't exist for floating point. Previously I have almost exclusively used integer arithmatic which does support shifts.
Quote:
 4. You need to think about whether this kind of small-transform-optimization approach will generalize well to larger signal lengths. It appears to be useful for the small-DFTs surrounding the dyadic-mul step, but large FFTs will need more-generic-radix passes (mostly small powers of 2, perhaps one odd-radix pass per fFFT and iFFT) preceding and following the dyadic-mul step. Those will have variable strides, and need to accommodate array accesses with varying strides, as well as complex "twiddle" muls.
Not sure what you mean by generic-radix and odd-radix though it might be the names fooling me. Do you mean that calulating r1 and r5 are odd-radix steps and the others generic radix steps?
I realize that the optimisations will get more complex. With a length 8 FFT E was 1/sqrt(2)+1/sqrt(2)*i and real(E)=im(E). With higher powers real(E)!=im(E).
The code above worked once I had changed m1=mtmp1+ntmp2 to m1=mtmp1+mtmp2.
I have gleaned most of my information on ffts from this thread. I have studied basic linear transformations at uni but nothing like this. I am currently half way through my first year studying maths at Leicester University. I will look at this again tomorrow. Need to get to sleep

2012-01-30, 20:40   #55
ewmayer
2ω=0

Sep 2002
República de California

2D8A16 Posts

Quote:
 Originally Posted by R.D. Silverman Read Nussbaumer's book!!! It contains a lot of optimized small FFT's for various domains.
It was one of my main references when I was writing my own early-stage FFT, although I found it most useful for the optimized short-length-DFT algos, rather than the overall implementation of a large-signal FFT. Or were to talking to Henry?

Quote:
 Originally Posted by henryzz All inputs and outputs should be real. All the variables are real however there are some implied imaginary parts. I just changed the sign when necessary when using them later on. I started from your example in the fifth post and used e^(i*2*pi/8)=1/sqrt(2)+1/sqrt(2)*i as E in a 8x8 matrix. Once I had done that I just looked for where there were repeat operations and optimised them out. Are you refering to the 1/sqrt(2)s in the code or something about my iFFT? N is 4 in my case and I am doing a length 2N FFT.
Now that you made clear you are writing a real-vector FFT, the reason for the sqrt(2)s is clear to me. I mis-read that part of your code, partly because I didn't have time to delve deeply into your implementation yesterday, and instead focused on big-picture stuff.

Quote:
 Not sure what you mean by generic-radix and odd-radix though it might be the names fooling me. Do you mean that calulating r1 and r5 are odd-radix steps and the others generic radix steps?
No, I mean that in general one seeks to write code that will be able to handle a wide variety of transform lengths. So e.g. if length-64 is too small to permit exact multiply of one's target inputs, one needs to be able to go easily to a larger transform length. One generally assembles these out of highly-optimized smaller-DFT routines. For example in my code I found it best to support power-of-2 complex-DFT-pass routines of lengths 8, 16 and 32. Lengths < 8 are inefficient because they lead to too many passes through the data array, but lengths > 32 yields little further gain in that regard. So for n=64 I would use 2 radix-8 passes, for n=128 I would do one pass at radix 8 ad another at radix16.

Once one needs to get seriously efficient one alno needs to support non-powr-of-2 intermediate runlengths, e.g. n=80, which one could do via radix [5,16] or [8,10] combos. That gets nontrivial because one needs to consider how to generalize the concept of "bit reversal reordering" to auch runlengths.

But you seem like you're on the right track, so let us know how things go.

Last fiddled with by ewmayer on 2012-01-30 at 20:42

 Similar Threads Thread Thread Starter Forum Replies Last Post jasong Lounge 31 2015-10-09 20:55 Unregistered Hardware 12 2006-11-03 03:53 jasong Sierpinski/Riesel Base 5 1 2006-03-22 01:06 Crystallize Lounge 6 2003-09-27 13:08 ThomRuley Lone Mersenne Hunters 1 2003-05-29 18:17

All times are UTC. The time now is 07:13.

Wed Oct 20 07:13:48 UTC 2021 up 89 days, 1:42, 0 users, load averages: 1.57, 1.85, 1.79