![]() |
|
|
#1 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
11101011001102 Posts |
Pre AVX-512 prime95 did primarily radix-4 FFTs. AVX-512 prime95 does primarily radix-8 FFTs.
The reason for the change is AVX-512 has double the number of registers. In past Intel architectures, it has generally been profitable to do as much work as possible once data is loaded into registers. Anyway, it has been pointed out by early users that the roundoff error for radix-8 FFTs is worse than radix-4 FFTs. My very preliminary observation is that the maximum roundoff error is about 20% worse. That is if radix-4 FFTs were generating a max roundoff of 0.3125 are now generating an error of 0.375. Have other FFT developers noticed the same issue? I presume the increased error is from radix-8 doing several multiplications by (SQRTHALF + i*SQRTHALF). Any ideas for reducing the increased noise? As is, I estimate this increased noise will result in reduction of 0.1 bits per FFT input word. |
|
|
|
|
|
#2 |
|
Sep 2016
33210 Posts |
There's a messy solution that I've played with in the past. It wasn't to improve precision, but to improve performance.
Detach the twiddle factors from the butterfly. Don't let the implementation dictate the operations you are actually doing. IOW, don't let the implementation dictate where you do your twiddle factors. Some years ago, I implemented an FFT that was mathematically and numerically the same as a split-radix FFT, but was implemented using a standard radix-4 access pattern. The split-radix FFT had a lower operation count, but radix-4 was more memory-friendly. So I was able to get the benefits of both. (Granted, I no longer use this FFT since operation-count is less relevant in today's environment.) The location and values of the twiddle factors are not constant between butterflies. They change from butterfly to butterfly. For the split-radix/radix-4 case, I needed two types of butterflies, one with the twiddles in the middle, and one with the twiddles at the end/beginning. And it also complicated the load-streams from the twiddle tables. This will probably get very ugly for the radix 8 case though. Try to find a layout of twiddle factors that's numerically the best. Then try to fit into (multiple) radix-8 butterfly types to cover all the cases. |
|
|
|
|
|
#3 |
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
In my own work I have come across examples of counterintuitive behavior involving those muls-by-sqrt2 (or by 1/sqrt2) in the radix-8 DFT, especially when done via FMA. I recall being able to reduce the ROE levels in one case by deliberately rounding the precomputed sqrt2 multiplier "the wrong way", in another by using a mix of correctly and wrong-way rounded such multipliers. Since the number of such muls in a radix-8 DFT is relatively small, it's not terribly hard to try a bunch of combos and tabulate the resulting ROEs for some quasi-random sample of exponents and FFT lengths to see if one particular such combo of roundings shows consistently superior ROE behavior.
|
|
|
|
|
|
#4 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2×53×71 Posts |
Quote:
The good news is I've studied my code some and think I can reduce the radix-8 round off somewhat. I'll test my idea out soon and if it works, I have a fair amount of macros to rewrite. The basic idea is to combine the radix-8 twiddle factor (stored as cosine/sine, sine) with the SQRTHALF. This lets me rearrange computations so that there is a more balanced number of floating-point ops in calculating each radix-8 output. |
|
|
|
|
|
|
#5 | |
|
Sep 2016
22×83 Posts |
Quote:
Non-power-of-two split radix also works with Bernstein's conjugate twiddle trick. Speaking of which, do you use Bernstein's trick in any form? Because of the data reordering, I've never figured out how to make it work for the 2-pass or N-pass FFTs where all the high-order twiddles are "squeezed" into a single layer between each pass. |
|
|
|
|
|
|
#6 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
|
|
|
|
|
|
#7 |
|
Sep 2016
22×83 Posts |
A "normal" split-radix butterfly has twiddle factors of w and w^3 on the last two points.
Instead of using w and w^3, you use w and w^-1. It produces the correct frequency domain outputs, but it further scrambles up the ordering of the points that got flipped from w^3 to w^-1. Since w and w^-1 are complex conjugates, you can then reuse the same twiddle factor load for both. Thus it eliminates half your twiddle factor loads and streams. The trick isn't restricted to the split-radix butterfly. You can do it on the standard radix-4 butterfly as well though you only eliminate 1/3 of the twiddle loads as opposed to 1/2. The concept generalizes to higher radix butterflies and higher split-radix butterflies as well. Last fiddled with by Mysticial on 2018-11-02 at 21:34 |
|
|
|
|
|
#8 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
11101011001102 Posts |
Quote:
Prime95 really does a standard radix-4 (now radix-8) FFT. When I say prime95 does two passes that simply refers to passes through main memory. Pass 1 reads large-strided data, pass 2 reads single-strided data. I do further reduce the volume of sin/cos data at the expense of some extra complex multiplies. |
|
|
|
|
|
|
#9 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2×53×71 Posts |
Grrr. My reordering of radix-8 operations did not help. I cannot explain why.
My idea was to turn this: Code:
;;R1 = ((r1+r5)+(r3+r7)) +((r2+r8)+(r4+r6)) ;;R2 = ((r1-r5)+(i3-i7)) +SQRTHALF*(((r2+r8)-(r4+r6))+((i2-i8)+(i4-i6))) Code:
;;R1 = ((r1+r5)+(r3+r7)) +SQRT2*((r2+r8)+(r4+r6)) ;;R2 = ((r1-r5)+(i3-i7)) + (((r2+r8)-(r4+r6))+((i2-i8)+(i4-i6))) The new code works, but generates very slightly worse roundoff behavior. What was wrong with my logic? |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Active noise cancelling CPU cooler | kladner | Hardware | 3 | 2013-06-21 23:30 |
| strange problem: efficient 'radix sums' | jasonp | Programming | 13 | 2013-05-16 19:11 |
| Roundoff error | bcp19 | Software | 4 | 2013-02-14 21:23 |
| HDD noise | Jugger | Information & Answers | 10 | 2011-03-30 06:25 |
| Playing with different radix | LoveCraft | Programming | 7 | 2005-11-14 07:59 |