![]() |
Implementing the Irrational-base discrete weighted transform
Hello all,
Recently I've tried to implement the IBDWT algorithm, with a mind to use it to perform the Lucas-Lehmer algorithm. I know any such implementation would probably be slow, but I'd like to learn how to perform it. I used the pseudocode found in "Prime Numbers: A Computational Perspective" by Pomerance as a guide. However I seem to have run in a couple problems, one conceptual, one a bug. The conceptual problem is that the algorithm seems to be performing squarings mod 2^p, instead of mod (2^p)-1. I think I've missed something in the description. The second I think is a bug. After a couple iterations, the square of the sum-in and the sum-out start to no longer match, and I get garbage results. I've wracked my brain trying to figure out what's causing it, but to no avail. (I'm sure it's something stupid I've overlooked) I'd appreciate it if anyone with some knowledge of the workings of the transformation could briefly look over my code and see if they spot any errors. I'd appreciate it. Here it is: [url]http://cs-people.bu.edu/coldfury/fft.cpp[/url] Thanks! |
My 10 second review:
1) Expecting square sum inputs to be equal to sum outputs plus-or-minus 0.1 may be unrealistic, depending on how many bits you are putting in each double. 2) Use a more accurate value of PI. Good luck. FFTs are a nightmare to debug. |
Thanks!
I think I've fixed the sumin/sumout bug. I think I was adding the sums at the wrong time. Before I was computing sumin before multiplying the weights in the forward transform, and the sumout after diving out the weights in the inverse transform. I discovered today that if you take the sumin after multiplying by the weights in the forward, and the sumout [i]before[/i] dividing out the weights, they seem to match. I'm not sure if this is the proper way, but they always seem to match. Speaking of bits in the doubles, right now I've got it set to 30 bits per double, but I have a feeling I could go higher. Is there any rule of thumb for the maximum number of bits you can use in a double as the base? Unfortunately, I still have the problem with the mod not being done correctly. In the degenerate case where the signal only has one element, the number is being done mod 2^p, instead of mod 2^p-1. If the signal has more than one element, the number seems to end up somewhere between mod 2^p and mod 2^p-1. I suspect the reason is because I'm skipping a step in the pseudo-code, mainly because the step is quite ambiguous, and I don't know what it means. The pseudo-code is as follows: <right after "modular reduction", z is the signal we're working with, and q=2^p-1> "z = z mod q" Now I thought the whole point of the IBWDT is that the mod is done automatically? In that case, why must it be reduced explicitely as well, after we've already performed the modular reduction? I suspect once I understand what that ambiguous line means, I can get it working. |
[quote="ColdFury"]Speaking of bits in the doubles, right now I've got it set to 30 bits per double, but I have a feeling I could go higher. Is there any rule of thumb for the maximum number of bits you can use in a double as the base?[/quote]
A pretty good rule of thumb is that you should be able to use ~22 bits per input (averaged over all N words of the input) for a 1K input vector (i.e. 1K REAL elements, which means a length-512 complex FFT). For every doubling of the FFT length beyond this, subtract 0.25-0.30 bits. For every halving of the FFT length, add 0.25-0.30 bits. For very small FFT lengths you'll probably be able to be a bit more aggressive, but the maximum will be set at length 1, where exactly storing the square of your single input in a 53-bit IEEE double mantissa limits your inputs to ~53/2 = 26.5 bits. A more-detailed explanation of this behavior is available [url=http://www.mersenneforum.org/attachments/pdfs/F24.pdf]here.[/url] [quote]Unfortunately, I still have the problem with the mod not being done correctly. In the degenerate case where the signal only has one element, the number is being done mod 2^p, instead of mod 2^p-1.[/quote] For n = 1 the single DWT weight = 1, the forward and inverse transforms are trivial, and the only actual operation that is not is the dyadic square, which amounts to just squaring your input. (And don't forget to subtract 2 fromt that.) You still need to normalize the result and propagate any carry - in this case, if you're working in base 2^b, you simply integer-divide (input^2 - 2) by 2^b and add the result to (input^2 - 2)%b, i.e. you 'fold' the upper bits of the output in with the lower b bits. If you forget to do this carry step and merely take the mod of the output with respect to 2^p, your result will be as you say, correct mod 2^p, rather than mod 2^p-1. [quote]If the signal has more than one element, the number seems to end up somewhere between mod 2^p and mod 2^p-1.[/quote] If you're not doing the carry step properly for n=1, it would most likely affect your n > 1 results, as well. |
[quote]For n = 1 the single DWT weight = 1, the forward and inverse transforms are trivial, and the only actual operation that is not is the dyadic square, which amounts to just squaring your input. (And don't forget to subtract 2 fromt that.) You still need to normalize the result and propagate any carry - in this case, if you're working in base 2^b, you simply integer-divide (input^2 - 2) by 2^b and add the result to (input^2 - 2)%b, i.e. you 'fold' the upper bits of the output in with the lower b bits. If you forget to do this carry step and merely take the mod of the output with respect to 2^p, your result will be as you say, correct mod 2^p, rather than mod 2^p-1.
[/quote] Thanks! I think that's what the book was referring to. Is this folding performed on the signal as a whole, or is it done individually to each element in the signal? The book pseudo-code places the step after the modular reduction, but this doesn't seem intuitive to me. The modular reduction already reduces the number, so folding the signal and reducing it yet again mod 2^p-1 shouldn't do anything. Therefore I think the folding is meant to be done as part of the modular reduction step. Correct? |
[quote="ColdFury"]Is this folding performed on the signal as a whole, or is it done individually to each element in the signal? The book pseudo-code places the step after the modular reduction, but this doesn't seem intuitive to me. The modular reduction already reduces the number, so folding the signal and reducing it yet again mod 2^p-1 shouldn't do anything. Therefore I think the folding is meant to be done as part of the modular reduction step. Correct?[/quote]
It's only done with the carry out of the most-significant element of your array. If you have a C-style array of n real elements, the carry left over after processing array[n-1] would get added to array[0]. Of course, along the way, each array[i] has gotten a carryin from array[i-1]. Actually, I should've used the word "wrapping" rather than folding, since when talking about the carryout of array[n-1] we're really talking about what is more-commonly referred to as the "wraparound carry." |
[quote]It's only done with the carry out of the most-significant element of your array. If you have a C-style array of n real elements, the carry left over after processing array[n-1] would get added to array[0]. Of course, along the way, each array[i] has gotten a carryin from array[i-1].
Actually, I should've used the word "wrapping" rather than folding, since when talking about the carryout of array[n-1] we're really talking about what is more-commonly referred to as the "wraparound carry."[/quote] Thanks! It's finally working correctly! Now I have a question more in terms of optimization. As you know, there's a variety of real-only signal transforms. Would this scheme work on any of those transformations, or does it only work with the FFT? |
[quote="ColdFury"]Now I have a question more in terms of optimization. As you know, there's a variety of real-only signal transforms. Would this scheme work on any of those transformations, or does it only work with the FFT?[/quote]
We're dealing with a fundamentally real signal here, so a real-signal transform should work just fine - Prime95 uses such a transform. OTOH a standard complex FFT (treating the n real inputs as n/2 complex numbers, i.e. packing a pair of real inputs into every complex element) with complex/real wrappers to give the correct real-signal output (as described e.g. in Numerical Recipes) also works fine - my Mlucas code and Guillermo Valor's Glucas both use such a scheme. Both types of transfroms, properly implemented, have virtually identical opcounts and data access pattern issuues. It's the latter that's key to getting good performance - how do you perform all the levels of the FFT, which has different index strides on each pass and is very data-nonlocal, while keeping as high a percentage of the data needed at any given time in the high-speed caches of the system memory hierarchy? That's where over 90% of the optimization effort needed to get optimal performance is. One other note on the complex transform: When George was developing Prime95 he said the (j, n-j) index correlations in the complex/real wrappers were killing his cache performance, but it turns out there are ways to make this step quite cache-friendly as well. But it's nontrivial, and depends on how you do your forward and inverse FFTs. And remember, all of the folks that have written high-performance codes of this type have put literally thousands of hours into it. Writing a basic LL test program using the DWT isn't too bad - writing a world-class one is orders of magnitude more work. I don't want to discourage you, just to make you aware of the realities. |
Something really weird is going on here with the phpBB software: I clicked "reply" to reply to Coldfury's last post. When the edit window containing the quoted text appeared, the quote="Coldfury" I expected to see in the first [] quote box was missing, so I added it, then typed my reply below the quoted text. When I submitted it, it appeared under ColdFury's name, as you can see at left!
Ernst (ewmayer) [quote="ColdFury"]Is it just me or does anyone else find the FFT code in Numerical Recipes in C horrendous? Indexing an array starting at 1, yuck[/quote] The unit starting index is a legacy from NR's early Pascal and Fortran days - at the time the series was started, C was not considered a "serious" language for scientific computation - and it still has some major drawbacks in this respect, particularly with respect to its handling of multi-dimensional arrays. Of course Fortran makes it very easy to use a zero-offset index (or any starting index one wishes, including negative ones), NR just went with the lazy programmer's default of 1 here. The NR FFT isn't great shakes, but is adequate for someone getting started (my orginal LL implementation used it) and who doesn't mind being restricted to power-of-2 vector lengths and having code that runs 2-5x slower than a really good FFT does. OTOH, NR's multiprecision arithmetic package, now [b]that's[/b] horrendous. -Ernst (no, really!) (Weirdly enough, I can edit this post like I actually wrote it - ColdFury) Me, too (wierdly --> weirdly), but when I edit it, it doesn't add to the "edited xyz times counter. I do believe we've discovered a wormhole in the spacetime fabric of the forum. -EWM |
[quote](Is it just me or does anyone else find the FFT code in Numerical Recipes in C horrendous? Indexing an array starting at 1, yuck)[/quote]
Well, there is a reason, I think. First version of Numerical Recipes was written in Pascal, and Pascal uses indexed arrays starting from position 1 :-) I should have that 5,25" floppy disk with the Pascal code somewhere in my messy storage room... Luigi |
Whoa, is it just me, or am I actually seeing my reply to ColdFury's last post appearing under his name? That's really bizarre - better add some text to it to explain that it's really from me.
|
[quote="ewmayer"]Whoa, is it just me, or am I actually seeing my reply to ColdFury's last post appearing under his name? That's really bizarre - better add some text to it to explain that it's really from me.[/quote]
So is space and time "folding" on itself, of "wraping" on itself :question: |
[quote="eepiccolo"]So is space and time "folding" on itself, of "wraping" on itself :question:[/quote]
I think this type of distortion of the bulletin-board continuum can occur when your computer mouse is traveling above "Wrap 9." |
[quote="ewmayer"]I think this type of distortion of the bulletin-board continuum can occur when your computer mouse is traveling above "Wrap 9."[/quote]
And slingshoting aroung the Sun... workstation |
What a strange bug...
|
| All times are UTC. The time now is 17:25. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.