mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2003-06-15, 09:15   #1
ColdFury
 
ColdFury's Avatar
 
Aug 2002

26·5 Posts
Default Implementing the Irrational-base discrete weighted transform

Hello all,

Recently I've tried to implement the IBDWT algorithm, with a mind to use it to perform the Lucas-Lehmer algorithm. I know any such implementation would probably be slow, but I'd like to learn how to perform it.

I used the pseudocode found in "Prime Numbers: A Computational Perspective" by Pomerance as a guide. However I seem to have run in a couple problems, one conceptual, one a bug.

The conceptual problem is that the algorithm seems to be performing squarings mod 2^p, instead of mod (2^p)-1. I think I've missed something in the description.

The second I think is a bug. After a couple iterations, the square of the sum-in and the sum-out start to no longer match, and I get garbage results. I've wracked my brain trying to figure out what's causing it, but to no avail. (I'm sure it's something stupid I've overlooked)

I'd appreciate it if anyone with some knowledge of the workings of the transformation could briefly look over my code and see if they spot any errors. I'd appreciate it.

Here it is: http://cs-people.bu.edu/coldfury/fft.cpp

Thanks!
ColdFury is offline   Reply With Quote
Old 2003-06-15, 13:59   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×53×71 Posts
Default

My 10 second review:

1) Expecting square sum inputs to be equal to sum outputs plus-or-minus 0.1 may be unrealistic, depending on how many bits you are putting in each double.

2) Use a more accurate value of PI.

Good luck. FFTs are a nightmare to debug.
Prime95 is online now   Reply With Quote
Old 2003-06-16, 04:46   #3
ColdFury
 
ColdFury's Avatar
 
Aug 2002

26×5 Posts
Default

Thanks!

I think I've fixed the sumin/sumout bug. I think I was adding the sums at the wrong time. Before I was computing sumin before multiplying the weights in the forward transform, and the sumout after diving out the weights in the inverse transform. I discovered today that if you take the sumin after multiplying by the weights in the forward, and the sumout before dividing out the weights, they seem to match. I'm not sure if this is the proper way, but they always seem to match.

Speaking of bits in the doubles, right now I've got it set to 30 bits per double, but I have a feeling I could go higher. Is there any rule of thumb for the maximum number of bits you can use in a double as the base?

Unfortunately, I still have the problem with the mod not being done correctly. In the degenerate case where the signal only has one element, the number is being done mod 2^p, instead of mod 2^p-1. If the signal has more than one element, the number seems to end up somewhere between mod 2^p and mod 2^p-1. I suspect the reason is because I'm skipping a step in the pseudo-code, mainly because the step is quite ambiguous, and I don't know what it means. The pseudo-code is as follows:

<right after "modular reduction", z is the signal we're working with, and q=2^p-1>
"z = z mod q"

Now I thought the whole point of the IBWDT is that the mod is done automatically? In that case, why must it be reduced explicitely as well, after we've already performed the modular reduction?

I suspect once I understand what that ambiguous line means, I can get it working.
ColdFury is offline   Reply With Quote
Old 2003-06-16, 18:29   #4
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103×113 Posts
Default

Quote:
Originally Posted by ColdFury
Speaking of bits in the doubles, right now I've got it set to 30 bits per double, but I have a feeling I could go higher. Is there any rule of thumb for the maximum number of bits you can use in a double as the base?
A pretty good rule of thumb is that you should be able to use ~22 bits per input (averaged over all N words of the input) for a 1K input vector (i.e. 1K REAL elements, which means a length-512 complex FFT). For every doubling of the FFT length beyond this, subtract 0.25-0.30 bits. For every halving of the FFT length, add 0.25-0.30 bits. For very small FFT lengths you'll probably be able to be a bit more aggressive, but the maximum will be set at length 1, where exactly storing the square of your single input in a 53-bit IEEE double mantissa limits your inputs to ~53/2 = 26.5 bits.

A more-detailed explanation of this behavior is available here.

Quote:
Unfortunately, I still have the problem with the mod not being done correctly. In the degenerate case where the signal only has one element, the number is being done mod 2^p, instead of mod 2^p-1.
For n = 1 the single DWT weight = 1, the forward and inverse transforms are trivial, and the only actual operation that is not is the dyadic square, which amounts to just squaring your input. (And don't forget to subtract 2 fromt that.) You still need to normalize the result and propagate any carry - in this case, if you're working in base 2^b, you simply integer-divide (input^2 - 2) by 2^b and add the result to (input^2 - 2)%b, i.e. you 'fold' the upper bits of the output in with the lower b bits. If you forget to do this carry step and merely take the mod of the output with respect to 2^p, your result will be as you say, correct mod 2^p, rather than mod 2^p-1.

Quote:
If the signal has more than one element, the number seems to end up somewhere between mod 2^p and mod 2^p-1.
If you're not doing the carry step properly for n=1, it would most likely affect your n > 1 results, as well.
ewmayer is offline   Reply With Quote
Old 2003-06-17, 02:09   #5
ColdFury
 
ColdFury's Avatar
 
Aug 2002

1010000002 Posts
Default

Quote:
For n = 1 the single DWT weight = 1, the forward and inverse transforms are trivial, and the only actual operation that is not is the dyadic square, which amounts to just squaring your input. (And don't forget to subtract 2 fromt that.) You still need to normalize the result and propagate any carry - in this case, if you're working in base 2^b, you simply integer-divide (input^2 - 2) by 2^b and add the result to (input^2 - 2)%b, i.e. you 'fold' the upper bits of the output in with the lower b bits. If you forget to do this carry step and merely take the mod of the output with respect to 2^p, your result will be as you say, correct mod 2^p, rather than mod 2^p-1.
Thanks! I think that's what the book was referring to. Is this folding performed on the signal as a whole, or is it done individually to each element in the signal? The book pseudo-code places the step after the modular reduction, but this doesn't seem intuitive to me. The modular reduction already reduces the number, so folding the signal and reducing it yet again mod 2^p-1 shouldn't do anything. Therefore I think the folding is meant to be done as part of the modular reduction step. Correct?
ColdFury is offline   Reply With Quote
Old 2003-06-17, 16:18   #6
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103×113 Posts
Default

Quote:
Originally Posted by ColdFury
Is this folding performed on the signal as a whole, or is it done individually to each element in the signal? The book pseudo-code places the step after the modular reduction, but this doesn't seem intuitive to me. The modular reduction already reduces the number, so folding the signal and reducing it yet again mod 2^p-1 shouldn't do anything. Therefore I think the folding is meant to be done as part of the modular reduction step. Correct?
It's only done with the carry out of the most-significant element of your array. If you have a C-style array of n real elements, the carry left over after processing array[n-1] would get added to array[0]. Of course, along the way, each array[i] has gotten a carryin from array[i-1].

Actually, I should've used the word "wrapping" rather than folding, since when talking about the carryout of array[n-1] we're really talking about what is more-commonly referred to as the "wraparound carry."
ewmayer is offline   Reply With Quote
Old 2003-06-18, 04:19   #7
ColdFury
 
ColdFury's Avatar
 
Aug 2002

32010 Posts
Default

Quote:
It's only done with the carry out of the most-significant element of your array. If you have a C-style array of n real elements, the carry left over after processing array[n-1] would get added to array[0]. Of course, along the way, each array[i] has gotten a carryin from array[i-1].

Actually, I should've used the word "wrapping" rather than folding, since when talking about the carryout of array[n-1] we're really talking about what is more-commonly referred to as the "wraparound carry."
Thanks! It's finally working correctly! Now I have a question more in terms of optimization. As you know, there's a variety of real-only signal transforms. Would this scheme work on any of those transformations, or does it only work with the FFT?
ColdFury is offline   Reply With Quote
Old 2003-06-18, 19:05   #8
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103×113 Posts
Default

Quote:
Originally Posted by ColdFury
Now I have a question more in terms of optimization. As you know, there's a variety of real-only signal transforms. Would this scheme work on any of those transformations, or does it only work with the FFT?
We're dealing with a fundamentally real signal here, so a real-signal transform should work just fine - Prime95 uses such a transform. OTOH a standard complex FFT (treating the n real inputs as n/2 complex numbers, i.e. packing a pair of real inputs into every complex element) with complex/real wrappers to give the correct real-signal output (as described e.g. in Numerical Recipes) also works fine - my Mlucas code and Guillermo Valor's Glucas both use such a scheme. Both types of transfroms, properly implemented, have virtually identical opcounts and data access pattern issuues. It's the latter that's key to getting good performance - how do you perform all the levels of the FFT, which has different index strides on each pass and is very data-nonlocal, while keeping as high a percentage of the data needed at any given time in the high-speed caches of the system memory hierarchy? That's where over 90% of the optimization effort needed to get optimal performance is.

One other note on the complex transform: When George was developing Prime95 he said the (j, n-j) index correlations in the complex/real wrappers were killing his cache performance, but it turns out there are ways to make this step quite cache-friendly as well. But it's nontrivial, and depends on how you do your forward and inverse FFTs.

And remember, all of the folks that have written high-performance codes of this type have put literally thousands of hours into it. Writing a basic LL test program using the DWT isn't too bad - writing a world-class one is orders of magnitude more work. I don't want to discourage you, just to make you aware of the realities.
ewmayer is offline   Reply With Quote
Old 2003-06-19, 02:10   #9
ColdFury
 
ColdFury's Avatar
 
Aug 2002

1010000002 Posts
Default

Something really weird is going on here with the phpBB software: I clicked "reply" to reply to Coldfury's last post. When the edit window containing the quoted text appeared, the quote="Coldfury" I expected to see in the first [] quote box was missing, so I added it, then typed my reply below the quoted text. When I submitted it, it appeared under ColdFury's name, as you can see at left!

Ernst (ewmayer)

Quote:
Originally Posted by ColdFury
Is it just me or does anyone else find the FFT code in Numerical Recipes in C horrendous? Indexing an array starting at 1, yuck
The unit starting index is a legacy from NR's early Pascal and Fortran days - at the time the series was started, C was not considered a "serious" language for scientific computation - and it still has some major drawbacks in this respect, particularly with respect to its handling of multi-dimensional arrays.

Of course Fortran makes it very easy to use a zero-offset index (or any starting index one wishes, including negative ones), NR just went with the lazy programmer's default of 1 here.

The NR FFT isn't great shakes, but is adequate for someone getting started (my orginal LL implementation used it) and who doesn't mind being restricted to power-of-2 vector lengths and having code that runs 2-5x slower than a really good FFT does.

OTOH, NR's multiprecision arithmetic package, now that's horrendous.

-Ernst (no, really!)

(Weirdly enough, I can edit this post like I actually wrote it - ColdFury)

Me, too (wierdly --> weirdly), but when I edit it, it doesn't add to the "edited xyz times counter. I do believe we've discovered a wormhole in the spacetime fabric of the forum. -EWM
ColdFury is offline   Reply With Quote
Old 2003-06-19, 12:43   #10
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

32×5×107 Posts
Default

Quote:
(Is it just me or does anyone else find the FFT code in Numerical Recipes in C horrendous? Indexing an array starting at 1, yuck)
Well, there is a reason, I think.

First version of Numerical Recipes was written in Pascal, and Pascal uses indexed arrays starting from position 1 :-)

I should have that 5,25" floppy disk with the Pascal code somewhere in my messy storage room...

Luigi
ET_ is offline   Reply With Quote
Old 2003-06-19, 15:51   #11
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103×113 Posts
Default

Whoa, is it just me, or am I actually seeing my reply to ColdFury's last post appearing under his name? That's really bizarre - better add some text to it to explain that it's really from me.
ewmayer is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Irrational Integers :) a1call Miscellaneous Math 16 2017-05-13 22:50
probability a number is prime with a weighted k. Trilo Homework Help 12 2014-06-06 19:17
Weighted Fermat factors Top 20 Merfighters Factoring 0 2010-04-13 14:16
Does Bush think God's irrational? jasong Soap Box 8 2006-05-22 17:32
Fast Odd Discrete Fourier Transform (ODFT) dsouza123 Miscellaneous Math 1 2005-11-13 21:37

All times are UTC. The time now is 18:21.


Fri Jul 16 18:21:21 UTC 2021 up 49 days, 16:08, 1 user, load averages: 2.64, 2.62, 2.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.