2019-11-27, 00:05 | #1 |
∂^{2}ω=0
Sep 2002
República de California
263F_{16} Posts |
Pointwise error correction of convolution outputs
Given a multiprecision integer input with respect to some fixed (or dual-size, for Mersenne-mod DWT) base convenient for computer arithmetic, as Schönhage & Strassen famously showed, autosquare of such an int with itself or multiply with a second such int is just a discrete convolution of the base-normalized input vectors. For efficiency we effect such convolution using fast transform, which needs O(n log n) machine-arithmetic operations for an n-word-discretized input integer. Using floating-point arithmetic for the transform we often find ourselves working near the bit limits of the floating-point data type, which manifests as fractional parts in the round-and-carry of the convolution outputs being so large that we can no longer be confident that we are rounding a given convolution output element in the correct direction - is that raw output 147180559945309.4375 a roundoff-error-smudged 147180559945309 or really a 147180559945310? When we hit such errors in practice we typically take the rather drastic measure of discarding the current result and resuming the long-chain-of-modmuls computation from some earlier saved "good" state, now using a larger convoluton length in order to cut the roundoff errors to a smaller size. For runs which are close to such an accuracy-limit boundary, it would be nice if there were a cheaper 'pointwise' form of error correction, in which one or a handful of individual output elements could be 'redone' as needed. There is in fact an obvious way to do this: for 2-input convolution C = A * B, the (j)th output element c_j is just the inner product of vector A with a j-term circularly-shifted copy of vector B. Thus if we have saved copies of the original pure-integer inputs, we can construct an exact version of any desired one of the n pure-integer output elements using O(n) arithmetic operations. Doing so for each of the n outputs amounts to grammar-school multiply with its for-large-n-prohibitive O(n^2) opcount, but just one or a handful of individual output elements, not every iteration but every hundred or fewer, can be redone this way for negligible work relative to the fast O(n log n) convolution, and can be done so on an as-needed basis: simply store a list of output elements which had dangerously large roundoff errors during the round-and-carry step (i.e. list of [index, raw (unrounded, no-carryin-applied) floating-point output element] pairs), then recompute those individual output elements from the saved pure-integer inputs, and adjust [-1,0,+1] as needed based on the result.
One assumption this makes is that the unmodified FFT-mul code is alredy doing every-word-and-every-iteration roundoff error checking - my Mlucas code does that, since modern instruction sets all have decently efficient round(x) instructions, and having just a single version of the relevant round-and-carry macro makes code maintenance easier. The other gating performance issue so far as I can see is: How cheaply can we do the needed pure-integer-input copy-saving during each round-and-carry step? The cost would need to be on the order of 1% of the total computation cost in order to make the slight gain in max-exponent-per-transform-length worthwhile. Last fiddled with by ewmayer on 2019-11-27 at 02:36 |
2019-12-05, 00:16 | #2 |
Sep 2016
7·47 Posts |
Ideas:
I'm not familiar with the structure of an optimized mersenne-transform (due to its asymmetry). But if the FFT is out-of-place and you're expending bandwidth to save the integer time-domains, you don't really need to save it. Once you have your post-convolution results with suspect bad results (due to round-off), you can reverse the convolution using square-roots at the pointwise step. This will give you the original inputs which are much smaller and very close to integers. The catch here is that you don't know which square root to take. Assuming it matters and there's no easy to find out, you can build a compressed bit-vector during the forward convolution to tell you the sign of each point (the real or imaginary parts). This can then be used resolve which square root is to be taken. Also, this obviously won't catch hardware errors. Obviously there is high computational cost to this - especially on reversing the iteration due to the square roots. But assuming we're in a bandwidth-constrained environment, we can save bandwidth on the steady state. ------ On the error-correction topic. If you have at most 2 suspect output coefficients, you can use the integer sum-in/sum-out to recover them. Evaluate the input and output polynomials with x = 1 and x = -1. If I didn't overlook anything, that gives you two equations and two unknowns which can be solved to recover the two missing coefficients. If you also use i and -i, then that would let you correct up to 4 coefficients. It's harder to go beyond that since the roots-of-unity become irrational. Last fiddled with by Mysticial on 2019-12-05 at 00:42 |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mersenne software error detection or correction opportunities | kriesel | Software | 6 | 2018-08-06 21:15 |
@ Supermods: Error Correction please | Raman | Forum Feedback | 72 | 2013-06-22 07:24 |
LLR SUM(INPUTS) != SUM(OUTPUTS) error | jbristow | Software | 4 | 2007-08-14 04:07 |
ERROR: SUM(INPUTS) != SUM(OUTPUTS) | flava | Hardware | 3 | 2004-01-19 17:52 |
ERROR: SUM(INPUTS) != SUM(OUTPUTS) | ebx | Software | 5 | 2004-01-02 22:25 |