![]() |
|
|
#1 |
|
Sep 2002
Database er0rr
3,739 Posts |
On http://primes.utm.edu/notes/faq/multiply.html it says:
for number having more than 10,000,000 digits, multiple Number Theoretic Transforms and Chinese Remainder Theorem should be used, because accuracy of floating-point numbers of available processors is not large enough. is this applicable to p95/mprime and todays processors? |
|
|
|
|
|
#2 | |
|
Tribal Bullet
Oct 2004
3,541 Posts |
Quote:
To a certain extent, all of the floating-point testers rely on the law of averages to prevent undetected catastrophic convolution error. If they didn't, and instead only handled problem sizes that were guaranteed to not encounter these errors, tests would take several times longer and error-free multiplies could have an advantage. jasonp Last fiddled with by jasonp on 2006-09-18 at 15:20 |
|
|
|
|
|
|
#3 | |
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
Quote:
Note also that while in theory (in addition to its being exact, assuming convolution outputs all fit in an integer word) one might indeed expect an all-integer transform to be at least as fast as a floating-point one, but the reality is that due to the great versatility of floating-point for all sorts of computation, chip manufacturers have put a huge amount of research and money into maximizing floating-point throughput, and as a result, certain key arithmetic operations (most notably multiply) are generally faster in the floating-point units than in the integer ones. This multiply-speed disparity (especially for 64-bit arithmetic) was generally fairly large (typically an order of magnitude or more) until quite recently, in which time architectures like the Alpha, Itanium, and AMD64 have narrowed it quite a lot. but I believe the best anyone has been able to achieve even on one of these 64-bit-int beasts (Jason himself, as it happens) is an integer transform that runs roughly a factor 2 slower than a floating-point FFT. |
|
|
|
|
|
|
#4 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
11101011001102 Posts |
someone should email Chris Caldwell....
|
|
|
|
|
|
#5 |
|
∂2ω=0
Sep 2002
República de California
103×113 Posts |
Done.
One more comment about FFT vs. NTT: while the latter is immune to rounding error (though one still needs to be careful about convolution outputs growing too large), it is *not* immune to hardware errors, which, especially on the types of processors used by participants in large DC projects like GIMPS, are at least as big a source of trouble as rounding errors. And even if one could make the actual arithmetic-computation units truly bulletproof in this regard, one still has other potential sources of data corruption (e.g. badly written third-party device drivers, marginal memory, etc.) to contend with, which have nothing to do with the actual functioning of the ALU or FPU. Jason will recall that during our pure-integer distributed "wavefront" triple-checking of the F24 result (also described in the paper I linked to above), there were 3 or 4 occasions on which one of the PC "drones" doing a computational subinterval of Pepin-test iterations via NTT spat out an end-of-subinterval residue which disagreed with the residue for the same iteration given by the "leading-edge" floating-point computation. In all such cases the two machines doing matching FFT-based computations using independently developed codes (Jason's and mine) on different kinds of hardware agreed with other and disagreed with the NTT result in question, and rerun of the iteration subinterval in question on a different PC (but again using the same NTT-based code) gave a result that matched the one from the FFT runs. Perhaps not coincidentally, most of these FFT-versus-NTT interim-residue mismatches occurred during a summer heat wave in Portland, where most of the PCs doing the NTT verification work were housed. (Richard Crandall had recruited a bunch of Reed College colleagues and students for this.) Just to illustrate that the better-known kinds of errors (rounding and "cosmic-ray") are quite often fairly far down on our list of things to worry about in this respect. |
|
|
|
|
|
#6 |
|
∂2ω=0
Sep 2002
República de California
101101011101112 Posts |
Another thing I neglected to clarify: in the aforementioned paper's FFT error analysis, the key result is expression (8). Multiplying that expression by 1/2 and trivially rearranging, one can obtain an expression for log2(W), which is the maximum allowable bits per input word to the transform in terms of the transform length N. We see 3 terms in (8) which act to reduce the maximum allowable input wordsize as N increases, which can be grouped into 2 distinct categories (when I wrote the section of FFT error heuristics I rearranged things on the way to (8) to group terms raised to the 1/2 and 3/2 powers, but that had the side effect of mixing terms arising from the convolution output size and floating-point rounding errors - in retrospect that was probably a bad decision):
1) Convolution output size: appears via a [ log2(N)/4 + 3*log2(log(log(N)))/4 ] term - this is irrespective of transform type (floating or integer), it assumes only that balanced-digit representation is used for transform inputs; 2) FFT floating-point rounding errors: appear via a log2(log2(N))/4 term. Thus, the dominant part of the 0.25-plus-a-little reduction in bits-per-word per doubling of the transform length, namely part (1), comes simply from the increase in average convolution output size with increasing convolution length (for fixed-sized inputs). To put numbers on the relative magnitudes of the 2 effects here: for N=220, the term (1) is roughly 6 times larger than the term (2). In other words, starting with a 1-term convolution (N=1), the simple squaring step limits our input size to roughly half as many bits as our computational mantissa has (i.e. 53/2 or roughly 26 bits for floating doubles). Going to N=220 we lose roughly 7 bits of this N=1 allowable wordsize, giving the aforementioned 19 bits per input word (which translates to a maximum testable Mersenne exponent of ~ 20M): roughly 6 of these bits are simply due to the nature of convolution (i.e. independent of the nature of the machine arithmetic used), and only one is due to FFT rounding errors. (Note that that is not to be taken to mean that "only one bit is ever lost due to rounding" - that would be clearly ludicrous - it's how many bits of relative error in the convolution outputs result due to the finite precision, after the IFFT and divide-by-N/2 have been done.) More importantly, the asymptotic growth in rounding error is extremely slow - for a length-N FFT, it mirrors the average displacement of a random walk having O(log2(N)) steps, i.e. grows much more slowly than the convolution output sizes, which can modeled as a random walk of O(N) steps. Last fiddled with by ewmayer on 2006-09-18 at 23:44 |
|
|
|
|
|
#7 | ||
|
Tribal Bullet
Oct 2004
3,541 Posts |
Quote:
Regarding the speed of all-integer convolution, I only ever implemented the Fermat-mod variant, and if memory serves it managed a speed within 15% of mlucas on an opteron. I doubt it could get much faster than that though. Quote:
jasonp |
||
|
|
|
|
|
#8 | |
|
∂2ω=0
Sep 2002
República de California
1163910 Posts |
Quote:
|
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Finding multiples of a real number that are close to a whole number | mickfrancis | Math | 16 | 2017-03-01 07:17 |
| Number of distinct prime factors of a Double Mersenne number | aketilander | Operazione Doppi Mersennes | 1 | 2012-11-09 21:16 |
| Faster Fourier Transforms | Mini-Geek | Lounge | 2 | 2012-01-20 15:31 |
| Number-theoretic FPGA function implementation suggestions? | rdotson | Hardware | 18 | 2005-09-25 13:04 |
| Search for a number theoretic function related to "prime divisor sums" | juergen | Math | 2 | 2004-07-10 23:01 |