mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2003-03-12, 15:09   #12
Cyclamen Persicum
 
Cyclamen Persicum's Avatar
 
Mar 2003

10100012 Posts
Default Prime95 secret

It seems I begin understand why other preshmen's progs are
very slow.

1) The slowest: don't use FFT at all, use scholar multiplication
O(n^2) instead.

2) Slow: Use two FFT every iteration

3) Fast: Use only once FFT at start-up and than perform
all calculation with FFT-images in complex field.

Is 3-rd item correct and does Prime95 do so?
Cyclamen Persicum is offline   Reply With Quote
Old 2003-03-12, 17:04   #13
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22·3·7·139 Posts
Default

Quote:
Originally Posted by cperciva
http://www.ams.org/journal-getitem?pii=S0025-5718-02-01419-9

Or, for those of you without a subscription, a preprint is at http://www.sfu.ca/~cperciva/mpaper.ps

I refer to your F24 paper in a section discussing previous error bound work -- Crandall sent me a copy a couple years ago.
Note that the error analysis in the final version is greatly improved from the one Richard C. sent you, and from the version available on his perfsci.com website. Our initial version had some significant algebra errors.

I used the revised version to estimate what FFT length would be needed for a Pe'pin test (or p-1/ecm factoring work) on F31, before Kruppa & Forbes found the small factor of that number. The error analysis (with the key adjustable asymptotic constant set based on data from runs with GIMPS-sized numbers) predicts that N = 2^27 floating doubles should me more than adequate to test F31-sized numbers - in fact the slightly smaller 15*2^23 should also suffice. Since I didn't have a non-power-of-2 runlength capability in my Fermat code (although one can use such for Fermat-mod DWT, by combining the standard negacyclic weighting with a Mersenne-like variable-base DWT - the final version of the paper also discusses this), I used my Mersenne code with M(M31) as input for the test. Sure enough, ~100K iterations with lengths 15*2^23 and 2^27 yielded no fractional errors anywhere close to dangerous (or at least what we consider dangerous based on lots of practical experience with GIMPS), but 14*2^23 quickly yielded fatal RO errors. The same error analysis also predicts that 16 bits per input will become too large around F35-36, at which point being able to do a non-power-of-2-runlength Fermat-mod DWT will become very important - otherwise we'd have to drop down to 8 bits per input, which would more than double our runtime.
ewmayer is offline   Reply With Quote
Old 2003-03-12, 17:19   #14
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×3×7×139 Posts
Default Re: Prime95 secret

Quote:
Originally Posted by Cyclamen Persicum
It seems I begin understand why other preshmen's progs are
very slow.

1) The slowest: don't use FFT at all, use scholar multiplication
O(n^2) instead.

2) Slow: Use two FFT every iteration

3) Fast: Use only once FFT at start-up and than perform
all calculation with FFT-images in complex field.

Is 3-rd item correct and does Prime95 do so?
1) There are also FFT-based programs which are still suboptimal in that they use zero-padding of the input vectors and a separate modulo step to effect the FFT-based Mersenne-mod multiply, rather than the Crandall/Fagin irrational-base discrete weighted transform (IBDWT; I actually prefer "mixed-base" DWT, but since Crandall devised it, he gets to name it), which needs no zero-padding of inputs.

2) No, all the best programs still need 2 FFTs per iteration - they just do them very efficiently.

3) I'm not sure what you mean here. If you're referring to doing all the iterations entirely in Fourier space, that has been much discussed, but simply doesn't work - you need to come out of Fourier space to do error removal (in a floating-point-FFT-based scheme), normalization of digits and carry propagation. Then you need to take the resulting new set of input digits and get back into Fourier space to do the squaring, and so on. There's your two FFTs - no way around that, I'm afraid.

So there's no black magic, no incredibly sophisticated mathematical sleight of hand (besides the FFT-based multiply and the DWT, both of which are really nifty but certainly not miraculous by any stretch) - just a small amount of higher mathematics and a hell of a lot of hard work. (Very much in the spirit of Thomas Edison's famous phrase about genius being 10% inspiration and 90% perspiration. Many would argue that the 10% figure is much too high. :))
ewmayer is offline   Reply With Quote
Old 2003-03-12, 19:28   #15
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×3×7×139 Posts
Default

Quote:
Originally Posted by ewmayer
Quote:
Originally Posted by cperciva
http://www.ams.org/journal-getitem?pii=S0025-5718-02-01419-9

Or, for those of you without a subscription, a preprint is at http://www.sfu.ca/~cperciva/mpaper.ps

I refer to your F24 paper in a section discussing previous error bound work -- Crandall sent me a copy a couple years ago.
Note that the error analysis in the final version is greatly improved from the one Richard C. sent you, and from the version available on his perfsci.com website. Our initial version had some significant algebra errors.
...the most important of which is the egregious model of RO errors using a random walk of N*log(N) steps. In the revised version, we come to same conclusion you do in your paper, that the appropriate model of FFT RO errors is one of a random walk of log(N) steps.
ewmayer is offline   Reply With Quote
Old 2003-03-12, 19:40   #16
cperciva
 
Oct 2002

43 Posts
Default

I feel rather bad, now, about referring to your paper (or rather, the estimate contained in it) so roughly... but my comments were entirely accurate when I wrote them. (In fact, I seem to recall mentioning it to Richard C. at the time, and being told something to the effect of "well, you might be right, but we're not going to change it now" -- but that was a couple years ago, so I might be getting confused with something else.)
cperciva is offline   Reply With Quote
Old 2003-03-12, 20:10   #17
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×3×7×139 Posts
Default

Quote:
Originally Posted by cperciva
I feel rather bad, now, about referring to your paper (or rather, the estimate contained in it) so roughly... but my comments were entirely accurate when I wrote them. (In fact, I seem to recall mentioning it to Richard C. at the time, and being told something to the effect of "well, you might be right, but we're not going to change it now" -- but that was a couple years ago, so I might be getting confused with something else.)
No, no, the (somewhat hastily banged-together, but that is no excuse) error analysis in the version of the paper you got deserved to be taken to task. But in doing the revisions we corrected the obvious buffoonery. I believe the clue to the "mystery" you mention in your paper of how such a flawed analysis could still produce a result that matches observed behavior is in the difference between FFT convolution outputs and the attendant RO errors: since each of the former is a weighted sum of the N (presumably random) inputs with complex roots of unity for the weights, these behave like an N-step random walk. Since the attendant relative floating-point errors behave like a random walk of O(log(N)) steps, the combination yields absolute RO errors that might well be approximated via a random walk (of machine-epsilon stepsize) with O(N*log(N)) steps. In the final paper I was very careful to make the distinction between the behavior of convolution outputs and RO errors, especially in the sense that the former (in the sense of the exact value) are independent of the details of the computation, whereas the latter depend crucially on the structure of the computation. It's certainly not the last word in FFT error analysis, but the result seems quite useful in practice. Obviously if it is important that the result be provably correct, we need some way of guarding against inputs that approximate the worst-case ones you mention in your paper, but for GIMPS work, because of the double-checking that gets done, we really do want to skate quite close to the ragged edge of accuracy in order to maximize speed.
ewmayer is offline   Reply With Quote
Old 2003-03-12, 20:38   #18
cperciva
 
Oct 2002

4310 Posts
Default

Quote:
Originally Posted by ewmayer
No, no, the (somewhat hastily banged-together, but that is no excuse) error analysis in the version of the paper you got deserved to be taken to task.
Yes, but people reading my paper and the published version of your paper aren't going to understand that I was referring to an earlier, unpublished, version of your paper.

Quote:
In the final paper I was very careful to make the distinction between the behavior of convolution outputs and RO errors, especially in the sense that the former (in the sense of the exact value) are independent of the details of the computation, whereas the latter depend crucially on the structure of the computation.
Well, the constant factors will vary, but all floating-point FFTs should have the same overall error behaviour.

Quote:
Obviously if it is important that the result be provably correct, we need some way of guarding against inputs that approximate the worst-case ones you mention in your paper, but for GIMPS work, because of the double-checking that gets done, we really do want to skate quite close to the ragged edge of accuracy in order to maximize speed.
Indeed; I made that point in the conclusion of my paper. I wish I could do better than what I've got, actually -- I wish I could do some extra computation alongside the FFT which would prove a stronger error bound specific to the data in question -- but apart from the trivial couple of bits from the weight of the input vectors, I can't see how to do this.
cperciva is offline   Reply With Quote
Old 2003-03-13, 11:31   #19
Cyclamen Persicum
 
Cyclamen Persicum's Avatar
 
Mar 2003

34 Posts
Default

Quote:
There are also FFT-based programs which are still suboptimal in that they use zero-padding of the input vectors and a separate modulo step to effect the FFT-based Mersenne-mod multiply
Thank you very much! My naive prog does exactly the same.
But it's still incredible that Prime95 is 30-times faster than such
suboptimal progs (instead 2 or 4 times).

How long digits are safe enough and optimal for 80- and 64- bit FFT?
(I believe that 16 and 8 bit per "digit")
Cyclamen Persicum is offline   Reply With Quote
Old 2003-03-13, 18:26   #20
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×3×7×139 Posts
Default

Quote:
Originally Posted by Cyclamen Persicum
How long digits are safe enough and optimal for 80- and 64- bit FFT?
(I believe that 16 and 8 bit per "digit")
If you're using x86 80-bit long doubles only while data are in registers (and to use them everywhere is going to be slow, as Prime95 points out, since that doubles the number of loads and stores), there's actually not much difference in accuracy between the x86 and a CPU that uses 64-bit doubles - for a given FFT length, you'll be able to use a max. exponent perhaps 1-2% larger on the x86.

For a guide to how many bits per diigt can safely be used, look at http://www.mersenne.org/bench.htm . For instance, at 1024K (that's the number of *real* elements in the FFT - if you're using a complex-data FFT that means 19 radix-2 passes, not 20) one can use ~19 bits per input digit. Of course to achieve this in practice requires careful coding - you need to use the balanced-digit representation described in the 1994 Crandall/Fagin Math. Comp. paper, use the DWT to halve the vector length over a zero-padded implementation, and preferably use a higher-radix FFT to redce the number of passes through the data (= faster) and reduce the number of complex multiplies by FFT sincos data (= more accurate).
ewmayer is offline   Reply With Quote
Old 2003-03-19, 18:14   #21
Cyclamen Persicum
 
Cyclamen Persicum's Avatar
 
Mar 2003

34 Posts
Default IBWDT

*** ewmayer
Would you be so kind as explaining me how IBWDT works?

According to its name I've tried to multiply
79 by 83 (mod 100) using 101^0.5 (i.e. 10.04987562112089)
as base.
In this system 79 and 83 will be, for instance,
(7 8.65087065215377)
and
(8 2.60099503103288).

Two FFT and one iFFT go here, which lead us to
(87.41393043446033 78.50087158036013)

The cyclic carry prolonging gives us:
(4.96504984437232 7.10186661139301)

In decimal system it will be:
57.000000000000

That is rare luck then irrationality was eliminated at the
very end, but more often was not!

What is the correct algorithm?
Thank you in advance!!!
Cyclamen Persicum is offline   Reply With Quote
Old 2003-03-20, 22:10   #22
pakaran
 
pakaran's Avatar
 
Aug 2002

3·83 Posts
Default Re: Prime95 secret

Quote:
Originally Posted by Cyclamen Persicum
It seems I begin understand why other preshmen's progs are
very slow.

1) The slowest: don't use FFT at all, use scholar multiplication
O(n^2) instead.
Meaning "fourth grade" multiplication, or did you mean "scalar" multiplication?
pakaran is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
why is http://www.mersenne.org/ so slow Unregistered Information & Answers 19 2012-04-17 03:12
Slow Down? R.D. Silverman GMP-ECM 55 2011-10-16 17:28
How hot is too hot? Slow is too slow? petrw1 Hardware 13 2008-11-10 23:25
Slow computer Housemouse Hardware 7 2008-02-15 18:18
Really slow machines? Doorbasher Hardware 5 2004-08-23 22:18

All times are UTC. The time now is 13:04.


Tue Dec 7 13:04:51 UTC 2021 up 137 days, 7:33, 0 users, load averages: 1.58, 1.54, 1.62

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.