![]() |
|
|
#1 |
|
Mar 2003
5116 Posts |
I'm not a Gimps member, because the
probability to be a winner is zero, zero, module zero and and zero periodically... I have tried other progs available in Internet, but they as slow as continent drift. When I wrote my own, but it still 10 or 100 times slower. I don't know why and what is the secret of Prime95? It seems the 39th number was discovered without any SSE for only 1.5 month. But my prog requier 41 year to check it!!! I use two FFT for squaring, and modular reduction which worth nothing for 11111111...b. |
|
|
|
|
|
#2 |
|
"Richard B. Woods"
Aug 2002
Wisconsin USA
22×3×641 Posts |
The "secret" of Prime95 is twofold -- hardware and software.
Intel Pentium CPUs execute floating-point (FP) instructions more rapidly and with higher precision than most other manufacturers' CPUs. They can pipeline as fast as one FP instruction completion per clock cycle, whereas many others can execute FP no faster than two, three, or more cycles per instruction completed. In addition, Intel FP units have 80-bit internal registers, longer than the 64-bit registers common on other brands. With proper programming to keep computations inside those internal registers as long as possible before the eventual conversion to a 64-bit external result, Intel CPUs can produce 16 more bits of numerical result per FP instruction than other brands. George Woltman has intensively studied the details of Intel CPUs and worked for years to maximize the throughput of Prime95 software, programming the most speed-critical modules in assembler language in order to carefully interleave machine instructions so as to keep the various parts of the CPU as busy as possible churning out results with minimum wasted time. (I'm quite envious of George's work because I've done similar software optimization for speed, in the past, on older computers, but by the time I discovered what George was up to with Prime95, he was waaay ahead of where I'd have started. :) ) |
|
|
|
|
|
#3 | ||||
|
∂2ω=0
Sep 2002
República de California
2·32·647 Posts |
Quote:
In fact George's code is not better on a per-cycle basis than other top-quality LL test codes (Guillermo Valor's Glucas and my Mlucas) - on an Alpha ev6 I get even better per-cycle performance than George's code does on the P4 - but P4's are way cheaper and more numerous, and kudos to George for getting such good performance out of a chip with smaller caches and lower bandwidth to main memory. Also, the speed demons at Intel have been able to ratchet up chip speeds much faster than the RISC manufacturers. Quote:
Quote:
Quote:
|
||||
|
|
|
|
|
#4 | ||
|
Oct 2002
43 Posts |
Quote:
|
||
|
|
|
|
|
#5 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
165558 Posts |
Quote:
|
|
|
|
|
|
|
#6 | ||
|
Oct 2002
43 Posts |
Quote:
The critical point is that complex multiplication and addition introduce up to sqrt(5)+1 ulp of error, while loading and storing introduce only 1 ulp. (Unavoidable inaccuracies in the trig data contributes the remaining sqrt(1/2) ulp.) |
||
|
|
|
|
|
#7 |
|
Oct 2002
1010112 Posts |
(If anyone hasn't already read it, my paper on FFT error bounds has *finally* been published in Math. Comp.)
|
|
|
|
|
|
#8 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
35·31 Posts |
Quote:
When using 64-bit FP, each radix-4 "unit" does a complex multiply (4 muls, 2 adds) and two complex adds. That is 2 muls and 3 adds for each data value. In the 1M FFT squaring that means there are about 105 opportunities for error to accumulate. Are you saying this 5 to 1 ratio should result in an extra 1.5 bits of precision? Does that match the 1% to 2% larger exponents the x86 FFTs support? |
|
|
|
|
|
|
#9 |
|
Oct 2002
2B16 Posts |
It's somewhat more complicated than that; the (not perfectly accurate) trig data contributes some error, and the error introduced in complex multiplication can be bounded somewhat more closely than a naive operation count would indicate.
But to answer your second question, in the 20M range you're packing about 20 bits per double, so reducing the error by one bit would mean a change of 2.5% in the size of numbers for which you can use a given FFT length. That said, all my experience is with worst-case bounds; in practice the errors are much smaller; indeed, I could construct values for which the prime95 multiplication would fail -- but it doesn't matter since double-checking would find those anyway. |
|
|
|
|
|
#10 | |
|
∂2ω=0
Sep 2002
República de California
2D7E16 Posts |
Quote:
Now that you mention it, I see that Crandall, Papadopoulos' and my paper on F24 has also appeared, at long last - just over 3 years since we actually submitted the initial draft: http://www.ams.org/journal-getitem?pii=S0025-5718-02-01479-5 The LaTeX manuscript can be gotten for free from http://hogranch.com/mayer/F24.rev3.tex Math. Comp. is very slow, and (at least based on our experience, where one reviewer didn't do anything with the manuscript for over a year, without MC doing anything to remind them of their responsibility as a reviewer) has poor timeliness-of-review controls. But grousing about tardiness aside, that paper also has some words about errors in floating-point FFTs, with an approach that is more heuristic in nature and more geared to most-likely error behavior than Colin's. The error estimates thus derived seem to match real-world experience quite closely. |
|
|
|
|
|
|
#11 |
|
Oct 2002
43 Posts |
http://www.ams.org/journal-getitem?pii=S0025-5718-02-01419-9
Or, for those of you without a subscription, a preprint is at http://www.sfu.ca/~cperciva/mpaper.ps I refer to your F24 paper in a section discussing previous error bound work -- Crandall sent me a copy a couple years ago. |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| why is http://www.mersenne.org/ so slow | Unregistered | Information & Answers | 19 | 2012-04-17 03:12 |
| Slow Down? | R.D. Silverman | GMP-ECM | 55 | 2011-10-16 17:28 |
| How hot is too hot? Slow is too slow? | petrw1 | Hardware | 13 | 2008-11-10 23:25 |
| Slow computer | Housemouse | Hardware | 7 | 2008-02-15 18:18 |
| Really slow machines? | Doorbasher | Hardware | 5 | 2004-08-23 22:18 |