![]() |
Beta version 24.6 - Athlon users wanted
As some of you know the I've been rewriting the FFT code to bring big speed gains for SoB, LLR, OpenPFGW, and other projects. This rewrite is now complete.
The good news: Athlons (except 64-bit CPUs) are about 15% faster. :banana: The bad news: P3s are 33% slower. :yucky: I didn't time P4s, but they should be ever so slightly slower. I expect P2 and Pentium-M will also be slower. Athlon owners running Windows may want to try the new version at [url]ftp://mersenne.org/gimps/p95v246.zip[/url] Let me know if you find any bugs. Running a doublecheck or two would be nice. In the meantime, I'll work on further fine tuning the new FFT code and see if I can recover some of the loss in P3 timings. |
I went from about .156 to ~.114 a 13,xxx,xxx number on my 1.2 GHz ath.
THANKS!! |
Neat!
I'll switch my Athlon 1900MP (2x 1.6GHz) box over to double checking as soon as the current factoring assignment finishes. Probably take a little less than a month for the first results. Given the various sizes of L1/L2 cache in the Athlon/Duron/Sempron processors, is the new code optimized for one version in particular? Would you like benchmarks posted? I'm sure SalemTheCat100 will be pleased:wink: -Scott- |
why dont u do best of boath world i mean you already has a good speed for pents why not check first what cpu is installed then use the right tweeking like a driver kinda only for fft right drivers match right processcer
|
I am already doing DC's with my ath. Here are the benches in "non-safe mode" (ie how I typically run my machine.
The new: [code]AMD Athlon(tm) processor CPU speed: 1127.85 MHz CPU features: RDTSC, CMOV, PREFETCH, MMX L1 cache size: 64 KB L2 cache size: 256 KB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 24 L2 TLBS: 256 Prime95 version 24.6, RdtscTiming=1 Best time for 512K FFT length: 65.019 ms. Best time for 640K FFT length: 87.656 ms. Best time for 768K FFT length: 105.915 ms. Best time for 896K FFT length: 129.876 ms. Best time for 1024K FFT length: 145.119 ms. Best time for 1280K FFT length: 195.657 ms. Best time for 1536K FFT length: 235.093 ms. Best time for 1792K FFT length: 280.729 ms. Best time for 2048K FFT length: 321.764 ms.[/code] The old: [code]AMD Athlon(tm) processor CPU speed: 1127.80 MHz CPU features: RDTSC, CMOV, PREFETCH, MMX L1 cache size: 64 KB L2 cache size: 256 KB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 24 L2 TLBS: 256 Prime95 version 23.5, RdtscTiming=1 Best time for 384K FFT length: 78.067 ms. Best time for 448K FFT length: 90.489 ms. Best time for 512K FFT length: 97.280 ms. Best time for 640K FFT length: 123.678 ms. Best time for 768K FFT length: 149.556 ms. Best time for 896K FFT length: 175.282 ms. Best time for 1024K FFT length: 200.824 ms. Best time for 1280K FFT length: 271.674 ms. Best time for 1536K FFT length: 328.457 ms. Best time for 1792K FFT length: 399.580 ms. Best time for 2048K FFT length: 477.366 ms.[/code] |
Thanks!!
my athlon XP 2400+ seems to working well with it, times are down, productivity is up... excellent work.
|
The new:
[code]AMD Athlon(tm) XP 2400+ CPU speed: 1991.65 MHz CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE L1 cache size: 64 KB L2 cache size: 256 KB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 32 L2 TLBS: 256 Prime95 version 24.6, RdtscTiming=1 Best time for 512K FFT length: 36.591 ms. Best time for 640K FFT length: 47.987 ms. Best time for 768K FFT length: 58.558 ms. Best time for 896K FFT length: 69.761 ms. Best time for 1024K FFT length: 78.666 ms. Best time for 1280K FFT length: 109.193 ms. Best time for 1536K FFT length: 132.532 ms. Best time for 1792K FFT length: 157.138 ms. Best time for 2048K FFT length: 176.963 ms.[/code] The old: [code]AMD Athlon(tm) XP 2400+ CPU speed: 1991.13 MHz CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE L1 cache size: 64 KB L2 cache size: 256 KB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 32 L2 TLBS: 256 Prime95 version 23.8, RdtscTiming=1 Best time for 384K FFT length: 46.405 ms. Best time for 448K FFT length: 57.387 ms. Best time for 512K FFT length: 59.590 ms. Best time for 640K FFT length: 77.468 ms. Best time for 768K FFT length: 91.095 ms. Best time for 896K FFT length: 108.234 ms. Best time for 1024K FFT length: 121.520 ms. Best time for 1280K FFT length: 165.685 ms. Best time for 1536K FFT length: 190.610 ms. Best time for 1792K FFT length: 242.729 ms. Best time for 2048K FFT length: 273.883 ms.[/code] I'm also getting a 33% increase on a 26,xxx,xxx number! |
I don't know if this has anything to do with the new version. It's not hamful either, just unexpected:
I only did LMH Factoring lately, so I downloaded 24.6 and requested some doublechecks because my queue was empty. Now: All of them were expected to be completed by [b]tomorrow[/b]. So they kept coming in and in and in... till I clicked Stop! Why were they expected to complete immediately? |
Will there be a Mprime version?
|
[QUOTE=Prime95]I didn't time P4s, but they should be ever so slightly slower.[/QUOTE]
Has anyone tried to benchmark a P4 yet? (I would myself, but I'm trying to get an exponent finished before I go on vacation for a few weeks :smile:.) |
Dear George,
does this mean you implemented Colin's general DWT for non-SSE2 architectures as well? Alex |
| All times are UTC. The time now is 07:24. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.