mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Beta version 24.6 - Athlon users wanted (https://www.mersenneforum.org/showthread.php?t=3387)

Prime95 2004-12-08 00:25

Beta version 24.6 - Athlon users wanted
 
As some of you know the I've been rewriting the FFT code to bring big speed gains for SoB, LLR, OpenPFGW, and other projects. This rewrite is now complete.

The good news: Athlons (except 64-bit CPUs) are about 15% faster. :banana:

The bad news: P3s are 33% slower. :yucky: I didn't time P4s, but they should be ever so slightly slower. I expect P2 and Pentium-M will also be slower.

Athlon owners running Windows may want to try the new version at [url]ftp://mersenne.org/gimps/p95v246.zip[/url] Let me know if you find any bugs. Running a doublecheck or two would be nice.

In the meantime, I'll work on further fine tuning the new FFT code and see if I can recover some of the loss in P3 timings.

Uncwilly 2004-12-08 00:49

I went from about .156 to ~.114 a 13,xxx,xxx number on my 1.2 GHz ath.

THANKS!!

sdbardwick 2004-12-08 05:16

Neat!
I'll switch my Athlon 1900MP (2x 1.6GHz) box over to double checking as soon as the current factoring assignment finishes. Probably take a little less than a month for the first results.

Given the various sizes of L1/L2 cache in the Athlon/Duron/Sempron processors, is the new code optimized for one version in particular?

Would you like benchmarks posted?

I'm sure SalemTheCat100 will be pleased:wink:

-Scott-

moo 2004-12-08 05:49

why dont u do best of boath world i mean you already has a good speed for pents why not check first what cpu is installed then use the right tweeking like a driver kinda only for fft right drivers match right processcer

Uncwilly 2004-12-08 07:08

I am already doing DC's with my ath. Here are the benches in "non-safe mode" (ie how I typically run my machine.

The new:
[code]AMD Athlon(tm) processor
CPU speed: 1127.85 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX
L1 cache size: 64 KB
L2 cache size: 256 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 24
L2 TLBS: 256
Prime95 version 24.6, RdtscTiming=1
Best time for 512K FFT length: 65.019 ms.
Best time for 640K FFT length: 87.656 ms.
Best time for 768K FFT length: 105.915 ms.
Best time for 896K FFT length: 129.876 ms.
Best time for 1024K FFT length: 145.119 ms.
Best time for 1280K FFT length: 195.657 ms.
Best time for 1536K FFT length: 235.093 ms.
Best time for 1792K FFT length: 280.729 ms.
Best time for 2048K FFT length: 321.764 ms.[/code]

The old:
[code]AMD Athlon(tm) processor
CPU speed: 1127.80 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX
L1 cache size: 64 KB
L2 cache size: 256 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 24
L2 TLBS: 256
Prime95 version 23.5, RdtscTiming=1
Best time for 384K FFT length: 78.067 ms.
Best time for 448K FFT length: 90.489 ms.
Best time for 512K FFT length: 97.280 ms.
Best time for 640K FFT length: 123.678 ms.
Best time for 768K FFT length: 149.556 ms.
Best time for 896K FFT length: 175.282 ms.
Best time for 1024K FFT length: 200.824 ms.
Best time for 1280K FFT length: 271.674 ms.
Best time for 1536K FFT length: 328.457 ms.
Best time for 1792K FFT length: 399.580 ms.
Best time for 2048K FFT length: 477.366 ms.[/code]

jebeagles 2004-12-08 14:29

Thanks!!
 
my athlon XP 2400+ seems to working well with it, times are down, productivity is up... excellent work.

jebeagles 2004-12-08 16:32

The new:
[code]AMD Athlon(tm) XP 2400+
CPU speed: 1991.65 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 64 KB
L2 cache size: 256 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 256
Prime95 version 24.6, RdtscTiming=1
Best time for 512K FFT length: 36.591 ms.
Best time for 640K FFT length: 47.987 ms.
Best time for 768K FFT length: 58.558 ms.
Best time for 896K FFT length: 69.761 ms.
Best time for 1024K FFT length: 78.666 ms.
Best time for 1280K FFT length: 109.193 ms.
Best time for 1536K FFT length: 132.532 ms.
Best time for 1792K FFT length: 157.138 ms.
Best time for 2048K FFT length: 176.963 ms.[/code]

The old:
[code]AMD Athlon(tm) XP 2400+
CPU speed: 1991.13 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 64 KB
L2 cache size: 256 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 256
Prime95 version 23.8, RdtscTiming=1
Best time for 384K FFT length: 46.405 ms.
Best time for 448K FFT length: 57.387 ms.
Best time for 512K FFT length: 59.590 ms.
Best time for 640K FFT length: 77.468 ms.
Best time for 768K FFT length: 91.095 ms.
Best time for 896K FFT length: 108.234 ms.
Best time for 1024K FFT length: 121.520 ms.
Best time for 1280K FFT length: 165.685 ms.
Best time for 1536K FFT length: 190.610 ms.
Best time for 1792K FFT length: 242.729 ms.
Best time for 2048K FFT length: 273.883 ms.[/code]

I'm also getting a 33% increase on a 26,xxx,xxx number!

MrHappy 2004-12-08 17:16

I don't know if this has anything to do with the new version. It's not hamful either, just unexpected:
I only did LMH Factoring lately, so I downloaded 24.6 and requested some doublechecks because my queue was empty. Now: All of them were expected to be completed by [b]tomorrow[/b]. So they kept coming in and in and in... till I clicked Stop!
Why were they expected to complete immediately?

Xyzzy 2004-12-08 17:51

Will there be a Mprime version?

JuanTutors 2004-12-08 20:59

[QUOTE=Prime95]I didn't time P4s, but they should be ever so slightly slower.[/QUOTE]
Has anyone tried to benchmark a P4 yet? (I would myself, but I'm trying to get an exponent finished before I go on vacation for a few weeks :smile:.)

akruppa 2004-12-08 21:05

Dear George,

does this mean you implemented Colin's general DWT for non-SSE2 architectures as well?

Alex


All times are UTC. The time now is 07:24.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.