mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2009-01-12, 21:24   #1
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11·673 Posts
Default Benchmarks please - 25.9 (round 2)

Download links:
ftp://mersenne.org/gimps/p95v259.zip
ftp://mersenne.org/gimps/p64v259.zip
ftp://mersenne.org/gimps/mprime259.tar.gz
ftp://mersenne.org/gimps/mprime259-linux64.tar.gz
ftp://mersenne.org/gimps/mprime259-MacOSX.tar.gz
ftp://mersenne.org/gimps/mprime259-MacOSX64.tar.gz

I've been neglecting my chores of upgrading the new mersenne.org website to pursue my passion - code optimization.

For the 64-bit version, I made use of the 8 extra SSE2 registers.

The 32-bit version, I looked at new optimizations for the Core 2 architecture.

There is both good news and bad news. It is cumbersome to create different FFT code paths for different CPUs. Where I had to make a choice, I gave preference to the Core 2 CPU. The new 32-bit version may be marginally slower on the Pentium 4 (for me it is 1% slower while the 64-bit version is about 3% faster). The Core 2 speedup is hard for me to measure on my Mac. The OS seems to lend itself to more variability in benchmarks than Linux and Windows - maybe because of VMWare.

I'd like some before and after benchmarks so that I can update the next whatsnew.txt file. Be sure to specify whether you are using the new 32-bit or 64-bit executable. Don't just post the raw data - try to estimate the overall percent improvement or the percent improvement for the FFT lengths you care about.


Summarizing results thusfar (1 single-threaded worker, "+" means faster):
Code:
                    32-bit                     64-bit
Pentium 4     btwn -2% and +1%              btwn  +2% and +4%
Core 2        btwn 0% and +6%               btwn +10% and +13%
Core i7            no data                  btwn +7% and +12% (one data point)
AMD64         0% (one data point)           btwn -1% and +1% (build 2 was btwn -4% and -3%)
Phenom             no data                  btwn +10% and +13% (one data point)

Last fiddled with by Prime95 on 2009-01-23 at 15:20 Reason: added summary and linux URLs
Prime95 is offline   Reply With Quote
Old 2009-01-12, 21:29   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11100111010112 Posts
Default

Pentium 4 - 2.8 GHz

Version 25.8:
Best time for 768K FFT length: 25.671 ms.
Best time for 896K FFT length: 31.407 ms.
Best time for 1024K FFT length: 35.779 ms.
Best time for 1280K FFT length: 44.130 ms.
Best time for 1536K FFT length: 53.231 ms.
Best time for 1792K FFT length: 64.061 ms.
Best time for 2048K FFT length: 71.499 ms.
Best time for 2560K FFT length: 94.674 ms.
Best time for 3072K FFT length: 114.123 ms.
Best time for 3584K FFT length: 137.035 ms.
Best time for 4096K FFT length: 152.913 ms.
Best time for 5120K FFT length: 194.349 ms.
Best time for 6144K FFT length: 247.920 ms.
Best time for 7168K FFT length: 296.353 ms.
Best time for 8192K FFT length: 328.556 ms.

32-bit 25.9 (about 1-2% slower):
Best time for 768K FFT length: 25.793 ms.
Best time for 896K FFT length: 31.495 ms.
Best time for 1024K FFT length: 36.046 ms.
Best time for 1280K FFT length: 44.521 ms.
Best time for 1536K FFT length: 53.618 ms.
Best time for 1792K FFT length: 64.515 ms.
Best time for 2048K FFT length: 72.495 ms.
Best time for 2560K FFT length: 96.006 ms.
Best time for 3072K FFT length: 115.202 ms.
Best time for 3584K FFT length: 138.457 ms.
Best time for 4096K FFT length: 155.151 ms.
Best time for 5120K FFT length: 196.918 ms.
Best time for 6144K FFT length: 248.199 ms.
Best time for 7168K FFT length: 298.390 ms.
Best time for 8192K FFT length: 332.971 ms.

64-bit 25.9 (about 2-3% faster than 25.8):
Best time for 768K FFT length: 24.622 ms.
Best time for 896K FFT length: 29.896 ms.
Best time for 1024K FFT length: 34.275 ms.
Best time for 1280K FFT length: 43.122 ms.
Best time for 1536K FFT length: 51.935 ms.
Best time for 1792K FFT length: 62.296 ms.
Best time for 2048K FFT length: 69.477 ms.
Best time for 2560K FFT length: 92.739 ms.
Best time for 3072K FFT length: 111.242 ms.
Best time for 3584K FFT length: 133.324 ms.
Best time for 4096K FFT length: 149.147 ms.
Best time for 5120K FFT length: 190.676 ms.
Best time for 6144K FFT length: 237.457 ms.
Best time for 7168K FFT length: 285.003 ms.
Best time for 8192K FFT length: 317.054 ms.

Last fiddled with by Prime95 on 2009-01-12 at 21:34
Prime95 is offline   Reply With Quote
Old 2009-01-12, 21:32   #3
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11×673 Posts
Default

I'm running some QA and torture tests now. It looks like the new version is safe to use - though you might want to copy your current save files just in case some obscure bug is uncovered in the next several days.

Linux and Mac versions will be available someday. There will also be a 64-bit Mac version.
Prime95 is offline   Reply With Quote
Old 2009-01-12, 22:13   #4
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17·251 Posts
Default

All on 32-bit XP on a dual core Athlon, CPU details are at the top of the results. It looks like 25.9 is 99.797% the speed of 25.8 for me, (i.e. slightly faster on average than 25.8) so if that's not just random differences, it's negligible. I suppose no changes were made for Athlons.
25.9.2
Code:
AMD Athlon(tm) 64 X2 Dual Core Processor 4800+
CPU speed: 2505.93 MHz, 2 cores
CPU features: RDTSC, CMOV, Prefetch, 3DNow!, MMX, SSE, SSE2
L1 cache size: 64 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 512
Prime95 32-bit version 25.9, RdtscTiming=1
Best time for 768K FFT length: 32.566 ms.
Best time for 896K FFT length: 39.064 ms.
Best time for 1024K FFT length: 43.812 ms.
Best time for 1280K FFT length: 55.958 ms.
Best time for 1536K FFT length: 68.876 ms.
Best time for 1792K FFT length: 82.434 ms.
Best time for 2048K FFT length: 92.325 ms.
Best time for 2560K FFT length: 124.386 ms.
Best time for 3072K FFT length: 152.826 ms.
Best time for 3584K FFT length: 184.338 ms.
Best time for 4096K FFT length: 204.152 ms.
Best time for 5120K FFT length: 276.092 ms.
Best time for 6144K FFT length: 341.204 ms.
Best time for 7168K FFT length: 423.880 ms.
Best time for 8192K FFT length: 486.829 ms.
Timing FFTs using 2 threads.
Best time for 768K FFT length: 21.015 ms.
Best time for 896K FFT length: 26.959 ms.
Best time for 1024K FFT length: 30.428 ms.
Best time for 1280K FFT length: 38.282 ms.
Best time for 1536K FFT length: 46.039 ms.
Best time for 1792K FFT length: 55.155 ms.
Best time for 2048K FFT length: 61.290 ms.
Best time for 2560K FFT length: 84.852 ms.
Best time for 3072K FFT length: 100.659 ms.
Best time for 3584K FFT length: 120.500 ms.
Best time for 4096K FFT length: 134.858 ms.
Best time for 5120K FFT length: 170.402 ms.
Best time for 6144K FFT length: 210.071 ms.
Best time for 7168K FFT length: 261.364 ms.
Best time for 8192K FFT length: 303.962 ms.
Best time for 58 bit trial factors: 4.985 ms.
Best time for 59 bit trial factors: 5.005 ms.
Best time for 60 bit trial factors: 5.035 ms.
Best time for 61 bit trial factors: 4.951 ms.
Best time for 62 bit trial factors: 4.999 ms.
Best time for 63 bit trial factors: 9.150 ms.
Best time for 64 bit trial factors: 9.241 ms.
Best time for 65 bit trial factors: 11.622 ms.
Best time for 66 bit trial factors: 11.802 ms.
Best time for 67 bit trial factors: 11.620 ms.
25.8.5
Code:
AMD Athlon(tm) 64 X2 Dual Core Processor 4800+
CPU speed: 2506.35 MHz, 2 cores
CPU features: RDTSC, CMOV, Prefetch, 3DNow!, MMX, SSE, SSE2
L1 cache size: 64 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 512
Prime95 32-bit version 25.8, RdtscTiming=1
Best time for 768K FFT length: 33.020 ms.
Best time for 896K FFT length: 39.551 ms.
Best time for 1024K FFT length: 44.099 ms.
Best time for 1280K FFT length: 55.532 ms.
Best time for 1536K FFT length: 68.342 ms.
Best time for 1792K FFT length: 83.123 ms.
Best time for 2048K FFT length: 92.548 ms.
Best time for 2560K FFT length: 124.195 ms.
Best time for 3072K FFT length: 152.490 ms.
Best time for 3584K FFT length: 184.992 ms.
Best time for 4096K FFT length: 205.550 ms.
Best time for 5120K FFT length: 277.408 ms.
Best time for 6144K FFT length: 343.343 ms.
Best time for 7168K FFT length: 422.755 ms.
Best time for 8192K FFT length: 485.835 ms.
Timing FFTs using 2 threads.
Best time for 768K FFT length: 21.147 ms.
Best time for 896K FFT length: 27.116 ms.
Best time for 1024K FFT length: 30.697 ms.
Best time for 1280K FFT length: 38.182 ms.
Best time for 1536K FFT length: 45.842 ms.
Best time for 1792K FFT length: 54.445 ms.
Best time for 2048K FFT length: 61.066 ms.
Best time for 2560K FFT length: 83.925 ms.
Best time for 3072K FFT length: 102.946 ms.
Best time for 3584K FFT length: 121.508 ms.
Best time for 4096K FFT length: 137.034 ms.
Best time for 5120K FFT length: 172.673 ms.
Best time for 6144K FFT length: 215.655 ms.
Best time for 7168K FFT length: 260.716 ms.
Best time for 8192K FFT length: 304.255 ms.
Best time for 58 bit trial factors: 4.999 ms.
Best time for 59 bit trial factors: 4.950 ms.
Best time for 60 bit trial factors: 4.927 ms.
Best time for 61 bit trial factors: 4.963 ms.
Best time for 62 bit trial factors: 5.013 ms.
Best time for 63 bit trial factors: 9.172 ms.
Best time for 64 bit trial factors: 9.125 ms.
Best time for 65 bit trial factors: 11.799 ms.
Best time for 66 bit trial factors: 11.643 ms.
Best time for 67 bit trial factors: 11.673 ms.

Last fiddled with by Mini-Geek on 2009-01-12 at 22:16
Mini-Geek is offline   Reply With Quote
Old 2009-01-12, 23:00   #5
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Pentium 4 - 2.8 GHz

...

64-bit 25.9 (about 2-3% faster than 25.8):
Eh? I thought Pentium 4's were 32-bit only?
mdettweiler is offline   Reply With Quote
Old 2009-01-12, 23:43   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11·673 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Eh? I thought Pentium 4's were 32-bit only?
Some P4's are 64-bit. I think Intel called it EMT64.
Prime95 is offline   Reply With Quote
Old 2009-01-13, 01:22   #7
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

2×11×31 Posts
Default

IIRC it was/is called EM64T. The Pentium D (P4 based) series (some of them, anyway) supported 64 bit mode. My old 820 did.
sdbardwick is offline   Reply With Quote
Old 2009-01-13, 01:46   #8
eugene2x
 
Dec 2008
LA, CA

5 Posts
Default

Running 32-bit:

25.8
Code:
Intel(R) Core(TM)2 Duo CPU     E8200  @ 2.66GHz
CPU speed: 3200.05 MHz, 2 cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 6 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 256
Prime95 32-bit version 25.8, RdtscTiming=1
Best time for 768K FFT length: 12.783 ms.
Best time for 896K FFT length: 15.596 ms.
Best time for 1024K FFT length: 17.720 ms.
Best time for 1280K FFT length: 22.208 ms.
Best time for 1536K FFT length: 27.121 ms.
Best time for 1792K FFT length: 32.647 ms.
Best time for 2048K FFT length: 36.619 ms.
Best time for 2560K FFT length: 47.733 ms.
Best time for 3072K FFT length: 58.073 ms.
Best time for 3584K FFT length: 69.316 ms.
Best time for 4096K FFT length: 77.711 ms.
Best time for 5120K FFT length: 99.462 ms.
Best time for 6144K FFT length: 119.678 ms.
Best time for 7168K FFT length: 145.219 ms.
Best time for 8192K FFT length: 158.250 ms.
Timing FFTs using 2 threads.
Best time for 768K FFT length: 6.835 ms.
Best time for 896K FFT length: 8.150 ms.
Best time for 1024K FFT length: 9.626 ms.
Best time for 1280K FFT length: 11.679 ms.
Best time for 1536K FFT length: 14.252 ms.
Best time for 1792K FFT length: 17.204 ms.
Best time for 2048K FFT length: 19.232 ms.
Best time for 2560K FFT length: 25.397 ms.
Best time for 3072K FFT length: 30.777 ms.
Best time for 3584K FFT length: 36.515 ms.
Best time for 4096K FFT length: 41.279 ms.
Best time for 5120K FFT length: 52.351 ms.
Best time for 6144K FFT length: 63.872 ms.
Best time for 7168K FFT length: 76.976 ms.
Best time for 8192K FFT length: 85.337 ms.
Best time for 58 bit trial factors: 3.230 ms.
Best time for 59 bit trial factors: 3.218 ms.
Best time for 60 bit trial factors: 3.241 ms.
Best time for 61 bit trial factors: 3.219 ms.
Best time for 62 bit trial factors: 3.239 ms.
Best time for 63 bit trial factors: 5.488 ms.
Best time for 64 bit trial factors: 5.474 ms.
Best time for 65 bit trial factors: 5.075 ms.
Best time for 66 bit trial factors: 5.042 ms.
Best time for 67 bit trial factors: 5.052 ms.
25.9
Code:
Intel(R) Core(TM)2 Duo CPU     E8200  @ 2.66GHz
CPU speed: 3200.15 MHz, 2 cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 6 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 256
Prime95 32-bit version 25.9, RdtscTiming=1
Best time for 768K FFT length: 12.866 ms.
Best time for 896K FFT length: 15.594 ms.
Best time for 1024K FFT length: 17.487 ms.
Best time for 1280K FFT length: 22.540 ms.
Best time for 1536K FFT length: 27.465 ms.
Best time for 1792K FFT length: 33.039 ms.
Best time for 2048K FFT length: 36.867 ms.
Best time for 2560K FFT length: 48.125 ms.
Best time for 3072K FFT length: 58.129 ms.
Best time for 3584K FFT length: 69.402 ms.
Best time for 4096K FFT length: 78.497 ms.
Best time for 5120K FFT length: 99.943 ms.
Best time for 6144K FFT length: 120.046 ms.
Best time for 7168K FFT length: 146.367 ms.
Best time for 8192K FFT length: 160.378 ms.
Timing FFTs using 2 threads.
Best time for 768K FFT length: 6.831 ms.
Best time for 896K FFT length: 8.409 ms.
Best time for 1024K FFT length: 9.965 ms.
Best time for 1280K FFT length: 12.294 ms.
Best time for 1536K FFT length: 14.701 ms.
Best time for 1792K FFT length: 17.878 ms.
Best time for 2048K FFT length: 20.161 ms.
Best time for 2560K FFT length: 26.504 ms.
Best time for 3072K FFT length: 32.357 ms.
Best time for 3584K FFT length: 38.598 ms.
Best time for 4096K FFT length: 43.475 ms.
Best time for 5120K FFT length: 55.813 ms.
Best time for 6144K FFT length: 66.970 ms.
Best time for 7168K FFT length: 81.612 ms.
Best time for 8192K FFT length: 89.446 ms.
Best time for 58 bit trial factors: 3.287 ms.
Best time for 59 bit trial factors: 3.248 ms.
Best time for 60 bit trial factors: 3.232 ms.
Best time for 61 bit trial factors: 3.280 ms.
Best time for 62 bit trial factors: 3.267 ms.
Best time for 63 bit trial factors: 5.572 ms.
Best time for 64 bit trial factors: 5.586 ms.
Best time for 65 bit trial factors: 5.162 ms.
Best time for 66 bit trial factors: 5.126 ms.
Best time for 67 bit trial factors: 5.118 ms.
I have the same applications running each time and I don't see a difference. Actually I see that the results are now slightly worse.

Last fiddled with by eugene2x on 2009-01-13 at 01:48 Reason: somehow repeated O_o
eugene2x is offline   Reply With Quote
Old 2009-01-13, 02:06   #9
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

1011111111112 Posts
Default

Core2Quad (Yorksfield) Q9450 2.66 Ghz, Windows XP 64bit:
25.8.5 32bit: Q9450 25_8_5 32bit.txt
25.8.4 64bit: Q9450 25_8_4 64bit.txt (0-0.5% faster than 32bit)
25.9.2 32bit: Q9450 25_9_2 32bit.txt (3-4% faster than 25.8.5 32bit for 1 thread, same or 1-2% slower at 2-4 threads)
25.9.2 64bit: Q9450 25_9_2 64bit.txt (10-12% faster than 25.8.4 64bit for 1 thread, 3-5% faster for 2threads, 0-1% faster for 3-4threads)

For 32bit both 25.8 and 25.9 had a rise in time for 1024K FFT compared to 768K and 896K.

Last fiddled with by ATH on 2009-01-13 at 02:13
ATH is offline   Reply With Quote
Old 2009-01-13, 02:12   #10
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

37·83 Posts
Default

Mobile Core2Duo T7300 (Merom) 2.00 Ghz, Windows Vista 32bit.
25.8.5 32bit: T7300 25_8_5 32bit.txt
25.9.2 32bit: T7300 25_9_2 32bit.txt (2-3% faster than 25.8.5 for 1 thread, about same speed as 25.8.5 for 2 threads)

Again a small bump at 1024K FFT.
ATH is offline   Reply With Quote
Old 2009-01-13, 02:19   #11
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

37·83 Posts
Default

Pentium 4 550 (Prescott) 3.4 Ghz, Windows XP 32bit
25.8.5 32bit: 3400 25_8_5 32bit.txt
25.9.2 32bit: 3400 25_9_2 32bit.txt (0-1% faster than 25.8 for 1 thread, 0-1% slower for 2 threads)

No spike at 1024K FFT.
ATH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
20 Questions Round 8 Uncwilly Puzzles 41 2010-12-20 18:53
40 questions round 4 science_man_88 Puzzles 29 2010-12-09 20:21
20 Questions - round 7 Flatlander Puzzles 52 2010-12-01 12:52
40 Questions - Round 1 Flatlander Puzzles 160 2010-11-15 01:49
Round off errors Matt_G Hardware 4 2004-04-12 14:46

All times are UTC. The time now is 13:00.

Tue Apr 13 13:00:59 UTC 2021 up 5 days, 7:41, 1 user, load averages: 3.24, 2.84, 2.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.