mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2004-12-09, 14:41   #23
Paulie
 
Paulie's Avatar
 
Aug 2002

223 Posts
Default Pentium M 1.7 (Thinkpad T42) (2048 cache?)

Intel(R) Pentium(R) M processor 1.70GHz
CPU speed: 1698.55 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 32 KB
L2 cache size: unknown
L1 cache line size: 64 bytes
L2 cache line size: unknown
TLBS: 128
Prime95 version 23.7, RdtscTiming=1
Best time for 512K FFT length: 45.599 ms.
Best time for 640K FFT length: 57.303 ms.
Best time for 768K FFT length: 70.191 ms.
Best time for 896K FFT length: 84.139 ms.
Best time for 1024K FFT length: 94.577 ms.
Best time for 1280K FFT length: 122.043 ms.
Best time for 1536K FFT length: 149.250 ms.
Best time for 1792K FFT length: 177.972 ms.
Best time for 2048K FFT length: 199.492 ms.


Intel(R) Pentium(R) M processor 1.70GHz
CPU speed: 1698.51 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 32 KB
L2 cache size: unknown
L1 cache line size: 64 bytes
L2 cache line size: unknown
TLBS: 128
Prime95 version 23.8, RdtscTiming=1
Best time for 512K FFT length: 45.489 ms.
Best time for 640K FFT length: 57.543 ms.
Best time for 768K FFT length: 70.479 ms.
Best time for 896K FFT length: 84.352 ms.
Best time for 1024K FFT length: 95.030 ms.
Best time for 1280K FFT length: 122.612 ms.
Best time for 1536K FFT length: 149.297 ms.
Best time for 1792K FFT length: 178.057 ms.
Best time for 2048K FFT length: 199.734 ms.


Intel(R) Pentium(R) M processor 1.70GHz
CPU speed: 1698.46 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 32 KB
L2 cache size: unknown
L1 cache line size: 64 bytes
L2 cache line size: unknown
TLBS: 128
Prime95 version 24.6, RdtscTiming=1
Best time for 512K FFT length: 45.619 ms.
Best time for 640K FFT length: 57.735 ms.
Best time for 768K FFT length: 71.017 ms.
Best time for 896K FFT length: 89.719 ms.
Best time for 1024K FFT length: 100.478 ms.
Best time for 1280K FFT length: 130.460 ms.
Best time for 1536K FFT length: 160.626 ms.
Best time for 1792K FFT length: 189.422 ms.
Best time for 2048K FFT length: 210.682 ms.

Paulie is offline   Reply With Quote
Old 2004-12-09, 19:03   #24
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Paulie, how does the Pentium-M compare if you put "CPUSupportsSSE2=0" in local.ini?
Prime95 is offline   Reply With Quote
Old 2004-12-09, 19:10   #25
flava
 
flava's Avatar
 
Feb 2003

2·59 Posts
Default

Works like a charm on my laptop. Thank you! :)

mobile AMD Athlon(tm) XP2800+
CPU speed: 2119.96 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 64 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 256


Prime95 version 23.7, RdtscTiming=1

Best time for 512K FFT length: 45.603 ms.
Best time for 640K FFT length: 58.825 ms.
Best time for 768K FFT length: 70.577 ms.
Best time for 896K FFT length: 82.925 ms.
Best time for 1024K FFT length: 93.971 ms.
Best time for 1280K FFT length: 129.194 ms.
Best time for 1536K FFT length: 147.872 ms.
Best time for 1792K FFT length: 188.165 ms.
Best time for 2048K FFT length: 198.254 ms.



Prime95 version 24.6, RdtscTiming=1

Best time for 512K FFT length: 32.047 ms.
Best time for 640K FFT length: 42.981 ms.
Best time for 768K FFT length: 52.466 ms.
Best time for 896K FFT length: 63.637 ms.
Best time for 1024K FFT length: 71.055 ms.
Best time for 1280K FFT length: 95.804 ms.
Best time for 1536K FFT length: 114.782 ms.
Best time for 1792K FFT length: 138.144 ms.
Best time for 2048K FFT length: 154.963 ms.
flava is offline   Reply With Quote
Old 2004-12-09, 19:26   #26
SalemTheCat100
 
Oct 2002

2·13 Posts
Default Very Pleased, Indeed!!!

Quote:
Originally Posted by sdbardwick
Neat!
I'll switch my Athlon 1900MP (2x 1.6GHz) box over to double checking as soon as the current factoring assignment finishes. Probably take a little less than a month for the first results.

Given the various sizes of L1/L2 cache in the Athlon/Duron/Sempron processors, is the new code optimized for one version in particular?

Would you like benchmarks posted?

I'm sure SalemTheCat100 will be pleased

-Scott-
Thanks for the effort in supporting the AMD Athlon processors.

Salem
SalemTheCat100 is offline   Reply With Quote
Old 2004-12-09, 19:39   #27
Wolf
 
Wolf's Avatar
 
Jul 2003
UK

3·17 Posts
Default

I think George is being too modest only claiming a 15% improvement!

AMD Athlon(TM) XP2100+
CPU speed: 1746.03 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 64 KB
L2 cache size: 256 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 256


Prime95 version 23.8, RdtscTiming=1

Best time for 512K FFT length: 55.348 ms.
Best time for 640K FFT length: 72.015 ms.
Best time for 768K FFT length: 85.664 ms.
Best time for 896K FFT length: 101.333 ms.
Best time for 1024K FFT length: 115.199 ms.
Best time for 1280K FFT length: 154.964 ms.
Best time for 1536K FFT length: 179.477 ms.
Best time for 1792K FFT length: 229.369 ms.
Best time for 2048K FFT length: 258.630 ms.


Prime95 version 24.6, RdtscTiming=1

Best time for 512K FFT length: 38.292 ms.
Best time for 640K FFT length: 51.586 ms.
Best time for 768K FFT length: 62.646 ms.
Best time for 896K FFT length: 76.130 ms.
Best time for 1024K FFT length: 85.223 ms.
Best time for 1280K FFT length: 115.138 ms.
Best time for 1536K FFT length: 139.228 ms.
Best time for 1792K FFT length: 167.028 ms.
Best time for 2048K FFT length: 187.977 ms.
Wolf is offline   Reply With Quote
Old 2004-12-09, 20:03   #28
SalemTheCat100
 
Oct 2002

2·13 Posts
Default Results for AMD Athlon 64 3400+

Some observations:

Version 24.6 without SSE2 is about 4.5% faster than Version 23.8.

Version 23.8 without SSE2 is slower whereas 24.6 is faster without SSE2.

Is there an optimized Athlon64 version in the works?

SALEM

-----------------------------------------------------------------------

Version 23.8.1.0

AMD Athlon(tm) 64 Processor 3400+
CPU speed: 2410.86 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 64 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 512
Prime95 version 23.8, RdtscTiming=1
Best time for 384K FFT length: 20.134 ms.
Best time for 448K FFT length: 24.076 ms.
Best time for 512K FFT length: 26.763 ms.
Best time for 640K FFT length: 33.394 ms.
Best time for 768K FFT length: 40.534 ms.
Best time for 896K FFT length: 49.038 ms.
Best time for 1024K FFT length: 55.168 ms.
Best time for 1280K FFT length: 73.898 ms.
Best time for 1536K FFT length: 88.287 ms.
Best time for 1792K FFT length: 107.433 ms.
Best time for 2048K FFT length: 121.641 ms.

No SSE2

AMD Athlon(tm) 64 Processor 3400+
CPU speed: 2410.69 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 64 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 512
Prime95 version 23.8, RdtscTiming=1
Best time for 384K FFT length: 22.612 ms.
Best time for 448K FFT length: 25.288 ms.
Best time for 512K FFT length: 27.608 ms.
Best time for 640K FFT length: 36.565 ms.
Best time for 768K FFT length: 43.933 ms.
Best time for 896K FFT length: 52.595 ms.
Best time for 1024K FFT length: 58.794 ms.
Best time for 1280K FFT length: 74.902 ms.
Best time for 1536K FFT length: 90.965 ms.
Best time for 1792K FFT length: 109.248 ms.
Best time for 2048K FFT length: 123.113 ms.

Version 24.6.1.0

AMD Athlon(tm) 64 Processor 3400+
CPU speed: 2410.87 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 64 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 512
Prime95 version 24.6, RdtscTiming=1
Best time for 512K FFT length: 26.847 ms.
Best time for 640K FFT length: 33.750 ms.
Best time for 768K FFT length: 40.709 ms.
Best time for 896K FFT length: 49.252 ms.
Best time for 1024K FFT length: 55.465 ms.
Best time for 1280K FFT length: 74.106 ms.
Best time for 1536K FFT length: 88.746 ms.
Best time for 1792K FFT length: 107.562 ms.
Best time for 2048K FFT length: 121.940 ms.

No SSE2

AMD Athlon(tm) 64 Processor 3400+
CPU speed: 2410.92 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 64 KB
L2 cache size: 512 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 512
Prime95 version 24.6, RdtscTiming=1
Best time for 512K FFT length: 25.206 ms.
Best time for 640K FFT length: 33.701 ms.
Best time for 768K FFT length: 40.126 ms.
Best time for 896K FFT length: 48.847 ms.
Best time for 1024K FFT length: 54.395 ms.
Best time for 1280K FFT length: 72.521 ms.
Best time for 1536K FFT length: 87.162 ms.
Best time for 1792K FFT length: 104.300 ms.
Best time for 2048K FFT length: 116.092 ms.
SalemTheCat100 is offline   Reply With Quote
Old 2004-12-09, 20:15   #29
Paulie
 
Paulie's Avatar
 
Aug 2002

223 Posts
Default George - Rerun with CPUSupportsSSE2=0

Much better!!!

Intel(R) Pentium(R) M processor 1600MHz
CPU speed: 1594.81 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 32 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Prime95 version 23.7, RdtscTiming=1
Best time for 512K FFT length: 43.403 ms.
Best time for 640K FFT length: 57.332 ms.
Best time for 768K FFT length: 69.735 ms.
Best time for 896K FFT length: 83.268 ms.
Best time for 1024K FFT length: 94.354 ms.
Best time for 1280K FFT length: 123.936 ms.
Best time for 1536K FFT length: 147.184 ms.
Best time for 1792K FFT length: 180.734 ms.
Best time for 2048K FFT length: 199.028 ms.


Intel(R) Pentium(R) M processor 1600MHz
CPU speed: 1594.67 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 32 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Prime95 version 23.8, RdtscTiming=1
Best time for 512K FFT length: 43.555 ms.
Best time for 640K FFT length: 58.165 ms.
Best time for 768K FFT length: 70.387 ms.
Best time for 896K FFT length: 84.209 ms.
Best time for 1024K FFT length: 95.294 ms.
Best time for 1280K FFT length: 125.689 ms.
Best time for 1536K FFT length: 148.689 ms.
Best time for 1792K FFT length: 182.862 ms.
Best time for 2048K FFT length: 199.285 ms.



Intel(R) Pentium(R) M processor 1600MHz
CPU speed: 1594.73 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 32 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 128
Prime95 version 24.6, RdtscTiming=1
Best time for 512K FFT length: 42.945 ms.
Best time for 640K FFT length: 57.472 ms.
Best time for 768K FFT length: 70.577 ms.
Best time for 896K FFT length: 84.819 ms.
Best time for 1024K FFT length: 95.173 ms.
Best time for 1280K FFT length: 120.814 ms.
Best time for 1536K FFT length: 149.215 ms.
Best time for 1792K FFT length: 180.982 ms.
Best time for 2048K FFT length: 200.603 ms.

Last fiddled with by Paulie on 2004-12-09 at 20:18
Paulie is offline   Reply With Quote
Old 2004-12-09, 20:31   #30
Paulie
 
Paulie's Avatar
 
Aug 2002

223 Posts
Default George - Rerun with CPUSupportsSSE2=0

The T42 with the different CPU than my T41:

Intel(R) Pentium(R) M processor 1.70GHz
CPU speed: 1698.51 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 32 KB
L2 cache size: unknown
L1 cache line size: 64 bytes
L2 cache line size: unknown
TLBS: 128
Prime95 version 23.7, RdtscTiming=1
Best time for 512K FFT length: 39.832 ms.
Best time for 640K FFT length: 54.439 ms.
Best time for 768K FFT length: 66.672 ms.
Best time for 896K FFT length: 80.723 ms.
Best time for 1024K FFT length: 91.565 ms.
Best time for 1280K FFT length: 118.482 ms.
Best time for 1536K FFT length: 141.612 ms.
Best time for 1792K FFT length: 175.459 ms.
Best time for 2048K FFT length: 190.218 ms.

Intel(R) Pentium(R) M processor 1.70GHz
CPU speed: 1698.57 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 32 KB
L2 cache size: unknown
L1 cache line size: 64 bytes
L2 cache line size: unknown
TLBS: 128
Prime95 version 23.8, RdtscTiming=1
Best time for 512K FFT length: 39.168 ms.
Best time for 640K FFT length: 53.837 ms.
Best time for 768K FFT length: 65.463 ms.
Best time for 896K FFT length: 79.924 ms.
Best time for 1024K FFT length: 92.466 ms.
Best time for 1280K FFT length: 120.630 ms.
Best time for 1536K FFT length: 142.399 ms.
Best time for 1792K FFT length: 172.767 ms.
Best time for 2048K FFT length: 187.741 ms.

Intel(R) Pentium(R) M processor 1.70GHz
CPU speed: 1698.51 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 32 KB
L2 cache size: unknown
L1 cache line size: 64 bytes
L2 cache line size: unknown
TLBS: 128
Prime95 version 24.6, RdtscTiming=1
Best time for 512K FFT length: 37.492 ms.
Best time for 640K FFT length: 49.905 ms.
Best time for 768K FFT length: 60.386 ms.
Best time for 896K FFT length: 73.509 ms.
Best time for 1024K FFT length: 81.655 ms.
Best time for 1280K FFT length: 105.156 ms.
Best time for 1536K FFT length: 129.665 ms.
Best time for 1792K FFT length: 157.141 ms.
Best time for 2048K FFT length: 172.926 ms.

Paulie is offline   Reply With Quote
Old 2004-12-09, 20:37   #31
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

3·7·17 Posts
Default

Interesting on the Pentium M. I know that the 23.8 version ran slower than an equally clocked Pentium 4 M. What is the reason for the improvement on the Pentium M on 24.6 (just curious)? Looks like laptops with Pentium M's should use 24.6?
delta_t is offline   Reply With Quote
Old 2004-12-09, 21:07   #32
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1D7116 Posts
Default

Quote:
Originally Posted by delta_t
Interesting on the Pentium M. I know that the 23.8 version ran slower than an equally clocked Pentium 4 M. What is the reason for the improvement on the Pentium M on 24.6 (just curious)? Looks like laptops with Pentium M's should use 24.6?
It's all about good cache usage. The P4 cache architecture, branch prediction, and out-of-order execution is so good you can pretend a Northwood is a CPU with a 512KB L1 cache.

The Athlon has substantial penalties for L2 cache accesses. So it is a CPU with a 64KB L1 cache.

The P3 and earlier have severe penalties for L2 cache access. Plus the CPU only has an 8KB or 16KB L1 cache. The Pentium-M seems to be a P3 with a 32KB L1 cache.

The size of the L1 cache governs how you should "split" a two-pass FFT. For example, a 1M FFT does 20 FFT levels. On a P3, you can do 8 levels entirely within the L1 cache. So it is best to do 12 levels in pass 1 and 8 levels in pass 2. Pass 1 won't fit in L1, but at least pass 2 will. This is what version 23 did.

In v24, the 1M FFT is done 10 levels in pass 1 and 10 levels in pass 2. The Athlon is happy because both pass 1 and pass 2 fit in the L1 cache. The P3 is unhappy because now neither fits in the L1 cache.

Your Pentium-M with a 32KB cache is like the Athlon - it likes the new 10-10 split. It doesn't like the SSE2 split because (for rather complicated reasons), SSE2 uses 2 or 4 times as much L1 cache space. The SSE2's 9 levels / 11 levels split doesn't fit in the L1 cache.

There are other factors at work too. V23 had a P3 TLB-friendly memory layout. The Athlon seems to prefer the new more linear approach. V24 supports copying pass 1 data to a contiguous scratch area, v23 only supported in-place operations.

All-in-all, use v24 with the CPUSupportsSSE2=0 option if that runs faster for you.

P.S. The Opteron remains an enigma. V24 seems to run little better than v23 with SSE2 support turned off. In fact, I think the Opteron now underperforms a similarly clocked AthlonXP.

Last fiddled with by Prime95 on 2004-12-09 at 21:09
Prime95 is offline   Reply With Quote
Old 2004-12-09, 21:50   #33
gbvalor
 
gbvalor's Avatar
 
Aug 2002

3×37 Posts
Default

Congratulations George!,

And thanks by your superb work. These last months I've been very busy and a bit out of Mersenne stuf and today I've seen in the forum this 'incredible and even faster prime95'.

Waiting for the Linux mprime version .

Guillermo
gbvalor is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLR beta Version 3.8.13 (deprecated) Jean Penné Software 111 2015-01-26 21:41
Prime95 beta version 28.3 Prime95 Software 68 2014-02-23 05:42
Beta version 24.12 available Prime95 Software 33 2005-06-14 13:19
Early Beta of version 24.11 Prime95 Software 113 2005-05-24 17:05
Beta version of PRP Prime95 PSearch 15 2004-09-17 19:21

All times are UTC. The time now is 13:46.


Mon Aug 2 13:46:27 UTC 2021 up 10 days, 8:15, 1 user, load averages: 1.97, 1.93, 1.94

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.