mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2008-05-19, 04:17   #1
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

44578 Posts
Default larger L2 cache, slower iterations?

The benchmarks page shows that many processors are actually faster than those that have the same clock speed but a larger L2 cache.

For example, the benchmarks page lists three 3.2 GHz Pentium 4 processors with 512 kb, 1,024 kb, and 2,048 kb L2 cache, respectively. For an exponent in the 49.1-58.52M range, each iteration takes 0.0958 seconds on the processor with 512 kb L2 cache, 0.1031 seconds on the one with 1,028 kb L2 cache, and 0.0984 seconds on the one with 2,048 kb L2 cache.

The processor with 512 kb L2 cache is actually the fastest. Does anyone know why this is?

Last fiddled with by ixfd64 on 2008-05-19 at 04:31
ixfd64 is online now   Reply With Quote
Old 2008-05-19, 04:54   #2
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

2·11·31 Posts
Default

512K cache P4's are Northwood cores, while 1024KB and higher are Prescott cores. Northwoods are more efficient. The discrepancy between Prescott core times could easily be due to different motherboards, chipsets, or memory timing configuration.
sdbardwick is online now   Reply With Quote
Old 2008-05-19, 05:44   #3
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

33×61 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
The benchmarks page shows that many processors are actually faster than those that have the same clock speed but a larger L2 cache.
...
The processor with 512 kb L2 cache is actually the fastest. Does anyone know why this is?
The benchmarks are a tool for George Woltman to optimize the Prime95 program (see ftp://mersenne.org/gimps/P4notes.doc) It was not intended for benchmarking processor, memory and mainboard. If you run several consecutive runs of the program you will get values that can differ by 5% or more. After all it is giving you the BEST iteration time not the average iteration time. The cache size values for the Core2 and Quads is not always correct in the table : the L2 cache is shared between two cores and sometimes this is reflected in the data of the benchmark pages and sometimes not. Finally to compare apples with apples one should have data on the memory and chipset used. I benchmarked the same motherboard and processor with three different memory speeds and found the times were inversely proportional to memory speed.

If one wants real benchmarks to aid in choosing hardware, there should be an option to get average times over a significant number of iterations. For multicore processors it should run on all the cores. My experience with NVidia SLI chipsets is that they do not scale well (4 cores running together yielding little more than two individual cores, Intel chipsets give about 3.2 cores...)

Jacob

Last fiddled with by S485122 on 2008-05-19 at 05:47 Reason: Sbardwick beat me to it and was more concise :-(
S485122 is offline   Reply With Quote
Old 2008-05-19, 20:46   #4
Nelson
 
Nelson's Avatar
 
Apr 2008
Regensburg..^~^..Plzeƈ

5·17 Posts
Default Better iteration timings

Quote:
Originally Posted by sdbardwick View Post
512K cache P4's are Northwood cores, while 1024KB and higher are Prescott cores. Northwoods are more efficient. The discrepancy between Prescott core times could easily be due to different motherboards, chipsets, or memory timing configuration.
I also discovered to my great dismay that a Northwood @ 3GHz was about 5% faster than a Prescott @ 3.3 GHz, one other factor was that the Northwood was running XP Pro with SP1 and the Prescott with SP2 and additional background programs that according to "procexp" were using 0.0 CPU which probably means during the update interval "procexp" couldn't detect any usage however memory was in use and would change from time to time so some CPU cycles had to be involved thus lengthening the iteration times. SP2 with it's "security enhancements" added to the number of processes as well so overall efficiency was reduced for that too. Both tests were on ASUS P4C800 Deluxe mainboards the Northwood with 512 Mb and the Prescott with 1024 Mb. FSB is essentially the same with Dual channel memory @ same timings. Go figure! About the only measurable difference is the OS and Alchohol 120% running additonal processes so the number of processes on a system will have a considerable impact on Prime95 performance.

The upshot is to use a stripped down OS with only essential services running to get the best timings. That can be done to "Show off" and later be reactivated but if your interested in greatest possible throughput see what you can do without including a lowend graphic card with very minimal drivers, no DirectX and use another PC for gaming and whatever but then the idea of using idle cycles wouldn't amount to a "hill of beans." A linux system would probably give better results if it's processes are really streamlined and no graphical frontend ie GUI such as KDE which would slow everything back down again. Does a linux expert have anything to say about the concept. Of course "procexp" or "task mangler" shouldn't be running at the same time either they are after all for troubleshooting and not regular work.

nelson
Nelson is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
CUDALucas not redoing iterations with larger FFT patrik GPU Computing 2 2014-09-12 00:56
Less GHz days for larger exponents in TF? Bdot Information & Answers 12 2010-11-21 22:33
Any changes planned for the larger Prescott cache? Digital Concepts Software 8 2004-03-06 06:54
2003-10-29: P-1: a set of 26 larger exponents GP2 Completed Missions 3 2003-11-12 14:16
Larger Prescott Cache = Speed Improvement? ColdFury Hardware 7 2003-10-12 16:43

All times are UTC. The time now is 23:58.

Wed Jan 20 23:58:11 UTC 2021 up 48 days, 20:09, 0 users, load averages: 1.18, 1.44, 1.54

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.