mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2007-12-27, 04:56   #1
zs6nw
 
Dec 2007

22×7 Posts
Default Core2 kicks Core4 silicon butt

2.4GHZ Dual Core benches 3x faster than 2.6GHZ Quad Core...

That's hard to swallow - comments/suggestions appreciated:

Machine A: DG965WH (Dual Core)

Intel(R) Core(TM)2 CPU E6600 @ 2.40GHz
CPU speed: 2397.81 MHz
L1 cache size: 32 KB
L1 cache line size: 64 bytes
Prime95 32-bit version 24.14, RdtscTiming=1
Best time for 512K FFT length: 11.660 ms.
Best time for 640K FFT length: 15.684 ms.
Best time for 768K FFT length: 19.009 ms.
Best time for 896K FFT length: 22.677 ms.
Best time for 1024K FFT length: 25.295 ms.
Best time for 1280K FFT length: 31.690 ms.
Best time for 1536K FFT length: 38.507 ms.
Best time for 1792K FFT length: 45.824 ms.
Best time for 2048K FFT length: 51.163 ms.
Best time for 2560K FFT length: 66.867 ms.
Best time for 3072K FFT length: 82.881 ms.
Best time for 3584K FFT length: 100.318 ms.
Best time for 4096K FFT length: 112.121 ms.

Machine B: X6DHE-XB (Quad Core)

Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
CPU speed: 2660.61 MHz
L1 cache size: 32 KB
L1 cache line size: 64 bytes
Prime95 32-bit version 24.14, RdtscTiming=1
Best time for 512K FFT length: 32.836 ms.
Best time for 640K FFT length: 44.817 ms.
Best time for 768K FFT length: 56.451 ms.
Best time for 896K FFT length: 67.419 ms.
Best time for 1024K FFT length: 78.335 ms.
Best time for 1280K FFT length: 97.779 ms.
Best time for 1536K FFT length: 119.686 ms.
Best time for 1792K FFT length: 141.810 ms.
Best time for 2048K FFT length: 156.129 ms.
Best time for 2560K FFT length: 199.925 ms.
Best time for 3072K FFT length: 242.916 ms.
Best time for 3584K FFT length: 296.264 ms.
Best time for 4096K FFT length: 336.755 ms.

On the dual-core, cores 1..2 were equally busy.
On the quad-core, cores 1..8 were equally busy.
(The quad-core machine runs two Xeons)
zs6nw is offline   Reply With Quote
Old 2007-12-27, 09:53   #2
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego Coun

821 Posts
Default

Set processor affinity from the advanced menu, stop and restart Prime95 and retest.
sdbardwick is offline   Reply With Quote
Old 2007-12-27, 10:46   #3
zs6nw
 
Dec 2007

22·7 Posts
Default

Quote:
Originally Posted by sdbardwick View Post
Set processor affinity from the advanced menu, stop and restart Prime95 and retest.
Thanks - each instance was associated with a specific core[0..7], then all eight threads were stopped & restarted.

There was a 20% improvement, which is great, but things are still 2x slower than the 2.4 GHz core2 machine.
zs6nw is offline   Reply With Quote
Old 2007-12-27, 22:16   #4
Cruelty
 
Cruelty's Avatar
 
May 2005

22×11×37 Posts
Default

What kind of RAM do you use in each system + what are the chipsets?
I suspect that Xeon system does not have sufficient memory bandwidth...
Cruelty is offline   Reply With Quote
Old 2007-12-27, 22:22   #5
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×7×461 Posts
Default

May I ask exactly what motherboard and what memory configuration you're using with the dual Xeon E5430 chips?

I presume the chips are http://www.newegg.com/Product/Produc...82E16819117145 - the new 45nm low-power-consumption ones with the large caches, which ought to be spectacular performers.

But your original post says you're using an X6DHE-XB board, and as far as I can tell the Xeon 5430 chips don't fit in such a board (they're 771-pin chips, and if the board is http://supermicro.com/products/mothe...0/X6DHE-XB.cfm then it takes 604-pin chips); do you have some sort of adapter, or are you using a different board?
fivemack is offline   Reply With Quote
Old 2007-12-28, 00:00   #6
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego Coun

11001101012 Posts
Default

Following up on fivemack's inquiry, off the top of my head those Xeon timings are about right for a 2 socket hyperthreading dual core system.

Try benchmarking with just one instance of Prime95 (with affinity set to core 0) so we can get a baseline for comparison.
sdbardwick is offline   Reply With Quote
Old 2007-12-28, 00:36   #7
db597
 
db597's Avatar
 
Jan 2003

2·103 Posts
Default

Probably memory bottleneck... more and more cores sharing a limited amount of bandwidth. Even with the Q6600 quads, we see the performance level off once we put on the 3rd core. And with 2 xeons, you even double the number of cores on top of that.

Prime95 is unique in that the CPU optimisation has been done so well that once you exceed 3 cores, the memory is getting maxed out. Don't see this in F@H.

Last fiddled with by db597 on 2007-12-28 at 00:42
db597 is offline   Reply With Quote
Old 2007-12-28, 09:28   #8
zs6nw
 
Dec 2007

22×7 Posts
Default

Yeah, most of you were right - memory bandwidth.

First, I had the motherboard wrong, it's a SuperMicro X7DCL-i which uses the Intel 5100 controller. The bad about this controller is that it channels both CPU memory requests to a single memory path. The memory bandwidth to each CPU is effectively halved.

Fortunately, it seems as if newer motherboards use the Intel 5400 controller, which has two paths to RAM. I'm pretty sure this will totally cure my issue. Of course it will cost me a new motherboard...

For those who asked, I did do some single-thread benchmarks, which were a little better than the Core2 Duo benchmarks posted earlier. Also, I used 2x 2GB DDR2-667 memory sticks, the fastest supported by the motherboard.

Thanks for all the feedback, I'd say as far as I'm concerned, the issue is closed: I used a motherboard that more than halved the available memory bandwidth of the E5430 processors.

Finally, I used the new mprime255 to benchmark 1-8 threads:

[Thu Dec 27 19:52:47 2007]
Compare your results to other computers at http://www.mersenne.org/bench.htm
Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
CPU speed: 2660.72 MHz, 8 cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 32 KB
L2 cache size: 6144 KB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 256
Prime95 32-bit version 25.5, RdtscTiming=1
Best time for 768K FFT length: 15.942 ms.
Best time for 896K FFT length: 19.521 ms.
Best time for 1024K FFT length: 22.337 ms.
Best time for 1280K FFT length: 29.252 ms.
Best time for 1536K FFT length: 36.162 ms.
Best time for 1792K FFT length: 43.346 ms.
Best time for 2048K FFT length: 48.503 ms.
Best time for 2560K FFT length: 64.157 ms.
Best time for 3072K FFT length: 78.098 ms.
Best time for 3584K FFT length: 93.058 ms.
Best time for 4096K FFT length: 104.233 ms.
Best time for 5120K FFT length: 132.860 ms.
Best time for 6144K FFT length: 160.176 ms.
Best time for 7168K FFT length: 193.282 ms.
Best time for 8192K FFT length: 212.017 ms.
Timing FFTs using 2 threads.
Best time for 768K FFT length: 8.467 ms.
Best time for 896K FFT length: 10.374 ms.
Best time for 1024K FFT length: 12.455 ms.
Best time for 1280K FFT length: 15.679 ms.
Best time for 1536K FFT length: 19.330 ms.
Best time for 1792K FFT length: 23.183 ms.
Best time for 2048K FFT length: 26.141 ms.
Best time for 2560K FFT length: 34.424 ms.
Best time for 3072K FFT length: 42.137 ms.
Best time for 3584K FFT length: 49.854 ms.
Best time for 4096K FFT length: 56.262 ms.
Best time for 5120K FFT length: 72.103 ms.
Best time for 6144K FFT length: 86.699 ms.
Best time for 7168K FFT length: 104.224 ms.
Best time for 8192K FFT length: 115.461 ms.
Timing FFTs using 3 threads.
Best time for 768K FFT length: 10.528 ms.
Best time for 896K FFT length: 12.265 ms.
Best time for 1024K FFT length: 18.825 ms.
Best time for 1280K FFT length: 16.041 ms.
Best time for 1536K FFT length: 19.485 ms.
Best time for 1792K FFT length: 23.058 ms.
Best time for 2048K FFT length: 26.033 ms.
Best time for 2560K FFT length: 33.656 ms.
Best time for 3072K FFT length: 40.724 ms.
Best time for 3584K FFT length: 47.987 ms.
Best time for 4096K FFT length: 54.553 ms.
Best time for 5120K FFT length: 68.906 ms.
Best time for 6144K FFT length: 86.584 ms.
Best time for 7168K FFT length: 99.504 ms.
Best time for 8192K FFT length: 112.586 ms.
Timing FFTs using 4 threads.
Best time for 768K FFT length: 9.835 ms.
Best time for 896K FFT length: 10.840 ms.
Best time for 1024K FFT length: 16.450 ms.
Best time for 1280K FFT length: 12.603 ms.
Best time for 1536K FFT length: 15.426 ms.
Best time for 1792K FFT length: 18.184 ms.
Best time for 2048K FFT length: 20.463 ms.
Best time for 2560K FFT length: 26.485 ms.
Best time for 3072K FFT length: 32.119 ms.
Best time for 3584K FFT length: 37.634 ms.
Best time for 4096K FFT length: 43.152 ms.
Best time for 5120K FFT length: 53.849 ms.
Best time for 6144K FFT length: 65.468 ms.
Best time for 7168K FFT length: 78.517 ms.
Best time for 8192K FFT length: 88.147 ms.
Timing FFTs using 5 threads.
Best time for 768K FFT length: 10.280 ms.
Best time for 896K FFT length: 11.450 ms.
Best time for 1024K FFT length: 17.234 ms.
Best time for 1280K FFT length: 11.790 ms.
Best time for 1536K FFT length: 13.872 ms.
Best time for 1792K FFT length: 16.275 ms.
Best time for 2048K FFT length: 18.288 ms.
Best time for 2560K FFT length: 23.749 ms.
Best time for 3072K FFT length: 28.409 ms.
Best time for 3584K FFT length: 33.396 ms.
Best time for 4096K FFT length: 38.152 ms.
Best time for 5120K FFT length: 47.313 ms.
Best time for 6144K FFT length: 57.611 ms.
Best time for 7168K FFT length: 68.433 ms.
Best time for 8192K FFT length: 77.099 ms.
Timing FFTs using 6 threads.
Best time for 768K FFT length: 9.409 ms.
Best time for 896K FFT length: 10.248 ms.
Best time for 1024K FFT length: 15.880 ms.
Best time for 1280K FFT length: 11.403 ms.
Best time for 1536K FFT length: 13.095 ms.
Best time for 1792K FFT length: 14.730 ms.
Best time for 2048K FFT length: 16.522 ms.
Best time for 2560K FFT length: 20.779 ms.
Best time for 3072K FFT length: 25.392 ms.
Best time for 3584K FFT length: 29.921 ms.
Best time for 4096K FFT length: 34.275 ms.
Best time for 5120K FFT length: 42.664 ms.
Best time for 6144K FFT length: 51.845 ms.
Best time for 7168K FFT length: 60.954 ms.
Best time for 8192K FFT length: 70.019 ms.
Timing FFTs using 7 threads.
Best time for 768K FFT length: 9.708 ms.
Best time for 896K FFT length: 10.763 ms.
Best time for 1024K FFT length: 16.670 ms.
Best time for 1280K FFT length: 11.166 ms.
Best time for 1536K FFT length: 12.901 ms.
Best time for 1792K FFT length: 14.920 ms.
Best time for 2048K FFT length: 16.754 ms.
Best time for 2560K FFT length: 20.669 ms.
Best time for 3072K FFT length: 24.509 ms.
Best time for 3584K FFT length: 28.760 ms.
Best time for 4096K FFT length: 32.811 ms.
Best time for 5120K FFT length: 40.536 ms.
Best time for 6144K FFT length: 48.947 ms.
Best time for 7168K FFT length: 58.237 ms.
Best time for 8192K FFT length: 67.027 ms.
Timing FFTs using 8 threads.
Best time for 768K FFT length: 9.306 ms.
Best time for 896K FFT length: 10.199 ms.
Best time for 1024K FFT length: 15.993 ms.
Best time for 1280K FFT length: 11.299 ms.
[Thu Dec 27 19:57:48 2007]
Best time for 1536K FFT length: 12.763 ms.
Best time for 1792K FFT length: 14.494 ms.
Best time for 2048K FFT length: 16.334 ms.
Best time for 2560K FFT length: 20.261 ms.
Best time for 3072K FFT length: 24.059 ms.
Best time for 3584K FFT length: 28.201 ms.
Best time for 4096K FFT length: 31.842 ms.
Best time for 5120K FFT length: 39.215 ms.
Best time for 6144K FFT length: 46.868 ms.
Best time for 7168K FFT length: 55.041 ms.
Best time for 8192K FFT length: 62.832 ms.
Best time for 58 bit trial factors: 3.884 ms.
Best time for 59 bit trial factors: 3.866 ms.
Best time for 60 bit trial factors: 3.840 ms.
Best time for 61 bit trial factors: 3.871 ms.
Best time for 62 bit trial factors: 6.580 ms.
Best time for 63 bit trial factors: 6.589 ms.
Best time for 64 bit trial factors: 6.054 ms.
Best time for 65 bit trial factors: 6.016 ms.
Best time for 66 bit trial factors: 6.021 ms.
Best time for 67 bit trial factors: 6.010 ms.
zs6nw is offline   Reply With Quote
Old 2007-12-28, 14:09   #9
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×7×461 Posts
Default

Quote:
Originally Posted by zs6nw View Post
Yeah, most of you were right - memory bandwidth.

First, I had the motherboard wrong, it's a SuperMicro X7DCL-i which uses the Intel 5100 controller. The bad about this controller is that it channels both CPU memory requests to a single memory path. The memory bandwidth to each CPU is effectively halved.
That's what I suspected; the 5100 controller has two front-side buses but only dual-channel DDR2/667 memory, so half the memory bandwidth per core of a standard 965 motherboard with a Q6600 in, and a quarter the memory bandwidth per core of the 965 with an E6600 in.

Quote:
Fortunately, it seems as if newer motherboards use the Intel 5400 controller, which has two paths to RAM. I'm pretty sure this will totally cure my issue. Of course it will cost me a new motherboard...
The 5400 controller has two front-side buses and four-channel memory; the problem is that you'll need to populate all four channels, with four 1GB FBDIMMs at $68 each from Crucial, to get the full memory bandwidth, which is still only the same as a Q6600 in a 965 board - half a channel per core. I think some reviews have suggested that you get better results if you populate all eight slots on a 5400 board, but the FBDIMMs start getting quite expensive.
fivemack is offline   Reply With Quote
Old 2007-12-28, 20:20   #10
zs6nw
 
Dec 2007

22·7 Posts
Default

Quote:
Originally Posted by fivemack View Post
The 5400 controller has two front-side buses and four-channel memory; the problem is that you'll need to populate all four channels, with four 1GB FBDIMMs at $68 each from Crucial, to get the full memory bandwidth, which is still only the same as a Q6600 in a 965 board - half a channel per core. I think some reviews have suggested that you get better results if you populate all eight slots on a 5400 board, but the FBDIMMs start getting quite expensive.
Right, but then you have to factor in the power consumption. This entire system (2 Xeons, 4M Ram, 160G Disk, etc) draws 102W when idle and 204W with all eight cores doing mprime.
zs6nw is offline   Reply With Quote
Old 2007-12-28, 22:22   #11
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×7×461 Posts
Default

If going to the faster motherboard made the machine run mprime twice as quickly, would you mind if it drew 408W?

The faster motherboard and FBDIMMs will draw more power, particularly at idle (I'd guess it might be as bad as 160W idle and 300W flat-out); but that's not particularly important if this is a machine whose goal in life is to run eight mprimes 24/7, and which will be idle only by unfortunate accident. I'm slightly wondering why in that case you chose to use a hard disc rather than having the OS on a USB stick; prime95 doesn't use much disc space.

100W is 2.4 kWh a day, which is 20p a day from my quite expensive electricity supplier; seventy pounds a year, not really significant given the cost of quad-core Xeons.

Last fiddled with by fivemack on 2007-12-28 at 22:24
fivemack is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Magnitude 5.6 Earthquake in Silicon Valley ewmayer Science & Technology 66 2008-07-31 15:30
Intel core2 Duo sieving? cipher Twin Prime Search 15 2007-06-05 21:20
Another Core2 Duo question Ender Hardware 3 2007-02-08 00:12
Silicon Valley to Receive Free Wi-Fi - New York Times ewmayer Lounge 0 2006-09-06 19:48
CPUs as Art - How to Expose the Bare Silicon? ewmayer Hardware 7 2005-10-19 19:48

All times are UTC. The time now is 13:41.


Fri Jul 7 13:41:48 UTC 2023 up 323 days, 11:10, 0 users, load averages: 0.73, 0.96, 1.08

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔