![]() |
![]() |
#1 |
Einyen
Dec 2003
Denmark
2×32×191 Posts |
![]()
I got my new computer: Haswell-E 8x core 5960X with 4x8 Gb 3000 Mhz 15-16-16-39 RAM (XMP).
I have been curious about the benefits of quad channel RAM vs dual channel for Prime95, so almost the first thing I did was testing Prime95 with 4 x 8Gb in quad channel and 2x8Gb in dual channel both setups running at 3000Mhz 15-16-16-39 and the 5960X processor running at 3500Mhz with Hyper Threading turned off. I tested 40M, 60M and 80M exponents with different setups of threads and cores per thread and each timing is the average over 100,000 iterations: Code:
Quad Channel DDR4 Dual Channel DDR4 3000Mhz 15-16-16-39 3000Mhz 15-16-16-39 iteration times in ms iteration times in ms 40M exponent(s) FMA3 FFT 2240K 1 worker 8 cores/worker 1.356 1.366 2 workers 4 cores/worker 2.935/2.937 4.159/4.122 (+41%) 4 workers 2 cores/worker 5.961/5.968/5.924/5.923 9.472/9.469/9.472/9.473 (+59%) 8 workers 1 core/worker 11.611/11.590/11.590/11.594 19.372/19.353/19.374/19.379 (+67%) 11.598/11.599/11.622/11.623 19.371/19.370/19.368/19.372 60M exponent(s) FMA3 FFT 3200K 1 worker 8 cores/worker 2.015 2.487 (+23%) 2 workers 4 cores/worker 4.296/4.297 6.524/6.525 (+52%) 4 workers 2 cores/worker 8.563/8.559/8.565/8.606 13.777/13.751/13.751/13.751 (+60%) 8 workers 1 core/worker 16.960/16.946/16.931/16.946 27.827/27.823/27.826/27.837 (+64%) 16.988/16.961/16.943/16.942 27.842/27.835/27.843/27.874 80M exponent(s) FMA3 FFT 4608K 1 worker 8 cores/worker 3.045 4.622 (+52%) 2 workers 4 cores/worker 6.156/6.156 9.908/9.923 (+61%) 4 workers 2 cores/worker 12.348/12.347/12.401/12.348 20.203/20.126/20.211/20.222 (+63%) 8 workers 1 core/worker 24.542/24.517/24.700/24.736 39.979/39.725/39.934/40.065 (+62%) 24.729/24.717/24.720/24.785 40.199/40.080/40.239/40.057 Last fiddled with by ATH on 2015-10-26 at 03:09 |
![]() |
![]() |
![]() |
#2 |
Einyen
Dec 2003
Denmark
2×32×191 Posts |
![]()
I also ran the Benchmark in Prime95 with:
BenchAllComplex=1 BenchHyperthreads=0 BenchMultithreads=1 in prime.txt. Prime95Benchmark.html and I ran standard memory speed benchmarks: MemoryBenchmark.html |
![]() |
![]() |
![]() |
#3 |
P90 years forever!
Aug 2002
Yeehaw, FL
41·199 Posts |
![]()
Fascinating, for most FFT lengths it appears that you are better off running 1 multi-threaded worker.
|
![]() |
![]() |
![]() |
#4 |
Einyen
Dec 2003
Denmark
2·32·191 Posts |
![]()
Looking at the (8 cpus, 1 worker) benchmark quad and dual throughput are tied up to 2048K FFT, then quad takes over as memory becomes the bottleneck.
But at (8 cpus, 8 workers) quad channel wins over dual channel for all the FFTs. |
![]() |
![]() |
![]() |
#5 | |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
5·7·191 Posts |
![]()
Can you please clarify the the meaning of "thread" here. I don't understand how you can have 8 cores on a single thread. Do you mean "worker" instead of "thread"?
Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Einyen
Dec 2003
Denmark
65568 Posts |
![]()
Yes sorry I mean workers. I called them threads because they are called "WorkerThreads" in local.txt and somehow I got stuck on threads instead of workers.
|
![]() |
![]() |
![]() |
#7 | |
Serpentine Vermin Jar
Jul 2014
5×677 Posts |
![]() Quote:
I still see a certain amount of % loss when adding additional cores to a worker, but definitely on DDR4 with quad-channel, it keeps up a LOT more than DDR3 with quad channel (Xeon E5 26xx v1/v2) or triple channel (Xeon 56xx for example). I've probably crippled some of my triple-channel systems because I've had to add 3 DIMMs per channel to some of them which lowers the clock rate... oh well. So don't do that. If you want max performance, stick with 1 DIMM per channel at the fastest speed the system allows. |
|
![]() |
![]() |
![]() |
#8 |
Einyen
Dec 2003
Denmark
2·32·191 Posts |
![]()
Added total iterations per second to my manual LL test times and 1 worker with 8 cores on it is the most efficient like in the benchmarks:
Code:
Quad Channel DDR4 Dual Channel DDR4 3000Mhz 15-16-16-39 3000Mhz 15-16-16-39 iteration times in ms iteration times in ms (total iterations per sec) (total iterations per sec) 40M exponent(s) FMA3 FFT 2240K 1 worker 8 cores/worker 1.356 (737 ite/sec) 1.366 (732 ite/sec) 2 workers 4 cores/worker 2.935/2.937 (681 ite/sec) 4.159/4.122 (+41%) (483 ite/sec) 4 workers 2 cores/worker 5.961/5.968/5.924/5.923 (673 ite/sec) 9.472/9.469/9.472/9.473 (+59%) (422 ite/sec) 8 workers 1 core/worker 11.611/11.590/11.590/11.594 (689 ite/sec) 19.372/19.353/19.374/19.379 (+67%) (413 ite/sec) 11.598/11.599/11.622/11.623 19.371/19.370/19.368/19.372 60M exponent(s) FMA3 FFT 3200K 1 worker 8 cores/worker 2.015 (496 ite/sec) 2.487 (+23%) (402 ite/sec) 2 workers 4 cores/worker 4.296/4.297 (465 ite/sec) 6.524/6.525 (+52%) (306 ite/sec) 4 workers 2 cores/worker 8.563/8.559/8.565/8.606 (466 ite/sec) 13.777/13.751/13.751/13.751 (+60%) (291 ite/sec) 8 workers 1 core/worker 16.960/16.946/16.931/16.946 (472 ite/sec) 27.827/27.823/27.826/27.837 (+64%) (287 ite/sec) 16.988/16.961/16.943/16.942 27.842/27.835/27.843/27.874 80M exponent(s) FMA3 FFT 4608K 1 worker 8 cores/worker 3.045 (328 ite/sec) 4.622 (+52%) (216 ite/sec) 2 workers 4 cores/worker 6.156/6.156 (325 ite/sec) 9.908/9.923 (+61%) (202 ite/sec) 4 workers 2 cores/worker 12.348/12.347/12.401/12.348 (324 ite/sec) 20.203/20.126/20.211/20.222 (+63%) (198 ite/sec) 8 workers 1 core/worker 24.542/24.517/24.700/24.736 (324 ite/sec) 39.979/39.725/39.934/40.065 (+62%) (200 ite/sec) 24.729/24.717/24.720/24.785 40.199/40.080/40.239/40.057 |
![]() |
![]() |
![]() |
#9 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
282916 Posts |
![]()
Very good job ATH!
![]() Very informative, too. It also seems curious to me how a single worker 8 cores is, per total, more productive than 8 workers single threaded. The single worker would still need time to put the pieces of the FFT together from the 8 cores. It is only explainable if Intel did wonders with internal process switching and their MESI protocols or whatever they were called, when sharing cache between cores... (I still have in mind the document about memories shared here on the forum some time ago). Last fiddled with by LaurV on 2015-10-26 at 05:20 |
![]() |
![]() |
![]() |
#10 |
Mar 2010
3·137 Posts |
![]()
Welcome to the club.
|
![]() |
![]() |
![]() |
#11 |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
37·163 Posts |
![]()
Could you do some tests with only 4 and maybe 6 cores in use rather than 8?
That should give an indication of how much the cheaper Haswell cpus would be limited with ddr4 rather than ddr3. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
i3 w/DDR4? | Fred | Hardware | 13 | 2016-03-24 08:16 |
Single vs Dual channel memory | TObject | Hardware | 5 | 2014-12-24 05:58 |
Importance of dual channel memory for dual core processors | patrik | Hardware | 3 | 2007-01-07 09:26 |
Opteron 175, Asus A8V-Deluxe, OCZ dual channel pc4000 | optyguy | Hardware | 3 | 2006-01-21 08:06 |
Cache, dual channel memory and Mprime performance | optim | Hardware | 4 | 2004-06-25 03:20 |