mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2014-09-02, 01:43   #1
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

5·137 Posts
Default Haswell-E Prelim. Benchmark

i7-5820k @ stock, 4x4GB RAM @ 2133 (XMP not enabled yet), HT disabled. Absolutely non-optimized setup; for all I know right now it might be 20% thermal throttled.
Attached Files
File Type: txt results.txt (13.4 KB, 178 views)

Last fiddled with by sdbardwick on 2014-09-02 at 01:45 Reason: RAM info
sdbardwick is offline   Reply With Quote
Old 2014-09-02, 01:49   #2
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103·113 Posts
Default

Differential diagnosis, please. (I.e. compare - preferably in "overall XY% faster per cycle" summary mode - to older Haswell running same benchmarks).

We seek insight, not data dumps. :)

Comparing the CPU temperature range during the self-tests to an older Haswell doing same might shed light on the likelihood of thermal throttling. Again, differential diagnosis is your friend.
ewmayer is offline   Reply With Quote
Old 2014-09-02, 02:35   #3
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

10101011012 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Differential diagnosis, please. (I.e. compare - preferably in "overall XY% faster per cycle" summary mode - to older Haswell running same benchmarks).

We seek insight, not data dumps. :)

Comparing the CPU temperature range during the self-tests to an older Haswell doing same might shed light on the likelihood of thermal throttling. Again, differential diagnosis is your friend.
Tough room.

Data acquisition before analysis.

Attached are results with XMP enabled, so RAM @ 2800. The motherboard utility also boosted CPU clock by approx. 300MHz for some reason.
Attached Files
File Type: txt results01.txt (13.4 KB, 139 views)
sdbardwick is offline   Reply With Quote
Old 2014-09-02, 05:06   #4
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103·113 Posts
Default

You left out "...and analysis before presentation." :)
ewmayer is offline   Reply With Quote
Old 2014-09-02, 07:43   #5
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

224058 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Differential diagnosis, please.
It's not lupus.
Attached Thumbnails
Click image for larger version

Name:	keep-calm-its-not-lupus.jpg
Views:	179
Size:	136.0 KB
ID:	11640  
Batalov is offline   Reply With Quote
Old 2014-09-02, 08:28   #6
axn
 
axn's Avatar
 
Jun 2003

22×3×421 Posts
Default

It would seem that running 1 test with 6 threads is better than running 6 threads with 1 worker each.
Code:
FFT   6t timing 6t thruput 6w thruput
1024K    0.852   1173.71     1043.83
1280K    1.072    932.84      822.41
1536K    1.284    778.82      679.03
1792K    1.553    643.92      567.52
2048K    1.802    554.94      479.4
2560K    2.409    415.11      385.5
3072K    2.968    336.93      299.52
3584K    3.591    278.47      254.08
4096K    4.194    238.46      ???.??
5120K    5.309    188.36      ???.??
6144K    6.401    156.23      156.16
7168K    7.583    131.87      132.41
8192K    8.681    115.19      115.17
Code:
FFT   6t timing 6t thruput 6w thruput
1024K    0.747   1338.69    1155.58
1280K    0.943   1060.45     915.91
1536K    1.139    877.96     755.36
1792K    1.37     729.93     634.42
2048K    1.575    634.92     542.75
2560K    2.126    470.37     428.64
3072K    2.651    377.22     362.52
3584K    3.203    312.21     305.53
4096K    3.762	  265.82     264.48
5120K    4.763    209.95     211.43
6144K    5.729    174.55     175.0
7168K    6.828    146.46     147.47
8192K    7.776    128.6      129.21
axn is offline   Reply With Quote
Old 2014-09-02, 21:14   #7
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103·113 Posts
Default

Quote:
Originally Posted by axn View Post
It would seem that running 1 test with 6 threads is better than running 6 threads with 1 worker each.
How about the following combos?

o 1 x 4-thread, 1 x 2-thread

o 3 x 2-thread

o 2 x 2-thread (i.e. idling 2 cores)

o 1 x 4-thread (again idling 2 cores)

Not sure if Prime95 allows such combos, if not you'll have to run several instances and manually derive the throughput, or "effective per-iteration time" -- assuming all jobs running same FFT length one just takes the various per-iter times t1, 2, ... and computes

t* = 1/(1/t1 + 1/t2 + ...)

Things get trickier if the various jobs are running different FFT lengths.

Last fiddled with by ewmayer on 2014-09-02 at 21:20
ewmayer is offline   Reply With Quote
Old 2014-09-02, 21:20   #8
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011001102 Posts
Default

Quote:
Originally Posted by ewmayer View Post
How about the following combos?
There is a way to do this with the prime95 benchmark:

Code:
In Addition To The Benchmarking Options Above, The Following Options Are
Available In The Multiple Workers Benchmark
	Benchmultipleworkers=0 Or 1	(Default Is 1)
	Benchtime=N			(Default Is 10)
	Benchhyperthreads=0 Or 1	(Default Is 1)
	Benchmultithreads=0 Or 1	(Default Is 0)
	Benchoddmultithreads=0 Or 1	(Default Is 0)
Benchmultipleworkers Can Be Used To Disable Benchmarking Multiple Workers.
Benchtime Can Be Used To Change The Number Of Seconds To Run Each Benchmark.
Benchhyperthreads Controls Can Be Used To Turn Off The Hyperthreaded Benchmarks.
Benchmultithreads Can Be Used To Also Benchmark Multi-Threaded Ffts (E.G. 2 Workers On 4 Cpus).
Benchoddmultithreads Can Be Used To Also Benchmark Asymetric Multi-Threaded Ffts
Combinations Such As 2 Workers On 3 Cpus.
The multithread observation is odd. I don't know of any other CPUs where multithreaded is better than multiworker.
Prime95 is offline   Reply With Quote
Old 2014-09-02, 21:35   #9
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

5·137 Posts
Default

Quote:
Originally Posted by ewmayer View Post
How about the following combos?

o 1 x 4-thread, 1 x 2-thread

o 3 x 2-thread

o 2 x 2-thread (i.e. idling 2 cores)

o 1 x 4-thread (again idling 2 cores)

Not sure if Prime95 allows such combos, if not you'll have to run several instances and manually derive the throughput, or "effective per-iteration time" -- assuming all jobs running same FFT length one just takes the various per-iter times t1, 2, ... and computes

t* = 1/(1/t1 + 1/t2 + ...)

Things get trickier if the various jobs are running different FFT lengths.
Yep, I think I did something very similar with my AMD 1090T hex-core back in the day.
Quote:
Originally Posted by Prime95 View Post
There is a way to do this with the prime95 benchmark:

Code:
In Addition To The Benchmarking Options Above, The Following Options Are
Available In The Multiple Workers Benchmark
	Benchmultipleworkers=0 Or 1	(Default Is 1)
	Benchtime=N			(Default Is 10)
	Benchhyperthreads=0 Or 1	(Default Is 1)
	Benchmultithreads=0 Or 1	(Default Is 0)
	Benchoddmultithreads=0 Or 1	(Default Is 0)
Benchmultipleworkers Can Be Used To Disable Benchmarking Multiple Workers.
Benchtime Can Be Used To Change The Number Of Seconds To Run Each Benchmark.
Benchhyperthreads Controls Can Be Used To Turn Off The Hyperthreaded Benchmarks.
Benchmultithreads Can Be Used To Also Benchmark Multi-Threaded Ffts (E.G. 2 Workers On 4 Cpus).
Benchoddmultithreads Can Be Used To Also Benchmark Asymetric Multi-Threaded Ffts
Combinations Such As 2 Workers On 3 Cpus.
The multithread observation is odd. I don't know of any other CPUs where multithreaded is better than multiworker.
Cool! Thanks for the heads up (and new benchmark code - I was surprised with the addition (guess I haven't been diligent in tracking progress lately))!

Might not get to it until tomorrow - I've been unexpectedly busy this holiday weekend. The parts for the -E box were purchased Saturday, and they sat (taunting me) until Monday evening (they started mocking me, so I had to do something).
sdbardwick is offline   Reply With Quote
Old 2014-09-02, 22:31   #10
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

23×3×5×72 Posts
Default

Does doing one test with more than one thread use more memory than just one thread? If so how much?

Single thread per test calculations http://mersenneforum.org/showpost.ph...&postcount=485
henryzz is offline   Reply With Quote
Old 2014-09-03, 03:24   #11
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

7×1,373 Posts
Default

Quote:
Originally Posted by Prime95 View Post
The multithread observation is odd. I don't know of any other CPUs where multithreaded is better than multiworker.
It looks like a memory bottleneck to me. Single worker gets all its data in time, multiworker (who need more) does not.

I think we should wait for the "real", "big chip" DDR4.
The actual "products" are still in their infancy, and I don't know so many boards taking full advantage of the new channeled architecture...
LaurV is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Haswell Preview Benchmark kracker Hardware 543 2015-10-05 05:28
Prime95 and Haswell Pleco Information & Answers 22 2014-07-13 16:03
Haswell Rig Mini-Geek Hardware 64 2014-05-27 13:22
LLR benchmarks for Haswell Unregistered Information & Answers 0 2013-06-16 21:48
Haswell New Instructions / AVX2 ixfd64 Hardware 72 2013-03-20 00:00

All times are UTC. The time now is 05:51.


Sat Jul 17 05:51:16 UTC 2021 up 50 days, 3:38, 1 user, load averages: 1.09, 1.49, 1.77

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.