![]() |
Haswell-E Prelim. Benchmark
1 Attachment(s)
i7-5820k @ stock, 4x4GB RAM @ 2133 (XMP not enabled yet), HT disabled. Absolutely non-optimized setup; for all I know right now it might be 20% thermal throttled.
|
Differential diagnosis, please. (I.e. compare - preferably in "overall XY% faster per cycle" summary mode - to older Haswell running same benchmarks).
We seek insight, not data dumps. :) Comparing the CPU temperature range during the self-tests to an older Haswell doing same might shed light on the likelihood of thermal throttling. Again, differential diagnosis is your friend. |
1 Attachment(s)
[QUOTE=ewmayer;381921]Differential diagnosis, please. (I.e. compare - preferably in "overall XY% faster per cycle" summary mode - to older Haswell running same benchmarks).
We seek insight, not data dumps. :) Comparing the CPU temperature range during the self-tests to an older Haswell doing same might shed light on the likelihood of thermal throttling. Again, differential diagnosis is your friend.[/QUOTE] Tough room. Data acquisition before analysis. Attached are results with XMP enabled, so RAM @ 2800. The motherboard utility also boosted CPU clock by approx. 300MHz for some reason. |
You left out "...and analysis before presentation." :)
|
1 Attachment(s)
[QUOTE=ewmayer;381921]Differential diagnosis, please.[/QUOTE]
It's not lupus. |
It would seem that running 1 test with 6 threads is better than running 6 threads with 1 worker each.
[CODE]FFT 6t timing 6t thruput 6w thruput 1024K 0.852 1173.71 1043.83 1280K 1.072 932.84 822.41 1536K 1.284 778.82 679.03 1792K 1.553 643.92 567.52 2048K 1.802 554.94 479.4 2560K 2.409 415.11 385.5 3072K 2.968 336.93 299.52 3584K 3.591 278.47 254.08 4096K 4.194 238.46 ???.?? 5120K 5.309 188.36 ???.?? 6144K 6.401 156.23 156.16 7168K 7.583 131.87 132.41 8192K 8.681 115.19 115.17[/CODE] [CODE]FFT 6t timing 6t thruput 6w thruput 1024K 0.747 1338.69 1155.58 1280K 0.943 1060.45 915.91 1536K 1.139 877.96 755.36 1792K 1.37 729.93 634.42 2048K 1.575 634.92 542.75 2560K 2.126 470.37 428.64 3072K 2.651 377.22 362.52 3584K 3.203 312.21 305.53 4096K 3.762 265.82 264.48 5120K 4.763 209.95 211.43 6144K 5.729 174.55 175.0 7168K 6.828 146.46 147.47 8192K 7.776 128.6 129.21[/CODE] |
[QUOTE=axn;381931]It would seem that running 1 test with 6 threads is better than running 6 threads with 1 worker each.[/QUOTE]
How about the following combos? o 1 x 4-thread, 1 x 2-thread o 3 x 2-thread o 2 x 2-thread (i.e. idling 2 cores) o 1 x 4-thread (again idling 2 cores) Not sure if Prime95 allows such combos, if not you'll have to run several instances and manually derive the throughput, or "effective per-iteration time" -- assuming all jobs running same FFT length one just takes the various per-iter times t1, 2, ... and computes t* = 1/(1/t1 + 1/t2 + ...) Things get trickier if the various jobs are running different FFT lengths. |
[QUOTE=ewmayer;381953]How about the following combos?[/QUOTE]
There is a way to do this with the prime95 benchmark: [CODE]In Addition To The Benchmarking Options Above, The Following Options Are Available In The Multiple Workers Benchmark Benchmultipleworkers=0 Or 1 (Default Is 1) Benchtime=N (Default Is 10) Benchhyperthreads=0 Or 1 (Default Is 1) Benchmultithreads=0 Or 1 (Default Is 0) Benchoddmultithreads=0 Or 1 (Default Is 0) Benchmultipleworkers Can Be Used To Disable Benchmarking Multiple Workers. Benchtime Can Be Used To Change The Number Of Seconds To Run Each Benchmark. Benchhyperthreads Controls Can Be Used To Turn Off The Hyperthreaded Benchmarks. Benchmultithreads Can Be Used To Also Benchmark Multi-Threaded Ffts (E.G. 2 Workers On 4 Cpus). Benchoddmultithreads Can Be Used To Also Benchmark Asymetric Multi-Threaded Ffts Combinations Such As 2 Workers On 3 Cpus. [/CODE] The multithread observation is odd. I don't know of any other CPUs where multithreaded is better than multiworker. |
[QUOTE=ewmayer;381953]How about the following combos?
o 1 x 4-thread, 1 x 2-thread o 3 x 2-thread o 2 x 2-thread (i.e. idling 2 cores) o 1 x 4-thread (again idling 2 cores) Not sure if Prime95 allows such combos, if not you'll have to run several instances and manually derive the throughput, or "effective per-iteration time" -- assuming all jobs running same FFT length one just takes the various per-iter times t1, 2, ... and computes t* = 1/(1/t1 + 1/t2 + ...) Things get trickier if the various jobs are running different FFT lengths.[/QUOTE] Yep, I think I did something very similar with my AMD 1090T hex-core back in the day. [QUOTE=Prime95;381954]There is a way to do this with the prime95 benchmark: [CODE]In Addition To The Benchmarking Options Above, The Following Options Are Available In The Multiple Workers Benchmark Benchmultipleworkers=0 Or 1 (Default Is 1) Benchtime=N (Default Is 10) Benchhyperthreads=0 Or 1 (Default Is 1) Benchmultithreads=0 Or 1 (Default Is 0) Benchoddmultithreads=0 Or 1 (Default Is 0) Benchmultipleworkers Can Be Used To Disable Benchmarking Multiple Workers. Benchtime Can Be Used To Change The Number Of Seconds To Run Each Benchmark. Benchhyperthreads Controls Can Be Used To Turn Off The Hyperthreaded Benchmarks. Benchmultithreads Can Be Used To Also Benchmark Multi-Threaded Ffts (E.G. 2 Workers On 4 Cpus). Benchoddmultithreads Can Be Used To Also Benchmark Asymetric Multi-Threaded Ffts Combinations Such As 2 Workers On 3 Cpus. [/CODE] The multithread observation is odd. I don't know of any other CPUs where multithreaded is better than multiworker.[/QUOTE] Cool! Thanks for the heads up (and new benchmark code - I was surprised with the addition (guess I haven't been diligent in tracking progress lately))! Might not get to it until tomorrow - I've been unexpectedly busy this holiday weekend. The parts for the -E box were purchased Saturday, and they sat (taunting me) until Monday evening (they started mocking me, so I had to do something). |
Does doing one test with more than one thread use more memory than just one thread? If so how much?
Single thread per test calculations [url]http://mersenneforum.org/showpost.php?p=368793&postcount=485[/url] |
[QUOTE=Prime95;381954]The multithread observation is odd. I don't know of any other CPUs where multithreaded is better than multiworker.[/QUOTE]
It looks like a memory bottleneck to me. Single worker gets all its data in time, multiworker (who need more) does not. I think we should wait for the "real", "big chip" DDR4. :smile: The actual "products" are still in their infancy, and I don't know so many boards taking full advantage of the new channeled architecture... |
| All times are UTC. The time now is 05:39. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.