mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Haswell Preview Benchmark (https://www.mersenneforum.org/showthread.php?t=17982)

kracker 2013-06-30 17:54

+1

One more thing, make sure the heatsink is *properly* placed, I've done that before when it was not completely secured... And make sure to use good grease for it.

TheJudger 2013-06-30 21:45

kladner: those AMDs have bigger die size thus they can move heat out of the silicon easilier (315mm[SUP]2[/SUP] vs. 177mm[SUP]2[/SUP]). Of course I have doublechecked the heatsink.

I'm affraid that I'm very good in putting some load on the CPU... in some overclocking forums I've noticed some persons which claim they can run their 4770k @4.5GHz, 1.4V easily on air while running LinX (Linpack for Windows)... well they have used an old LinX which (I guess) only does SSE. At 4.5GHz the screenshot revealed ~60GFLOPS...
Back to my issue: seems that the PCU (Power Controlling Unit, part of the CPU) and BIOS (OK, OK EFI) aren't on my side. With default settings in BIOS the system does 3.9GHz 4-core turbo under heavy load (exceeding the TDP easily)... The CPU should do up to 3.7GHz 4-core turbo to stay within spec. For each step above non-turbo multiplier the PCU adds some voltage. And there are some comments on the web that for AVX code it adds even more voltage. I've measured ~1.26v under load (~1.1v default vCore). With voltage manually set to 1.100v and 4GHz I was able to keep the CPU temperatures at ~75°C while running Linpack.

Oliver

kladner 2013-06-30 22:36

Thanks for the details on voltage and temperature. I know you would have checked the heatsinking, but over 90 C is in the borderlands even for Intel: startling, I would call it, even under extreme loads with anything but a stock cooler. To mention it is on the order of asking "Is the power cord connected?" The Linpack version you reference must be one mean mofo!

TheJudger 2013-06-30 22:59

I'll continue on this next weekend, perhaps (for comparison) I should check the temperatures while running mprime. A fine tweaked Linpack (hpl-2.0 + Intel MKL 11.someversion, properly choosen parameters for HPL and process pinning) is my worst case scenario for temperature and power consumption. If a system can do this I feel pretty comfortable with real world applications. Linpack makes heavy use of the new dual FMA capability of the haswell chips (16 DP ops per clock and core).

Edit: the Windows "LinX" isn't that bad if you choose the right version (AVX-capable, check performance, for comparison: I can do 200GFLOPS with 4 cores @3.9GHz on my system) if you want to give it a try, much easier than compiling the whole stuff by yourself.

Oliver

ewmayer 2013-07-01 03:30

[QUOTE=ewmayer;344497]After getting your e-mail a couple days ago in which you first described some of the above issues (and with advance knowledge that ADD/SUB are inherently limited to just 1 of the 2 issue ports), I came to the same conclusion - been spending the last 2 days restructuring (so far just the scalar-data C-code version of) one of my FFT-core building blocks, the radix-16 DIF DFT-with-twiddles macro, to use all FMA-based arithmetic.[/QUOTE]

Above radix-16 macro fully C-prototyped - first using FMA4 as the model for simplicity, then converting that to use FMA3 (in which the 'c' in FMA3(a,b,c) = a*b + c gets overwritten by the result), keeping in mind the 16-register constraint of the Intel CPUs. Assembly coding begins tomorrow...

ewmayer 2013-07-02 18:31

[QUOTE=ewmayer;344930]Above radix-16 macro fully C-prototyped - first using FMA4 as the model for simplicity, then converting that to use FMA3 (in which the 'c' in FMA3(a,b,c) = a*b + c gets overwritten by the result), keeping in mind the 16-register constraint of the Intel CPUs. Assembly coding begins tomorrow...[/QUOTE]

IAAI [i am an idiot] - in the Intel FMA3 model it's 'a' [i.e. the first of the 2 multiplicands] which is also used to store the result. Led astray by the opposite-operand-ordering of Intel vs AT&T syntax once again...

Anyhoo, rejiggering the prototype code shouldn't be too hard, just a lot of swapping out what goes into various register-copy temporaries. Still annoyed @myself for wasting my own time, though. Will try to use the extra work to also do some 2nd-pass optimization, so as to not make it feel entirely redundant - save a few register copies and improve the instruction scheduling to better hide latency.

Prime95 2013-07-02 19:55

[QUOTE=ewmayer;345086]IAAI [i am an idiot] - in the Intel FMA3 model it's 'a' [i.e. the first of the 2 multiplicands] which is also used to store the result.[/QUOTE]

You can overwrite either a, b, or c. To overwrite c, use vfmadd231.

I wrote a MASM macro that takes 4 args and outputs the optional register copy and the appropriate 132, 231, 213 version of the FMA instruction.

ewmayer 2013-07-02 20:50

Ah, very good - nice of Intel to at least provide some options here, given that they don't (yet) support the desired FMA4 syntax.

TheJudger 2013-07-05 19:17

my 4770k - continued

I see two options:[LIST=1][*]I'm too stupid to mount the heatsink properly [B]and[/B] I'm too stupid to run Prime95 (mprime)[*]"Others" don't stress their CPUs as hard as I do[/LIST]
i7 4770k + Gigabyte Z87X-UD3H + 2x 8GiB DDR3-2133 1.50V + 1x SATA HDD + 1x SATA SSD + Thermalright HR-02 Macho + Noctua NF-P12 @full speed, [I]80minus[/I] power supply
temporary build open on table, ambient temperature ~22°C
BIOS settings: Gigabytes BIOS defaults, voltages set to "normal", hyperthreading disabled, memory set to "XMP Profile 1"
OS: openSUSE 12.3, 64bits of course

Optimized HPL (Linpack) making heavy usage of AVX+FMA: 210W measured on AC, CPU reports ~120W, CPU temperatures 92-95°C
Prime95 (mprime v27.9), "blend test": 150-170W measured on AC, CPU reports 70-90W, CPU temperatures 60-72°C
Prime95 out indicates that it is using AVX FFTs, power and temperatures varies over different FFT lengths while HPL power consumption is very stable.

Oliver

ewmayer 2013-07-05 19:35

[QUOTE=TheJudger;345339]Optimized HPL (Linpack) making heavy usage of AVX+FMA: 210W measured on AC, CPU reports ~120W, CPU temperatures 92-95°C
Prime95 (mprime v27.9), "blend test": 150-170W measured on AC, CPU reports 70-90W, CPU temperatures 60-72°C[/QUOTE]

Since these are on the same setup, it appears there is a significant load-dependent aspect.

You note that Linpack is making heavy use of AVX2 (i.e. AVX+FMA) - there is one obvious difference between it and Prime95, which George is busy adding FMA-usage to, but your version uses just FMA-less AVX. AVX2 effectively doubles the floating-MUL bandwidth (also ADD, but the MUL is the biggie here) - those MULs generate a lot of heat.

Also, linear algebra tends to be able to use the FPU at much closer to max. theoretical capacity than FFTs, because the data access patterns are much simpler and the arithmetic mix is more favorable in the sense that optimized FFTs are ADD-dominated.

George, have you noticed any temperature impact from using FMA in your development code?

TheJudger 2013-07-05 19:50

Hi Ernst,

yepp, HPL is a power virus (but not the worst I can imagin[SUP]*[/SUP])
This perfect explains why I have so much trouble with the cooling of my CPU while other say that they can handle the heat of a Haswell.
In the [I]past[/I] (few years ago) Prime95s power consumption/heat generation was close to Linpack but today...

[SUP]*[/SUP]I [B]guess[/B] running only DGEMM (BLAS) on reasonable sized inputs is even worse than Linpack. Linpack spents much time in this standard function but not all of the time, there are other calls to the BLAS library and communication between processes/threads aswell. I'm using [B]Intel[/B] MKL as BLAS implementation, those functions are designed for optimal performance (not for maximum power consumption/heat generation). So I guess Intel wont say "don't run this code on our CPUs, it is just a stupid power virus".

Oliver

P.S. I've just [I]improved[/I] my HPL settings: 1-2W more, 203GFLOPS @default clock


All times are UTC. The time now is 20:49.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.