![]() |
|
|
#144 | |
|
May 2013
East. Always East.
11×157 Posts |
Quote:
If you run something like CPUID Hardware Monitor, you will notice that your current CPU temperature will be a few degrees below the max recorded temperature but WELL above the lowest recorded. For example, as we speak, CURRENT 72C, MAX 77C, MIN 30C, for me. The audible hiccups in the fan speed are an indication that your cooling solution is running properly and that the chip doesn't have much thermal inertia. My chip is at roughly 72C at the moment. If I take off the load for even two seconds, the temperature drops to 40C. Another two seconds and it's into the low 30s. |
|
|
|
|
|
|
#145 |
|
Romulan Interpreter
Jun 2011
Thailand
26×151 Posts |
+1 to this (I wanted to say something similar, but you were faster)
Last fiddled with by LaurV on 2013-06-23 at 06:09 |
|
|
|
|
|
#146 |
|
May 2013
East. Always East.
11×157 Posts |
Load up your GPU and then stop it for a while and watch as the temperature slowly slowly crawls its way down to idle temperature. It can take even a minute.
|
|
|
|
|
|
#147 |
|
∂2ω=0
Sep 2002
República de California
19·613 Posts |
The audible hiccups in the fan speed are an indication that I need to multithread and SIMDize my residue conversion routine. :)
|
|
|
|
|
|
#148 |
|
May 2013
East. Always East.
11·157 Posts |
Well, if you can have the saves for each worker staggered then 7 of 8 (or 3 of 4, I can't remember which processor you have) workers can keep chugging along while the one worker waits for the save to complete, that could be kind of cool.
|
|
|
|
|
|
#149 |
|
∂2ω=0
Sep 2002
República de California
19·613 Posts |
Good point - probably best to just copy the residue array and spin the existing savefile-write stuff off into a separate thread - no reason to keep the crunching threads waiting.
|
|
|
|
|
|
#150 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
I've been working on figuring out why the key building block macro won't run in the theoretically possible 13 clocks. I mentioned earlier that this macro is taking 15 clocks.
I believe I have discovered the causes of the 2 clock delay. It is actually a combination of factors. 1) Since Intel ditched the 4-operand FMA instruction (A = B * C + D) in favor of a 3-operand version (one must overwrite B, C, or D), my macro is forced to do a number of register copies. These register copies have zero-latency, but have a cost as we will see later. 2) If you use both add and subtract instructions as well as FMA (or MUL) instructions on port 1, then you will encounter "dispatch bubbles". This is because the add and sub instruction take 3 clocks while the FMA takes 5. For example, if an FMA is scheduled for clocks 1-5 and 2-6, then an add or subtract cannot be dispatched for clocks 3-5 because you cannot have two instruction both end on clock 5. The add must be delayed until clocks 5-7 -- a two-clock dispatch bubble. 3) To avoid this 3 vs 5 clock dispatch bubble, a macro must contain exactly 25% or 50% add and subtract instructions that execute every other clock cycle or every clock cycle respectively. This is a pretty severe coding restriction. 4) Because of the restrictions in 3, it is best to always use FMA. This means that calculating A+B and A-B (a common FFT operation) requires 3 instructions: a register copy and two FMA instructions rather than 2 instructions: an add and subtract instruction. 5) The chip can retire only 4 instructions per clock cycle. The building block macro does 8 data loads, 4 sin/cos loads, 8 stores, 4 muls, 22 FMAs, and 11 register copies. That is 57 instructions at 4 retires per clock you get a 14.25 clock minimum. 6) [retracted] What does all this mean for prime95? Probably not a lot. If I could achieve 13 clocks, the single worker case (or Haswell-E multi-worker case) would be a few percent faster. The 4 worker case will still be bandwidth limited. Last fiddled with by Prime95 on 2013-06-27 at 16:04 |
|
|
|
|
|
#151 | |
|
∂2ω=0
Sep 2002
República de California
19×613 Posts |
Quote:
Going from ADD/SUB -> FMA for typically-ADD/SUB-dominated DFT macros theoretically doubles our throughput because the resulting number of FMAs should match the previous ADD/SUB total, but 2 FMA can issue per cycle versus just 1 ADD/SUB. It will be interesting to see how much of that theoretical gain is realizable. |
|
|
|
|
|
|
#152 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
Correction. This is not the case - there was a bug in my test case code.
|
|
|
|
|
|
#153 | |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Hi,
Quote:
![]() i7 4770k, Gigabyte Z87X-UD3H, 2x 8GiB DDR3-2133 (1.5V), Thermalright "HR-02 Macho Rev A", Noctua NF-P12 fan running at ~1250rpm, open on table (without chassis): I've set voltage to "normal" instead of "auto". I'm running a selfcompiled and configured HPL (MPI parallel Linpack) which has a higher efficency (and power consumption) than LinX (which Windows users might know). At default clock rates (3.5GHz, turbo enabled, Hyper-Threading disabled) and memory @DDR3-2133 I get temperatures a little bit above 90°C and 201GFLOPS. OK, prime95 temperature will be below that but this doesn't fell comfortable. Oliver |
|
|
|
|
|
|
#154 | |
|
"Kieren"
Jul 2011
In My Own Galaxy!
100111101011102 Posts |
Quote:
Also, what voltage does the "Normal" setting result in? Don't late model Intel chips run in the 1.2 volt range? EDIT: My FX-8350 (@ stock 4 GHz) draws a LOT more power than a Haswell chip, and stays in the middle 50 C's on air cooling, on a warm day, with two substantial GPU's in the case. The CPU is running P-1 x8, and both GPU's are running mfaktc. I can't see how you would reach such scary temps if things were working correctly. Last fiddled with by kladner on 2013-06-30 at 03:57 |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Haswell-E Prelim. Benchmark | sdbardwick | Hardware | 37 | 2015-02-10 18:49 |
| Prime95 and Haswell | Pleco | Information & Answers | 22 | 2014-07-13 16:03 |
| Haswell Rig | Mini-Geek | Hardware | 64 | 2014-05-27 13:22 |
| Prime95 version 27.1 early preview, not-even-close-to-beta release | Prime95 | Software | 126 | 2012-02-09 16:17 |
| Missing mouse-over preview text | retina | Forum Feedback | 1 | 2011-09-12 15:32 |