![]() |
![]() |
#309 | |
Aug 2013
3×29 Posts |
![]() Quote:
2/4 have EVGA x299 Micro (131-SX-E295) - latest BIOS 2/4 have EVGA x299 Micro 2 (121-SX-E296) - latest BIOS 2/4 have G.SKILL Sniper X Series 64GB (4 x 16GB) 288-Pin DDR4 SDRAM DDR4 3600 (PC4 28800) Desktop Memory Model F4-3600C19D-32GSXKB 2/4 have G.SKILL Ripjaws V Series 64GB (4 x 16GB) 288-Pin DDR4 SDRAM DDR4 3600 (PC4 28800) Desktop Memory Model F4-3600C19D-32GVRB 4/4 have: Intel 9800X Skylake-X CPU (lapped) Noctua NH-D15 coolers with NT-H1 paste (all temps under 80C at load) All are using XMP settings for 3600Mhz RAM @ cas 19 400W Seasonic platinum PSU (actually one has a 850W to see if that was the problem) Headless with junk 2nd hand graphics cards 120GB SSD Win10 64 bit Cheap microatx Cases All are running default BIOS settings except for XMP and the one I’m underclocking to reach stability All are plugged into a KillOWatt to measure consumption. All are stable under 29.4 (no AVX512) and unstable @ 29.5b9 (AVX512) Last fiddled with by simon389 on 2019-02-07 at 19:53 |
|
![]() |
![]() |
![]() |
#310 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
9,473 Posts |
![]() Quote:
The only important commonality I see is the Intel 9800X Skylake-X CPU (lapped). If you have the time and the coin, swapping one of those out and testing deeply might be worth your time. Intel have learnt the hard way that selling bad kit is not good for public relations, nor their stock price.... Edit: Just to be clear, you "lapped" your kit!?!?!? And you're wondering why it's not working reliably? Last fiddled with by chalsall on 2019-02-07 at 20:13 Reason: s/only commonality/only important commonality/; |
|
![]() |
![]() |
![]() |
#311 | |
"Mihai Preda"
Apr 2015
134210 Posts |
![]() Quote:
The core problem is not why the system failed, but why the PRP didn't catch it. |
|
![]() |
![]() |
![]() |
#312 | ||
Sep 2016
1010010112 Posts |
![]() Quote:
But since there's 4 machines here failing the same way, it seems unlikely this would be the cause. It's worth turning off the XMP anyway to see if you still see the instability. But I doubt this is the cause. Memory instability is highly variable across even identical systems. So it seems unlikely that it would put all 4 systems right on the edge of stability. Quote:
So while we are blocked trying to solve the software issue, it is still safe to try to (in parallel) debug the hardware issue as well. There's also multiple boxes here exhibiting the same issue. So if we're extra paranoid, just leave one of them untouched. Last fiddled with by Mysticial on 2019-02-07 at 20:25 |
||
![]() |
![]() |
![]() |
#313 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
224018 Posts |
![]() Quote:
Some of us prefer to work with stable kit before trying to figure out why the software failed. But your point is valid. Possibly simon389 will agree to not fix one of his machines, so George can run tests on it. To figure out why the software didn't correctly determine that the hardware was insane. |
|
![]() |
![]() |
![]() |
#314 |
"Robert Gerbicz"
Oct 2005
Hungary
2·7·103 Posts |
![]()
If he is getting/using different shift value for PRP then he will squaremod very different numbers.
Agree. |
![]() |
![]() |
![]() |
#315 |
Aug 2013
3·29 Posts |
![]()
They had problems before lapping the IHS. I lapped to try and get temps down (and the temps did indeed lower but the errors remained). I have lapped many times. No problems ever.
Last fiddled with by simon389 on 2019-02-07 at 21:09 |
![]() |
![]() |
![]() |
#316 |
If I May
"Chris Halsall"
Sep 2002
Barbados
9,473 Posts |
![]() |
![]() |
![]() |
![]() |
#317 | |
Sep 2016
5138 Posts |
![]() Quote:
The same mobo, with the same memory brand and type (albeit with different decorations). I've seen these sorts of things show up all the time on my own hardware. Even for memory that's on the QVL of the mobo. I've had multiple instances where the QVL has failed me. Quite often, downclocking doesn't work, and the only solution is to change either the mobo or the memory. The only thing that doesn't really support this is that the errors are AVX512-sensitive. Perhaps the AVX512 makes the workload sufficiently memory-bound to stress the memory in a way that's not possible with just AVX or scalar. I find it hard to believe that sanding down the IHS would cause problems like this. I'd expect it to be no issues at all, unrelated issues (like temperature), or catastrophic issues like a missing memory channel, or complete failure of the chip. And then you have the fact that it's 4/4 machines here - all with the exact same symptoms. Last fiddled with by Mysticial on 2019-02-07 at 21:48 |
|
![]() |
![]() |
![]() |
#318 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
9,473 Posts |
![]() Quote:
This might only involve some BIOS settings (set at the lowest possible, initially), to experiment.... |
|
![]() |
![]() |
![]() |
#319 |
Aug 2013
3·29 Posts |
![]()
Lapping was a really smart decision IMHO. Yes, it killed my warranty, but the IHSes were all rounded in the middle and temps were all over the place. It’s amazing, actually, how uneven CPUs and heat sinks are, even of the premium variety.
More than happy to Last fiddled with by simon389 on 2019-02-07 at 21:50 |
![]() |
![]() |