mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   768k Skylake Problem/Bug (https://www.mersenneforum.org/showthread.php?t=20714)

s1riker 2016-01-16 18:39

[QUOTE=chalsall;422632]So, I'm now under attack for questioning you...

My first question is, are you overclocking any of your systems and/or sub-systems?

My second question is, are you being methodical in your testing?[/QUOTE]

Hey Chalsall,

I've gathered a sense of what your personality is like from reading this thread, so I won't take any attacks personally :) Since you asked, here's a dump of everything I know/tried:

1. Hang happens whether overclocking core CPU frequency or not, strangely, it seems to happen less when it is overclocked to 4.5GHz.
2. Hang happens whether I'm running RAM at default 2133MHz or my RAM's XMP settings which are 3000MHz (15-15-15-35).
3. Hang never happens while I'm using the PC. I can game on it for hours, run Prime95 torture test for over 12h.
4. I've run HCI memtest for 24 hours to ensure there are no RAM errors. All clean.
5. Hang always happens after PC is left idle (sleep is turned off as I like to have my PC available for RDP at all times)
6. When hung, PC must be hard shut down, reset button does not work
7. I've already tried a different set of RAM sticks (although same manufacturer and same rated speed), different PSU, onboard video (instead of my 970GTX) and the motherboard was RMA'ed. Nothing made any different, still hangs.
8. I've tried all available BIOS updates for my motherboard, including beta ones. The newest beta does seem to reduce the frequency of the hang, but not completely.

That's all I recall for now.

chalsall 2016-01-16 19:14

[QUOTE=s1riker;422707]I've gathered a sense of what your personality is like from reading this thread, so I won't take any attacks personally :)[/QUOTE]

Great. There's a reason I'm called "Sheldon" by my friends... :smile:

[QUOTE=s1riker;422707]5. [B]Hang always happens after PC is left idle (sleep is turned off as I like to have my PC available for RDP at all times)[/B]
8. I've tried all available BIOS updates for my motherboard, including beta ones. The newest beta does seem to reduce the frequency of the hang, but not completely.[/QUOTE]

Interesting...

Are you able to replicate this on any other machines?

Madpoo 2016-01-16 19:30

[QUOTE=pegnose;422682]...
But I still experience a freeze during Prime95. Yes, testing is a PITA. Currently I try setting 16-16-16-40 (instead of 39), which should be compatible in my case according to the manual...[/QUOTE]

It can be hard to say just why Prime95 would lock a machine, but bear in mind that Prime95 is doing what it's supposed to... stressing the system so that if it has a problem, it will manifest itself.

Sounds like you're doing all the right things by focusing on different things that could be the problem until you finally (and hopefully) figure out what causes it.

One thing I'm not sure if you've tried is to underclock your system and see if it's stable. That might just help you verify if there are thermal issues, or maybe power supply related things. Same with the memory.

Madpoo 2016-01-16 19:39

[QUOTE=s1riker;422707]Hey Chalsall,

I've gathered a sense of what your personality is like from reading this thread, so I won't take any attacks personally :) [/QUOTE]

He's really a lovable old grump, deep down.

Regarding your issue, I know it can be a pain in the butt to nail down what's causing hanging issues, but with the servers I work with, my basic approach is to remove *everything* except the bare essentials needed to get the system booted. That might mean taking out all but a single stick of RAM, maybe just one CPU on a dual socket motherboard, taking out any add-on cards, etc. etc.

Then set the BIOS settings to stock/default. Give it a spin and see if it works any better. If it still fails I could swap the one mem module with another and try again, in case I happened to keep the one bad stick. Changing the BIOS settings to some low power/efficient mode might also be useful to see the effect.

If it works great, then yay, I can start adding things back in one at a time until I find the culprit.

Worst case scenario is when multiple things are bad (2 mem modules, or some funky interaction with different PCI cards, for example), but that's not common.

Oh, I guess another worst case is a bad power supply, but the servers I use have dual supplies and have never caused an issue, but on desktops it's more likely.

pegnose 2016-01-16 20:04

[QUOTE=pegnose;422682]Thank you so much, this is exactly what I am dealing with. I got my system stable at least for memtest86 by
- disabling the two unused dram slots
- enabling MCH Full Check (ASUS)
- setting dram voltage tolerance to 110%

But I still experience a freeze during Prime95. Yes, testing is a PITA. Currently I try setting 16-16-16-40 (instead of 39), which should be compatible in my case according to the manual.[/QUOTE]


Looks good so far: Prime95 ran for over 8h for the first time! I even managed to reproduce the 768k bug. :)

pegnose 2016-01-16 20:10

[QUOTE=s1riker;422707]Hey Chalsall,

I've gathered a sense of what your personality is like from reading this thread, so I won't take any attacks personally :) Since you asked, here's a dump of everything I know/tried:

1. Hang happens whether overclocking core CPU frequency or not, strangely, it seems to happen less when it is overclocked to 4.5GHz.
2. Hang happens whether I'm running RAM at default 2133MHz or my RAM's XMP settings which are 3000MHz (15-15-15-35).
3. Hang never happens while I'm using the PC. I can game on it for hours, run Prime95 torture test for over 12h.
4. I've run HCI memtest for 24 hours to ensure there are no RAM errors. All clean.
5. Hang always happens after PC is left idle (sleep is turned off as I like to have my PC available for RDP at all times)
6. When hung, PC must be hard shut down, reset button does not work
7. I've already tried a different set of RAM sticks (although same manufacturer and same rated speed), different PSU, onboard video (instead of my 970GTX) and the motherboard was RMA'ed. Nothing made any different, still hangs.
8. I've tried all available BIOS updates for my motherboard, including beta ones. The newest beta does seem to reduce the frequency of the hang, but not completely.

That's all I recall for now.[/QUOTE]


Have you read what I wrote above? Probably you have, but I just post it again:

From what I have heard, hard lock during idle could be a third problem. I had this when I had ASPM enabled for the link between SA and the PCH. Others suggested the PSU being incompatible with Haswell C-States (6/7). You could try disabling all power saving options altogether.

Did you do that? Also in Windows?

pegnose 2016-01-16 20:14

[QUOTE=Madpoo;422713]It can be hard to say just why Prime95 would lock a machine, but bear in mind that Prime95 is doing what it's supposed to... stressing the system so that if it has a problem, it will manifest itself.

Sounds like you're doing all the right things by focusing on different things that could be the problem until you finally (and hopefully) figure out what causes it.

One thing I'm not sure if you've tried is to underclock your system and see if it's stable. That might just help you verify if there are thermal issues, or maybe power supply related things. Same with the memory.[/QUOTE]


Thanks for reassuring me, Madpoo! I tried the memory @2133 MHz (JEDEC) and the CPU runs stock values (didn't go lower, so far; but it is well below 50 °C during 1344k stress testing). However, now with 16-16-16-40 timing it looks pretty good. I think I'll know by tomorrow evening.

chalsall 2016-01-16 20:23

[QUOTE=pegnose;422718]Thanks for reassuring me, Madpoo! I tried the memory @2133 MHz (JEDEC) and the CPU runs stock values (didn't go lower, so far; but it is well below 50 °C during 1344k stress testing). However, now with 16-16-16-40 timing it looks pretty good. I think I'll know by tomorrow evening.[/QUOTE]

On how many machines are these errors manifesting?

Have you tried different PSUs?

pegnose 2016-01-16 21:19

[QUOTE=chalsall;422719]On how many machines are these errors manifesting?

Have you tried different PSUs?[/QUOTE]

I have only the one machine. And no, not yet. But as I said: with 16-16-16-40 no hard lock so far. Plus: neither idle state nor massive load (300 W GTX 980TI OC during Furmark) were able to provoke the issue (at least not within 1h or so).

I didn't believe it, but maybe a) ASUS and b) Crucial were both right: a) my modules with 16-16-16-39 timing actually are not compatible (with my M. VIII Hero), but b) they are the very same hardware as the ones with 16-16-16-40, only with different timings programmed.

Mark Rose 2016-01-16 21:35

You should be able to disable sleep states in the BIOS. I would play with that setting.

Madpoo 2016-01-16 23:25

Nice boost in software downloads thanks to this issue
 
By the way for everyone following along...

The [url]www.mersenne.org[/url] server has seen a decent little boost in traffic to the download page, mostly from links on PC World.

I'm sure many of those are people interested in testing their own Skylake to see if it exhibits the issue, but maybe some will stick around and test a few exponents.


All times are UTC. The time now is 23:23.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.