![]() |
|
|
#430 | |
|
Jan 2016
31 Posts |
Quote:
Edit: The interesting thing is that it happened for you during high I/O, which is inline with what Solis3 was saying the Tom's thread. Maybe that's something we should investigate. Last fiddled with by s1riker on 2016-01-20 at 15:12 |
|
|
|
|
|
|
#431 | |
|
Serpentine Vermin Jar
Jul 2014
3,313 Posts |
Quote:
Just guessing... software RAID also uses a lot of RAM so if there was anything bad with it, you might have bumped into the bad section of memory. I had a server a few years back that ran great most of the time. I could run Prime95 stress tests for days on end, no worries. I ran memtest on it for about a week solid, no problems. But when this thing was running normally and happened to take over as the primary SQL node in a cluster, once SQL's mem usage crept past a certain point and hit some bad section of memory, the thing would blue screen and an unrecoverable memory error was logged (it was a Proliant, so it actually tells you which DIMM it was). I mean, I threw everything I could at this to try and replicate the issue with artificial memory tests, using every type of bit patterns imaginable. It "only" had 36 GB so memtest could run through the whole thing in a fairly short time, and yeah, I left it running from a memtest bootable USB for around a week and it never did throw an error. To work around it I setup the Proliant in "spare memory" mode which effectively took the bad module out of service. Then a couple months later I finally got onsite with a new module and swapped it out, the problem disappeared. But it goes to show that even the best artificial tests out there are no match for real workloads. |
|
|
|
|
|
|
#432 | |
|
Jan 2016
111112 Posts |
Quote:
|
|
|
|
|
|
|
#433 | |
|
Jan 2016
34 Posts |
Quote:
However, I highly doubt that there is something broken. Because I can kick my system as hard as I can. It only freezes after many hours, somehow ruling out a power or heat problem. It is hard to understand. Freezes got less and less frequent, this is a gradual thing. If at all, I will swap memory to a 100% compatible kit. After that I can - again - open a support case with asus. ALSO, my system was stable the first two months! Or it is mostly with particular load scenarios (games in my case: Fallout 4 and Deus Ex Human Revolution; but not Dying Light, so far). Last fiddled with by pegnose on 2016-01-20 at 19:17 |
|
|
|
|
|
|
#434 | |
|
Jan 2016
34 Posts |
Quote:
Ram? a) I have 16 GB, b) memtest86 ran for 10h straight. Ok, for stresstesting you should use HCI's version. I will do that soon. But I am not so sure it is memory any more for me. Ok, so it was a faulty module for you and you couldn't detect it with memtest? That is actually weird. But I would be happy about this. Probably I will know soon, because I plan on switching the brand (more people have issues with Crucial on recent ASUS boards). Only the modules from ASUS' HCL for that board in part are hard to get in Germany, and the other part has some addendum like "(ver. 5.26)". How am I supposed to get exactly THAT version?! Nevertheless, thank you very much for sharing your experience with me! Last fiddled with by pegnose on 2016-01-20 at 19:21 |
|
|
|
|
|
|
#435 | |
|
Jan 2016
34 Posts |
Quote:
My last freeze (and the first AFTER I thought I was finally good) now was with high load on CPU and HDD (ASM1061 in particular; which is PCH in platform terms). Memory is SA, the other side of the DMI link, if I am correct. Interestingly, I had freezes when I activated native power management for the DMI link (the PCH side of it) in the bios. Maybe that is the relevant component. - idle state (power management) for some - video streaming (to the hdd) for others - memory for most of us Does that make sense? |
|
|
|
|
|
|
#436 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×67×73 Posts |
Certainly. But please keep in mind that modern computers are /incredibly/ complicated, with many different interacting components (often from different suppliers) which each must work perfectly for the system as a whole to work correctly.
Within the software industry there's the term "once a month" bug. This means that something going demonstrably wrong, but it is not easy to deterministically reproduce. This is where the expression "have you tried turning it off and on again" comes from. To try to interject something useful to this post, have you looked at your PSU's power rails loading? It is possible your PSU is fine, but by chance you have one or more of your rails /just/ at the edge of its rating because of the power cabling configuration. |
|
|
|
|
|
#437 | |
|
Jan 2016
34 Posts |
Quote:
Unfortunately, is is more "once every 4 days" for many users. So you wouldn't think about just ignoring it. And it seems to affect really a lot of people out there. Yes, I thought about the power rails. But... - my system consumes up to 450 W, on a 700 W PSU of a solid brand (BeQuiet) - with CPU and mobo connectors there is no choice - one SSD is M.2 on the board, the other three consume up to... 20 W? - then there is my GTX980Ti which takes up to 300 W, but (now) it is connected to both PCIe rails (before, when I was still naive, hehe, I had 2 (!) power outages in 5 months - real black-outs (PC only, I mean: instant-off) - which is nothing compared to the Skylake hard-lock); I even had contact with the BeQuiet support on that matter: with both rails I should be more than fine now So... unless my PSU has a weird defect, which is shared by many other users with many other PSUs... it is not the culprit, I guess. Last fiddled with by pegnose on 2016-01-20 at 20:34 |
|
|
|
|
|
|
#438 | |||
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2·67·73 Posts |
Quote:
Quote:
Quote:
I would suggest (if you and/or others can) to first remove any kit you can (for example, HDs, GPUs, RAM), and rerun the tests you've used to produce the observed crashing (even if not deterministically -- currently you're doing statistical testing). Swap out MBs and PSUs. Make sure your mains power is good. This is not to say this is not a CPU issue, but you don't /know/ it is yet. I hope that makes sense and helps. |
|||
|
|
|
|
|
#439 | ||
|
"Kieren"
Jul 2011
In My Own Galaxy!
2·3·1,693 Posts |
chalsall:
Quote:
Quote:
Last fiddled with by kladner on 2016-01-20 at 21:53 |
||
|
|
|
|
|
#440 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×67×73 Posts |
Quote:
Please also note this: "My last freeze (and the first AFTER I thought I was finally good) now was with high load on CPU and HDD". Perhaps pegnose will speak to our questions. |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Skylake vs Kabylake | ET_ | Hardware | 17 | 2017-05-24 16:19 |
| Skylake and RAM scaling | mackerel | Hardware | 34 | 2016-03-03 19:14 |
| So does skylake-nonXeon actually get us anything? | fivemack | Hardware | 36 | 2015-09-08 01:42 |
| Skylake processor | tha | Hardware | 7 | 2015-03-05 23:49 |
| Skylake AVX-512 | clarke | Software | 15 | 2015-03-04 21:48 |