mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-01-16, 18:39   #397
s1riker
 
s1riker's Avatar
 
Jan 2016

31 Posts
Default

Quote:
Originally Posted by chalsall View Post
So, I'm now under attack for questioning you...

My first question is, are you overclocking any of your systems and/or sub-systems?

My second question is, are you being methodical in your testing?
Hey Chalsall,

I've gathered a sense of what your personality is like from reading this thread, so I won't take any attacks personally :) Since you asked, here's a dump of everything I know/tried:

1. Hang happens whether overclocking core CPU frequency or not, strangely, it seems to happen less when it is overclocked to 4.5GHz.
2. Hang happens whether I'm running RAM at default 2133MHz or my RAM's XMP settings which are 3000MHz (15-15-15-35).
3. Hang never happens while I'm using the PC. I can game on it for hours, run Prime95 torture test for over 12h.
4. I've run HCI memtest for 24 hours to ensure there are no RAM errors. All clean.
5. Hang always happens after PC is left idle (sleep is turned off as I like to have my PC available for RDP at all times)
6. When hung, PC must be hard shut down, reset button does not work
7. I've already tried a different set of RAM sticks (although same manufacturer and same rated speed), different PSU, onboard video (instead of my 970GTX) and the motherboard was RMA'ed. Nothing made any different, still hangs.
8. I've tried all available BIOS updates for my motherboard, including beta ones. The newest beta does seem to reduce the frequency of the hang, but not completely.

That's all I recall for now.
s1riker is offline   Reply With Quote
Old 2016-01-16, 19:14   #398
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·67·73 Posts
Default

Quote:
Originally Posted by s1riker View Post
I've gathered a sense of what your personality is like from reading this thread, so I won't take any attacks personally :)
Great. There's a reason I'm called "Sheldon" by my friends...

Quote:
Originally Posted by s1riker View Post
5. Hang always happens after PC is left idle (sleep is turned off as I like to have my PC available for RDP at all times)
8. I've tried all available BIOS updates for my motherboard, including beta ones. The newest beta does seem to reduce the frequency of the hang, but not completely.
Interesting...

Are you able to replicate this on any other machines?
chalsall is offline   Reply With Quote
Old 2016-01-16, 19:30   #399
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

331310 Posts
Default

Quote:
Originally Posted by pegnose View Post
...
But I still experience a freeze during Prime95. Yes, testing is a PITA. Currently I try setting 16-16-16-40 (instead of 39), which should be compatible in my case according to the manual...
It can be hard to say just why Prime95 would lock a machine, but bear in mind that Prime95 is doing what it's supposed to... stressing the system so that if it has a problem, it will manifest itself.

Sounds like you're doing all the right things by focusing on different things that could be the problem until you finally (and hopefully) figure out what causes it.

One thing I'm not sure if you've tried is to underclock your system and see if it's stable. That might just help you verify if there are thermal issues, or maybe power supply related things. Same with the memory.
Madpoo is offline   Reply With Quote
Old 2016-01-16, 19:39   #400
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by s1riker View Post
Hey Chalsall,

I've gathered a sense of what your personality is like from reading this thread, so I won't take any attacks personally :)
He's really a lovable old grump, deep down.

Regarding your issue, I know it can be a pain in the butt to nail down what's causing hanging issues, but with the servers I work with, my basic approach is to remove *everything* except the bare essentials needed to get the system booted. That might mean taking out all but a single stick of RAM, maybe just one CPU on a dual socket motherboard, taking out any add-on cards, etc. etc.

Then set the BIOS settings to stock/default. Give it a spin and see if it works any better. If it still fails I could swap the one mem module with another and try again, in case I happened to keep the one bad stick. Changing the BIOS settings to some low power/efficient mode might also be useful to see the effect.

If it works great, then yay, I can start adding things back in one at a time until I find the culprit.

Worst case scenario is when multiple things are bad (2 mem modules, or some funky interaction with different PCI cards, for example), but that's not common.

Oh, I guess another worst case is a bad power supply, but the servers I use have dual supplies and have never caused an issue, but on desktops it's more likely.
Madpoo is offline   Reply With Quote
Old 2016-01-16, 20:04   #401
pegnose
 
pegnose's Avatar
 
Jan 2016

34 Posts
Default

Quote:
Originally Posted by pegnose View Post
Thank you so much, this is exactly what I am dealing with. I got my system stable at least for memtest86 by
- disabling the two unused dram slots
- enabling MCH Full Check (ASUS)
- setting dram voltage tolerance to 110%

But I still experience a freeze during Prime95. Yes, testing is a PITA. Currently I try setting 16-16-16-40 (instead of 39), which should be compatible in my case according to the manual.

Looks good so far: Prime95 ran for over 8h for the first time! I even managed to reproduce the 768k bug. :)
pegnose is offline   Reply With Quote
Old 2016-01-16, 20:10   #402
pegnose
 
pegnose's Avatar
 
Jan 2016

34 Posts
Default

Quote:
Originally Posted by s1riker View Post
Hey Chalsall,

I've gathered a sense of what your personality is like from reading this thread, so I won't take any attacks personally :) Since you asked, here's a dump of everything I know/tried:

1. Hang happens whether overclocking core CPU frequency or not, strangely, it seems to happen less when it is overclocked to 4.5GHz.
2. Hang happens whether I'm running RAM at default 2133MHz or my RAM's XMP settings which are 3000MHz (15-15-15-35).
3. Hang never happens while I'm using the PC. I can game on it for hours, run Prime95 torture test for over 12h.
4. I've run HCI memtest for 24 hours to ensure there are no RAM errors. All clean.
5. Hang always happens after PC is left idle (sleep is turned off as I like to have my PC available for RDP at all times)
6. When hung, PC must be hard shut down, reset button does not work
7. I've already tried a different set of RAM sticks (although same manufacturer and same rated speed), different PSU, onboard video (instead of my 970GTX) and the motherboard was RMA'ed. Nothing made any different, still hangs.
8. I've tried all available BIOS updates for my motherboard, including beta ones. The newest beta does seem to reduce the frequency of the hang, but not completely.

That's all I recall for now.

Have you read what I wrote above? Probably you have, but I just post it again:

From what I have heard, hard lock during idle could be a third problem. I had this when I had ASPM enabled for the link between SA and the PCH. Others suggested the PSU being incompatible with Haswell C-States (6/7). You could try disabling all power saving options altogether.

Did you do that? Also in Windows?
pegnose is offline   Reply With Quote
Old 2016-01-16, 20:14   #403
pegnose
 
pegnose's Avatar
 
Jan 2016

34 Posts
Default

Quote:
Originally Posted by Madpoo View Post
It can be hard to say just why Prime95 would lock a machine, but bear in mind that Prime95 is doing what it's supposed to... stressing the system so that if it has a problem, it will manifest itself.

Sounds like you're doing all the right things by focusing on different things that could be the problem until you finally (and hopefully) figure out what causes it.

One thing I'm not sure if you've tried is to underclock your system and see if it's stable. That might just help you verify if there are thermal issues, or maybe power supply related things. Same with the memory.

Thanks for reassuring me, Madpoo! I tried the memory @2133 MHz (JEDEC) and the CPU runs stock values (didn't go lower, so far; but it is well below 50 °C during 1344k stress testing). However, now with 16-16-16-40 timing it looks pretty good. I think I'll know by tomorrow evening.

Last fiddled with by pegnose on 2016-01-16 at 20:17
pegnose is offline   Reply With Quote
Old 2016-01-16, 20:23   #404
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×67×73 Posts
Default

Quote:
Originally Posted by pegnose View Post
Thanks for reassuring me, Madpoo! I tried the memory @2133 MHz (JEDEC) and the CPU runs stock values (didn't go lower, so far; but it is well below 50 °C during 1344k stress testing). However, now with 16-16-16-40 timing it looks pretty good. I think I'll know by tomorrow evening.
On how many machines are these errors manifesting?

Have you tried different PSUs?
chalsall is offline   Reply With Quote
Old 2016-01-16, 21:19   #405
pegnose
 
pegnose's Avatar
 
Jan 2016

34 Posts
Default

Quote:
Originally Posted by chalsall View Post
On how many machines are these errors manifesting?

Have you tried different PSUs?
I have only the one machine. And no, not yet. But as I said: with 16-16-16-40 no hard lock so far. Plus: neither idle state nor massive load (300 W GTX 980TI OC during Furmark) were able to provoke the issue (at least not within 1h or so).

I didn't believe it, but maybe a) ASUS and b) Crucial were both right: a) my modules with 16-16-16-39 timing actually are not compatible (with my M. VIII Hero), but b) they are the very same hardware as the ones with 16-16-16-40, only with different timings programmed.

Last fiddled with by pegnose on 2016-01-16 at 21:23
pegnose is offline   Reply With Quote
Old 2016-01-16, 21:35   #406
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

55628 Posts
Default

You should be able to disable sleep states in the BIOS. I would play with that setting.
Mark Rose is offline   Reply With Quote
Old 2016-01-16, 23:25   #407
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default Nice boost in software downloads thanks to this issue

By the way for everyone following along...

The www.mersenne.org server has seen a decent little boost in traffic to the download page, mostly from links on PC World.

I'm sure many of those are people interested in testing their own Skylake to see if it exhibits the issue, but maybe some will stick around and test a few exponents.
Madpoo is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Skylake vs Kabylake ET_ Hardware 17 2017-05-24 16:19
Skylake and RAM scaling mackerel Hardware 34 2016-03-03 19:14
So does skylake-nonXeon actually get us anything? fivemack Hardware 36 2015-09-08 01:42
Skylake processor tha Hardware 7 2015-03-05 23:49
Skylake AVX-512 clarke Software 15 2015-03-04 21:48

All times are UTC. The time now is 04:33.


Fri Aug 6 04:33:50 UTC 2021 up 13 days, 23:02, 1 user, load averages: 2.43, 3.27, 4.37

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.