![]() |
[QUOTE=chalsall;425652]Perhaps I was wrong. :smile: I was speaking from the spike power load perspective, where SSDs don't tend to spike (much). Possibly it doesn't involve power.
But, to be perfectly honest, all the various different reports of Skylake instability (or, in other cases, no issues) really makes me wonder what is really going on. Regardless, it definitely informs me that I'm not going to deploy any such kit until the reported issue(s) are well understood. Edit: "(it started giving SMART errors)"... This is an excellent point! Is anyone seeing weird Skylake freezes seeing SMART errors in your logs?[/QUOTE] Not that I know of, so far. I recently put together a list of solutions found over at toms. Besides the usual noise (mainboard/heat sink seated incorrectly, faulty display cable...) two main categories appeared: a) Ram incompatibility (no explicit errors), usually with Crucial Ram (very often Crucial Ballistix Sport) b) bad CPU batch showing problems with c-states: either resolved by RMA or by disabling c-states (sometimes also disabling EIST or slightly overclocking the BCLK to 100.10 or something MHz; don't know whether this fits in the image) For what I know this clears up abput 90% of the 'variance', with the rest being individual problems, sometimes, but not always, due to self-assembled PCs. |
[QUOTE=chalsall;425652]Perhaps I was wrong. :smile: I was speaking from the spike power load perspective, where SSDs don't tend to spike (much). Possibly it doesn't involve power.
But, to be perfectly honest, all the various different reports of Skylake instability (or, in other cases, no issues) really makes me wonder what is really going on. Regardless, it definitely informs me that I'm not going to deploy any such kit until the reported issue(s) are well understood. Edit: "(it started giving SMART errors)"... This is an excellent point! Is anyone seeing weird Skylake freezes seeing SMART errors in your logs?[/QUOTE] Yep, I'm very happy that my 6700K system appears to be fully stable. Makes me feel privileged. :big grin: Regarding the SSD issue, it's certainly not the first thing I would suspect, but having ruled out most of the other usual suspects I would probably investigate it before replacing the CPU. To be honest, I have never experienced that type of crash with any SATA drives. The SSD in the Mac Book Air was a PCI-E (Samsung) drive. Maybe that had something to do with it or maybe it was all just really bad luck. If I was using an M.2 drive in my Skylake system and had these crashes, I would definitely try with a SATA drive instead. M.2 and PCI-E SSDs in general can probably be assumed to be less mature than SATA. ["Fun" fact regarding my MacBook Air issue: Apple refused any sort of goodwill regarding this issue. This happened last year and they wanted almost $1000 for a replacement 256GB SSD and another $150 for installing it. I obviously didn't fall for that. Eventually found a used drive on Ebay. I didn't expect any goodwill, but I didn't expect the drive to cost a grand. My respect for Apple from a customer service point of view is pretty much ruined. How can they possibly expect anyone to pay almost as much for a new drive as a new computer? Or do they expect their customers to just throw away their otherwise perfectly functioning computer and buy a new one from Apple? In this case they lost me as well as my company as customers.] |
[QUOTE=Brunnis;425711]
Regarding the SSD issue, it's certainly not the first thing I would suspect, but having ruled out most of the other usual suspects I would probably investigate it before replacing the CPU. To be honest, I have never experienced that type of crash with any SATA drives. The SSD in the Mac Book Air was a PCI-E (Samsung) drive. Maybe that had something to do with it or maybe it was all just really bad luck. If I was using an M.2 drive in my Skylake system and had these crashes, I would definitely try with a SATA drive instead. M.2 and PCI-E SSDs in general can probably be assumed to be less mature than SATA.[/QUOTE] Interesting. I have Samsung SM951 M.2 PCIe AHCI 256 GB SSD. My Skylake machine is running fine now. However, on testing max power saving settings I found out that I am able to produce a hang of this system drive by enabling ASPM for the PCH side of the DMI Link (the PCH is where the SSD is attached to, I suspect). Drives will stop working while I still can move the mouse and the clock is still ticking. If I click on stuff that would eventually effect drive access, the HDD LED will light up constantly and my system will lock up for good. Of course, no events in the log because they can't be written any more. But this really only happens if these power saving options are enabled in the Bios. I suspect this is because my drive supports an extreme power saving state called L1.2, while in the Bios for the DMI Link I can only select L0 or L1. Either this, or my Win10 installation is to blame. Because if I install Win10 clean on a Sata SSD, the thing is gone. [QUOTE=Brunnis;425711] ["Fun" fact regarding my MacBook Air issue: Apple refused any sort of goodwill regarding this issue. This happened last year and they wanted almost $1000 for a replacement 256GB SSD and another $150 for installing it. I obviously didn't fall for that. Eventually found a used drive on Ebay. I didn't expect any goodwill, but I didn't expect the drive to cost a grand. My respect for Apple from a customer service point of view is pretty much ruined. How can they possibly expect anyone to pay almost as much for a new drive as a new computer? Or do they expect their customers to just throw away their otherwise perfectly functioning computer and buy a new one from Apple? In this case they lost me as well as my company as customers.][/QUOTE] This is shocking, to say the least. |
[QUOTE=Brunnis;425711]Or do they expect their customers to just throw away their otherwise perfectly functioning computer and buy a new one from Apple?[/QUOTE]
Yes, they do. |
[url]http://arstechnica.com/gadgets/2016/02/intel-to-shut-down-renegade-skylake-overclocking-with-microcode-update/[/url]
|
[QUOTE=s1riker;425654]It seemed unlikely to me as well, but I haven't ruled it out. After I finish testing C6/C7/C8 disabled, if I still have the will, I'll try the drive next.[/QUOTE]
A bad drive can (and has) caused performance problems, due to some really horrible retry code, in the past. The gist of it was that when trying to read a bad section of disc (or flash, I guess) the drive would enter a "retry of death" where it would try to re-read that sector over and over again for a ridiculous amount of time. As far as I know this was a spinning platter problem, don't know if SSDs were ever affected, and it had to do with some drives that tried way too hard to retry which would generate interesting interrupt activity and could slow your system to a crawl (during which you could hear your drive reset repeatedly). I'm trying to remember if this was related at all to the old click-of-death, but whatever... it was annoying and I think a firmware update or something helped. As usual with things like this, my memory may be fuzzy on the edges, so don't believe everything (or anything) I say. Summary, let's say a bad SSD was causing it to remap areas from time to time. I don't know if SAS/SATA does the same horrible stuff PATA drives did, but it may cause a slight burp/hiccup in responsiveness if that were the case. Using the SSD tools for your drive that (hopefully) show the # of remaps or the SMART status in general may be good just to rule that out. |
[QUOTE=Xyzzy;425762][URL]http://arstechnica.com/gadgets/2016/02/intel-to-shut-down-renegade-skylake-overclocking-with-microcode-update/[/URL][/QUOTE]
Grrr...:rant: |
@pegnose, please correct me if I'm wrong .. but over at the Tom's thread, it seems we're coming to the conclusion that several of the early Skylake-S CPUs were manufactured with a defect wherein they would randomly hang with C6/C7 enabled, and that Intel's QA was not good enough to catch it (or they let it slip because they couldn't get good yields). I've currently got C6/C7 disabled and all it well so far. I just want some confirmation before I sent this darn thing back. What an ordeal.
|
updated spec posted <https://www-ssl.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-spec-update.html>
|
[QUOTE=s1riker;426648]@pegnose, please correct me if I'm wrong .. but over at the Tom's thread, it seems we're coming to the conclusion that several of the early Skylake-S CPUs were manufactured with a defect wherein they would randomly hang with C6/C7 enabled, and that Intel's QA was not good enough to catch it (or they let it slip because they couldn't get good yields). I've currently got C6/C7 disabled and all it well so far. I just want some confirmation before I sent this darn thing back. What an ordeal.[/QUOTE]
That is my impression. I dare not say whether it is actually a defect. I would call it "extensive quality variance" atm, to that extent that it freezes the system in certain extreme situations (like very low power during c-states or c-state transitions). I speak of "broken" over at toms, because this is easily understandable. I conclude this from data we collected and from the info that Solis3 got from the Intel rep. Currently we have no counter-evidence. I.e. everybody who returned their CPU benefited from that. If disabling c-states is the only thing that actaully helps you (how long have you tested? 2 weeks in your case should be minimum), i see no point in not doing so (RMA the CPU). Particularly as you are through with everything else. As usual, it is at your own risk. But without any fancy stuff in your chassis, like 3000+ MHz DDR4 e.g., your system should run out of the box (at default settings). |
It took me a few days to go over this entire thread, and I understand that I am so late to the party, but boy am I happy to have found this.
I work on prepping media servers for projection & video. I recently started using the i7-6700 with the ASUS z170-ws. I made about 20 systems before I got my first freeze. I usually stress test new systems for about two full weeks before I let them go out in the wild. So here's the setup: intell skylake i7-6700 ASUS Z170-WS (BIOS ver. 0503) 32GB RAM Crucial Avago-LSI 9260 Raid Controler 4x500GB Samsung 850 (Raid 10) Seasonic 600W PSU EVGA GTX970 Windows 8.1 Pro So , the first time the machine froze, it drove me up the wall. I swapped everything in this setup out , at least once (in the case of the RAID card I also tried two different manufacturers, same as the graphic card) , I changed types of RAM, different configurations, one dimm at a time, changed the CPU out , did everything humanly possible, but it kept freezing about 40 hours into rendering video. (so you understand it was a looong process). I have isolated the issue happening in windows regardless of what software was running at the time it froze. Eventually I gave up on the motherboard and CPU altogether, moved on to a Xeon based system and put it behind me. And then I found this thread...So I put a system with the bad components back together and run Prime95 . The first time it run I got a bunch of the rounding errors. I am not sure if i had run the test as it's supposed to be run . I think I tested the 14942209 exponent basked on Henk_NLs instruction on the intel forums. However the second time I tried to run the test , it seems that only the 1st worker is running and the other 3 are waiting for work. Is that normal ? or should i be seeing all 4 working? I didn't see the errors happening again, so I am quite confused as to how repeatable this is. I am not at all familiar with the program, and I wish instructions on how to reproduce this were a little clearer: For example: If I do /Advanced/Test , exponent 14242209 per the instructions on the intel forum, only 1 worker is active. If I use /options/Torture Test and set the parameters per instructions on this forum (786K, 8 threads, 120minutes) , the workers look like they are about to start but eventually all stay in "No work to do at the present time. Waiting." status. I know that the issue is the Skylake freezing issue, and I know that ASUS has not yet released anything to address it. I just want to be able to reproduce it so I can pressure asus for a solution. Any help with recreating the error is greatly appreciated Thanks |
| All times are UTC. The time now is 23:23. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.