![]() |
As an pretty active pretester 50+ 6700k went through my socket so far. I was able to test this issue with a bunch of cpus and different ram kits and the problem is always the same.
Prime 27.9 768k will always end up with worker errors, sometimes it takes 3 minutes, sometimes up to 600 minutes (with the exactly same settings!). Reducing the clock speed and/or adding more vcore (3,5 GHz @ 1,3V for example) doesn't help a thing. Things I noticed: - All other K lengths in Prime 27.9 work just fine - Disabling Hyperthreading will make the problems with 768k disappear - Using 28.7 with FMA3 works just fine for hours - Using 28.7 with CpuSupportsFMA3=0 and default FFT size of 3 works just fine as well - Using 28.7 with CpuSupportsFMA3=0 but FFT size of 15 gives the same errors as 27.9 does (same settings as 27.9 default settings). However, some people claim to have builds that run 27.9 768k without any problems for hours or even days. It's a really weird problem that doesn't make any sense to me. Either there is a problem with your algorithm/calculations, but that wouldn't explain why some ppl have skylake builds that work just fine, or the 768k is stressing the skylake architecture in a way 75% of the CPUs (rough estimate) can't handle and causes worker errors. Hope this is enough to rise your interest to investigate this further. As a skylake owner the situation is pretty unsatisfactory as you can imagine, even though there are no problems in daily usage and all other stress tests (XTU, LinX and so on) work just fine. Kind regards, Ralf |
[QUOTE=ralleh;417935]Hope this is enough to rise your interest to investigate this further. As a skylake owner the situation is pretty unsatisfactory as you can imagine, even though there are no problems in daily usage and all other stress tests (XTU, LinX and so on) work just fine.[/QUOTE]
Thank you very much for bringing this report forward. We look forward to further reports. To put on the table, we're not yet sure where the errors are from. Is it the software, or the hardware? Empirical can be tricky... Large sample sets are important. Please continue to submit the emprirical. |
I am one of those too.
I have an i7-6700K and it happens in the first 30 to 45 mins to me. No matter if I use stock clocks and voltages, downclock the CPU, give far more voltage than needed, use different memory, different BIOS versions, etc, etc, etc. We have tried everything that came to mind. The only things that seem to work is what ralleh described already, except disabling hyperthreading in my case doesnt seem to work, but I will try again, because I tested so much in the last weeks that I cant remember for sure anymore if I indeed tested it with HT off. |
[QUOTE=AGM;417942]I am one of those too.[/QUOTE]
Thank you for entering the true dragon's den. :smile: [QUOTE=AGM;417942]I have an i7-6700K and it happens in the first 30 to 45 mins to me. No matter if I use stock clocks and voltages, downclock the CPU, give far more voltage than needed, use different memory, different BIOS versions, etc, etc, etc. We have tried everything that came to mind.[/QUOTE] OK. Then we need to work the problem. [QUOTE=AGM;417942]The only things that seem to work is what ralleh described already, except disabling hyperthreading in my case doesnt seem to work, but I will try again, because I tested so much in the last weeks that I cant remember for sure anymore if I indeed tested it with HT off.[/QUOTE] This comes across as a little hysterical. Just a suggestion... A mentor of mine advised I use a paper and pen based log. I found this to be valuable advise.... |
Oh, I would, if it was more complex. I can remember what I did in that case, but I tested overclocking settings too and that included turning HT off. I just couldnt remember if I ran 768k with it off.
Anyway, I am running it right now with HT off and it seems to indeed work. No worker stopped after 1:30 hours yet. Ill update, if it crashes after all. |
[QUOTE=AGM;417952]No worker stopped after 1:30 hours yet. Ill update, if it crashes after all.[/QUOTE]
OK. Cool. So this seems like everything is OK until further notice? |
From descriptions thusfar, my initial guess is that it is a Skylake defect. Non-reproducible problems can be software bugs, such as inadvertently using an uninitialized variable. However, such a bug would affect Haswell, Sandy Bridge, and Ivy Bridge chips. With the problem happening on several motherboards, several RAM configurations, and several CPU speed/voltage combinations -- the only variable left is the chip itself.
Intel has a pretty robust QA process, so I may well be wrong. I just don't see what else it could be right now. Does anyone know if Intel is aware of this issue? If we can reach the right people, they will take prime95 issues seriously. We need to find that person and provide them with as much accurate information as possible. For example, apparently some Skylakes have no problems. Can we narrow the problem down to a subset of Skylake steppings? BTW, I will be of little help in debugging/narrowing the problem unless we can come up with a completely reproducible case. |
[QUOTE=Prime95;417959]From descriptions thusfar, my initial guess is that it is a Skylake defect....[/QUOTE]
Maybe, but boy, from a cursory look it sure seems heat or voltage related. I know upping the voltage or running at stock speeds didn't seem to work for everyone, but the other clues there are disabling HT which effectively shuts down significant parts of the die. To get a better idea whether heat/power are somehow related, I'd recommend simply *under* clocking. Or if the BIOS in question has any support for locking the CPU to lower p-states, that's would work too. Disabling turbo boost and running the test could help as well, although I don't think turbo boost offers any kick in speed with all cores enabled on the 6700K. Looks like it's always 4 GHz with dual/quad cores enabled. But other Skylake models, it'd be worth trying. In fact, on that note, again if the BIOS supports it, disable all but one core (in the BIOS itself). All of those suggestions are intended to get the CPU running cooler and with lower power demands. If it still throws errors even when the CPU is locked at p states 1 and above (as opposed to p0 = full throttle), or when underclocked or whatever else, then at least we can start to rule out thermal or underpower issues. Since it seems to affect AVX and not AVX2/FMA3 that's also a good clue. But that could just be some design issue related to the thermal/power envelope too so it's not really enough by itself to go on by itself. But (hopefully) taking the thermal/power out of the possibilities matrix you're kind of left with something inside AVX itself. Something that works fine in previous generations... that would be curious. |
In case it is a hardware bug, you should start collecting all data you possibly can about each and every Skylake chip you test. Things like serial number, date of purchase, manufacturing location, etc
|
And if you identified exactly the hardware bug and you can reproduce it, do not report it for a while, until intel gets i7-6990X on the market, so you will be able to ask for a replacement when they recall every 6700K back... :razz:
Disclaimer: this post is only a tentative of a joke... :wink: |
[QUOTE=Madpoo;417979]To get a better idea whether heat/power are somehow related, I'd recommend simply *under* clocking. Or if the BIOS in question has any support for locking the CPU to lower p-states, that's would work too.[/QUOTE]
With all due respect, I'm not a casual user, I pretest 200-300 CPUs of each generation for overclocking needs. As I mentioned I did perform tests with underclocked and/or overvolted CPU. The average core temps were in in the mid 50 degrees, definitely no heat issue there ;) [QUOTE=Prime95;417959]From descriptions thusfar, my initial guess is that it is a Skylake defect.[/QUOTE] This would be my guess, too! That's essentially why we contacted you... to rule out eventual software problems before we make this issue more public and try to make Intel aware of it. [QUOTE=Prime95;417959]Does anyone know if Intel is aware of this issue?[/QUOTE] I don't think they are (yet). But I honestly think they have other severe problems with the Skylake architecture, as the promised a new revision with SXG (Software Guard Extensions) which is still not available to the market, even though it was promised for late November (and I think they planned to include it in the originally released CPUs as well but it didn't work for some reasons). Source: [url]http://qdms.intel.com/dm/i.aspx/5A160770-FC47-47A0-BF8A-062540456F0A/PCN114074-00.pdf[/url] [QUOTE=Prime95;417959]If we can reach the right people, they will take prime95 issues seriously.[/QUOTE] That would be an awesome thing to do. Unfortunately most channels will just give the usual answers and expect the user and/or the UEFI settings to be the problem. Maybe you know the right employees at Intel to contact about this? [QUOTE=Prime95;417959]Can we narrow the problem down to a subset of Skylake steppings?[/QUOTE] There is only one stepping so far, but I did encounter the problem on all of my CPUs so far. Batches varied between L519 to L537 (L means produced in Ma[B]L[/B]ay, in the Year 201[B]5[/B] and in the weeks [B]19[/B] to [B]37[/B]). |
| All times are UTC. The time now is 23:23. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.