mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Zeroed results on Skylake (https://www.mersenneforum.org/showthread.php?t=21485)

XeniaRick 2016-08-07 23:42

[QUOTE=BinaryKhaos;439523]
That on the other hand, wasn't too good after all. I guess the CPU throttled down then to avoid (further) damage. But running it for such an extended period of time at that heat level is not something that was too good.
[/QUOTE]
Yep, I knew when I started I was overclocking. I had a false sense of security due to the manufacturer providing the XMP profile and articles stating the CPU could support much higher rates. I thought I was playing it safe at 3200. I can't recall reading any warnings that the CPU temps would go up so high. Now I know.

[QUOTE=BinaryKhaos;439523]
Your temperatures are generally still very (!) high. No matter if I use AVX or FMA3 with Prime95 v29.8, the maximum core temperature on each core never crosses the 80C mark. It is usually below 70C throughout the test, with very few short spikes that the cooling system (which ramps up then) properly gets under control. So the CPU temperature itself is also below 64C.
[/QUOTE]
I've read that Skylakes are all over the place on temps and Intel doesn't publish expected values.

[QUOTE=BinaryKhaos;439523]
Have you checked that you haven't bent any socket pins?
[/QUOTE]
Yep, got one. Trying to RMA the board.

[QUOTE=BinaryKhaos;439523]
MemTest86 v7 (I have the Pro, but that doesn't matter), should not restart if you use parallel testing.[/QUOTE]
There's a known bug in memtest on the Gigabyte gaming 7 under UEFI.

[QUOTE=BinaryKhaos;439523]
Regarding lapping the CPU, that I personally wouldn't have done though.
[/QUOTE]
I didn't. I flattened the cooler.

Hope to have the system back up in a couple of weeks tops. No sense in doing anything more until the new MB is in.

Regards,
Rick

chalsall 2016-08-07 23:44

[QUOTE=XeniaRick;439588]Holding off on the CPU until I try a new MB.[/QUOTE]

Excellent move.

The architect of record and I came to terms on fixing the water leak...

There is a very small wall which I might be able to get though. But it's only about 20cm wide, and we've agreed that emergency services, and much lubricate, will be on standby.

This is actually what is required for our having sex....

XeniaRick 2016-08-08 00:13

[QUOTE=chalsall;439592]
The architect of record and I came to terms on fixing the water leak...
[/QUOTE]
Good deal. I was just about to ask..

BinaryKhaos 2016-08-12 17:48

Hey Rick...

[QUOTE=XeniaRick;439577]Having to replace the CPU is my worst fear, but makes the most sense all things considered. I do have a questionable pin on the motherboard's CPU socket as well so it could be both need replaced. Too many variables to debug without a duplicate set of parts.[/QUOTE]

Sorry to hear that. :( But if you finally find the root of all problems, you can at least move on.

A bent socket pin is something you are not alone with since I have read it quite a few times all around the net. IMHO, with a quality board and proper installation of the CPU and CPU cooler that should not happen.

Make sure you get yourself a good CPU cooler and be careful when installing the CPU. Also don't over-tighten the mounting screws for the CPU cooler. There is also the problem that a lot of manufacturers don't stay within the pressure specifications set by Intel and thus put more pressure on the CPU and socket than they are supposed to.

My suggestion, if you don't mind, I have only good things to say about Noctua. They build high quality products and their [URL="http://noctua.at/en/products/cpu-cooler-retail/nh-d15s"]NH-D15s[/URL] is one of the best CPU air coolers you can get. I am using it myself (and have used the previous iteration on my old machine). Also, Noctua stays within the pressure specifications set by Intel. And, IMHO, their mounting system is rather easy and they have explanation videos available on YouTube. Oh, should you get the D15s, make sure to get the extra fan. Then you can have both run at lower speeds. Everything to mount the second fan is already included... just not the fan.

Just my two cents...

[QUOTE=XeniaRick;439591]I thought I was playing it safe at 3200. I can't recall reading any warnings that the CPU temps would go up so high. Now I know.[/QUOTE]

The temperatures are not so much the problem, if you keep the CPU on stock otherwise, so no general CPU overclocking that is not related to RAM. In my case, running @ DDR4-3000 results in an CPU temperature increase of ~5-7C, under full Prime95 load.

The problem in your case is that you are already running the CPU way too hot for stock operation. Once you have got the cooling problem properly solved, things will get a lot cooler and more manageable.

I hope you get your problems finally solved. I wish you all the best and good luck! If there is anything, just post...

Have a nice weekend,
Matthias

BinaryKhaos 2016-08-12 17:59

Hello @all,

so I have been quiet for a few days since I have done more extensive testing (all the grey hair, oh well) and my gut feeling was right after all.

To make a long story short: I have updated to microcode 0x9e and was able to finally run Prime95 for 75h straight without any problems. I have now upped the mem clock to DDR4-3000 and have been running Prime95 for almost 25h without any hiccups and I don't expect any.

Apparently, Intel needed a few more iterations to fix their "instability". It would be great if they started publishing changelogs for their microcodes and proper explanations for the errata items. There are rumors what it might have been but only Intel knows. :(

Just to clarify: Prime95 does not run stable with either FMA3 nor AVX up until microcode 0x9e, that is what I learnt the very hard way over the course of the last 6 or 7 weeks. Maybe I should send my power bill over to Santa Clara, CA. ;-)

Thanks for everyone's input on the matter and patience! It was very much appreciated. And I am glad that I have, apparently, finally found the culprit and could forgo swapping CPU and board after all -- even though I really would be curious to know how many years I shoved off the life of my CPU after the torture I put him through... and vice-versa, actually. ;-)

Have a nice weekend,
Matthias

PS. I will let the current test run again for 72h. Should the unexpected happen and it aborts, I will naturally let everyone know. Keep your fingers crossed though please that that won't be necessary.

chalsall 2016-08-12 19:56

[QUOTE=BinaryKhaos;439881]To make a long story short: I have updated to microcode 0x9e and was able to finally run Prime95 for 75h straight without any problems. I have now upped the mem clock to DDR4-3000 and have been running Prime95 for almost 25h without any hiccups and I don't expect any.[/QUOTE]

Good to hear of the successful conclusion!

So, then, at the end of the day, it turns out that ATH's suggestion of upgrading the BIOS was the solution?

BinaryKhaos 2016-08-13 06:59

Hi...

[QUOTE=chalsall;439887]So, then, at the end of the day, it turns out that ATH's suggestion of upgrading the BIOS was the solution?[/QUOTE]

No, unfortunately it was not that easy -- that would have saved me quite some headache otherwise since I was always running the latest firmware. But ASUS as of right now is still stuck at MC 0x74.

But you can download the microcode from Intel directly and apply it yourself. Sometimes Intel publishes them a bit late themselves but that's the best way to get an up-to-date version. I have been doing that ever since I can remember on Linux but since I am stuck right now on Windows (I have been waiting for the system to become stable before installing my main OS and all my data), I did not know if that was even possible there.

For those wondering, you can use a special [URL="https://labs.vmware.com/flings/vmware-cpu-microcode-update-driver"]CPU Microcode Update Driver[/URL] by VMware that works just fine (the microcode needs to be updated after each reboot). And the microcode itself is available directly from [URL="https://downloadcenter.intel.com/search?keyword=Linux+Processor+Microcode+Data+File"]Intel[/URL].

So long,
Matthias

rudi_m 2016-08-13 11:40

[QUOTE=BinaryKhaos;439904]
For those wondering, you can use a special [URL="https://labs.vmware.com/flings/vmware-cpu-microcode-update-driver"]CPU Microcode Update Driver[/URL] by VMware that works just fine (the microcode needs to be updated after each reboot). And the microcode itself is available directly from [URL="https://downloadcenter.intel.com/search?keyword=Linux+Processor+Microcode+Data+File"]Intel[/URL].

So long,
Matthias[/QUOTE]

BTW certain Linux distros have usually packages (named like intel-ucode, ucode-intel or intel-microcode). Once installed they update the microcode during boot-up. Or you can do it manually, which looks like this
[CODE]
$ echo 1 >/sys/devices/system/cpu/microcode/reload
$ dmesg | grep -i microcode
[ 0.982852] microcode: CPU0 sig=0x506e3, pf=0x2, revision=0x8a
[ 0.982926] microcode: CPU1 sig=0x506e3, pf=0x2, revision=0x8a
[ 0.983012] microcode: CPU2 sig=0x506e3, pf=0x2, revision=0x8a
[ 0.983089] microcode: CPU3 sig=0x506e3, pf=0x2, revision=0x8a
[ 0.983267] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[672192.282331] microcode: CPU0 sig=0x506e3, pf=0x2, revision=0x8a
[672192.283171] microcode: CPU0 updated to revision 0x9e, date = 2016-06-22
[672192.283248] microcode: CPU1 sig=0x506e3, pf=0x2, revision=0x8a
[672192.284088] microcode: CPU1 updated to revision 0x9e, date = 2016-06-22
[672192.284181] microcode: CPU2 sig=0x506e3, pf=0x2, revision=0x8a
[672192.285024] microcode: CPU2 updated to revision 0x9e, date = 2016-06-22
[672192.285104] microcode: CPU3 sig=0x506e3, pf=0x2, revision=0x8a
[672192.285942] microcode: CPU3 updated to revision 0x9e, date = 2016-06-22
[/CODE]

Note the packages from official distro repositories are usually not up-to-date.

I have had also a double checking mismatch on a brand new system, (i7-6700, 64GB, Fujitsu D3401-H1). 1 bad / 20 good. So far it looks stable now, since I've updated to 0x7c.

Unfortunately I did one 100M check ([URL="http://www.mersenne.org/report_exponent/?exp_lo=332301763&full=1"]332301763[/URL]) still using microcode 0x55. Maybe one could double check that for me ;) I would also do some double checks for the one who checks mine.

Mark Rose 2016-08-13 14:24

[QUOTE=rudi_m;439909]BTW certain Linux distros have usually packages (named like intel-ucode, ucode-intel or intel-microcode). Once installed they update the microcode during boot-up. Or you can do it manually, which looks like this
[CODE]
$ echo 1 >/sys/devices/system/cpu/microcode/reload
$ dmesg | grep -i microcode
[ 0.982852] microcode: CPU0 sig=0x506e3, pf=0x2, revision=0x8a
[ 0.982926] microcode: CPU1 sig=0x506e3, pf=0x2, revision=0x8a
[ 0.983012] microcode: CPU2 sig=0x506e3, pf=0x2, revision=0x8a
[ 0.983089] microcode: CPU3 sig=0x506e3, pf=0x2, revision=0x8a
[ 0.983267] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[672192.282331] microcode: CPU0 sig=0x506e3, pf=0x2, revision=0x8a
[672192.283171] microcode: CPU0 updated to revision 0x9e, date = 2016-06-22
[672192.283248] microcode: CPU1 sig=0x506e3, pf=0x2, revision=0x8a
[672192.284088] microcode: CPU1 updated to revision 0x9e, date = 2016-06-22
[672192.284181] microcode: CPU2 sig=0x506e3, pf=0x2, revision=0x8a
[672192.285024] microcode: CPU2 updated to revision 0x9e, date = 2016-06-22
[672192.285104] microcode: CPU3 sig=0x506e3, pf=0x2, revision=0x8a
[672192.285942] microcode: CPU3 updated to revision 0x9e, date = 2016-06-22
[/CODE]

Note the packages from official distro repositories are usually not up-to-date.

I have had also a double checking mismatch on a brand new system, (i7-6700, 64GB, Fujitsu D3401-H1). 1 bad / 20 good. So far it looks stable now, since I've updated to 0x7c.

Unfortunately I did one 100M check ([URL="http://www.mersenne.org/report_exponent/?exp_lo=332301763&full=1"]332301763[/URL]) still using microcode 0x55. Maybe one could double check that for me ;) I would also do some double checks for the one who checks mine.[/QUOTE]

Interesting. I had thought the distributions would include updated microcodes in the kernel builds. It seems all my Skylake CPUs are running 0x74.

chalsall 2016-08-13 19:15

[QUOTE=BinaryKhaos;439904]For those wondering, you can use a special [URL="https://labs.vmware.com/flings/vmware-cpu-microcode-update-driver"]CPU Microcode Update Driver[/URL] by VMware that works just fine (the microcode needs to be updated after each reboot). And the microcode itself is available directly from [URL="https://downloadcenter.intel.com/search?keyword=Linux+Processor+Microcode+Data+File"]Intel[/URL].[/QUOTE]

Thanks Matthias for sticking with the process, and sharing your experience from which others can learn.

Modern "kit" can be harsh mistresses. Sometimes it takes a while to "make friends". Or at least, come to terms....

XeniaRick 2016-08-16 01:58

[QUOTE=BinaryKhaos;439880]Hey Rick...



Sorry to hear that. :( But if you finally find the root of all problems, you can at least move on.

A bent socket pin is something you are not alone with since I have read it quite a few times all around the net. IMHO, with a quality board and proper installation of the CPU and CPU cooler that should not happen.

Make sure you get yourself a good CPU cooler and be careful when installing the CPU. Also don't over-tighten the mounting screws for the CPU cooler. There is also the problem that a lot of manufacturers don't stay within the pressure specifications set by Intel and thus put more pressure on the CPU and socket than they are supposed to.

My suggestion, if you don't mind, I have only good things to say about Noctua. They build high quality products and their [URL="http://noctua.at/en/products/cpu-cooler-retail/nh-d15s"]NH-D15s[/URL] is one of the best CPU air coolers you can get. I am using it myself (and have used the previous iteration on my old machine). Also, Noctua stays within the pressure specifications set by Intel. And, IMHO, their mounting system is rather easy and they have explanation videos available on YouTube. Oh, should you get the D15s, make sure to get the extra fan. Then you can have both run at lower speeds. Everything to mount the second fan is already included... just not the fan.

Just my two cents...



The temperatures are not so much the problem, if you keep the CPU on stock otherwise, so no general CPU overclocking that is not related to RAM. In my case, running @ DDR4-3000 results in an CPU temperature increase of ~5-7C, under full Prime95 load.

The problem in your case is that you are already running the CPU way too hot for stock operation. Once you have got the cooling problem properly solved, things will get a lot cooler and more manageable.

I hope you get your problems finally solved. I wish you all the best and good luck! If there is anything, just post...

Have a nice weekend,
Matthias[/QUOTE]
Matthias, FANTASTIC!!! So glad to hear you to found the problem and solution.

After much delay I am back up and running. Turns out the questionable pin was indeed broken. I had to buy a microscope to actually see it. Found the head of the broken pin embedded between the corresponding pad on the CPU and the CPU PCB. I removed the head but now there's a sizable gouge in the pad. Not sure the CPU is usable at this point. I didn't want to take a chance and bought a new board and CPU. I'm returning the MB for a refund and will try to do the same for the CPU. As mentioned before, the cooler I was using had a convex shape that put maximum pressure on the CPU right above the location of the broken pin and I had the cooler down tight. I can't prove the cooler caused the damage but it only makes me wonder.

Yes, after some additional research I know you are right about the temperatures being way to high in my old system. Thank you for pointing that out. I downloaded HWMonitor and I'm starting this system out low and slow and will increase as I gain confidence. I'm just now reading your last message, so it's too late to give your cooler recommendation a try. I'm trying out a water cooler, the Corsair H100i v2. It attaches via thumbscrews and I've only got them finger tight - no more - and not even as tight as I can get them.

Thanks for all of your help and best wishes!

Regards,
Rick


All times are UTC. The time now is 06:48.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.