mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   768k Skylake Problem/Bug (https://www.mersenneforum.org/showthread.php?t=20714)

pegnose 2016-01-20 22:09

[QUOTE=chalsall;423258]Please trust me, I understand. When I was responsible for building out the first wireless WAN here in Barbados the manufacturer had a "once a month" bug. Of course, after deploying ~100 radios, the problem manifested ~3 times a day...[/QUOTE]

Oh nose. What was it?

[QUOTE=chalsall;423258]Are the other three drives SSDs or HDs? Please know that "spinning rust storage" can have rather extreme random spikes in their draw (both OS driven, and independent).[/QUOTE]

Its one Samsung 840 Pro 256 GB SSD and two WD Green 3TB.

If I understand my PSU correct, the 4 rails are 1) for CPU and mobo, 2) for drives and all other devices, 3) PCIe 1, 4) PCIe 2. If I am correct, there is no danger here. Unfortunately there is no info on that on their website.

[QUOTE=chalsall;423258]Don't guess. Test.[/QUOTE]

Many people with many different PSUs have thre freeze. There is no reason to believe it is the problem.

[QUOTE=chalsall;423258]I would suggest (if you and/or others can) to first remove any kit you can (for example, HDs, GPUs, RAM), and rerun the tests you've used to produce the observed crashing (even if not deterministically -- currently you're doing statistical testing). Swap out MBs and PSUs. Make sure your mains power is good.[/QUOTE]

That is exactly what I did. As soon as ASUS support pointed me to my memory not exactly being compatible, I adjusted settings and took all componentes out except CPU and one ram module. I even unplugged all the fans. Ok, the DVD drive was sill on, but I had to run memtest86 somehow. ;)

Finally I arrived at memtest86, Prime95, and idle state, running for a whole day without any issue - with correct memory timings, more power for DRAM, and some compat settings that possibly are pure placebo.

[QUOTE=chalsall;423258]This is not to say this is not a CPU issue, but you don't /know/ it is yet.

I hope that makes sense and helps.[/QUOTE]

To bring you up to speed: only few of us were able to imporve their situation by RMAing the CPU. My impression is that many think it is a platform/bios/microcode (whatever of those) issue - me included. I don't think that anything is broken, particularly as I had two crash-free months in the beginning. Here is 'our' main discussion thread:

[url]http://www.tomshardware.co.uk/forum/id-2830772/skylake-build-randomly-freezing-crashing/page-9.html#17356820[/url]

Some have the Skylake freeze during idle, others during load, one guy while video streaming. Some could improve by disabling c-states, some with memory timings and voltages, some even with CPU core voltage, if I remember correctly. It is diffuse, obscure, and some other nice words. ;)

It could be very different problems, but interestingly, all these different issues were not completely cured on any machine (not that I read of), but only got better to a more or less substantial extent.

pegnose 2016-01-20 22:18

[QUOTE=kladner;423263]chalsall:
pegnose:
I'm pretty sure, from the OP's post, that all the drives are SSDs. Twenty watts for three spinners also seems pretty unlikely.[/QUOTE]

Samsung 840 pro: 3.21 W in use
WD Green 3TB: 4.1 W in use (x2)
plus a substantial buffer, or let it be 30 W altogether; on one PSU rail, what are we talking about?!

pegnose 2016-01-20 22:23

[QUOTE=chalsall;423265]Please also note this: "My last freeze (and the first AFTER I thought I was finally good) now was with high load on CPU and HDD".[/QUOTE]

Yes, I stated this intentionally. But I rather meant data transfer. My software Raid 1 was resyncing. Of course, this also means power, but spikes? Plus: even the BeQuiet support told me that I would rather have to fear black-outs in such cases.

I really apreciate your effort!! But I am afraid we are on the wrong track. First thing I do is get different (brand) compatible memory. I.e. after I flash the new 1402 bios update and get the next freeze (if any, haha).

chalsall 2016-01-20 22:25

[QUOTE=pegnose;423267]It is diffuse, obscure, and some other nice words. ;)

It could be very different problems, but interestingly, all these different issues were not completely cured on any machine (not that I read of), but only got better to a more or less substantial extent.[/QUOTE]

OK. But, you guys are changing many different variables all at the same time, with little cross correlation nor testable results.

This is not how the Scientific Method works.

To use an analogy, this is worse than shooting a shotgun in the dark hoping to find your keys....

pegnose 2016-01-20 22:29

[QUOTE=chalsall;423272]OK. But, you guys are changing many different variables all at the same time, with little cross correlation nor testable results.

This is not how the Scientific Method works.

To use an analogy, this is worse than shooting a shotgun in the dark hoping to find your keys....[/QUOTE]

You mean shooting the shotgun at the streetlight you are standing below... ,)

pegnose 2016-01-20 22:31

And what is wrong with getting advised memory? I say: check one component at a time and make 100% sure it is ok. I am not done with memory, yet. If I get new memory from a different brand that is in my HCL, and I still have the issue, I move on to the next component.

chalsall 2016-01-20 22:33

[QUOTE=pegnose;423271]I really apreciate your effort!! But I am afraid we are on the wrong track. First thing I do is get different (brand) compatible memory. I.e. after I flash the new 1402 bios update and get the next freeze (if any, haha).[/QUOTE]

Have you, personally, tried a different motherboard supplier with all your other components, including the CPU?

I learnt the hard way to eliminate *all* variables.... :smile:

pegnose 2016-01-20 22:34

[QUOTE=chalsall;423276]Have you, personally, tried a different motherboard supplier with all your other components, including the CPU?

I learnt the hard way to eliminate *all* variables.... :smile:[/QUOTE]

No. As I said: I started out with memory, and I am not yet done with it.

But, of course, part of the problem is - and this is more or less the same with all of us: I bought my mobo more than 6 months ago. What will my vendor say if I want to return it without being able to proove that it is broken and that it WAS broken from the beginning (after 6 mo that is necessary), PLUS that I want a different one in return? I should be happy if he deems my worthy of even the shortest response.

AND on the other hand: as I said, I don't believe that something is broken (or sort of broken with all ASUS Z170 boards). If my other componentes are fine, ASUS support has to deal with it. I will make them. Oh, I will.

EDIT: You HEAR me, ASUS?!?

chalsall 2016-01-20 22:45

HDD Diet: Power Consumption and Heat Dissipation
 
Just to put this out there...

[URL="http://ixbtlabs.com/articles2/storage/hddpower.html"]Almost eleven years old, and yet still relevant[/URL]....

pegnose 2016-01-20 22:51

[QUOTE=chalsall;423278]Just to put this out there...

[URL="http://ixbtlabs.com/articles2/storage/hddpower.html"]Almost eleven years old, and yet still relevant[/URL]....[/QUOTE]

You are right, that is important.

In my case, I have a good cooling solution (and case ;). My HDDs are resyncing for 16 h now, and they are 31°C and 32°C.


EDIT: Nighty night.

chalsall 2016-01-20 23:07

[QUOTE=pegnose;423279]In my case, I have a good cooling solution (and case ;). My HDDs are resyncing for 16 h now, and they are 31°C and 32°C.[/QUOTE]

You might have missed the point of the article...

Measuring the temperature of the components involved is an averaged and high-latency measurement of the power consumed.

Taking an instantaneous power consumption measurement is a lot more difficult (particularly when Direct Current rather than Alternating Current is involved).


All times are UTC. The time now is 23:23.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.