mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2016-08-03, 18:14   #1
Akujik
 
Aug 2016

2·3 Posts
Default Issue with Broadwell-E and mprime?

Hi,

I stress test in linux using mprime and have been OCing a i7-6950X (new 10core extreme cpu)

I started encountering an issue that I never had before.. basically while stress testing mprime will kill itself after about 15minutes. There will be no errors posted in results.txt nor anywhere else. All load just drops on the CPUs and system stays up with zero errors, zero temperature issues, etc.

Currently using mprime v27.9.

Is there any known bug with the new broadwell-e cpus? I haven ot experienced this issue on anything ranging from 4.0GHz to 4.5GHz, but while I am pushing much higher this keeps happening.
Akujik is offline   Reply With Quote
Old 2016-08-03, 19:24   #2
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

3×977 Posts
Default

Quote:
Originally Posted by Akujik View Post
Is there any known bug with the new broadwell-e cpus? I haven ot experienced this issue on anything ranging from 4.0GHz to 4.5GHz, but while I am pushing much higher this keeps happening.
There is your answer. Keep it clocked to 4.5 GHz or under. It's not stable higher.
Mark Rose is offline   Reply With Quote
Old 2016-08-03, 21:18   #3
Akujik
 
Aug 2016

2×3 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
There is your answer. Keep it clocked to 4.5 GHz or under. It's not stable higher.
How is that an answer?

I have it stable at 4.7. I just cannot get cache at default or higher - these issues will always happen. - Note, I said these issues happen over 4.5 and havent before.. I did not say I didn't have a system stable over those speeds.

And I don't see how instability is proven by the program just stopping rather than reporting an error.

To have my answer it needs to actually answer my questions - Segmentation Fault and why the program will sometimes just stop rather than reporting errors.

With segmentation fault it seems like this was a bug before as there was an official reply talking about fixing the bug regarding segmentation fault before.. but my version shouldn't have this bug according to that post, and not always having to do with instability issues.

Last fiddled with by Akujik on 2016-08-03 at 21:20
Akujik is offline   Reply With Quote
Old 2016-08-03, 21:59   #4
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

Quote:
Originally Posted by Akujik View Post
How is that an answer?

I have it stable at 4.7. I just cannot get cache at default or higher - these issues will always happen. - Note, I said these issues happen over 4.5 and havent before.. I did not say I didn't have a system stable over those speeds.

And I don't see how instability is proven by the program just stopping rather than reporting an error.

To have my answer it needs to actually answer my questions - Segmentation Fault and why the program will sometimes just stop rather than reporting errors.

With segmentation fault it seems like this was a bug before as there was an official reply talking about fixing the bug regarding segmentation fault before.. but my version shouldn't have this bug according to that post, and not always having to do with instability issues.
If you run mprime in GDB / LLDB then you should actually get useful information when it crashes, such as what it was doing. Most likely Mark is right - your overclock reached a stability point where a calculation was wrong or a bit got flipped and the program reached what should have been an impossible state. mprime isn't exactly high-security defensively programmed software, it is optimized for performance. As a result it likely makes assumptions about the output of an operation based on the input (trivial example - adding two 8 bit numbers and storing the result in a 32 bit word should always leave the upper 16 bits as zero - but if an OC becomes unstable that register might not be 0, and a later operation depending on that state generates a seg fault.)
airsquirrels is offline   Reply With Quote
Old 2016-08-03, 22:26   #5
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

3·1,579 Posts
Default

Less technically: If a program crashes under certain overclocking conditions but does not crash without the overclock, the problem is with the hardware, not the software.

mprime is not the only program I've had just disappear (silently crash) when my OC is too aggressive. It may be the only program that you've tested that does so, but all the same it is a sign you're too close to the edge for real stability. There are uses (e.g. gaming) where this state is acceptable, but for scientific computation you should back off the OC settings.
VBCurtis is offline   Reply With Quote
Old 2016-08-04, 00:25   #6
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

224708 Posts
Default

Quote:
Originally Posted by Akujik View Post
I have it stable at 4.7.
Clearly, you don't.
chalsall is offline   Reply With Quote
Old 2016-08-04, 01:56   #7
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

24AF16 Posts
Default

Quote:
Originally Posted by Akujik View Post
How is that an answer?

I have it stable at 4.7.
So... you can rev your RPM to 11,000 while standing still, but not while driving?
Maybe you should have not skipped those physics classes in high school? Just maybe...
Attached Thumbnails
Click image for larger version

Name:	red-line-gauge.jpg
Views:	80
Size:	52.0 KB
ID:	14727  
Batalov is offline   Reply With Quote
Old 2016-08-04, 02:42   #8
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

83·113 Posts
Default

@OP: Together with P95 distribution, it comes a file called "stress.txt". Read it.
Quote:
Originally Posted by "stress.txt", last FAQ
Q) A forum member said "Don't bother with prime95, it always pukes on me,
and my system is stable!. What do you make of that?"

or

"We had a server at work that ran for 2 MONTHS straight, without a reboot
I installed Prime95 on it and ran it - a couple minutes later I get an error.
You are going to tell me that the server wasn't stable?"

A) These users obviously do not subscribe to the 100% rock solid
school of thought. THEIR MACHINES DO HAVE HARDWARE PROBLEMS.
But since they are not presently running any programs that reveal
the hardware problem, the machines are quite stable. As long as
these machines never run a program that uncovers the hardware problem,
then the machines will continue to be stable.
LaurV is offline   Reply With Quote
Old 2016-08-04, 13:39   #9
GP2
 
GP2's Avatar
 
Sep 2003

22×647 Posts
Default

Quote:
Originally Posted by Akujik View Post
Currently using mprime v27.9.
Try upgrading to v28.9 (the latest) and see if that works for you.

It works slightly differently internally (with potentially multiple helper threads for each worker thread), so who knows, maybe there's a slight possibility that the problem won't be triggered.
GP2 is offline   Reply With Quote
Old 2016-08-04, 14:54   #10
Akujik
 
Aug 2016

68 Posts
Default

Quote:
Originally Posted by chalsall View Post
Clearly, you don't.
30 hours on mprime with 47/25 isn't good?

I clearly don't with a higher cache though since that is what I am explaining and wanted to know about those 2 specific issues/failures, you are right about that.
Akujik is offline   Reply With Quote
Old 2016-08-04, 14:55   #11
Akujik
 
Aug 2016

2·3 Posts
Default

Quote:
Originally Posted by GP2 View Post
Try upgrading to v28.9 (the latest) and see if that works for you.

It works slightly differently internally (with potentially multiple helper threads for each worker thread), so who knows, maybe there's a slight possibility that the problem won't be triggered.
thanks, will give it a try
Akujik is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
New Broadwell-EX Xeons ATH Hardware 3 2017-02-28 01:18
building mprime issue on FedoraCore22 EmbeddedSteve Linux 3 2016-01-28 14:24
Intel Broadwell-E rumors VictordeHolland Hardware 2 2015-12-12 11:38
Broadwell Processor firejuggler Hardware 57 2015-05-23 01:22
Broadwell new instructions tha Hardware 6 2014-07-18 00:08

All times are UTC. The time now is 16:23.

Mon Apr 19 16:23:20 UTC 2021 up 11 days, 11:04, 1 user, load averages: 2.12, 2.14, 1.96

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.