mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2015-11-30, 11:48   #1
Aurum
 
Aurum's Avatar
 
Nov 2015

2·52 Posts
Default 768k Skylake Problem/Bug

hi,

as some of you might know there is a Problem relating the 768k test. If I'm using my 6700k @stock the 768k test fails. Usually the 768k test is related to the Agent/IMC. It has been reproduced several times in the past few weeks. Does anyone know where I can post a bug report relating this problem?

thanks
Aurum is offline   Reply With Quote
Old 2015-11-30, 19:05   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3×1,193 Posts
Default

This is the email I just sent ASRock support -- I think they may have been talking about your case. If this does not help, post your prime95 version, the error message, temps, etc. that might help.

Hi,

Prime95 version 27.x was made available after Sandy Bridge was introduced with AVX instructions. The latest version is 28.7 which uses the AVX2 fused multiply add instructions introduced in Haswell.

Your screenshot does not indicate the error message, but that is not important. Most likely, the problem is the CPU needs a little more voltage. Prime95 works the FPU and memory subsystem awfully hard -- it can turn up problems that other programs do not. I have not personally tried prime95 on Skylake i7 6700k, but I expect that with the right voltage settings it should pass the stress test. Note, I once got a Haswell i5 k-series that did not pass the stress test at stock voltages. I RMA'ed the CPU and the replacement had no problems.

I would try this:
1) Increase voltages and run version 27.9 to see if you can obtain stability. Try the small FFT test which only stresses the FPU. Then the large or blend FFT test to stress both FPU and memory. Keep an eye on temps too.
2) Then try version 28.7. Prime95 using the AVX2 instructions will work the FPU even harder. Temps will rise considerably as (on Haswell) Intel automatically increases voltages 0.1V when AVX2 instructions are in use.

Hope that helps,
George
Prime95 is offline   Reply With Quote
Old 2015-11-30, 20:29   #3
Aurum
 
Aurum's Avatar
 
Nov 2015

2×52 Posts
Default

hi

that is the official support answer. They are wrong! The problem is not (!) vcore related. Other voltages like vccsa or vccio also have no affect on this problem. The 768k problem occurs with every common version of prime (27.9, 28.5 and 28.7). Although it will take much longer till a worker fails with 28.7. Sometimes no worker will fail within several hours. If you restart the computer several times a worker might fail within minutes. It's different to common oc problems. At first we thought it was related to the ram training process. But the sub timings are the same when a worker fails or the test runs for hours. The problem will even be there @stock. The issue has been reproduced with CPUs from several forum members: http://www.hardwareluxx.de/community...z-1086608.html

I've contacted the support but they said I should post a bug report. An overview of the problem can be found here:

http://www.bilder-hochladen.net/file...0a-9j-6457.jpg
http://www.bilder-hochladen.net/file...0a-9k-3cd1.jpg
http://www.bilder-hochladen.net/file...0a-9l-c9ea.jpg
http://www.bilder-hochladen.net/file...0a-9m-6b0d.jpg
http://www.bilder-hochladen.net/file...0a-9n-e784.jpg
http://www.bilder-hochladen.net/file...0a-9o-7db0.jpg
http://www.bilder-hochladen.net/file...0a-9p-f7a8.jpg

thanks
Aurum is offline   Reply With Quote
Old 2015-12-01, 00:21   #4
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

32·331 Posts
Default

I'm wondering why your torture tests are using AVX FFT. When I try on my Haswell-E 5960X it uses FMA3 FFT, and your Skylake supports FMA3 as well as can be seen in the CPU-Z.
ATH is offline   Reply With Quote
Old 2015-12-01, 00:28   #5
Aurum
 
Aurum's Avatar
 
Nov 2015

1100102 Posts
Default

CpuSupportsFMA3=0 The problem is related to the AVX test.
Aurum is offline   Reply With Quote
Old 2015-12-01, 05:20   #6
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

32·331 Posts
Default

Well there does not seem to be a problem with Haswell-E at 768k FFT.

I ran 1.5 hours of tests with FMA3 FFT ~35 tests on each of the 8 cores, and then 3.25 hours of tests with AVX FFT ~75 on each of the 8 cores with no errors.
ATH is offline   Reply With Quote
Old 2015-12-01, 07:28   #7
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

983210 Posts
Default

Quote:
Originally Posted by ATH View Post
Well there does not seem to be a problem with Haswell-E at 768k FFT.

I ran 1.5 hours of tests with FMA3 FFT ~35 tests on each of the 8 cores, and then 3.25 hours of tests with AVX FFT ~75 on each of the 8 cores with no errors.
George's post - if I read it right - indicates this is likely a motherboard-undervoltage problem, related to one specific mobo manufacturer. Are you using an ASRock mobo, and if so, what specific model?

George, can the OP tweak his mobo voltages via the boot-time BIOS menu to see if that fixes the problem for him, or is that not a user-accessible fiddle?

Last fiddled with by ewmayer on 2015-12-01 at 07:28
ewmayer is offline   Reply With Quote
Old 2015-12-01, 10:28   #8
Aurum
 
Aurum's Avatar
 
Nov 2015

5010 Posts
Default

Quote:
Well there does not seem to be a problem with Haswell-E
The problem is only related to skylake.

Quote:
- if I read it right - indicates this is likely a motherboard-undervoltage problem
This is not an undervoltage problem.

Quote:
related to one specific mobo manufacturer.
nope. It is related to several skylake mobo manufacturers.
Aurum is offline   Reply With Quote
Old 2015-12-01, 19:34   #9
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1BF616 Posts
Default

I was not implying ASRock mobos have a problem. I was suggesting that in the past similar problems have been solved by bumping the voltage -- sometimes past stock settings.

OP: I cannot read your German website. Are you saying this is happening to Skylake CPUs from several users or just the one you own?

OP: From your description it sounds like the problem does not occur in the same place with the same error message every time -- that is the error always occurs on the exact same exponent and in fact never passes a test on that exponent.

OTHERS: Are there any Skylake owners that can try to duplicate this? Use CpuSupportsFMA3=0 in local.txt. Run a custom torture test only on 768K FFT.
Prime95 is offline   Reply With Quote
Old 2015-12-01, 19:51   #10
Aurum
 
Aurum's Avatar
 
Nov 2015

5010 Posts
Default

Quote:
I was suggesting that in the past similar problems have been solved by bumping the voltage -- sometimes past stock settings.
Vcore, Vccsa, Vccio won't solve the problem. We have tried different combinations with different CPUs.

Quote:
Are you saying this is happening to Skylake CPUs from several users or just the one you own?
The problem has been reproduced with ~15 CPUs (6700k) by several forum members. Not all Cpus are affected. There seem to be some working combinations out there.

Quote:
From your description it sounds like the problem does not occur in the same place with the same error message every time -- that is the error always occurs on the exact same exponent and in fact never passes a test on that exponent.
That's right. The problem may kick in after hours or minutes ... Sometime no worker will fail within several hours. If you restart the computer a worker might fail within minutes with the same settings.

Quote:
Are there any Skylake owners that can try to duplicate this? Use CpuSupportsFMA3=0 in local.txt. Run a custom torture test only on 768K FFT.
Yep, there are many. I can ask some of the other guys to post their experiences if needed ^^

Last fiddled with by Aurum on 2015-12-01 at 19:52
Aurum is offline   Reply With Quote
Old 2015-12-01, 20:12   #11
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

221558 Posts
Default

Quote:
Originally Posted by Aurum View Post
Yep, there are many. I can ask some of the other guys to post their experiences if needed ^^
Do so.
chalsall is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Skylake vs Kabylake ET_ Hardware 17 2017-05-24 16:19
Skylake and RAM scaling mackerel Hardware 34 2016-03-03 19:14
So does skylake-nonXeon actually get us anything? fivemack Hardware 36 2015-09-08 01:42
Skylake processor tha Hardware 7 2015-03-05 23:49
Skylake AVX-512 clarke Software 15 2015-03-04 21:48

All times are UTC. The time now is 20:58.

Fri Nov 27 20:58:08 UTC 2020 up 78 days, 18:09, 3 users, load averages: 1.59, 1.20, 1.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.