mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   768k Skylake Problem/Bug (https://www.mersenneforum.org/showthread.php?t=20714)

Dubslow 2015-12-08 01:36

The long story short is that

1) either the contact is blowing her off, or

2) basically we need a long and precise list of all systems that failed (all hardware combinations known to fail), together with multiple mobo manufacturers contacting Intel through official channels on the manner.

She'll try a different contact, but for now, it looks like we should focus on the motherboard OEMs and get them to initiate something formal.

Edit: part of the problem is the segregation at Intel - my roommate didn't work on desktop chips, so the people she knows aren't really the people we need. As has already been noted, we should definitely try to reproduce the issue on Skylake Xeons.

ATH 2015-12-08 08:02

There are no Skylake Xeons yet, right?

Dubslow 2015-12-08 08:21

[QUOTE=ATH;418592]There are no Skylake Xeons yet, right?[/QUOTE]

There are, though not many, and none with more than 4 cores as of yet.

Prime95 2015-12-08 15:19

ASRock has contacted Intel

Madpoo 2015-12-08 21:51

[QUOTE=Dubslow;418594]There are, though not many, and none with more than 4 cores as of yet.[/QUOTE]

Correct... Skylake Xeons are out (I mentioned a few posts back that some of them have actually contacted the Primenet server, but thus far none of them have turned any results in).

None of them have run a benchmark either. Who knows... maybe these were Intel folks running Prime95 on samples. :smile:

The new Thinkpad P50/P70 laptops with the option of a mobile Skylake Xeon model are supposed to be out now (P70) or in the next month (for the 15" P50 model)... I think if you chose the Xeon option on those they come with "Xeon E3-1505M v5"

I'm sure other brands probably have the mobile versions in their pipeline as well.

The workstation/single CPU Xeon E3-12xx v5 models should also be available. I'm basing that on this:
[URL="https://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors"]https://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors[/URL]

As with any wiki page, take it with a grain of salt, but there are sellers out there offering the E3-12xx v5 units. I'd paste a link, but then again no I won't because it seems tacky to include a link to a reseller in here, especially one I've never purchased from and could be a scam for all I know. LOL

At any rate, only the uniprocessor "E3" models have been announced and/or released. The E5/E7 models don't have details yet and won't be out until sometime in 2016.

What I'm unclear on is whether the uniprocessor and mobile Xeon Skylakes have the promised AVX-512 or not. I'd like to assume they do and if so, hey, that'd be a fun hobby purchase, but otherwise why bother. :smile:

Prime95 2015-12-08 23:30

It would not hurt if some of our German bug discoverers could pester other motherboard manufacturers. The more sources Intel hears from the more seriously they will take the problem.

tha 2015-12-08 23:43

[QUOTE=Prime95;418652] The more sources Intel hears from the more seriously they will take the problem.[/QUOTE]

Since this issue has potential problems for the security of the systems, I mailed the security guys at Intel yesterday. They have replied to me today that they intent to forward it to the chip designers, but already try to replicate the erratic behaviour on their own systems. I also included this thread on this forum in the info I mailed them. What they like to see is a list of exact steps taken to get the processor to fail and the precise specifics of hardware and software used, even though they are already aware of the processor itself being the main suspect. You can post any such information here in this thread.

This posting is to underline the posting above of George, not to signal no further action from the community to Intel is needed.

Dubslow 2015-12-09 00:01

1) Use Skylake chip

2) Run Prime95 stress test, at 768K FFT length, in place (0 mem usage). Either version 27 or 28 (though as I understand sometimes 27 fails faster...?)

Very easy to reproduce.

science_man_88 2015-12-09 00:14

[QUOTE=ralleh;417935]
- Using 28.7 with CpuSupportsFMA3=0 but FFT size of 15 gives the same errors as 27.9 does (same settings as 27.9 default) [/QUOTE]

I must be misinterpreting this then ?

Prime95 2015-12-09 01:24

[QUOTE=Dubslow;418655]1) Use Skylake chip

2) Run Prime95 stress test, at 768K FFT length, in place (0 mem usage). Either version 27 or 28 (though as I understand sometimes 27 fails faster...?)

Very easy to reproduce.[/QUOTE]

Too vague. Improving:

1) Use Skylake 6770(either K or non-K) with hyperthreading enabled.
2) Use Windows 64-bit (problem can also be replicated using Linux).
3) Run version 27.9 of prime95 available at [url]ftp://mersenne.org/gimps/p95v279.win64.zip[/url]. Run a torture test. Choose custom from the dialog box. Select 8 threads, 768 min and max FFT size, in-place.
4) Failure usually occurs within an hour.

Batalov 2015-12-09 02:17

Would anyone with a Skylake chip open their system for a ssh access and send credentials via PM?

To increase chances of success (of getting Intel to pay attention), it would be nice to make the debug case as minimal as it can be, with a standalone code of
a) perhaps few dozen lines, distilled from the prime95 source guts (stripped from the libcurl dependencies etc which usually give folks trouble compiling) and no command line parameters and no conf files,
b) using just one debug case ad nauseum (e.g. [I]6500 Lucas-Lehmer iterations of M10485761 * using AVX FFT length 768K [/I]only), and
c) linked to the standard libgwnum.a
__________________
* because it is a nice number 5*2^21+1


All times are UTC. The time now is 23:23.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.