mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   768k Skylake Problem/Bug (https://www.mersenneforum.org/showthread.php?t=20714)

tha 2016-01-02 09:45

[QUOTE=chalsall;420871]Care to reference that article?

It would help to build "the case".[/QUOTE]

The Intel x86 processors use microcode to correct design failures, this is in my belief common knowledge. Intel, I believe, publishes it online including details somewhere. I recall browsing through a list of such workarounds. It also makes good sense to do it in such a way.

Of course the Intel processors of today carry an enormous amount of legacy with them. It is about the cost of recompiling all existing software in the world again and redistribute that over the costumers of all software publishers versus the costs of carrying the legacy code in the processor.

And also there are some commercial reasons that came to light when Intel tried to introduce a new architecture, Itanium, and failed to line up the customers behind a deal that found not enough market support. It gave AMD a lot of leeway to attract new customers.

tha 2016-01-02 09:58

I have been running 27.9 now for five hours on the same worktodo test case. So far, no error showed up.

Apart from freezing errors showing up more frequently after restarting the machine, did errors show up more frequently when the machine was used for other purposes, like browsing the internet or so, or did the errors show up as frequently when the machine was left alone to only churn on prime95/mprime?

George, can you write a version of 27.9 that outputs some potentially interesting data, like memory addresses used each iteration or so, that we could run? Of course such a modification will be very time consuming for the processor. Even worse may possibly actually prevent the error from showing up, but If I can make my machine to exhibit the freezing I would be willing to run such a version.

Aurum 2016-01-02 10:46

I recommend that you don't use your worktodo test case if you want to reproduce the error.

Is freezing equal to a worker stopped for you? There is another bug which will actually freeze the whole system: [url]http://www.tomshardware.co.uk/forum/id-2830772/skylake-build-randomly-freezing-crashing.html[/url]


[QUOTE]did errors show up more frequently when the machine was used for other purposes, like browsing the internet or so, or did the errors show up as frequently when the machine was left alone to only churn on prime95/mprime?[/QUOTE]

Doesn't make any difference. Although one of the top german overclockers der8auer ([url]http://hwbot.org/user/der8auer/[/url]) said that prime is not the fastest way to produce an error @1344k. I think he said that he uses prime + another stress testing software parallel to that.

megabit8 2016-01-02 12:28

Feedback from developers required
 
[QUOTE=Prime95;420934]
I did not see you using the addr function. This is fortunate. Since you are reading the raw doubles straight out of gwnum memory (looping from 0 to gwnum_datasize/8-1). The addr function is used when looping from 0 to fftlen-1). The gwnum array contains a number of unused cache lines to avoid power-of-2 strides which is why gwnum_datasize is not 768K * 8 bytes.
[/QUOTE]
I use it to get the start address of the full buffer including the cache strides like this:
[CODE]
gwnum s = gwalloc(gwdata);
double *startAddress = addr(gwdata, s, 0);
unsigned long dataSize = gwnum_datasize(gwdata);
[/CODE]I need that double *startAddress to be pointing to the start of the full buffer including unused zones so that it has exactly "gwnum_datasize(gwdata)" bytes allocated.
So I need a confirmation that this code is right for the job.

[QUOTE=Prime95;420934]
To make sure you are using AVX FFTs you need to call guessCpuType and then turn off
the CPU_FMA3 flag in CPU_FLAGS.
[/QUOTE]
Is this really necessary since I am linking with version 27.9 of the gwnum library and it seems to produce code without FMA3. The "gwnum.c" does not have any reference to FMA, only to AVX.

[QUOTE=Prime95;420934]
I don't know what to make of your program working standalone, but not after pausing in the debugger. Could well be a completely hardware different problem or a software problem with MSVC.
[/QUOTE]

The small program is a non-important in-place torture test using 768K FFT trying to replicate the most time consuming routine from Prime95 27.9, the FFT. But it always operates on the same input data all over again because the seed is kept constant equal to 7. I left it running for 10 hours and no errors appeared by themselves on Skylake. As far as I have used VS 2015 for work purposes it was pretty solid at pause/continue the debugger.

First I am trying to exclude that the *startAddress and dataSize variables do not touch any memory beyond their allocated space and they contain the entire FFT mem-work-space. Otherwise they will interfere with unallocated memory possibly used by VS. So I need some help with this.
Thank you!

megabit8 2016-01-02 14:19

If "The addr function is used when looping from 0 to fftlen-1" then
[CODE] unsigned long dataSize = (unsigned long)addr(gwdata, s, asm_data->FFTLEN - 1) + sizeof(double) - (unsigned long)startAddress; [/CODE]gives the same result as
[CODE]
unsigned long dataSize = gwnum_datasize(gwdata);
[/CODE](I have verified) which means that the iteration is done through allocated data only and contains the first and the last element.

I have also called this function once at the beginning:
[CODE]
_declspec(dllexport) void __cdecl InitializeCPU()
{
guessCpuType();
guessCpuSpeed();
CPU_FLAGS &= ~CPU_FMA;
}[/CODE]But the result is the same. It seems that the errors appear immediately on Debug or Release, Pause and Continue, only if the option "Enable native code debugging" is on.

I am wondering what code VS 2015 executes there so that the FFT memory space gets altered.

ATH 2016-01-02 14:33

[QUOTE=tha;420953]I have been running 27.9 now for five hours on the same worktodo test case. So far, no error showed up.[/QUOTE]

I assume you have hyperthreading on? In the stress test case you need 8 workers running on all the 8 virtual threads. I noticed your worktodo had only 4 workers you should try with 8 workers, and "WorkerThreads=8" + "ThreadsPerTest=1" in local.txt

But if you really want to know if your system has the error or not you should first replicate the exact circumstances, see for example [URL="http://www.mersenneforum.org/showpost.php?p=419502&postcount=184"]post #184[/URL] for the summary.

megabit8 2016-01-02 14:52

I've got some bad news. The same errors happen on another more dummy, slower but AVX optimized FFT of 768K real points. With the Debug Break and continue only on Skylake. It seems to be VS 2015 and Skylake related since _mm256_load_pd loads only half of the register when stepping over. Can you please tell me which compiler do you use for producing the Prime95 exe. Is it VS 2005 since the .SLN file is 2005 ?
Thank you!

Prime95 2016-01-02 15:00

[QUOTE=megabit8;420991]I've got some bad news. The same errors happen on another more dummy, slower but AVX optimized FFT of 768K real points. With the Debug Break and continue only on Skylake. It seems to be VS 2015 and Skylake related since _mm256_load_pd loads only half of the register when stepping over. Can you please tell me which compiler do you use for producing the Prime95 exe. Is it VS 2005 since the .SLN file is 2005 ?[/QUOTE]

Yes, I build with MSVC 2005.

tha 2016-01-02 15:26

[QUOTE=ATH;420989]I assume you have hyperthreading on? [/QUOTE]

Yes, I have hyperthreading on and both versions of mprime reported that the cores 1 & 5, 2 & 6, 3 & 7, and 4 & 8 were working on the respective exponents.

I will finish the current test, which will be another two hours. I will then start a new test with 8 threads working concurrently on the following worktodo.txt test case:

[CODE]
[Worker #1]
Test=N/A,14942209,67,1

[Worker #2]
Test=N/A,14942267,67,1

[Worker #3]
Test=N/A,14942293,67,1

[Worker #4]
Test=N/A,14942437,67,1

[Worker #5]
Test=N/A,14942497,67,1

[Worker #6]
Test=N/A,14942539,67,1

[Worker #7]
Test=N/A,14942563,67,1

[Worker #8]
Test=N/A,14942567,67,1
[/CODE]

If it fails I will start it again and see if it takes an equal amount of time to failure.

The current test and other tests done so far will prove the stability of the system so far.

I have a GTX 580 and a 590 for this system, but I am not turning them on yet as I don't want to interfere with these tests.

tha 2016-01-02 15:31

@ George:

If I remember correctly each time a LL test is started a random offset is chosen to eliminate potential errors in the design of the processor. Can we somehow force mprime (or prime95) to use no offset or at least a reproducible offset? I would like to try to create a case where we can tell beforehand exactly at which point the processor will be thrown off course.

Also I assume you have access to a Skylake system yourself?

ATH 2016-01-02 15:46

[QUOTE=tha;420995]Yes, I have hyperthreading on and both versions of mprime reported that the cores 1 & 5, 2 & 6, 3 & 7, and 4 & 8 were working on the respective exponents[/QUOTE]

But have you tried the actually torture test yet? So far it is the only reported source of the bug, just to test if your system even has the error. It seems premature to test new ways to generate it before you know if the only known way works.


All times are UTC. The time now is 23:23.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.