mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-01-02, 09:45   #254
tha
 
tha's Avatar
 
Dec 2002

5·163 Posts
Default

Quote:
Originally Posted by chalsall View Post
Care to reference that article?

It would help to build "the case".
The Intel x86 processors use microcode to correct design failures, this is in my belief common knowledge. Intel, I believe, publishes it online including details somewhere. I recall browsing through a list of such workarounds. It also makes good sense to do it in such a way.

Of course the Intel processors of today carry an enormous amount of legacy with them. It is about the cost of recompiling all existing software in the world again and redistribute that over the costumers of all software publishers versus the costs of carrying the legacy code in the processor.

And also there are some commercial reasons that came to light when Intel tried to introduce a new architecture, Itanium, and failed to line up the customers behind a deal that found not enough market support. It gave AMD a lot of leeway to attract new customers.
tha is offline   Reply With Quote
Old 2016-01-02, 09:58   #255
tha
 
tha's Avatar
 
Dec 2002

14578 Posts
Default

I have been running 27.9 now for five hours on the same worktodo test case. So far, no error showed up.

Apart from freezing errors showing up more frequently after restarting the machine, did errors show up more frequently when the machine was used for other purposes, like browsing the internet or so, or did the errors show up as frequently when the machine was left alone to only churn on prime95/mprime?

George, can you write a version of 27.9 that outputs some potentially interesting data, like memory addresses used each iteration or so, that we could run? Of course such a modification will be very time consuming for the processor. Even worse may possibly actually prevent the error from showing up, but If I can make my machine to exhibit the freezing I would be willing to run such a version.
tha is offline   Reply With Quote
Old 2016-01-02, 10:46   #256
Aurum
 
Aurum's Avatar
 
Nov 2015

5010 Posts
Default

I recommend that you don't use your worktodo test case if you want to reproduce the error.

Is freezing equal to a worker stopped for you? There is another bug which will actually freeze the whole system: http://www.tomshardware.co.uk/forum/...-crashing.html


Quote:
did errors show up more frequently when the machine was used for other purposes, like browsing the internet or so, or did the errors show up as frequently when the machine was left alone to only churn on prime95/mprime?
Doesn't make any difference. Although one of the top german overclockers der8auer (http://hwbot.org/user/der8auer/) said that prime is not the fastest way to produce an error @1344k. I think he said that he uses prime + another stress testing software parallel to that.
Aurum is offline   Reply With Quote
Old 2016-01-02, 12:28   #257
megabit8
 
Dec 2015

23×3 Posts
Post Feedback from developers required

Quote:
Originally Posted by Prime95 View Post
I did not see you using the addr function. This is fortunate. Since you are reading the raw doubles straight out of gwnum memory (looping from 0 to gwnum_datasize/8-1). The addr function is used when looping from 0 to fftlen-1). The gwnum array contains a number of unused cache lines to avoid power-of-2 strides which is why gwnum_datasize is not 768K * 8 bytes.
I use it to get the start address of the full buffer including the cache strides like this:
Code:
gwnum s = gwalloc(gwdata);
double *startAddress = addr(gwdata, s, 0);
unsigned long dataSize = gwnum_datasize(gwdata);
I need that double *startAddress to be pointing to the start of the full buffer including unused zones so that it has exactly "gwnum_datasize(gwdata)" bytes allocated.
So I need a confirmation that this code is right for the job.

Quote:
Originally Posted by Prime95 View Post
To make sure you are using AVX FFTs you need to call guessCpuType and then turn off
the CPU_FMA3 flag in CPU_FLAGS.
Is this really necessary since I am linking with version 27.9 of the gwnum library and it seems to produce code without FMA3. The "gwnum.c" does not have any reference to FMA, only to AVX.

Quote:
Originally Posted by Prime95 View Post
I don't know what to make of your program working standalone, but not after pausing in the debugger. Could well be a completely hardware different problem or a software problem with MSVC.
The small program is a non-important in-place torture test using 768K FFT trying to replicate the most time consuming routine from Prime95 27.9, the FFT. But it always operates on the same input data all over again because the seed is kept constant equal to 7. I left it running for 10 hours and no errors appeared by themselves on Skylake. As far as I have used VS 2015 for work purposes it was pretty solid at pause/continue the debugger.

First I am trying to exclude that the *startAddress and dataSize variables do not touch any memory beyond their allocated space and they contain the entire FFT mem-work-space. Otherwise they will interfere with unallocated memory possibly used by VS. So I need some help with this.
Thank you!

Last fiddled with by megabit8 on 2016-01-02 at 12:36
megabit8 is offline   Reply With Quote
Old 2016-01-02, 14:19   #258
megabit8
 
Dec 2015

1816 Posts
Lightbulb

If "The addr function is used when looping from 0 to fftlen-1" then
Code:
         unsigned long dataSize = (unsigned long)addr(gwdata, s,  asm_data->FFTLEN - 1) + sizeof(double) - (unsigned long)startAddress;
gives the same result as
Code:
          
        unsigned long dataSize = gwnum_datasize(gwdata);
(I have verified) which means that the iteration is done through allocated data only and contains the first and the last element.

I have also called this function once at the beginning:
Code:
    _declspec(dllexport) void __cdecl InitializeCPU()
    {
        guessCpuType();
        guessCpuSpeed();
        CPU_FLAGS &= ~CPU_FMA;
    }
But the result is the same. It seems that the errors appear immediately on Debug or Release, Pause and Continue, only if the option "Enable native code debugging" is on.

I am wondering what code VS 2015 executes there so that the FFT memory space gets altered.

Last fiddled with by megabit8 on 2016-01-02 at 14:29 Reason: Add more info
megabit8 is offline   Reply With Quote
Old 2016-01-02, 14:33   #259
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×1,579 Posts
Default

Quote:
Originally Posted by tha View Post
I have been running 27.9 now for five hours on the same worktodo test case. So far, no error showed up.
I assume you have hyperthreading on? In the stress test case you need 8 workers running on all the 8 virtual threads. I noticed your worktodo had only 4 workers you should try with 8 workers, and "WorkerThreads=8" + "ThreadsPerTest=1" in local.txt

But if you really want to know if your system has the error or not you should first replicate the exact circumstances, see for example post #184 for the summary.

Last fiddled with by ATH on 2016-01-02 at 14:34
ATH is offline   Reply With Quote
Old 2016-01-02, 14:52   #260
megabit8
 
Dec 2015

110002 Posts
Exclamation

I've got some bad news. The same errors happen on another more dummy, slower but AVX optimized FFT of 768K real points. With the Debug Break and continue only on Skylake. It seems to be VS 2015 and Skylake related since _mm256_load_pd loads only half of the register when stepping over. Can you please tell me which compiler do you use for producing the Prime95 exe. Is it VS 2005 since the .SLN file is 2005 ?
Thank you!
megabit8 is offline   Reply With Quote
Old 2016-01-02, 15:00   #261
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Quote:
Originally Posted by megabit8 View Post
I've got some bad news. The same errors happen on another more dummy, slower but AVX optimized FFT of 768K real points. With the Debug Break and continue only on Skylake. It seems to be VS 2015 and Skylake related since _mm256_load_pd loads only half of the register when stepping over. Can you please tell me which compiler do you use for producing the Prime95 exe. Is it VS 2005 since the .SLN file is 2005 ?
Yes, I build with MSVC 2005.
Prime95 is offline   Reply With Quote
Old 2016-01-02, 15:26   #262
tha
 
tha's Avatar
 
Dec 2002

5×163 Posts
Default

Quote:
Originally Posted by ATH View Post
I assume you have hyperthreading on?
Yes, I have hyperthreading on and both versions of mprime reported that the cores 1 & 5, 2 & 6, 3 & 7, and 4 & 8 were working on the respective exponents.

I will finish the current test, which will be another two hours. I will then start a new test with 8 threads working concurrently on the following worktodo.txt test case:

Code:
[Worker #1]
Test=N/A,14942209,67,1

[Worker #2]
Test=N/A,14942267,67,1

[Worker #3]
Test=N/A,14942293,67,1

[Worker #4]
Test=N/A,14942437,67,1

[Worker #5]
Test=N/A,14942497,67,1

[Worker #6]
Test=N/A,14942539,67,1

[Worker #7]
Test=N/A,14942563,67,1

[Worker #8]
Test=N/A,14942567,67,1
If it fails I will start it again and see if it takes an equal amount of time to failure.

The current test and other tests done so far will prove the stability of the system so far.

I have a GTX 580 and a 590 for this system, but I am not turning them on yet as I don't want to interfere with these tests.

Last fiddled with by tha on 2016-01-02 at 15:33
tha is offline   Reply With Quote
Old 2016-01-02, 15:31   #263
tha
 
tha's Avatar
 
Dec 2002

5×163 Posts
Default

@ George:

If I remember correctly each time a LL test is started a random offset is chosen to eliminate potential errors in the design of the processor. Can we somehow force mprime (or prime95) to use no offset or at least a reproducible offset? I would like to try to create a case where we can tell beforehand exactly at which point the processor will be thrown off course.

Also I assume you have access to a Skylake system yourself?
tha is offline   Reply With Quote
Old 2016-01-02, 15:46   #264
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2·1,579 Posts
Default

Quote:
Originally Posted by tha View Post
Yes, I have hyperthreading on and both versions of mprime reported that the cores 1 & 5, 2 & 6, 3 & 7, and 4 & 8 were working on the respective exponents
But have you tried the actually torture test yet? So far it is the only reported source of the bug, just to test if your system even has the error. It seems premature to test new ways to generate it before you know if the only known way works.

Last fiddled with by ATH on 2016-01-02 at 15:47
ATH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Skylake vs Kabylake ET_ Hardware 17 2017-05-24 16:19
Skylake and RAM scaling mackerel Hardware 34 2016-03-03 19:14
So does skylake-nonXeon actually get us anything? fivemack Hardware 36 2015-09-08 01:42
Skylake processor tha Hardware 7 2015-03-05 23:49
Skylake AVX-512 clarke Software 15 2015-03-04 21:48

All times are UTC. The time now is 04:33.


Fri Aug 6 04:33:44 UTC 2021 up 13 days, 23:02, 1 user, load averages: 2.30, 3.26, 4.38

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.