![]() |
You should try Chris_Halsall ...
EDIT: Mark was faster ^^ |
[QUOTE=Mark Rose;419535]It may disallow spaces.[/QUOTE]
Thanks. That was it (or maybe a human got involved). Spaces not allowed in a display name? That's just stupid. |
[QUOTE=Aurum;419536]You should try Chris_Halsall ...[/QUOTE]
Yeah. That's what I ended up doing, and it worked. BTW, having your avatar changed by the "Gods" is a badge of honor! :smile: |
MCU = 506E3 2D
CSME = 11.0.0.1166 Graphics Driver = Nvidia: 359.00 (same happening with iGPU) BIOS version = 1.3, 1.7 & 1.9 Motherboard vendor/model = Asrock Z170 OC Formula |
[QUOTE=chalsall;419539]Yeah. That's what I ended up doing, and it worked.
BTW, having your avatar changed by the "Gods" is a badge of honor! :smile:[/QUOTE] Probably the majority of the avatars here from active members have been assigned by the forum admin. The next level of recognition is a bit of text under your name. |
The conversation continues...
[url]https://communities.intel.com/message/361811#361811[/url]
Mike.C (from Intel) questions Henk_NL about his (serious) overclocking. EricHefner then responds to Mike.C saying that clock speed has no function in the errors (and provides a screenshot). I am hopeful that those serious at Intel will get back to work after the holidays are over.... |
[QUOTE=chalsall;420329][URL]https://communities.intel.com/message/361811#361811[/URL]
Mike.C (from Intel) questions Henk_NL about his (serious) overclocking. EricHefner then responds to Mike.C saying that clock speed has no function in the errors (and provides a screenshot). I am hopeful that those serious at Intel will get back to work after the holidays are over....[/QUOTE] Mike.C (from Intel) was already drunk from the Christmas champagne. He repeatedly confused Mega with Giga. :w00t: |
AVX instructions seems to load only 128 bits of data instead of 256
Can someone make a simple test on Skylake ?
Like this in C++ float temp[8] = { 1, 2, 3, 4, 5, 6, 7, 8 }; __m256 loadedData = _mm256_loadu_ps(&temp[0]); //On Skylake it only loads only the first 4 floats //On Ivy Bridge it loads the full 8 floats OR: double temp[8] = { 1, 2, 3, 4 }; __m256d loadedData = _mm256_loadu_pd(&temp[0]); //On Skylake it only loads only the first 2 doubles //On Ivy Bridge it loads the full 4 doubles Or am I making a mistake ? |
It was a false alarm related to Visual Studio 2015 vs. 2012. The bug is specific to VS 2015 so nevermind the previous post. It works correctly on Skylake with VS 2012.
|
It is a rounding problem in Skylake AVX after all.... :)
1 Attachment(s)
I managed to isolate this piece of code:
[INDENT][LEFT][B]#include <stdio.h> #include <math.h> const double M_PI = 3.1415926535897932384626433832795; double SkylakeAVXCosine() { double arg = (2 * M_PI) * 33 / 256.0; double cosValue = cos(arg); return cosValue; } int main() { double cosValue = SkylakeAVXCosine(); unsigned __int64 bytes = *(unsigned __int64*)&cosValue; printf("cos((2 * M_PI) * 33 / 256.0) == %llx \r\nPress enter to exit...", bytes); getchar(); }[/B] [/LEFT] [/INDENT]The above code should be compiled with Visual Studio 2015, use the platform toolset: "Visual Studio 2015 (v140)" and in Code Generation set: "Advanced Vector Extensions (/arch:AVX)" so that the cos function will be computed with AVX. On Skylake processor the result is: cos((2 * M_PI) * 33 / 256.0) == 3fe610b7551d2cdf On Ivy Bridge the result is: cos((2 * M_PI) * 33 / 256.0) == 3fe610b7551d2cde (Notice the last/lowest bit difference, on Skylake it is 1 on Ivy it is 0). After all, this small program shows different result on Skylake vs other processors. I think this is related to the rounding problem posted here. The exe is attached. |
[QUOTE=megabit8;420545]After all, this small program shows different result on Skylake vs other processors. I think this is related to the rounding problem posted here.[/QUOTE]
Maybe, maybe not. Worth investigating further, although I question if this could cause the crashing reported; more likely it would simply cause an error in the end calculation. [QUOTE=megabit8;420545]The exe is attached.[/QUOTE] Just so you know, most here are very hesitant to run precompiled code from "unknown" sources. Further, this bug has been observed under Linux, and your above code doesn't compile under GCC under Linux. I'm not accusing you of posting "malware" (the exec is too small unless you're /really/ good :wink:) but producing source code which can compile under both Linux and Windows would be useful. Happy New Year! :smile: |
It shows different result on Skylake than Ivy for sure
2 Attachment(s)
First of all Happy New Year! :smile:
Then 11KB for an exe is too big for malware. One needs only 1KB of infected code to download an 100MB EXE from a short link over the internet and run on the computer in background all the viruses, trojans and malware in the world. Now to be serious, that exe is the result of compiling the attached code in Visual Studio 2015. BitDefender was running. I also tested with "Platform tools from Visual Studio 2013" and the bug is reproducible. With "Platform Tools from Visual Studio 2012" the result on Skylake is the same as on Ivy. This makes me think that Visual Studio 2013 libraries or newer detects the new AVX2 and FMA instructions and uses them to compute the cosine. But it fails to be backwards compatible. I do not know if this issue affects the Prime 95 calculations. My Skylake processor does not freeze after updating the BIOS. I had lot of Windows freezes using the older BIOS. It just shows a rounding error in the Prime95 27.9 version 768 KB inplace-FFT (see attached). [Current Bios is 1402, MB: ASUS Z170-A] I was wondering if Prime 95 uses some common libraries to compute the sine/cosine which might yield different results on Skylake just as in that test. Then the errors would propagate. I do not have any experience on Linux GCC so I cannot give you code that executes on Linux. It might not be even reproducible on Linux since this issue might be related to how Microsoft Visual Studio 2015 computes the sine/cosine functions using newer instructions. It could be a fail of Skylake as well if Microsoft does the calculations correctly. I forgot to mention, that exe is compiled for 64 bit and you might need the Visual Studio C++ redistributables x64 bit version installed from: [url]https://www.microsoft.com/en-us/download/details.aspx?id=48145[/url] to run it correctly. All the best in the new year! |
[QUOTE=megabit8;420747]Then 11KB for an exe is too big for malware. One needs only 1KB of infected code to download an 100MB EXE from a short link over the internet and run on the computer in background all the viruses, trojans and malware in the world.[/QUOTE]
A cursory disassembly of your EXE didn't show anything particularly interesting. That's a good thing. But I give you your point. [QUOTE=megabit8;420747]Now to be serious, that exe is the result of compiling the attached code in Visual Studio 2015. BitDefender was running. I also tested with "Platform tools from Visual Studio 2013" and the bug is reproducible. With "Platform Tools from Visual Studio 2012" the result on Skylake is the same as on Ivy.[/QUOTE] The bug you are reporting might have absolutely nothing to do with bug which opened this thread. Can you please explain, based on your experience, why only 768k FFTs failed? |
It is a deep question, if it fails only for 768KB FFT. It could be that the Prime 95 code is very sensitive to rounding errors for this size and possibly others with similar properties.
My personal opinion is that there's an error from rounding in the twiddle factors computation using sine/cosine. And the code is designed to use even the last bit precisely and that bit is different. This is the engineer's approach. I tested Prime95 v27.9 for 25 hours with hyper-threading off and no error appeared whereas with hyper-threading on an error appears after at most 2 hours. So this makes me think that there's a deeper issue with the memory controller and instructions scheduling when hyper-threading is on. This is the opinion after reading the entire thread. I use no overclock and I keep turbo disabled from 1 core 4.2GHz to 4.0GHz top speed for all cores. So no overclock at all, even a bit of down clock. I got to that magic constant argument for cosine by using a self made 768KB AVX optimized FFT transform which showed differences in the end result from Skylake to Ivy. After tracing the problem I observed that the precomputed twiddle factors differed. The first constant that differed was the 33/256 complex root of unity. Then I fixed the sin/cos functions to be standard with the VS 2012 non-optimized AVX functions and left the computers running to compute 100 million transformations each of size 768KB. It took 1.5 days. Skylake finished when Ivy was at 80%. I compared the SHA1 of the result of each FFT and it was identical between Skylake and Ivy. This means that the AVX operations I have used to compute the FFT: _mm256_mul_pd _mm256_add_pd _mm256_sub_pd perform identical on Skylake as on Ivy in my case. Hyperthreading was on on both computers and all 8 threads were used. This is why either: 1. it is a rounding error that propagates caused by the Intel 6'th gen processor in conjunction with a bad library that computes the trigonometric functions. 2. it is far more complex and involves hyperthreading, caches, bug in the processor circuitry so that for example when running some complex AVX optimized code in Prime 95 v27.9 on all 8 threads, it alters a bit the end result. I had contact with Intel i7 first gen, second gen Sandy Bridge, 3'rd gen Ivy which are all rock solid in Windows. Now on Skylake, FireFox crashes most often and it feels that the system is not that solid even though it is faster. This is the general impression. |
[QUOTE=megabit8;420545]I managed to isolate this piece of code:[INDENT][LEFT][B]#include <stdio.h>
#include <math.h> const double M_PI = 3.1415926535897932384626433832795; double SkylakeAVXCosine() { double arg = (2 * M_PI) * 33 / 256.0; double cosValue = cos(arg); return cosValue; } int main() { double cosValue = SkylakeAVXCosine(); unsigned __int64 bytes = *(unsigned __int64*)&cosValue; printf("cos((2 * M_PI) * 33 / 256.0) == %llx \r\nPress enter to exit...", bytes); getchar(); }[/B] [/LEFT] [/INDENT]The above code should be compiled with Visual Studio 2015, use the platform toolset: "Visual Studio 2015 (v140)" and in Code Generation set: "Advanced Vector Extensions (/arch:AVX)" so that the cos function will be computed with AVX. On Skylake processor the result is: cos((2 * M_PI) * 33 / 256.0) == 3fe610b7551d2cdf On Ivy Bridge the result is: cos((2 * M_PI) * 33 / 256.0) == 3fe610b7551d2cde (Notice the last/lowest bit difference, on Skylake it is 1 on Ivy it is 0). After all, this small program shows different result on Skylake vs other processors. I think this is related to the rounding problem posted here. The exe is attached.[/QUOTE] I replaced M_PI with pi.. on my (windows)/Haswell system, I'm getting [code] cos((2 * pi) * 33 / 256.0) == 3fe610b7551d2cdf Press enter to exit... [/code] |
This is a good confirmation
[QUOTE=kracker;420768]I replaced M_PI with pi.. on my (windows)/Haswell system, I'm getting
[code] cos((2 * pi) * 33 / 256.0) == 3fe610b7551d2cdf Press enter to exit... [/code][/QUOTE] Thank you for your post. It is the same result as on Skylake. On Ivy the result is different. A debug on the cos function evaluation shows that for Ivy it branches to a code with pure AVX and on Skylake it branches to some asm code with "vfmadd213sd" and "vfnmadd231sd" instructions that are FMA. It happens in the same way on Haswell based on your confirmation. Theoretically the end result should be the same no matter the processor used, otherwise this kind of errors are spread all over, basically in each program that simply evaluates a cos function. |
Technical questions about the workings of Prime 95
To further narrow the cause of the error and maybe exclude this rounding error, because Prime 95 v27.9 works correctly on Haswell. Does someone know the error tolerance of the multiplication done through FFT ?
Basically the FFT is used to multiply 2 big numbers A and B like this: result = IFFT ( FFT(A, 0) <point.by.point.multiplication> FFT(B, 0) ) If A and B have n coefficients, the middle coefficients of the product is the sum of n intermediary coefficient products and the FFT and IFFT are performed on size n * 2. So the error tolerance should account for this middle "n" term, should also account for the log(n) bits lost in the precision of computing the FFT (maybe twice, for the IFFT as well). Then it should have added a fixed tolerance. Does someone know what is this fixed tolerance ? If big enough, the multiplication will not be affected by a possible precision loss in the calculations. |
Prime95 does not use any trigonometric functions (well it does, but via a library that produces 128-bit floating point values).
Even if it did, a change in the least significant bit would not affect prime95's results. |
Questions?
Are all the trig functions precomputed and stored in memory or do some of them get to be computed during runtime ? At least for the 768KB FFT.... And 768KB FFT is more exactly a 49152 Complex point FFT, right ?
|
[QUOTE=megabit8;420792]Are all the trig functions precomputed and stored in memory or do some of them get to be computed during runtime ? At least for the 768KB FFT.... And 768KB FFT is more exactly a 49152 Complex point FFT, right ?[/QUOTE]
All are precomputed. A 768K is a 786432 point real FFT. |
It starts to make sense...
Now I see,
And a point real occupies 8 bytes or 16 bytes ? In other words is the in-place memory 6MB or 12 MB ? Is more memory used intensely during the In-Place test ? Like a temp buffer of equal size or lower to copy back and forth the transformation ? A good test to exclude this error would be to export all the precomputed 128bit coefficients into a binary file from a Skylake processor and from say an Ivy Bridge processor or another without FMA. I can do this test if someone points me to the point where the precomputed buffer is filled. If the exports match then it is really a complex Intel Architecture problem. |
[QUOTE=megabit8;420778]Thank you for your post. It is the same result as on Skylake. On Ivy the result is different.[/QUOTE]
Too ignorant to even understand the empirical... To those watching this thread, we're (mostly) smarter than bricks here. Particularly those who have a few posts (and a bit of software) under their belts.... |
[QUOTE=megabit8;420794]Now I see,
And a point real occupies 8 bytes or 16 bytes ? In other words is the in-place memory 6MB or 12 MB ? Is more memory used intensely during the In-Place test ? Like a temp buffer of equal size or lower to copy back and forth the transformation ? A good test to exclude this error would be to export all the precomputed 128bit coefficients into a binary file from a Skylake processor and from say an Ivy Bridge processor or another without FMA. I can do this test if someone points me to the point where the precomputed buffer is filled. If the exports match then it is really a complex Intel Architecture problem.[/QUOTE] It occupies 6MB and operates in place with an additional 1.2MB of precomputed constants. All signs point to a complex Intel Architecture problem. If it wasn't, then the torture test would fail in the same place every time with the same error message. |
Thank you for your compliment chalsall! :tu:
The empirical is logical... the cos thing evaluates differently based on FMA support which Haswell has. I was not trying to be ignorant, I was trying to help rationally in getting this issue sorted out. Anyways ... Those questions in my last post seem unimportant but are very important for a tech person. Even the coefficients comparison is important, I do not trust that something that operates on different data produces the same results - because of propagation. And I know a bit about how software is developed. Theoretically with 2 doubles you have 53x2 = 106 bits of precision, but you loose 3 * log2(768*1024) = 58.8 bits due to transformations and middle coefficient + 2 bits for tolerance = 61 bits lost. You are left with 45 bits of precision for each real number. And this has to hold a square of something. So the numbers assigned to this double real vector should be less that 2^22 * sqrt(2). And another thing, 1 bit is lost for this happy simple case cos(PI*33/128), but there can be other values in which the rounding error is bigger. This I have not tested yet. But these are all good reasons to compare the coefficients. In the end I hope that Skylake is fine and that me and other thousands of people did not buy a processor that produces junk from time to time and can make the system freeze and applications crash. This is true for any processor/ram/electronic device until proven it works correctly. I am trying to do a step by step approach into sorting this issue out, otherwise with an error thrown out each hour, there could be other factors which influence a calculation. We have the code and the error each hour. What's left is to make it happen in 1 second or less. I ask you how do you do it ? Because otherwise it is too hard to test, imagine that if Intel uses a program to record all the calculations performed in an hour for comparison, that's hundreds of TB and very slow even with adequate hardware ... |
I have build my own Skylake system in the days between Christmas and New Years day. The build is made of the following parts:
motherboard: Asus Z170 deluxe Processor: Intel 6700K (Skylake) RAM memory: Corsair Vengeance DDR4 4*4 Gb 3200 MHz After installing Ubuntu 15.10 I installed mprime version 28.7 The test worktodo.txt file is: [CODE] [Worker #1] Test=N/A,14942209,67,1 [Worker #2] Test=N/A,14942267,67,1 [Worker #3] Test=N/A,14942293,67,1 [Worker #4] Test=N/A,14942437,67,1 [/CODE] I first let the program run this with the following two lines added to local.txt: [CODE] CpuSupportsFMA3=0 CpuNumHyperthreads=1 [/CODE] So this switches hyperthreading off and forces mprime to make use of the older AVX implementation of the 768K FFT. As expected, the processor finished this in about 12 hours: [CODE] M14942209 is not prime. Res64: 8587C9937E3BED22. We8: CA7381D0,2354169,00000000 M14942267 is not prime. Res64: C35562BC4F3511F3. We8: D9111948,9356811,00000000 M14942293 is not prime. Res64: 035EFC95F88CFC27. We8: 361EF8AE,3597260,00000000 M14942437 is not prime. Res64: 683A0DFFC5827CD8. We8: E69D5DB7,323098,00000000 [/CODE] I then deleted both lines from local.txt allowing two threads to run on each assignment, but also to use the new instructionset. As also expected, the results were the same as in the first run, but obtained slightly faster. I then added the line CpuSupportsFMA3=0 again to local.txt to force mprime to use the older FFT implementation and allow hyperthreading on all 8 logical CPU's. So far it has run for three hours doing nearly 25% of the work and without anything noticeable happening, I am writing this message on that machine. Any thoughts? |
[QUOTE=tha;420846]I have build my own Skylake system in the days between Christmas and New Years day. The build is made of the following parts:[/QUOTE]
Sweet! Lucky you! :smile: [QUOTE=tha;420846]Any thoughts?[/QUOTE] Try running one exponent which uses the 768K FFT across all the cores (both physical and virtual). Initially don't set affinity. Also, post the information requested of you in post #171 of this thread. Note also that it has been shown that _some_ (possibly most) Skylake systems work fine. This whole exercise is to try to figure out if there is a correlation of many variables as to why. |
[QUOTE=tha;420846]
Any thoughts?[/QUOTE] Have you tried the known failure case? Namely, run version 27.9 torture test on 768K FFT for 8 threads. Data indicates 25% of Skylakes do not exhibit the problem, you may have gotten lucky. |
Version 28.7 will take a lot longer till a worker stops. Even if you use 27.9 it can take hours. If you think that the system is stable @27.9 try to restart the computer a few times. The risk of failure will increase (although I don't really understand why).
[QUOTE] [Worker #1] Test=N/A,14942209,67,1 [Worker #2] Test=N/A,14942267,67,1 [Worker #3] Test=N/A,14942293,67,1 [Worker #4] Test=N/A,14942437,67,1[/QUOTE] I was not able to reproduce the error in a reasonable amount of time by using a worktodo.txt file similar to this one. [QUOTE](possibly most) Skylake systems work fine[/QUOTE] Ralle has tested a lot of CPUs and all have the same problem. Even with the new SGX (Software Guard Extensions) Version of the chip worker will stop. |
[QUOTE=Aurum;420861]I was not able to reproduce the error in a reasonable amount of time by using a worktodo.txt file similar to this one.[/QUOTE]
Could you, then, please provide a worktodo.txt file which /did/ exhibit the error? Specific prime.txt and local.txt files would be useful as well. I know this has been posted above, but it's been rather interleaved. Perhaps a definite test domain would be useful... [QUOTE=Aurum;420861]Ralle has tested a lot of CPUs and all have the same problem. Even with the new SGX (Software Guard Extensions) Version of the chip worker will stop.[/QUOTE] One thing I found interesting is that an Intel representative said they were able to reproduce the bug by /downgrading/ the CPU's microcode. This might (or might not) be the key variable with regards to this issue. |
As I recall, there was no worktodo.txt that could recreate the issue, only the 768K stress test.
|
[QUOTE=Prime95;420856]Have you tried the known failure case? Namely, run version 27.9 torture test on 768K FFT for 8 threads.[/QUOTE]
I will try to complete this test first which will be just under 6 hours to go. I just downloaded 27.9 from the mersenne.ca site and will run that test tomorrow morning. |
[QUOTE=tha;420867]I will try to complete this test first which will be just under 6 hours to go. I just downloaded 27.9 from the mersenne.ca site and will run that test tomorrow morning.[/QUOTE]
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" -- Issac Asimov |
[QUOTE=Dubslow;420865]As I recall, there was no worktodo.txt that could recreate the issue, only the 768K stress test.[/QUOTE]
That's correct. [QUOTE]One thing I found interesting is that an Intel representative said they were able to reproduce the bug by /downgrading/ the CPU's microcode.[/QUOTE] I read an article a few days ago about the CPU architecture and the microcode. The author basically said that the microcode includes a lot of workarounds for hardware errata. It would take to much time to fix the CPU design itself so the workarounds will stay i the microcode forever. |
[QUOTE=Aurum;420869]I read an article a few days ago about the CPU architecture and the microcode. The author basically said that the microcode includes a lot of workarounds for hardware errata. It would take to much time to fix the CPU design itself so the workarounds will stay i the microcode forever.[/QUOTE]
Care to reference that article? It would help to build "the case". |
[QUOTE=Dubslow;420865]As I recall, there was no worktodo.txt that could recreate the issue, only the 768K stress test.[/QUOTE]
In theory it should be recreatable (not a word, I know) with a worktodo that does a 768K FFT test, but the local.txt would also need settings to ensure it's running a solo worker on all physical and HT cores, just like the torture test would. The torture test is using a random exponent whereas the worktodo would be using a specific one (and even better, it could use one with a known final residue to ensure nothing else happened along the way even if no roundoff errors were caught). |
[QUOTE=chalsall;420871]Care to reference that article?
It would help to build "the case".[/QUOTE] I can't find it anymore. |
[QUOTE=Aurum;420875]I can't find it anymore.[/QUOTE]
Your dog ate your homework? |
It's hard to remember all websites/sources.
|
[QUOTE=Aurum;420878]It's hard to remember all websites/sources.[/QUOTE]
Have you heard about Google? It's a little start-up which might help you find things you think you've read.... |
I even searched my bookmarks + history ... I don't even remember if it was a german or english website. This is by far not the only thing I'm working on.
|
[QUOTE=Aurum;420881]I even searched my bookmarks + history ... I don't even remember if it was a german or english website. This is by far not the only thing I'm working on.[/QUOTE]
OK... Understood. But please understand that making a claim, and then not being able to support it, doesn't go down well around here. |
The article also refereed to the sandy brige sata bug and said that a minor design change takes 8 weeks. That's why the workarounds in the microcode are not fixed in the hardware itself. Maybe someone else knows the source I'm talking about.
|
[QUOTE=Aurum;420878]It's hard to remember all websites/sources.[/QUOTE]
<never mind posted without thought again.> is it any of these ? [url]https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=%2B%22hardware+errata%22+%2B+%22sata+bug%22[/url] |
[QUOTE=Aurum;420886]The article also refereed to the sandy brige sata bug and said that a minor design change takes 8 weeks. That's why the workarounds in the microcode are not fixed in the hardware itself. Maybe someone else knows the source I'm talking about.[/QUOTE]
Anyone? Anyone at all... Please forgive me for this, but we often have people entering our space who try to distract rather than converge. It is important to have one's "signal to noise ratio" filter set to stun.... |
I found it in my history: [url]http://www.computerbase.de/2015-12/amd-zum-32c3-einblicke-in-die-komplexitaet-eines-x86-prozessordesigns/[/url]
|
[QUOTE=Aurum;420898]I found it in my history: [URL]http://www.computerbase.de/2015-12/amd-zum-32c3-einblicke-in-die-komplexitaet-eines-x86-prozessordesigns/[/URL][/QUOTE]
:tu: Warning! Google Translate link! [URL]https://translate.google.com/translate?sl=de&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fwww.computerbase.de%2F2015-12%2Famd-zum-32c3-einblicke-in-die-komplexitaet-eines-x86-prozessordesigns%2F&edit-text=[/URL] |
[QUOTE=Aurum;420898]I found it in my history: [url]http://www.computerbase.de/2015-12/amd-zum-32c3-einblicke-in-die-komplexitaet-eines-x86-prozessordesigns/[/url][/QUOTE]
OK... But that article is talking about AMD CPUs. We are talking about Intel CPUs here. Please do try to keep up.... (Just to be clear, this is intentionally confrontational. Deal with it.) |
I managed to make the problem reproduce more quickly
I had an idea to use the Prime 95 v27.9 code to perform a 768K FFT per thread. The input of the FFT is always some fixed random data. So the hash result of the transformed fixed data should be always the same. And it is, most of the time on Skylake. Except that when the project is run from Visual Studio 2015 in Debug mode and the "Pause button is hit", then step through some lines of code then continue. Errors appear immediately. It is spectacular that on Ivy the Pause and step operations work fine and no error appears.
I have isolated this code below and exported it from a c++ dll. I need some expert advice from the developers that the following code performs a FFT of size 768K, exactly as in the v27.9 torture test:[INDENT][B]#define norm_routines 10[/B] [/INDENT][INDENT][B]#define gw_fft(h,a) (*(h)->GWPROCPTRS[0])(a) [/B] [/INDENT][INDENT] [B] _declspec(dllexport) void* __cdecl AllocPrime95Handle()[/B] [B] {[/B] [B] gwhandle *gwdata = new gwhandle();[/B] [B] unsigned long fftlen = 768 * 1024;[/B] [B] unsigned long p = 14942209; //does not matter, used just to initialize. (see the Prime95FFT function).[/B] [B] gwinit(gwdata);[/B] [B] gwset_specific_fftlen(gwdata, fftlen);[/B] [B] gwsetup(gwdata, 1.0, 2, p, -1);[/B] [B] return gwdata;[/B] [B] }[/B] [B] _declspec(dllexport) void __cdecl Prime95FFT(void *handle, __int64 *fastHashOutput) //fastHashOutput has 32 bytes[/B] [B] {[/B] [B] gwhandle *gwdata = (gwhandle*)handle;[/B] [B] int seed = 7; //Use the same calculations.[/B] [B] gwnum s = gwalloc(gwdata);[/B] [B] struct gwasm_data *asm_data = (struct gwasm_data *) gwdata->asm_data;[/B] [B] asm_data->NORMRTN = gwdata->GWPROCPTRS[norm_routines + gwdata->NORMNUM];[/B] [B] asm_data->DESTARG = s;[/B] [B] asm_data->DIST_TO_FFTSRCARG = 0;[/B] [B] asm_data->DIST_TO_MULSRCARG = 0;[/B] [B] asm_data->ffttype = 2; //type 2 = square.[/B] [B] double *startAddress = addr(gwdata, s, 0);[/B] [B] unsigned long dataSize = gwnum_datasize(gwdata);[/B] [B] int n = dataSize / sizeof(double); //n = 808952[/B] [B] __int64 v = (__int64)seed;[/B] [B] for (int i = n; --i >= 0; )[/B] [B] {[/B] [B] v = v * 0x2345987094395 + 1;[/B] [B] startAddress[i] = (double)v; //this is the fixed-random data that is written all the time.[/B] [B] }[/B] [B] gw_fft(gwdata, asm_data);[/B] [B] //sha1::calc(startAddress, dataSize, (unsigned char*)fastHashOutput);[/B] [B] __int64 *startAddressInt64 = (__int64*)startAddress;[/B] [B] __int64 hashResults[4] = { 10000, 20000, 30000, 40000 };[/B] [B] int shifts[4] = { 1, 2, 4, 8 }; //for primes: 3, 5, 17 and 256.[/B] [B] for (int i = n; --i >= 0; )[/B] [B] {[/B] [B] hashResults[i & 3] += (hashResults[i & 3] << shifts[i & 3]) + startAddressInt64[i];[/B] [B] }[/B] [B] memcpy(fastHashOutput, hashResults, sizeof(hashResults));[/B] [B] gwfree(gwdata, s);[/B] [B] }[/B] [/INDENT]I also need to know if I use the functions: [B]addr [/B]and[B] gwnum_datasize [/B]correctly so that this code does not touch memory zones outside the transform. I have linked gwnum64.lib (non-debug library) to this project and included all the *.h files from the prime95 v27.9 "gwnum" folder. Thank you! |
2 Attachment(s)
I have attached 2 pictures of this trivial application showing the iterations per thread before "Pause" is set in VS2015, and the "after1.png" file which shows that 2 errors appeared instantly as soon as Debug Pause is pressed then continue.
On my Skylake the error appears instantly. I am going to leave this application running overnight to see if any errors appear on their own. |
[QUOTE=megabit8;420915]I am going to leave this application running overnight to see if any errors appear on their own.[/QUOTE]
Dumber than bricks.... |
Feeling Lucky ??
1 Attachment(s)
Mr. Chris, are you characterizing yourself ? :smile:
Do you know what's the difference between genius and "brick stupid" people ? For those willing to make this test in Visual Studio 2015, I have attached the small project. You need to download the 2.79 source from: [url]http://www.mersenne.org/ftp_root/gimps/p95v279.source.zip[/url] and copy the p95v279.source.zip\gwnum\gwnum64.lib into the project "AVXCore" folder. You can skip the project upgrade. That's it. Then compile. The test itself should be almost as good as the torture test, even better because it is sensitive on every processor to every bit changed. |
[QUOTE=megabit8;420917]Mr. Chris, are you characterizing yourself ? :smile:[/QUOTE]
Possibly... But then, maybe not.... |
[QUOTE=chalsall;420919]Possibly...
But then, maybe not....[/QUOTE] Maybe so or maybe no. Ingenious people have limits and bounds whereas stupidity is unlimited... (Einstein) What's the only thing that's infinite ? |
[QUOTE=megabit8;420920]What's the only thing that's infinite ?[/QUOTE]
Thought? Let's be honest here... We might have found a bug on Intel's latest and greatest chip. Let's figure this out. It's not really that big of a deal. |
It will be interesting to see if the Pause aka. Break-All issue happens on Haswell
[B]Now someone with a Haswell processor could validate if the "Pause" issue happens on the 5'th gen Intel Processor. I was thinking of: "kracker[/B]" since he validated first the cosine rounding problem.
|
[QUOTE=chalsall;420921]Thought?
Let's be honest here... We might have found a bug on Intel's latest and greatest chip. Let's figure this out. It's not really that big of a deal.[/QUOTE] The only thing that's infinite is the human stupidity. (Einstein) |
At the very least, you should format your forum posts properly. Avoid unnecessary bold (and I would classify all such usage so far thusly) and use the [code] tags which were designed for code, rather than this bold nonsense.
And then: George? |
[QUOTE=megabit8;420912]I had an idea to use the Prime 95 v27.9 code to perform a 768K FFT per thread.
also need to know if I use the functions: [B]addr [/B]and[B] gwnum_datasize [/B]correctly so that this code does not touch memory zones outside the transform. I have linked gwnum64.lib (non-debug library) to this project and included all the *.h files from the prime95 v27.9 "gwnum" folder. [/QUOTE] Kudos to someone willing to delve deep into the source code and try to potentially learn something new about the problem at hand. I did not see you using the addr function. This is fortunate. Since you are reading the raw doubles straight out of gwnum memory (looping from 0 to gwnum_datasize/8-1). The addr function is used when looping from 0 to fftlen-1). The gwnum array contains a number of unused cache lines to avoid power-of-2 strides which is why gwnum_datasize is not 768K * 8 bytes. To make sure you are using AVX FFTs you need to call guessCpuType and then turn off the CPU_FMA3 flag in CPU_FLAGS. I don't know what to make of your program working standalone, but not after pausing in the debugger. Could well be a completely hardware different problem or a software problem with MSVC. |
[QUOTE=Prime95;420934]Kudos to someone willing to delve deep into the source code and try to potentially learn something new about the problem at hand.[/QUOTE]
I wanted to chime in and say ditto. It would be nice to have an isolated bit of code that could simply be run and get the error (or not). It would make it easier for folks to troubleshoot and walk through what's going on. I know that Intel would want to take possible problems seriously and would look into it, but getting easily replicated code would certainly let them focus on the issue itself. |
[QUOTE=chalsall;420871]Care to reference that article?
It would help to build "the case".[/QUOTE] The Intel x86 processors use microcode to correct design failures, this is in my belief common knowledge. Intel, I believe, publishes it online including details somewhere. I recall browsing through a list of such workarounds. It also makes good sense to do it in such a way. Of course the Intel processors of today carry an enormous amount of legacy with them. It is about the cost of recompiling all existing software in the world again and redistribute that over the costumers of all software publishers versus the costs of carrying the legacy code in the processor. And also there are some commercial reasons that came to light when Intel tried to introduce a new architecture, Itanium, and failed to line up the customers behind a deal that found not enough market support. It gave AMD a lot of leeway to attract new customers. |
I have been running 27.9 now for five hours on the same worktodo test case. So far, no error showed up.
Apart from freezing errors showing up more frequently after restarting the machine, did errors show up more frequently when the machine was used for other purposes, like browsing the internet or so, or did the errors show up as frequently when the machine was left alone to only churn on prime95/mprime? George, can you write a version of 27.9 that outputs some potentially interesting data, like memory addresses used each iteration or so, that we could run? Of course such a modification will be very time consuming for the processor. Even worse may possibly actually prevent the error from showing up, but If I can make my machine to exhibit the freezing I would be willing to run such a version. |
I recommend that you don't use your worktodo test case if you want to reproduce the error.
Is freezing equal to a worker stopped for you? There is another bug which will actually freeze the whole system: [url]http://www.tomshardware.co.uk/forum/id-2830772/skylake-build-randomly-freezing-crashing.html[/url] [QUOTE]did errors show up more frequently when the machine was used for other purposes, like browsing the internet or so, or did the errors show up as frequently when the machine was left alone to only churn on prime95/mprime?[/QUOTE] Doesn't make any difference. Although one of the top german overclockers der8auer ([url]http://hwbot.org/user/der8auer/[/url]) said that prime is not the fastest way to produce an error @1344k. I think he said that he uses prime + another stress testing software parallel to that. |
Feedback from developers required
[QUOTE=Prime95;420934]
I did not see you using the addr function. This is fortunate. Since you are reading the raw doubles straight out of gwnum memory (looping from 0 to gwnum_datasize/8-1). The addr function is used when looping from 0 to fftlen-1). The gwnum array contains a number of unused cache lines to avoid power-of-2 strides which is why gwnum_datasize is not 768K * 8 bytes. [/QUOTE] I use it to get the start address of the full buffer including the cache strides like this: [CODE] gwnum s = gwalloc(gwdata); double *startAddress = addr(gwdata, s, 0); unsigned long dataSize = gwnum_datasize(gwdata); [/CODE]I need that double *startAddress to be pointing to the start of the full buffer including unused zones so that it has exactly "gwnum_datasize(gwdata)" bytes allocated. So I need a confirmation that this code is right for the job. [QUOTE=Prime95;420934] To make sure you are using AVX FFTs you need to call guessCpuType and then turn off the CPU_FMA3 flag in CPU_FLAGS. [/QUOTE] Is this really necessary since I am linking with version 27.9 of the gwnum library and it seems to produce code without FMA3. The "gwnum.c" does not have any reference to FMA, only to AVX. [QUOTE=Prime95;420934] I don't know what to make of your program working standalone, but not after pausing in the debugger. Could well be a completely hardware different problem or a software problem with MSVC. [/QUOTE] The small program is a non-important in-place torture test using 768K FFT trying to replicate the most time consuming routine from Prime95 27.9, the FFT. But it always operates on the same input data all over again because the seed is kept constant equal to 7. I left it running for 10 hours and no errors appeared by themselves on Skylake. As far as I have used VS 2015 for work purposes it was pretty solid at pause/continue the debugger. First I am trying to exclude that the *startAddress and dataSize variables do not touch any memory beyond their allocated space and they contain the entire FFT mem-work-space. Otherwise they will interfere with unallocated memory possibly used by VS. So I need some help with this. Thank you! |
If "The addr function is used when looping from 0 to fftlen-1" then
[CODE] unsigned long dataSize = (unsigned long)addr(gwdata, s, asm_data->FFTLEN - 1) + sizeof(double) - (unsigned long)startAddress; [/CODE]gives the same result as [CODE] unsigned long dataSize = gwnum_datasize(gwdata); [/CODE](I have verified) which means that the iteration is done through allocated data only and contains the first and the last element. I have also called this function once at the beginning: [CODE] _declspec(dllexport) void __cdecl InitializeCPU() { guessCpuType(); guessCpuSpeed(); CPU_FLAGS &= ~CPU_FMA; }[/CODE]But the result is the same. It seems that the errors appear immediately on Debug or Release, Pause and Continue, only if the option "Enable native code debugging" is on. I am wondering what code VS 2015 executes there so that the FFT memory space gets altered. |
[QUOTE=tha;420953]I have been running 27.9 now for five hours on the same worktodo test case. So far, no error showed up.[/QUOTE]
I assume you have hyperthreading on? In the stress test case you need 8 workers running on all the 8 virtual threads. I noticed your worktodo had only 4 workers you should try with 8 workers, and "WorkerThreads=8" + "ThreadsPerTest=1" in local.txt But if you really want to know if your system has the error or not you should first replicate the exact circumstances, see for example [URL="http://www.mersenneforum.org/showpost.php?p=419502&postcount=184"]post #184[/URL] for the summary. |
I've got some bad news. The same errors happen on another more dummy, slower but AVX optimized FFT of 768K real points. With the Debug Break and continue only on Skylake. It seems to be VS 2015 and Skylake related since _mm256_load_pd loads only half of the register when stepping over. Can you please tell me which compiler do you use for producing the Prime95 exe. Is it VS 2005 since the .SLN file is 2005 ?
Thank you! |
[QUOTE=megabit8;420991]I've got some bad news. The same errors happen on another more dummy, slower but AVX optimized FFT of 768K real points. With the Debug Break and continue only on Skylake. It seems to be VS 2015 and Skylake related since _mm256_load_pd loads only half of the register when stepping over. Can you please tell me which compiler do you use for producing the Prime95 exe. Is it VS 2005 since the .SLN file is 2005 ?[/QUOTE]
Yes, I build with MSVC 2005. |
[QUOTE=ATH;420989]I assume you have hyperthreading on? [/QUOTE]
Yes, I have hyperthreading on and both versions of mprime reported that the cores 1 & 5, 2 & 6, 3 & 7, and 4 & 8 were working on the respective exponents. I will finish the current test, which will be another two hours. I will then start a new test with 8 threads working concurrently on the following worktodo.txt test case: [CODE] [Worker #1] Test=N/A,14942209,67,1 [Worker #2] Test=N/A,14942267,67,1 [Worker #3] Test=N/A,14942293,67,1 [Worker #4] Test=N/A,14942437,67,1 [Worker #5] Test=N/A,14942497,67,1 [Worker #6] Test=N/A,14942539,67,1 [Worker #7] Test=N/A,14942563,67,1 [Worker #8] Test=N/A,14942567,67,1 [/CODE] If it fails I will start it again and see if it takes an equal amount of time to failure. The current test and other tests done so far will prove the stability of the system so far. I have a GTX 580 and a 590 for this system, but I am not turning them on yet as I don't want to interfere with these tests. |
@ George:
If I remember correctly each time a LL test is started a random offset is chosen to eliminate potential errors in the design of the processor. Can we somehow force mprime (or prime95) to use no offset or at least a reproducible offset? I would like to try to create a case where we can tell beforehand exactly at which point the processor will be thrown off course. Also I assume you have access to a Skylake system yourself? |
[QUOTE=tha;420995]Yes, I have hyperthreading on and both versions of mprime reported that the cores 1 & 5, 2 & 6, 3 & 7, and 4 & 8 were working on the respective exponents[/QUOTE]
But have you tried the actually torture test yet? So far it is the only reported source of the bug, just to test if your system even has the error. It seems premature to test new ways to generate it before you know if the only known way works. |
[QUOTE=Prime95;420992]Yes, I build with MSVC 2005.[/QUOTE]
And what compiler do you use to produce the Linux executables for mprime? It has been reported that this bug manifests under Linux as well as Windows (on some machines). |
[QUOTE=chalsall;421010]And what compiler do you use to produce the Linux executables for mprime?[/QUOTE]
GCC but all the critical FFT code is assembled in Windows using Masm. |
[QUOTE=Prime95;421024]GCC but all the critical FFT code is assembled in Windows using Masm.[/QUOTE]
Interesting... So while we thought we had eliminated a variable (it's OS independent) this might actually come down to focusing on Microsoft's assembler's interaction with the Skylake architecture. Or, maybe not... Solving intermittent problems is fun! Not easy, mind you, but rewarding.... |
[QUOTE=tha;420995]I will finish the current test, which will be another two hours. I will then start a new test with 8 threads working concurrently on the following worktodo.txt test case:
[/QUOTE] You might be better off just stopping what you're running now and doing the torture test at 768K since that's known to cause the problem on affected CPUs. Then if you get the roundoff errors you'll know your CPU is "in the club" and you can try to reproduce using a "real" exponent. My guess is that to replicate what the torture test is doing, you'd want to have all 8 (physical and HT) cores in a single worker. Not 8 separate workers doing 8 separate tests. It could be something specific to the threading code and combining the separate chunks from each large multiplication. |
Trying to interprete the results....
If I am making a FFT of type 2 - similar to gwsquare, NORMNUM = 1, NRMRTN = yi3eCORE (whatever this is, there's no source for it) for the data:
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, .... 0, 0 [10 ones then 768*1024 - 10 zeroes] I get the following output: 1 2 4 6 8 10 12 14 16 18 18 16 14 12 10 8 6 4 2 0 0 ..... all zeroes till the end. I was expecting: 1 2 3 4 5 6 7 8 9 10 9 8 7 6 5 4 3 2 1 0 .... zeroes till the end. Or at least double of this array because this is the real square. Is the norm routine doing something to this small data ? [CODE] for (int i = 768 * 1024; --i >= 0; ) { *addr(gwdata, s, i) = i < 10 ? 1 : 0; } gw_fft(gwdata, asm_data); for (int i = 768 * 1024; --i >= 0; ) { output[i] = *addr(gwdata, s, i); } [/CODE]The only explanation I find is that gwsquare(p) actually computes: 2 * p^2 - 2 * p + 1 instead of p^2. Is it right ? |
[QUOTE=megabit8;421033]If I am making a FFT of type 2 - similar to gwsquare, NORMNUM = 1, NRMRTN = yi3eCORE (whatever this is, there's no source for it) for the data:
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, .... 0, 0 [10 ones then 768*1024 - 10 zeroes] I was expecting: 1 2 3 4 5 6 7 8 9 10 9 8 7 6 5 4 3 2 1 0 .... zeroes till the end. Or at least double of this array because this is the real square. Is the norm routine doing something to this small data ? [/QUOTE] Gwnum performs weighted transforms. The initial FFT data values are rarely integers. If you call set_fft_value it will apply the proper weighting factor. Similarly, get_fft_value will return the FFT value after removing the weighting factor. Source for yi3eCORE is in ymult3a.asm. You'll have to wade through a pile of nasty MASM macros to see the generated assembly code. yi3eCORE is the rounding-to-integer and carry propagation code. y=AVX, i=Irrational FFT, e=calc round off error, CORE=optimized for CORE architectures. |
[QUOTE=Madpoo;421030]You might be better off just stopping what you're running now and doing the torture test at 768K since that's known to cause the problem on affected CPUs.
Then if you get the roundoff errors you'll know your CPU is "in the club" and you can try to reproduce using a "real" exponent. My guess is that to replicate what the torture test is doing, you'd want to have all 8 (physical and HT) cores in a single worker. Not 8 separate workers doing 8 separate tests. It could be something specific to the threading code and combining the separate chunks from each large multiplication.[/QUOTE] Amen to part 1. As to part 2, a torture test is more like 8 workers running separate tests. |
[QUOTE=Prime95;421036]Gwnum performs weighted transforms. The initial FFT data values are rarely integers. If you call set_fft_value it will apply the proper weighting factor. Similarly, get_fft_value will return the FFT value after removing the weighting factor.
Source for yi3eCORE is in ymult3a.asm. You'll have to wade through a pile of nasty MASM macros to see the generated assembly code. yi3eCORE is the rounding-to-integer and carry propagation code. y=AVX, i=Irrational FFT, e=calc round off error, CORE=optimized for CORE architectures.[/QUOTE] Thank you for your prompt response. |
[QUOTE=Prime95;421036]Gwnum performs weighted transforms. The initial FFT data values are rarely integers. If you call set_fft_value it will apply the proper weighting factor. Similarly, get_fft_value will return the FFT value after removing the weighting factor.
[/QUOTE] Tried set_fft_value for the input: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, .... 0 still the magic output appears with get_fft_value: 1 2 4 6 8 10 12 14 16 18 18 16 14 12 10 8 6 4 2 0 ... 0 Tried with 1, 3, 5, 7, 9, 0, 0, ...., 0 The output is: 1 6 28 74 152 248 278 252 162 0 .... 0 Instead of: 1 6 19 44 85 124 139 126 81 0 .... 0 |
[QUOTE=megabit8;421048]Tried set_fft_value for the input:
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, .... 0 still the magic output appears with get_fft_value: 1 2 4 6 8 10 12 14 16 18 18 16 14 12 10 8 6 4 2 0 ... 0 Tried with 1, 3, 5, 7, 9, 0, 0, ...., 0 The output is: 1 6 28 74 152 248 278 252 162 0 .... 0 Instead of: 1 6 19 44 85 124 139 126 81 0 .... 0[/QUOTE] That is OK. Gwnum stuffs varying number of bits in each FFT word. In your case eith floor or ceiling of 14942209 / 768K. |
So, tomorrow most of the rest of the world get's back to work.
It might be interesting to see what happens, as we have done additional research (and engaged in heated debate) on this issue while other's enjoyed their time off... Sorry for being a prick. It's in my training; and my general nature... Question everything, and take no offence of anything.... |
The last test I embarked on has finished! And the results are very interesting. See the results.txt file:
[CODE] [Sat Jan 2 18:39:23 2016] M14942437 is not prime. Res64: 683A0DFFC5827CD8. We8: E57106A7,7379210,00000000 M14942267 is not prime. Res64: C35562BC4F3511F3. We8: D8A74C7B,2423514,00000000 M14942209 is not prime. Res64: 8587C9937E3BED22. We8: CDAD4A41,7713418,00000000 M14942293 is not prime. Res64: 035EFC95F88CFC27. We8: 36084309,4746867,00000000 [Sun Jan 3 22:02:18 2016] Iteration: 14329935/14942209, ERROR: FFT data has been zeroed! Possible hardware failure, consult the readme.txt file. Continuing from last save file. [Sun Jan 3 22:26:02 2016] M14942267 is not prime. Res64: D20C84656405F3FB. We8: FCFDD819,14910347,00000000 M14942539 is not prime. Res64: 0A930E56A9284971. We8: 7FE55A2A,1188977,00000000 M14942437 is not prime. Res64: 136153185F4D524F. We8: B81CE272,9576909,00000000 [Sun Jan 3 22:37:01 2016] M14942497 is not prime. Res64: 80BD5A064693F1C0. We8: 0CAD30A7,2607443,00000000 M14942293 is not prime. Res64: 035EFC95F88CFC27. We8: 36502AEF,8394253,00000000 M14942567 is not prime. Res64: D233F12AC3781E04. We8: 59875C25,3894081,00000000 [Sun Jan 3 22:42:28 2016] M14942563 is not prime. Res64: 6815BC39FCD7650F. We8: A94AFB88,2473090,00000000 [Sun Jan 3 22:55:12 2016] M14942209 is not prime. Res64: 0AA69D2EA9100E22. We8: 7D077832,14397436,00010000 [/CODE] The first four results belong to the first test I did with v27.9 and was done by two threads on each exponent. It matches the three tests I did on this machine using v28.7 and the data in the GIMPS database. These three other tests are described in an earlier post of mine. The last test consists of eight threads (4 cores, 6700K processor) working on eight exponents. Throughout the 28 hours the testrun lasted no errors were reported except for one on thread 1 when 96% of the run was completed. Notice that the results of this run do not match the previous runs except for one test. Of the four exponents that were tested for the first time on this machine concurrently with the other four, also one test did not fail whereas the other three did. The two successfully completed tests matching with the database were running on the following threads: [CODE] [Worker #3 Jan 3 21:54] Iteration: 14460000 / 14942293 ... [Worker #6 Jan 3 21:53] Iteration: 14620000 / 14942539 ... [/CODE] The threads 1 & 5, 2 & 6, 3 & 7 and 4 & 8 are the four pairs that each share one of the four physical cores. Small complication is that due to glazed frost on the high tension power lines in the northern parts of The Netherlands there were some noticeable power cuts lasting milliseconds throughout the last 20% of the test run. This did not stop the machine running, and the web browser that was running on this machine did not fail either because of it. I will now restart this exact test and finish it in an expected 28 more hours, I will probably be asleep when that test finishes but will report a few hours later. |
If someone has a reasonable fast pre Skylake four physical cores with hyperthreading Intel machine available and feels like it than feel free to run the worktodo file from [URL="http://www.mersenneforum.org/showpost.php?p=420995&postcount=262"]post 262[/URL].
Just for reference, the outcome will be predictable, 8 correct residues, which is what I am looking four. The only such machine I have is about eight years old and would take too much time to run this test. Please post here if you embark on it. If someone with another Skylake wants to run this test, than of course, feel free to do so. Before I started the test for a second time on my Skylake machine I rebooted it. |
[QUOTE=tha;421143]Before I started the test for a second time on my Skylake machine I rebooted it.[/QUOTE]
Did that enter more or less entropy into the system? |
[QUOTE=chalsall;421146]Did that enter more or less entropy into the system?[/QUOTE]
I am assuming it is a joke. But a serious answer to the question is that I don't think it made any difference. Just a safeguard. |
[QUOTE=tha;421148]I am assuming it is a joke.
But a serious answer to the question is that I don't think it made any difference. Just a safeguard.[/QUOTE] It was kind of a joke, but also a serious question... We still don't understand what is happening. So, restarting might make sense. Then again, it might not. In a perfect universe, we could capture the quantum state of the computing devices we use, and run many tests based on their initial states. We humans are not that powerful, but we still have the ability to try.... :smile: |
[QUOTE=chalsall;421146]Did that enter more or less entropy into the system?[/QUOTE]
Yes. (couldn't resist... LOL) |
[QUOTE=Madpoo;421153]Yes. (couldn't resist... LOL)[/QUOTE]
Cool.... |
Haha, I was also reading it like "Did that, more or less, enter entropy into the system?" but you were faster with the answer...
Edit: Reading all this I feel sorry I don't own a Skylake... Itching hands to try some tests by myself. |
[QUOTE=Madpoo;421153]Yes. (couldn't resist... LOL)[/QUOTE]
:tu::tu: |
Some times contradictory discussions yield best results. One thing is sure, time will tell how this issue sorts out. I would not rush in.
|
[QUOTE=megabit8;421178]I would not rush in.[/QUOTE]
I was interested to see if this had been picked up by any mainstream media yet. I ran a few Google queries, and it appears it hasn't. This is a good thing in my mind -- we want Intel and the motherboard manufacturers to have as much lead time as possible to solve what appears to be a very subtle bug without the hysteria which mainstream reporting often brings. I did find [URL="http://www.anandtech.com/show/9607/skylake-discrete-graphics-performance-pcie-optimizations"]this article from AnandTech[/URL] on a tangential bug interesting. Also, that the [URL="https://communities.intel.com/message/361811"]Intel Forum thread on this matter hasn't been posted to[/URL] since December 31st. It's only a matter of time before this is "out there". I do hope that Intel are taking this seriously.... |
I am currently trying to build a custom version of mprime on my Ubuntu Skylake system.
I got the following errors during compile: [CODE] /usr/bin/ld: cannot find -lgssapi_krb5 /usr/bin/ld: cannot find -lkrb5 /usr/bin/ld: cannot find -lk5crypto [/CODE] The original list was longer but I was able to figure which dev packages need to be installed to eliminate these error messages. These 3 are a little harder to Google, if anyone knows in which packages they belong I would appreciate any help. I modified the makefile in the following way, but that was more wild guessing than understanding what I am doing. If anyone wants to comment it, I welcome insight. [CODE] # Makefile for Linux 64-bit mprime # # Ugh, different linux variants require different makefiles. # The current makefile is for CentOS 5.10. We prefer to link against # older Linux versions because linking on the latest, greatest version # will create an mprime executable that will not run on older # Linux versions because of glibc incompatibilites. # # Some linux versions require some of the variations below: # "export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig" # CFLAGS = -I.. -I../gwnum -DX86_64 -O2 -Wno-unused-result # LFLAGS = -Wl,-M # LIBS = ../gwnum/gwnum.a ../gwnum/gwnum.ld -lm -lpthread -Wl,-Bstatic $(shell pkg-config --static --libs libcurl) -lstdc++ -Wl,-Bdynamic -ldl CC = gcc # CFLAGS = -I.. -I../gwnum -I/usr/local/include -DX86_64 -O2 CFLAGS = -I.. -I../gwnum -DX86_64 -O2 -Wno-unused-result CPP = g++ CPPFLAGS = -I.. -I../gwnum -DX86_64 -O2 # LFLAGS = -Wl,-M -Wl,-L/usr/local/lib # LIBS = ../gwnum/gwnum.a ../gwnum/gwnum.ld -lm -lpthread -Wl,-Bstatic -lcurl -Wl,-Bdynamic -lrt -lstdc++ -ldl LFLAGS = -Wl,-M LIBS = ../gwnum/gwnum.a ../gwnum/gwnum.ld -lm -lpthread -Wl,-Bstatic $(shell pkg-config --static --libs libcurl) -lstdc++ -Wl,-Bdynamic -ldl . . . [/CODE] |
[QUOTE=tha;421238]I am currently trying to build a custom version of mprime on my Ubuntu Skylake system.
I got the following errors during compile: [CODE] /usr/bin/ld: cannot find -lgssapi_krb5 /usr/bin/ld: cannot find -lkrb5 /usr/bin/ld: cannot find -lk5crypto [/CODE] The original list was longer but I was able to figure which dev packages need to be installed to eliminate these error messages. These 3 are a little harder to Google, if anyone knows in which packages they belong I would appreciate any help. [/quote] Try sudo apt-get install libkrb5-dev . Some quick Googling suggests that's the answer. |
[QUOTE=tha;421238]I am currently trying to build a custom version of mprime on my Ubuntu Skylake system.[/QUOTE]
Did you take the advise given, and run the torture test on your kit? I will give you that it's really cool that you got an error on a regular test. Reproducibility is questionable, and errors are wonderful. |
[QUOTE=Mark Rose;421239]
Try sudo apt-get install libkrb5-dev . Some quick Googling suggests that's the answer.[/QUOTE] That was the one I tried first, and actually I am still amazed it doesn't work. That package is installed. |
[QUOTE=chalsall;421241]Did you take the advise given, and run the torture test on your kit?
I will give you that it's really cool that you got an error on a regular test. Reproducibility is questionable, and errors are wonderful.[/QUOTE] No, not yet. I have a couple of other tests that I want to run first. About 12 more hours before the current test completes. I lost some more time trying to compile mprime than i thought beforehand. |
| All times are UTC. The time now is 23:23. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.