![]() |
[QUOTE=chalsall;419539]Yeah. That's what I ended up doing, and it worked.
BTW, having your avatar changed by the "Gods" is a badge of honor! :smile:[/QUOTE] Probably the majority of the avatars here from active members have been assigned by the forum admin. The next level of recognition is a bit of text under your name. |
The conversation continues...
[url]https://communities.intel.com/message/361811#361811[/url]
Mike.C (from Intel) questions Henk_NL about his (serious) overclocking. EricHefner then responds to Mike.C saying that clock speed has no function in the errors (and provides a screenshot). I am hopeful that those serious at Intel will get back to work after the holidays are over.... |
[QUOTE=chalsall;420329][URL]https://communities.intel.com/message/361811#361811[/URL]
Mike.C (from Intel) questions Henk_NL about his (serious) overclocking. EricHefner then responds to Mike.C saying that clock speed has no function in the errors (and provides a screenshot). I am hopeful that those serious at Intel will get back to work after the holidays are over....[/QUOTE] Mike.C (from Intel) was already drunk from the Christmas champagne. He repeatedly confused Mega with Giga. :w00t: |
AVX instructions seems to load only 128 bits of data instead of 256
Can someone make a simple test on Skylake ?
Like this in C++ float temp[8] = { 1, 2, 3, 4, 5, 6, 7, 8 }; __m256 loadedData = _mm256_loadu_ps(&temp[0]); //On Skylake it only loads only the first 4 floats //On Ivy Bridge it loads the full 8 floats OR: double temp[8] = { 1, 2, 3, 4 }; __m256d loadedData = _mm256_loadu_pd(&temp[0]); //On Skylake it only loads only the first 2 doubles //On Ivy Bridge it loads the full 4 doubles Or am I making a mistake ? |
It was a false alarm related to Visual Studio 2015 vs. 2012. The bug is specific to VS 2015 so nevermind the previous post. It works correctly on Skylake with VS 2012.
|
It is a rounding problem in Skylake AVX after all.... :)
1 Attachment(s)
I managed to isolate this piece of code:
[INDENT][LEFT][B]#include <stdio.h> #include <math.h> const double M_PI = 3.1415926535897932384626433832795; double SkylakeAVXCosine() { double arg = (2 * M_PI) * 33 / 256.0; double cosValue = cos(arg); return cosValue; } int main() { double cosValue = SkylakeAVXCosine(); unsigned __int64 bytes = *(unsigned __int64*)&cosValue; printf("cos((2 * M_PI) * 33 / 256.0) == %llx \r\nPress enter to exit...", bytes); getchar(); }[/B] [/LEFT] [/INDENT]The above code should be compiled with Visual Studio 2015, use the platform toolset: "Visual Studio 2015 (v140)" and in Code Generation set: "Advanced Vector Extensions (/arch:AVX)" so that the cos function will be computed with AVX. On Skylake processor the result is: cos((2 * M_PI) * 33 / 256.0) == 3fe610b7551d2cdf On Ivy Bridge the result is: cos((2 * M_PI) * 33 / 256.0) == 3fe610b7551d2cde (Notice the last/lowest bit difference, on Skylake it is 1 on Ivy it is 0). After all, this small program shows different result on Skylake vs other processors. I think this is related to the rounding problem posted here. The exe is attached. |
[QUOTE=megabit8;420545]After all, this small program shows different result on Skylake vs other processors. I think this is related to the rounding problem posted here.[/QUOTE]
Maybe, maybe not. Worth investigating further, although I question if this could cause the crashing reported; more likely it would simply cause an error in the end calculation. [QUOTE=megabit8;420545]The exe is attached.[/QUOTE] Just so you know, most here are very hesitant to run precompiled code from "unknown" sources. Further, this bug has been observed under Linux, and your above code doesn't compile under GCC under Linux. I'm not accusing you of posting "malware" (the exec is too small unless you're /really/ good :wink:) but producing source code which can compile under both Linux and Windows would be useful. Happy New Year! :smile: |
It shows different result on Skylake than Ivy for sure
2 Attachment(s)
First of all Happy New Year! :smile:
Then 11KB for an exe is too big for malware. One needs only 1KB of infected code to download an 100MB EXE from a short link over the internet and run on the computer in background all the viruses, trojans and malware in the world. Now to be serious, that exe is the result of compiling the attached code in Visual Studio 2015. BitDefender was running. I also tested with "Platform tools from Visual Studio 2013" and the bug is reproducible. With "Platform Tools from Visual Studio 2012" the result on Skylake is the same as on Ivy. This makes me think that Visual Studio 2013 libraries or newer detects the new AVX2 and FMA instructions and uses them to compute the cosine. But it fails to be backwards compatible. I do not know if this issue affects the Prime 95 calculations. My Skylake processor does not freeze after updating the BIOS. I had lot of Windows freezes using the older BIOS. It just shows a rounding error in the Prime95 27.9 version 768 KB inplace-FFT (see attached). [Current Bios is 1402, MB: ASUS Z170-A] I was wondering if Prime 95 uses some common libraries to compute the sine/cosine which might yield different results on Skylake just as in that test. Then the errors would propagate. I do not have any experience on Linux GCC so I cannot give you code that executes on Linux. It might not be even reproducible on Linux since this issue might be related to how Microsoft Visual Studio 2015 computes the sine/cosine functions using newer instructions. It could be a fail of Skylake as well if Microsoft does the calculations correctly. I forgot to mention, that exe is compiled for 64 bit and you might need the Visual Studio C++ redistributables x64 bit version installed from: [url]https://www.microsoft.com/en-us/download/details.aspx?id=48145[/url] to run it correctly. All the best in the new year! |
[QUOTE=megabit8;420747]Then 11KB for an exe is too big for malware. One needs only 1KB of infected code to download an 100MB EXE from a short link over the internet and run on the computer in background all the viruses, trojans and malware in the world.[/QUOTE]
A cursory disassembly of your EXE didn't show anything particularly interesting. That's a good thing. But I give you your point. [QUOTE=megabit8;420747]Now to be serious, that exe is the result of compiling the attached code in Visual Studio 2015. BitDefender was running. I also tested with "Platform tools from Visual Studio 2013" and the bug is reproducible. With "Platform Tools from Visual Studio 2012" the result on Skylake is the same as on Ivy.[/QUOTE] The bug you are reporting might have absolutely nothing to do with bug which opened this thread. Can you please explain, based on your experience, why only 768k FFTs failed? |
It is a deep question, if it fails only for 768KB FFT. It could be that the Prime 95 code is very sensitive to rounding errors for this size and possibly others with similar properties.
My personal opinion is that there's an error from rounding in the twiddle factors computation using sine/cosine. And the code is designed to use even the last bit precisely and that bit is different. This is the engineer's approach. I tested Prime95 v27.9 for 25 hours with hyper-threading off and no error appeared whereas with hyper-threading on an error appears after at most 2 hours. So this makes me think that there's a deeper issue with the memory controller and instructions scheduling when hyper-threading is on. This is the opinion after reading the entire thread. I use no overclock and I keep turbo disabled from 1 core 4.2GHz to 4.0GHz top speed for all cores. So no overclock at all, even a bit of down clock. I got to that magic constant argument for cosine by using a self made 768KB AVX optimized FFT transform which showed differences in the end result from Skylake to Ivy. After tracing the problem I observed that the precomputed twiddle factors differed. The first constant that differed was the 33/256 complex root of unity. Then I fixed the sin/cos functions to be standard with the VS 2012 non-optimized AVX functions and left the computers running to compute 100 million transformations each of size 768KB. It took 1.5 days. Skylake finished when Ivy was at 80%. I compared the SHA1 of the result of each FFT and it was identical between Skylake and Ivy. This means that the AVX operations I have used to compute the FFT: _mm256_mul_pd _mm256_add_pd _mm256_sub_pd perform identical on Skylake as on Ivy in my case. Hyperthreading was on on both computers and all 8 threads were used. This is why either: 1. it is a rounding error that propagates caused by the Intel 6'th gen processor in conjunction with a bad library that computes the trigonometric functions. 2. it is far more complex and involves hyperthreading, caches, bug in the processor circuitry so that for example when running some complex AVX optimized code in Prime 95 v27.9 on all 8 threads, it alters a bit the end result. I had contact with Intel i7 first gen, second gen Sandy Bridge, 3'rd gen Ivy which are all rock solid in Windows. Now on Skylake, FireFox crashes most often and it feels that the system is not that solid even though it is faster. This is the general impression. |
[QUOTE=megabit8;420545]I managed to isolate this piece of code:[INDENT][LEFT][B]#include <stdio.h>
#include <math.h> const double M_PI = 3.1415926535897932384626433832795; double SkylakeAVXCosine() { double arg = (2 * M_PI) * 33 / 256.0; double cosValue = cos(arg); return cosValue; } int main() { double cosValue = SkylakeAVXCosine(); unsigned __int64 bytes = *(unsigned __int64*)&cosValue; printf("cos((2 * M_PI) * 33 / 256.0) == %llx \r\nPress enter to exit...", bytes); getchar(); }[/B] [/LEFT] [/INDENT]The above code should be compiled with Visual Studio 2015, use the platform toolset: "Visual Studio 2015 (v140)" and in Code Generation set: "Advanced Vector Extensions (/arch:AVX)" so that the cos function will be computed with AVX. On Skylake processor the result is: cos((2 * M_PI) * 33 / 256.0) == 3fe610b7551d2cdf On Ivy Bridge the result is: cos((2 * M_PI) * 33 / 256.0) == 3fe610b7551d2cde (Notice the last/lowest bit difference, on Skylake it is 1 on Ivy it is 0). After all, this small program shows different result on Skylake vs other processors. I think this is related to the rounding problem posted here. The exe is attached.[/QUOTE] I replaced M_PI with pi.. on my (windows)/Haswell system, I'm getting [code] cos((2 * pi) * 33 / 256.0) == 3fe610b7551d2cdf Press enter to exit... [/code] |
| All times are UTC. The time now is 23:23. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.