![]() |
Skylake FMA3 round off error
I have my new Skylake at work and it completed its first DC exponent, which was in the 43M range. The system appears to be stable. Before the work was started mprime did a test to see if it could be done in a FFT size of 2240K. The round off error was just below the acceptable value. The DC Lucas-Lehmer test completed successfully.
The system then started to work on its second exponent, 43175681. Same procedure, except that halfway in the DC test it started to show a round off error. Now, at 90% work done there are already 10 round off errors. A stress test that I did after a few errors had shown up did not show up any errors. I am thinking of running a second test on this exponent using FMA2 instead of FMA3 to see if the different rounding in the two methods might be the cause of this. Any thoughts? |
My experience with a Haswell-E (5820K) is that Prime95 is a bit too aggressive when near the limit for a FFT size. I fiddled with the "SoftCrossoverAdjust" parameter in prime.txt to go round this.
Jacob |
I queued it up for a triple check.
FMA3 is part of AVX2. There is nothing called FMA2, the other FFT type in Prime95 is just called AVX. You can use: SoftCrossover=0.1 or SoftCrossover=0 in prime.txt then it will use a larger FFT size at a lower exponent. Edit: Reading undoc.txt again it might be better to use SoftCrossoverAdjust=-0.002 or SoftCrossoverAdjust=-0.004 instead of disabling the SoftCrossover feature (with SoftCrossover=0). |
[QUOTE=ATH;425061]I queued it up for a triple check.
FMA3 is part of AVX2. There is nothing called FMA2, the other FFT type in Prime95 is just called AVX. You can use: SoftCrossover=0.1 or SoftCrossover=0 in prime.txt then it will use a larger FFT size at a lower exponent. Edit: Reading undoc.txt again it might be better to use SoftCrossoverAdjust=-0.002 or SoftCrossoverAdjust=-0.004 instead of disabling the SoftCrossover feature (with SoftCrossover=0).[/QUOTE] I have set the values in prime.txt to: SoftCrossover=0.3 SoftCrossoverAdjust=-0.020 Let me see how this works out The exponent 43175683 is now being done witt FFT size 2304K instead of 2240K. This bigger size turns out to be about 4-5% faster than the smaller one. Maybe some interference with the memory speed or so. |
[QUOTE=tha;425134]
The exponent 43175683 is now being done witt FFT size 2304K instead of 2240K. This bigger size turns out to be about 4-5% faster than the smaller one. Maybe some interference with the memory speed or so.[/QUOTE] George, might this merit some tweaks to Prime95? |
My FFT-error-modeling-based length-setting function suggests a maxP in the range 43005794-43235170 for 2240K, with 'where in the range' depending on the details of the short-length DFT algos needed to build up that FFT length: 2240K = 2^16.5.7, and whether the FMA version of same give greater or lesser accuracy than the non-FMA ones. p = 43175683 is definitely pushing things.
In my own FMA3 code deployment I found that certain DFT radices showed surprisingly higher ROE levels when coded in the obvious fashion, and much of this was due to certain common arithmetic combinations (such as the twiddle-multiply by the odd-indexed 8th roots of unity, (±1±I)/sqrt(2) in the radix-8 DFT) when coded up to take advantage of FMA. My workaround for these was based on a lot of experimentation with 'strategic wrong-way rounding' of various key arithmetic constants, i.e. round the LSB float64 bit in the opposite direction of that indicated by the quad-float high-precision version of the same constant. The current state of my various sensitivity analyses here is as much voodoo as rigorous science, however. |
[QUOTE=Dubslow;425142]George, might this merit some tweaks to Prime95?[/QUOTE]
With another machine, I also noticed this before, sometimes a larger FFT is faster than the smaller one. It would be worth to have the benchmark routine extended with a routine (or introduce a seperate one) that successively tests each FFT size for speed and then software wise disable the sizes that are slower than their both bigger and faster ones. Also, I have to make a confession. The errors started showing up after I closed the cabinet. The GTX580 in it has a power connector that presses against the cover. That may very well have exerted too much pressure on the motherboard. I am not willing to close the cabinet again, before having obtained or custom make a new power cable. ATH, are you doing this exponent on a Skylake, or a less recent machine? |
[QUOTE=tha;425175]ATH, are you doing this exponent on a Skylake, or a less recent machine?[/QUOTE]
I'm doing it on my Titan Black. It will start in about 7 hours and take 20-30 hours depending on how much I use my computer, I have a few days off. |
I now believe the cause of the errors was the power cable of the GPU exerting pressure on the motherboard. I took the tie wraps off the cable and they now bend easily against the side panel.
Can I have this one faulty result taken from the machine's track record? |
[QUOTE=tha;425249]I now believe the cause of the errors was the power cable of the GPU exerting pressure on the motherboard. I took the tie wraps off the cable and they now bend easily against the side panel.
Can I have this one faulty result taken from the machine's track record?[/QUOTE] Probably not... EDIT: I pulled up some wrong results... hang on... Okay, after I looked up the correct user's info... your stats for all of your systems combined are: 1 suspect result (M43175681) ... currently unknown if it's bad until a triple-check...hang in there! 329 good 0 bad 44 unknown When I do my own statistical analysis of systems, I break down a machine's results by user, cpu, year, and app version. So if that does end up being bad, it will only affect that one cpu and app version for any 2016 results. And I only use that info to guess at tie-breakers for mismatched results. I currently consider any system with zero bad and >= 15 good to be the winner, and that's really all the guessing involved. |
[QUOTE=tha;425249]Can I have this one faulty result taken from the machine's track record?[/QUOTE]
Yes, Madpoo can easily do that, [U]after he clear out all my bad results I had between January and March 2012[/U] when I was testing the new cudalucas (switching from powers-of-two-FFT to non-powers-of-two-FFT). Didn't I say that many times? :razz: The rule was that the bad results stay bad. For whatever reason. Otherwise we completely mess the statistics... Are we doing statistics or not? |
| All times are UTC. The time now is 13:05. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.