![]() |
[QUOTE=Madpoo;418226]I started to wonder if something changed in the AVX that, for whatever bizarre reason, affects the precision in such a way that where Prime95 would normally consider 768K "enough" for a certain range of exponents, it's just not cutting it.
By forcing 768K FFT size on a much smaller exponent where we'd be sure we were VERY far away from that kind of rounding error, if it *still* throws out rounding errors even then, well, I think that's a safe bet that AVX in Skylake got fried in some peculiar way. If, however, it can do a 768K FFT on a much smaller exponent, and it's only the larger ones in the "traditional" 768K range that cause issues, seems like it's still an AVX bug but smaller in scale... basically it's not being as precise as it should.[/QUOTE] From Dubslow's data in post #31: [i] Test 19, 6500 Lucas-Lehmer iterations of M10485761 using AVX FFT length 768K FATAL ERROR: Rounding was 0.5, expected less than 0.4 [/i] That exponent works out to just 13.33... bits per FFT word, which is significantly lower than the default maxP for that FFT length of around 15M ==> ~19.3 bits/word, fully 6 bits per word larger. That argues against your speculation above. Note also that if one makes the exponent *too* small (~8 bits per digit or less), one may run into carry-chain-too-long issues, if one uses a fixed-max-length carry chain for the 'wraparound' step of the carry procedure. My code does things that way, not sure about George's. Long story short, any p <= 90% the maxP for the FFT length in question is more than small enough to rule out any subtle precision-related effects such as you surmise. |
[QUOTE=ewmayer;418279]Long story short, any p <= 90% the maxP for the FFT length in question is more than small enough to rule out any subtle precision-related effects such as you surmise.[/QUOTE]
OK. Please connect the dots for those of us slower than most... What should we put in our worktodo.txt files to take it to the edge, and potentially generate a reproducible error? |
[QUOTE=chalsall;418284]What should we put in our worktodo.txt files to take it to the edge, and potentially generate a reproducible error?[/QUOTE]Our reading of Ernst's post is that we want to pick an exponent that is not likely to generate a reproducible (rounding) error, so an exponent that is not near the FFT boundaries.
:confused: |
[QUOTE=chalsall;418284]OK. Please connect the dots for those of us slower than most...
What should we put in our worktodo.txt files to take it to the edge, and potentially generate a reproducible error?[/QUOTE] Any p between 10-15M should serve. Here are results of a pair of 1000-iter selftests of my code @768K - first is for the default maxP at that FFT length, 2nd shows how much lower the ROE levels are already at largest-prime-less-than-15M: [b]Run 1:[/b] [i] M15094403: using FFT length 768K = 786432 8-byte floats. this gives an average 19.193525950113933 bits per digit Using complex FFT radices 192 8 16 16 ... 1000 iterations of M15094403 with FFT length 786432 = 768 K Res64: F673A8D6413923A9. AvgMaxErr = 0.271705612. MaxErr = 0.375000000. Program: E14.1 [/i] [b]Run 2:[/b] [i] M14999981: using FFT length 768K = 786432 8-byte floats. this gives an average 19.073462168375652 bits per digit Using complex FFT radices 192 8 16 16 ... 1000 iterations of M14999981 with FFT length 786432 = 768 K Res64: 38221FD59A59B0D0. AvgMaxErr = 0.224540663. MaxErr = 0.312500000. Program: E14.1[/i] |
[QUOTE=ewmayer;418287]Any p between 10-15M should serve. Here are results of a pair of 1000-iter selftests of my code @768K - first is for the default maxP at that FFT length, 2nd shows how much lower the ROE levels are already at largest-prime-less-than-15M:[/QUOTE]
And the mprime/Prime95 worktodo.txt entries would be? What similar entries might push the test case over the edge? |
On a related note; the news in the last couple of days is that, comparing to sturdier earlier designs, Skylake CPUs bend so easy, motherboard contact pins have been subject to damage by overtightening cooler screws or due to dynamic stress from shipping with heavy coolers on.
|
[URL="http://cdn.overclock.net/6/60/500x1000px-LL-605c96d0_hb0a-9q-a533.jpeg"]http://cdn.overclock.net/6/60/500x1000px-LL-605c96d0_hb0a-9q-a533.jpeg[/URL]
I noticed you recommended "[B]Run FFTs In-place[/B]". I'm not sure exactly what this does, but from undoc.txt: "The default value is the larger of your daytime and nighttime memory settings. If this is set to 8MB or less, then the torture test does FFTs in-place. This may be more stressful but could miss memory errors that only occur at a specific physical address." Have you tried without this option? George can explain the difference, it might be another thing to try? |
[QUOTE=ATH;418298][
I noticed you recommended "[B]Run FFTs In-place[/B]". I'm not sure exactly what this does, [/QUOTE] In-place FFTs: Square a number placing the result in the same memory location. Repeat. Not in-place FFTs: Square a number placing the result in the next chunk of RAM. Repeat until all allocated RAM used, then store result back in first chunk of allocated RAM. |
[QUOTE=Xyzzy;418230]Maybe:
Try an exponent that uses a 768K FFT (that fails) and try it with a larger FFT. Try an exponent that uses a smaller FFT (that passes) and try it with a 768K FFT.[/quote] [QUOTE=chalsall;418289]And the mprime/Prime95 worktodo.txt entries would be? What similar entries might push the test case over the edge?[/QUOTE] IMO, y'all are getting side-tracked. We already know that the error is not dependent on the exponent being tested. As Ernst pointed out we are nowhere near the max exponent for the 768K FFT. We are not dealing with "edge cases". The whole point of a torture test is to run previously tested cases and see if we get the same results. Think of it as a double-check but only doing a few thousand iterations. [quote=XYZZY] Has anyone tried Mprime with a Linux "live" CD? (Eliminate the operating system variable!)[/QUOTE] Now this is a great idea. I did not think to include that in my previous list of commonalities in all the reported failures. |
[QUOTE=Prime95;418304]IMO, y'all are getting side-tracked. We already know that the error is not dependent on the exponent being tested. As Ernst pointed out we are nowhere near the max exponent for the 768K FFT. We are not dealing with "edge cases". The whole point of a torture test is to run previously tested cases and see if we get the same results. Think of it as a double-check but only doing a few thousand iterations.
... Now this is a great idea. I did not think to include that in my previous list of commonalities in all the reported failures.[/QUOTE] Another (possibly?) good idea... use mlucas to replicate what's going on with Prime95/mprime ? Ernst would have to chime in since I'm totally unfamiliar with mlucas options and whether it can be forced to use AVX (not FMA) and essentially set it up to do the same thing that Prime95 is doing when it fails. At least then with a separate code branch (but same underlying technique) it might be useful in some way. Possibly eliminate code issues of mlucas also throws rounding errors. |
More of a side-tracking:
I used pari to generate this worktodo by picking random primes in 9M-15M interval: [CODE][Worker #1] Test=N/A,FFT2=768K,13028909,70,1 Test=N/A,FFT2=768K,10018273,70,1 Test=N/A,FFT2=768K,13009501,70,1 Test=N/A,FFT2=768K,9089261,70,1 Test=N/A,FFT2=768K,11440477,70,1 Test=N/A,FFT2=768K,12655001,70,1 Test=N/A,FFT2=768K,10798133,70,1 [Worker #2] Test=N/A,FFT2=768K,14707391,70,1 Test=N/A,FFT2=768K,14162843,70,1 Test=N/A,FFT2=768K,11396233,70,1 Test=N/A,FFT2=768K,13947863,70,1 Test=N/A,FFT2=768K,10661239,70,1 Test=N/A,FFT2=768K,13790047,70,1 Test=N/A,FFT2=768K,12768493,70,1 [Worker #3] Test=N/A,FFT2=768K,10675681,70,1 Test=N/A,FFT2=768K,12324287,70,1 Test=N/A,FFT2=768K,9520739,70,1 Test=N/A,FFT2=768K,13448317,70,1 Test=N/A,FFT2=768K,11051611,70,1 Test=N/A,FFT2=768K,12084151,70,1 Test=N/A,FFT2=768K,14614757,70,1 [Worker #4] Test=N/A,FFT2=768K,14173669,70,1 Test=N/A,FFT2=768K,12710693,70,1 Test=N/A,FFT2=768K,9656821,70,1 Test=N/A,FFT2=768K,12409589,70,1 Test=N/A,FFT2=768K,10762571,70,1 Test=N/A,FFT2=768K,9320599,70,1 Test=N/A,FFT2=768K,9681097,70,1 [Worker #5] Test=N/A,FFT2=768K,13680587,70,1 Test=N/A,FFT2=768K,9712259,70,1 Test=N/A,FFT2=768K,14749243,70,1 Test=N/A,FFT2=768K,14698003,70,1 Test=N/A,FFT2=768K,13222663,70,1 Test=N/A,FFT2=768K,10664923,70,1 Test=N/A,FFT2=768K,10161911,70,1 [Worker #6] Test=N/A,FFT2=768K,10458089,70,1 Test=N/A,FFT2=768K,10974797,70,1 Test=N/A,FFT2=768K,14775599,70,1 Test=N/A,FFT2=768K,9848833,70,1 Test=N/A,FFT2=768K,13317671,70,1 Test=N/A,FFT2=768K,14399617,70,1 Test=N/A,FFT2=768K,12593393,70,1 [Worker #7] Test=N/A,FFT2=768K,11629151,70,1 Test=N/A,FFT2=768K,9485549,70,1 Test=N/A,FFT2=768K,9162203,70,1 Test=N/A,FFT2=768K,13075291,70,1 Test=N/A,FFT2=768K,11318459,70,1 Test=N/A,FFT2=768K,10594379,70,1 Test=N/A,FFT2=768K,14911957,70,1 [Worker #8] Test=N/A,FFT2=768K,14641651,70,1 Test=N/A,FFT2=768K,11376217,70,1 Test=N/A,FFT2=768K,11936039,70,1 Test=N/A,FFT2=768K,14732461,70,1 Test=N/A,FFT2=768K,10465967,70,1 Test=N/A,FFT2=768K,9488701,70,1 Test=N/A,FFT2=768K,12477719,70,1 [/CODE]Then I copied p95 in a fresh folder, added [CODE]WorkerThreads=8 Affinity=100 ThreadsPerTest=1 CpuSupportsFMA3=0 CpuSupportsFMA4=0 [/CODE] to local.txt (last line is "just in case" :razz: after reading undoc file, hehe). As I don't own these new thingies yet, I let it run in a SB [U]and[/U] in a Haswell box over night. No error, 8 residues matched in both computers (first round of expo finished only, then I stopped it). Using mprime, you can just rename your worktodo and put this one instead. |
| All times are UTC. The time now is 23:23. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.