![]() |
[QUOTE=tha;420846]
Any thoughts?[/QUOTE] Have you tried the known failure case? Namely, run version 27.9 torture test on 768K FFT for 8 threads. Data indicates 25% of Skylakes do not exhibit the problem, you may have gotten lucky. |
Version 28.7 will take a lot longer till a worker stops. Even if you use 27.9 it can take hours. If you think that the system is stable @27.9 try to restart the computer a few times. The risk of failure will increase (although I don't really understand why).
[QUOTE] [Worker #1] Test=N/A,14942209,67,1 [Worker #2] Test=N/A,14942267,67,1 [Worker #3] Test=N/A,14942293,67,1 [Worker #4] Test=N/A,14942437,67,1[/QUOTE] I was not able to reproduce the error in a reasonable amount of time by using a worktodo.txt file similar to this one. [QUOTE](possibly most) Skylake systems work fine[/QUOTE] Ralle has tested a lot of CPUs and all have the same problem. Even with the new SGX (Software Guard Extensions) Version of the chip worker will stop. |
[QUOTE=Aurum;420861]I was not able to reproduce the error in a reasonable amount of time by using a worktodo.txt file similar to this one.[/QUOTE]
Could you, then, please provide a worktodo.txt file which /did/ exhibit the error? Specific prime.txt and local.txt files would be useful as well. I know this has been posted above, but it's been rather interleaved. Perhaps a definite test domain would be useful... [QUOTE=Aurum;420861]Ralle has tested a lot of CPUs and all have the same problem. Even with the new SGX (Software Guard Extensions) Version of the chip worker will stop.[/QUOTE] One thing I found interesting is that an Intel representative said they were able to reproduce the bug by /downgrading/ the CPU's microcode. This might (or might not) be the key variable with regards to this issue. |
As I recall, there was no worktodo.txt that could recreate the issue, only the 768K stress test.
|
[QUOTE=Prime95;420856]Have you tried the known failure case? Namely, run version 27.9 torture test on 768K FFT for 8 threads.[/QUOTE]
I will try to complete this test first which will be just under 6 hours to go. I just downloaded 27.9 from the mersenne.ca site and will run that test tomorrow morning. |
[QUOTE=tha;420867]I will try to complete this test first which will be just under 6 hours to go. I just downloaded 27.9 from the mersenne.ca site and will run that test tomorrow morning.[/QUOTE]
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" -- Issac Asimov |
[QUOTE=Dubslow;420865]As I recall, there was no worktodo.txt that could recreate the issue, only the 768K stress test.[/QUOTE]
That's correct. [QUOTE]One thing I found interesting is that an Intel representative said they were able to reproduce the bug by /downgrading/ the CPU's microcode.[/QUOTE] I read an article a few days ago about the CPU architecture and the microcode. The author basically said that the microcode includes a lot of workarounds for hardware errata. It would take to much time to fix the CPU design itself so the workarounds will stay i the microcode forever. |
[QUOTE=Aurum;420869]I read an article a few days ago about the CPU architecture and the microcode. The author basically said that the microcode includes a lot of workarounds for hardware errata. It would take to much time to fix the CPU design itself so the workarounds will stay i the microcode forever.[/QUOTE]
Care to reference that article? It would help to build "the case". |
[QUOTE=Dubslow;420865]As I recall, there was no worktodo.txt that could recreate the issue, only the 768K stress test.[/QUOTE]
In theory it should be recreatable (not a word, I know) with a worktodo that does a 768K FFT test, but the local.txt would also need settings to ensure it's running a solo worker on all physical and HT cores, just like the torture test would. The torture test is using a random exponent whereas the worktodo would be using a specific one (and even better, it could use one with a known final residue to ensure nothing else happened along the way even if no roundoff errors were caught). |
[QUOTE=chalsall;420871]Care to reference that article?
It would help to build "the case".[/QUOTE] I can't find it anymore. |
[QUOTE=Aurum;420875]I can't find it anymore.[/QUOTE]
Your dog ate your homework? |
| All times are UTC. The time now is 23:23. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.