![]() |
|
|
#1 |
|
Einyen
Dec 2003
Denmark
345210 Posts |
During verification of the new prime (no spoilers) my Prime95 on my Haswell-E 5960X suddenly did not match the interim residues of the others without any errors shown and I had OutputRoundoff=1 in prime.txt and the max round off was like 0.098. This happened 2 times with the first FFT and 1 time with a higher FFT.
Then I added ErrorCheck=1 and SumInputsErrorCheck=1 to prime.txt and started a new run with the higher FFT and residues matched up to iteration 23M, then I suddently got 4 x roundoff error > 0.4, I think the max was 0.5 and it said "confidence in final result is low" but the interim 32M residue still matched the others, very wierd. I restarted a new run with a much higher FFT and still ErrorCheck=1 and SumInputsErrorCheck=1 and I got roundoff > 0.4 a few times even before iteration 1M, but the 1M residue still matched the others. At this point I gave up and just finished my CudaLucas run. I got this computer in the beginning of November and it has 30+ successful double checks, so I assumed one or more of my RAM sticks was FUBAR. I am running the XMP profile 15-16-16-39 and 3000 Mhz, and the processor is overclocked to 3500 Mhz. I restarted computer on a USB with Memtest86 which tests all 32 GB RAM and it ran for ~36 hours without any errors most of the time on all 8 cores. I restarted back in Windows and ran ~ 45 hours of Prime95 stress test without any errors! Now I'm running the verification again and now the residues match so far. Can this be some error on boot up which can go away again after restart? I did restart my computer just before starting the initial verification run, and did not restart it again until the Memtest86. Any other ideas? I will do some more double checks soon and see what happens. Last fiddled with by ATH on 2016-01-16 at 13:19 |
|
|
|
|
|
#2 |
|
Dec 2009
Peine, Germany
331 Posts |
If it has always been stable before and does now fail without any change I would think of the power supply. I had a stable system which suddenly started with sporadic errors shown in Prime95. This become worse and worse over the months and ended in several blue screens, even without load. Finally, I ended up with swapping my power supply and all was good again. I can't say whether there were residue errors without Prime95 showing an error or not but I guess not all calculation errors lead to an error message (as you noticed).
The same applies to CUDALucas: before downclocking my Titan's memory to 2600MHz from 3000MHz I never had a roundoff error message but ended up with wrong residues. |
|
|
|
|
|
#3 |
|
Einyen
Dec 2003
Denmark
22×863 Posts |
Yeah you might be right, it is just strange that all the testing gave not a single error, and now it seems fine again. I'll have to continue to watch it and do some more double checks.
|
|
|
|
|
|
#4 |
|
"Oliver"
Mar 2005
Germany
5·223 Posts |
Link training usualy gives different results (settings) on each (re-)boot for PCIe, DDRx, QPI, whanever links...
Oliver P.S. I've seen Xeons with DDR3 reg. ECC memory throwing hundreds of correctable ECC errors per second, after next reboot that number went down to maybe one error per hour.. and the next boot the system refuses to boot. This is of course an extreme example, not the general case. Last fiddled with by TheJudger on 2016-01-16 at 15:47 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Weird error message | ThomRuley | Msieve | 15 | 2017-03-30 18:39 |
| Weird freezing error, desperate for a solution. | jasong | Lounge | 5 | 2016-11-18 00:43 |
| Weird error message in small fft and blend | samhot84 | Information & Answers | 4 | 2014-04-20 19:40 |
| Weird seiving error | popandbob | Twin Prime Search | 7 | 2007-06-09 20:37 |
| Weird Game and Prime 95 problems, may it be Hardware? | Arthanis | Hardware | 30 | 2005-01-07 11:16 |