![]() |
![]() |
#1 |
Nov 2022
12 Posts |
![]()
Getting hardware errors for the first time in years of running GIMPS Anyone can tell me what this means? clips from messages:
[Nov 21 07:37] Resuming Gerbicz error-checking PRP test of M115299347 using FMA3 FFT length 6M, Pass1=1536, Pass2=4K, clm=1, 12 threads [Nov 21 07:37] PRP proof using power=8 and 64-bit hash size. [Nov 21 07:37] Proof requires 3.7GB of temporary disk space and uploading a 130MB proof file. [Nov 21 07:37] Iteration: 93123971 / 115299347 [80.76%]. [Nov 21 07:37] Hardware errors have occurred during the test! [Nov 21 07:37] 1 Gerbicz/double-check error. [Nov 21 07:37] Confidence in final result is excellent. [Nov 21 07:37] Iteration: 93130000 / 115299347 [80.77%], ms/iter: 3.391, ETA: 20:53:04 [Nov 21 07:37] Hardware errors have occurred during the test! [Nov 21 07:37] 1 Gerbicz/double-check error. [Nov 21 07:37] Confidence in final result is excellent. [Nov 21 07:38] Iteration: 93140000 / 115299347 [80.78%], ms/iter: 4.655, ETA: 28:39:07 [Nov 21 07:38] Hardware errors have occurred during the test! [Nov 21 07:38] 1 Gerbicz/double-check error. [Nov 21 07:38] Confidence in final result is excellent. [Nov 21 07:39] Iteration: 93150000 / 115299347 [80.78%], ms/iter: 5.218, ETA: 32:06:21 [Nov 21 07:39] Hardware errors have occurred during the test! [Nov 21 07:39] 1 Gerbicz/double-check error. [Nov 21 07:39] Confidence in final result is excellent. |
![]() |
![]() |
![]() |
#2 | |
Sep 2002
Database er0rr
5·29·31 Posts |
![]() Quote:
Install some temperature measuring software. Things to watch out for are:
HTH and allays your fears. Gerbicz EDAC is a recent addition to GIMPS software. May your number be prime. If not then there is consolation. Last fiddled with by paulunderwood on 2022-11-21 at 18:44 |
|
![]() |
![]() |
![]() |
#3 |
Mar 2021
Rockledge, Sunny FL
2·19 Posts |
![]()
I am also starting to get these errors. I did slightly adjust my AMD Ryzen 9 5900X processor up a bit in CPU voltage, but that should not be causing the error.
My memory screams and my peak speed average on the twelve threads is 3,692 Mhz. I am well below the max core temp at 87C with water cooling. I see 3 Gerbicz/double-check errors each iteration. Confidence is high and it moves on to the next iteration where it alerts again. This is interesting. A snapshot: CPU-Z data: Last fiddled with by FlaJunkie on 2022-12-02 at 17:39 Reason: add pics |
![]() |
![]() |
![]() |
#4 |
Jan 2021
California
523 Posts |
![]() |
![]() |
![]() |
![]() |
#5 | |
Mar 2021
Rockledge, Sunny FL
2×19 Posts |
![]() Quote:
And if I stop and restart the program, it still shows 3 errors. Seems a bit odd. |
|
![]() |
![]() |
![]() |
#6 | |
Jan 2021
California
20B16 Posts |
![]() Quote:
After resuming it should continue to show the total number of errors that have occurred during the run. Last fiddled with by slandrum on 2022-12-02 at 18:11 |
|
![]() |
![]() |
![]() |
#7 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
34×7×13 Posts |
![]() Quote:
The buffers in which prime95 worker window contents are stored are of finite size. Runs can be very long, with many status updates. So an early or mid-run error message will no longer be present in the buffer, after newer updates fill the buffer. The odds are quite low, of a typical user seeing a message that only occurs once when a GEC error or other error type is detected. (Even the most obsessed user must sleep sometime!) And frequently, long runs are terminated before completion, and resumed from last save file, due to intentional program shutdown, power loss, hardware errors, Windows updates, etc. Worker window contents are not saved at shutdown by the program, or restored at relaunch. And, it's not each iteration, unless prime95 has been seriously inefficiently manually misconfigured. As Paul's example shows, the update can occur at tens of thousands of iterations apart. Code:
[Nov 21 07:37] Iteration: 93130000 / 115299347 [80.77%], ms/iter: 3.391, ETA: 20:53:04 [Nov 21 07:37] Hardware errors have occurred during the test! [Nov 21 07:37] 1 Gerbicz/double-check error. [Nov 21 07:37] Confidence in final result is excellent. [Nov 21 07:38] Iteration: 93140000 / 115299347 [80.78%], ms/iter: 4.655, ETA: 28:39:07 [Nov 21 07:38] Hardware errors have occurred during the test! [Nov 21 07:38] 1 Gerbicz/double-check error. [Nov 21 07:38] Confidence in final result is excellent. In a somewhat extreme example for total run time, following, it's 50,000 iterations and several minutes between status updates. Code:
[Dec 1 12:21:36] Iteration: 66000000 / 550000007 [11.99%], ms/iter: 76.097, ETA: 426d 06:52 [Dec 1 12:23:08] Gerbicz error check passed at iteration 66000000. [Dec 1 13:30:58] Iteration: 66050000 / 550000007 [12.00%], ms/iter: 80.856, ETA: 452d 21:28 [Dec 1 14:35:52] Iteration: 66100000 / 550000007 [12.01%], ms/iter: 77.266, ETA: 432d 17:48 [Dec 1 15:40:48] Iteration: 66150000 / 550000007 [12.02%], ms/iter: 77.546, ETA: 434d 06:24 [Dec 1 16:47:06] Iteration: 66200000 / 550000007 [12.03%], ms/iter: 79.164, ETA: 443d 06:49 [Dec 1 17:55:10] Iteration: 66250000 / 550000007 [12.04%], ms/iter: 81.102, ETA: 454d 02:04 [Dec 1 19:02:36] Iteration: 66300000 / 550000007 [12.05%], ms/iter: 80.503, ETA: 450d 16:31 [Dec 1 20:08:53] Iteration: 66350000 / 550000007 [12.06%], ms/iter: 79.146, ETA: 443d 01:05 [Dec 1 21:13:46] Iteration: 66400000 / 550000007 [12.07%], ms/iter: 77.448, ETA: 433d 11:52 [Dec 1 22:18:25] Iteration: 66450000 / 550000007 [12.08%], ms/iter: 77.197, ETA: 432d 01:00 [Dec 1 23:28:11] Iteration: 66500000 / 550000007 [12.09%], ms/iter: 83.183, ETA: 465d 11:52 [Dec 2 00:36:01] Iteration: 66550000 / 550000007 [12.09%], ms/iter: 81.022, ETA: 453d 08:35 [Dec 2 01:41:16] Iteration: 66600000 / 550000007 [12.10%], ms/iter: 77.904, ETA: 435d 20:45 [Dec 2 02:47:31] Iteration: 66650000 / 550000007 [12.11%], ms/iter: 78.826, ETA: 440d 23:31 [Dec 2 03:48:14] Iteration: 66700000 / 550000007 [12.12%], ms/iter: 72.477, ETA: 405d 10:00 [Dec 2 04:50:01] Iteration: 66750000 / 550000007 [12.13%], ms/iter: 73.777, ETA: 412d 15:34 [Dec 2 06:00:48] Iteration: 66800000 / 550000007 [12.14%], ms/iter: 84.367, ETA: 471d 19:56 [Dec 2 07:11:10] Iteration: 66850000 / 550000007 [12.15%], ms/iter: 84.054, ETA: 470d 00:48 [Dec 2 08:18:27] Iteration: 66900000 / 550000007 [12.16%], ms/iter: 80.315, ETA: 449d 01:47 [Dec 2 09:23:53] Iteration: 66950000 / 550000007 [12.17%], ms/iter: 77.964, ETA: 435d 21:11 [Dec 2 10:28:46] Iteration: 67000000 / 550000007 [12.18%], ms/iter: 77.506, ETA: 433d 06:42 [Dec 2 10:30:10] Gerbicz error check passed at iteration 67000000. |
|
![]() |
![]() |
![]() |
#8 | |
Mar 2021
Rockledge, Sunny FL
468 Posts |
![]() Quote:
I agree, but the screen uses the word "Iteration:" and that's what I meant. Once again, thanks for the comments. Very Interesting. Now I need to figure out why my high-powered machine burped during the program. |
|
![]() |
![]() |
![]() |
#9 | |||
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
34×7×13 Posts |
![]()
Yes, but I would advise against that. Reducing its verbosity is also possible. Read undoc.txt. Also, we can increase the number of iterations between worker window updates, which would let both the user and the program use their time more efficiently.
Quote:
Quote:
Quote:
Have fun, and happy sleuthing what caused the errors. Undoing pushing the hardware beyond the default clocking or voltage is a place to start. I don't get too concerned about a few errors in a PRP run. But if a system is error prone as shown by PRP/GEC's excellent detection (& rewind to known-good point) rate, that system is not a candidate for work with less reliable error detection, such as LLDC, or P-1 factoring. Strictly speaking, I think the GEC error count is a count of detection of errors. (If multiple errors occur within a single check period, which hopefully is rare, I think it detects and counts one check mismatch, then goes back and tries again from the last known good save file and its iteration number & stored interim residue.) Last fiddled with by kriesel on 2022-12-02 at 20:16 |
|||
![]() |
![]() |
![]() |
#10 |
Mar 2021
Rockledge, Sunny FL
2×19 Posts |
![]()
Thanks for the responses.
With your explanations, I have theorized that while I was running the program earlier, the hardware failure I generated by OC must have been recorded by Prime95. I had raised the OC parameters to give a ~5,000 Mhz peak speed across all 12 threads. The system rebooted within 5 minutes while the Prime95 program was operating. I reset the values except I kept the built-in OC function enabled with an increased core voltage. The program and computer has worked well since. The screens I posted earlier show the current settings. The errors reported by Prime95 were probably generated during the OC failure. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Hardware errors have occurred during the test! | rgirard1 | Software | 27 | 2021-05-31 03:16 |
Hardware errors help | Chelle | Hardware | 8 | 2020-10-21 13:18 |
Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4. | Xyzzy | Software | 7 | 2016-12-20 00:01 |
Possible hardware errors... | SverreMunthe | Hardware | 16 | 2013-08-19 14:39 |
more about hardware errors | graeme | Hardware | 4 | 2003-07-08 09:14 |