![]() |
![]() |
#331 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4,903 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#332 |
Aug 2013
3×29 Posts |
![]()
Double checking PRP with build 10 on my “bad PRP” machine. Had to turn off Gerbicz verbosity 3 because progress had advanced just 0.74% overnight. Turned it off and speeds returned to normal.
|
![]() |
![]() |
![]() |
#333 | |
P90 years forever!
Aug 2002
Yeehaw, FL
1CB616 Posts |
![]() Quote:
Please send the results.txt file to me. |
|
![]() |
![]() |
![]() |
#334 | |
Sep 2003
258310 Posts |
![]() Quote:
A string buffer overflow somewhere in all those additional verbose strings, or a change in the patterns of register usage, if registers aren't being saved and restored correctly. That might even tie in with errors happening in the end-of-PRP-test final processing, because new strings get printed at that point, other than the usual iteration count lines. |
|
![]() |
![]() |
![]() |
#335 | |
Aug 2013
5716 Posts |
![]() Quote:
I also removed the "Gerbicz offset" line. Maybe that affected things? Every half second it was showing outputs of "Gerbicz checking iteration X" "Test passed" or whatever. After 7 hours it had advanced less than 1%. I'll forward the results.txt file to you. I will keep that individual machine "broken" for the time being, but I finally "fixed" the other three, meaning I found CPU/RAM settings that have stable AVX512 tests in AIDA64. Originally was 4.1Ghz, and that was failing. Tried 3.9Ghz - failed. Tried 3.8Ghz - failed. Tried 3.7 - that failed too. Finally, Mystical suggested undoing XMP settings for RAM (3600Mhz @ 1.35v and 19-20-20-40), and I discovered (to my surprise) that 3.8Ghz CPU and stock RAM settings (2000Mhz) had stable AVX512 in AIDA64 for 37 hours on all 3 machines. SUCCESS! So now I'm leaving the "broken" PRP doublecheck system the way it is for testing purposes (it's currently doing a PRP doublecheck on build 10), but I'm using the other three machines to try different RAM speeds to see which works. Right now I have one still at stock RAM (2000Mhz), another one at 3000Mhz RAM (default cas latency, which I think is 15), and another one with RAM at 3400Mhz (with 1.3v and 19-20-20-40 like XMP suggests). We'll see which is stable after 24 hours. |
|
![]() |
![]() |
![]() |
#336 |
P90 years forever!
Aug 2002
Yeehaw, FL
2·3·52·72 Posts |
![]()
This line from Simon's results.txt file is ominous:
Code:
[Fri Feb 08 21:13:48 2019] Start Gerbicz block of size 0 at iteration 0. Simon, do you know if this was build 9 or build 10? Also, early last month we see: Code:
[Fri Jan 04 03:13:35 2019] Iteration: 1/79048733, Possible error: round off (0.193936384) > 0 |
![]() |
![]() |
![]() |
#337 | |
P90 years forever!
Aug 2002
Yeehaw, FL
1CB616 Posts |
![]() Quote:
Code:
PRPGerbiczCompareIntervalAdj= -206254 Build 10 would have caught this and set block size to 16 -- which also explains why build 10 was running so slowly for Simon. I'm adding checks for bogus adjustment values. |
|
![]() |
![]() |
![]() |
#338 |
Sep 2016
331 Posts |
![]()
Nice! Just like that, it sounds like both software and hardware problems are now resolved?
![]() |
![]() |
![]() |
![]() |
#339 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
4,903 Posts |
![]()
Sanity checks for anything a user could get his hands on is probably a good idea. Even if it is an issue created by some unforeseen combination of code behavior plus unexpected hardware error, not user error. The mindset of what characters could possibly be entered or stored here by a creative user and what effects might those have, removes filtering expectations about what the program will or won't or couldn't put there.
|
![]() |
![]() |
![]() |
#340 |
P90 years forever!
Aug 2002
Yeehaw, FL
11100101101102 Posts |
![]()
There are now 4 separate sanity checks that would have caught Simon's problem.
1) The adjustment value is forced to be between 0.001 and 1.0 2) The gerbicz block size is forced to be between 25 and number of iterations remaining 3) If the iteration counter somehow gets past the end of the Gerbicz error a rollback occurs. 4) If the PRP test completes and the internal PRP state is "in the middle of a Gerbicz block" then a rollback occurs. More sanity checks are on the way as well as protection against copying errors discussed in another thread. |
![]() |
![]() |
![]() |
#341 |
Banned
"Luigi"
Aug 2002
Team Italia
26·3·52 Posts |
![]()
I was doing a double-check on my 9800X (4 threads used by Prime95, 4 threads used by another sieving program), when I had the following message:
Code:
[Work thread Feb 10 11:49] Running Jacobi error check. Failed. Time: 11.230 sec. [Work thread Feb 10 11:50] Iteration: 17409895/47905967, ERROR: Jacobi error check failed! [Work thread Feb 10 11:50] Continuing from last save file. [Work thread Feb 10 11:50] Setting affinity to run helper thread 1 on CPU core #3 [Work thread Feb 10 11:50] Setting affinity to run helper thread 2 on CPU core #4 [Work thread Feb 10 11:50] Setting affinity to run helper thread 3 on CPU core #5 [Work thread Feb 10 11:50] Running Jacobi error check. Failed. Time: 11.132 sec. [Work thread Feb 10 11:50] Error reading intermediate file: p9M05967 [Work thread Feb 10 11:50] Renaming p9M05967 to p9M05967.bad1 [Work thread Feb 10 11:50] Trying backup intermediate file: p9M05967.bu [Work thread Feb 10 11:50] Running Jacobi error check. Failed. Time: 11.175 sec. [Work thread Feb 10 11:50] Error reading intermediate file: p9M05967.bu [Work thread Feb 10 11:50] Renaming p9M05967.bu to p9M05967.bad2 [Work thread Feb 10 11:50] Trying backup intermediate file: p9M05967.bu2 [Work thread Feb 10 11:50] Running Jacobi error check. Failed. Time: 11.209 sec. [Work thread Feb 10 11:50] Error reading intermediate file: p9M05967.bu2 [Work thread Feb 10 11:50] Renaming p9M05967.bu2 to p9M05967.bad3 [Work thread Feb 10 11:50] All intermediate files bad. Temporarily abandoning work unit. What should I do with the *.bad savefiles? |
![]() |
![]() |