![]() |
![]() |
#1 |
Oct 2021
U. S. / Maine
2·73 Posts |
![]()
First off: GPUOwl 7.2.63 on up-to-date Windows 10.
I decided today to try a modest undervolt for GPUOwl on my Radeon 5700 XT. I set the voltage I wanted in Radeon Software and -log 10000 in config.txt, then began stepping the clock by 25 MHz, starting GPUOwl, and watching for GEC failures. (I just did this with the exponent I was already working on rather than loading a test exponent, because I figured the GEC failure rollback would save me, especially with a 10,000 iter. check interval.) I kept doing this until I saw some, then backed off one step and started GPUOwl again for a longer burn-in test. Upon starting up this time, I got a failed proof residue validation. But, the problematic residue was stated to be from much earlier in the test, well before I started screwing with anything (I was at iter. 33,xxx,xxx and the mismatch was stated to be in a residue from around 15,xxx,xxx). Also, GPUOwl automatically tried validating my residues for the next proof power down (I use 10, it tried 9) — and that passed and the test resumed. What happened here? I tried deleting every save file and temporary proof file from after I started my undervolting process, and also reverting to my 5700 XT's base voltage and clock, but neither resolved anything. At that point I didn't want to risk turning in a bad proof so I cut my losses and unreserved the exponent, but I'm still very curious on exactly how this problem arose. All input appreciated. |
![]() |
![]() |
![]() |
#2 |
Oct 2021
U. S. / Maine
100100102 Posts |
![]()
I realize I was being a bit inspecific. Here is the actual GPUOwl printout, copied from my log file:
Code:
114482779 OK 38410000 on-load: blockSize 400, 42fed0e4b7671ebf 114482779 validating proof residues for power 10 114482779 checksum 13cd8e0d (expected 3bb9aafa) in '.\114482779\proof\15540145' 114482779 validating proof residues for power 9 114482779 Proof using power 9 (vs 10) for 114482779 This is from after I took the step of deleting the noted files, but the error appeared exactly the same before that, right down to the expected and actual checksums. |
![]() |
![]() |
![]() |
#3 |
Oct 2021
U. S. / Maine
2228 Posts |
![]()
I may have an insight. I reviewed further up in my log to see if I could notice anything about the run just before the problem started appearing. It turns out, I had killed the process when it was in the middle of validating proof residues (I caught an incorrectly set Radeon Software parameter and needed to fix it). Is there a possibility that the problematic residue is the one the program was then in the middle of reading, and aborting the operation corrupted it?
I will feel stupid if this turns out to have nothing to do with undervolting (although I'm now even more confused on how the bad residue still validated for proof power 9). |
![]() |
![]() |
![]() |
#4 |
Jul 2003
So Cal
3×13×61 Posts |
![]()
A stab in the dark... Perhaps the residues were fine, but the voltage was still too low and an error happened during the validation.
|
![]() |
![]() |
![]() |
#5 |
Oct 2021
U. S. / Maine
9216 Posts |
![]() |
![]() |
![]() |
![]() |
#6 |
"Mihai Preda"
Apr 2015
2·691 Posts |
![]()
I don't understand what happened. The proof residues are written once only, afterwards they are only read. The check of residues at startup is done CPU-side only (it's a very simple checksum over the file). But if the check is suspected, a simple restart would re-do the check of the proof files, and if the outcome is the reproducible than it's reliable.
I understand that you did restart the process a few times, and it did check as correct the 15540145 proof file, only for it to turn bad at some later point.. strange because I don't expect that file to mutate. Anyway, you're lucky because you can still generate a power-9 proof, which is perfectly fine. If you still have the data around, I'd suggest you finish the exponent, the proof should be good. OTOH it's true, it's a problem that I don't see how that file error appeared. Last fiddled with by preda on 2021-11-19 at 14:50 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Something strange ... | bayanne | Software | 6 | 2016-04-06 04:33 |
strange problem: efficient 'radix sums' | jasonp | Programming | 13 | 2013-05-16 19:11 |
Strange bug with GMP-ECM | MatWur-S530113 | GMP-ECM | 2 | 2007-11-19 00:01 |
strange problem with torture test on 16core machines | TheJudger | Hardware | 5 | 2006-04-08 11:20 |
STRANGE problem with Shuttle ST20G5 | g1ul10 | Hardware | 6 | 2006-03-19 17:27 |