mersenneforum.org Very strange proof problem
 Register FAQ Search Today's Posts Mark Forums Read

 2021-10-28, 19:50 #1 techn1ciaN     Oct 2021 U. S. / Maine 2·73 Posts Very strange proof problem First off: GPUOwl 7.2.63 on up-to-date Windows 10. I decided today to try a modest undervolt for GPUOwl on my Radeon 5700 XT. I set the voltage I wanted in Radeon Software and -log 10000 in config.txt, then began stepping the clock by 25 MHz, starting GPUOwl, and watching for GEC failures. (I just did this with the exponent I was already working on rather than loading a test exponent, because I figured the GEC failure rollback would save me, especially with a 10,000 iter. check interval.) I kept doing this until I saw some, then backed off one step and started GPUOwl again for a longer burn-in test. Upon starting up this time, I got a failed proof residue validation. But, the problematic residue was stated to be from much earlier in the test, well before I started screwing with anything (I was at iter. 33,xxx,xxx and the mismatch was stated to be in a residue from around 15,xxx,xxx). Also, GPUOwl automatically tried validating my residues for the next proof power down (I use 10, it tried 9) — and that passed and the test resumed. What happened here? I tried deleting every save file and temporary proof file from after I started my undervolting process, and also reverting to my 5700 XT's base voltage and clock, but neither resolved anything. At that point I didn't want to risk turning in a bad proof so I cut my losses and unreserved the exponent, but I'm still very curious on exactly how this problem arose. All input appreciated.
 2021-10-28, 20:21 #2 techn1ciaN     Oct 2021 U. S. / Maine 100100102 Posts I realize I was being a bit inspecific. Here is the actual GPUOwl printout, copied from my log file: Code: 114482779 OK 38410000 on-load: blockSize 400, 42fed0e4b7671ebf 114482779 validating proof residues for power 10 114482779 checksum 13cd8e0d (expected 3bb9aafa) in '.\114482779\proof\15540145' 114482779 validating proof residues for power 9 114482779 Proof using power 9 (vs 10) for 114482779 (I slightly misremembered what my progress in the test was.) This is from after I took the step of deleting the noted files, but the error appeared exactly the same before that, right down to the expected and actual checksums.
 2021-10-28, 20:57 #3 techn1ciaN     Oct 2021 U. S. / Maine 2228 Posts I may have an insight. I reviewed further up in my log to see if I could notice anything about the run just before the problem started appearing. It turns out, I had killed the process when it was in the middle of validating proof residues (I caught an incorrectly set Radeon Software parameter and needed to fix it). Is there a possibility that the problematic residue is the one the program was then in the middle of reading, and aborting the operation corrupted it? I will feel stupid if this turns out to have nothing to do with undervolting (although I'm now even more confused on how the bad residue still validated for proof power 9).
 2021-10-28, 21:42 #4 frmky     Jul 2003 So Cal 3×13×61 Posts A stab in the dark... Perhaps the residues were fine, but the voltage was still too low and an error happened during the validation.
2021-10-28, 21:58   #5
techn1ciaN

Oct 2021
U. S. / Maine

9216 Posts

Quote:
 Originally Posted by frmky A stab in the dark... Perhaps the residues were fine, but the voltage was still too low and an error happened during the validation.

Dubious. I applied my 5700 XT's default clock and voltage and restarted GPUOwl, but got the same error with the same bad checksum.

2021-11-19, 14:49   #6
preda

"Mihai Preda"
Apr 2015

2·691 Posts

Quote:
 Originally Posted by techn1ciaN What happened here?
I don't understand what happened. The proof residues are written once only, afterwards they are only read. The check of residues at startup is done CPU-side only (it's a very simple checksum over the file). But if the check is suspected, a simple restart would re-do the check of the proof files, and if the outcome is the reproducible than it's reliable.

I understand that you did restart the process a few times, and it did check as correct the 15540145 proof file, only for it to turn bad at some later point.. strange because I don't expect that file to mutate.

Anyway, you're lucky because you can still generate a power-9 proof, which is perfectly fine. If you still have the data around, I'd suggest you finish the exponent, the proof should be good.

OTOH it's true, it's a problem that I don't see how that file error appeared.

Last fiddled with by preda on 2021-11-19 at 14:50

 Similar Threads Thread Thread Starter Forum Replies Last Post bayanne Software 6 2016-04-06 04:33 jasonp Programming 13 2013-05-16 19:11 MatWur-S530113 GMP-ECM 2 2007-11-19 00:01 TheJudger Hardware 5 2006-04-08 11:20 g1ul10 Hardware 6 2006-03-19 17:27

All times are UTC. The time now is 19:22.

Sat May 28 19:22:20 UTC 2022 up 44 days, 17:23, 0 users, load averages: 1.42, 1.50, 1.54