![]() |
![]() |
#1 |
Sep 2017
USA
3328 Posts |
![]()
I recently started a new-to-GIMPS machine on PRP first time tests. On the assignment rules page, I have it set so machine should do one matching double-check yearly. New LLing machines start out with a double check. However, this machine jumped right into the first time PRP test.
I understand that the Gerbicz error-check for PRPs is very reliable. But wouldn't it be prudent for each PRPing machine to still do an occasional double check? Or is the error check just that good? Thanks! |
![]() |
![]() |
![]() |
#2 |
P90 years forever!
Aug 2002
Yeehaw, FL
13·563 Posts |
![]()
Your observation is correct.
In theory, the Gerbicz error-check is so good that an undetected error is virtually impossible. Thus, if your machine is not quite stable, you should see some error messages during the test. |
![]() |
![]() |
![]() |
#3 |
P90 years forever!
Aug 2002
Yeehaw, FL
13×563 Posts |
![]()
That said, prime95's implementation of the Gerbicz error check and recovery is flawed somehow. I'm investigating now. It will be fixed in version 29.6.
|
![]() |
![]() |
![]() |
#4 |
Sep 2017
USA
2×109 Posts |
![]()
Interesting. Thank you.
Are current PRP tests considered reliable? Would you recommend going back to LLing for now? Thanks in advance! |
![]() |
![]() |
![]() |
#5 |
P90 years forever!
Aug 2002
Yeehaw, FL
13×563 Posts |
![]()
PRP tests are more reliable than LL. Carry on. I'll post more when I've finished debugging.
|
![]() |
![]() |
![]() |
#6 |
P90 years forever!
Aug 2002
Yeehaw, FL
13·563 Posts |
![]()
Gerbicz PRP investigation thusfar:
1) A bug was fixed on Aug. 20, 2018 where a Gerbicz check could erroneously succeed (both values contained an invalid floating point value like INF or NaN). For testing, I tweaked the gwnum code to spit out bad values randomly 2% of the time. 2) If one set GerbiczCompareInterval=100 in prime.txt, then the automatic adjusting code could eventually reduce the compare interval to zero, which resulted in no error checking. In v29.6 the interval will not be allowed to get below 16. 3) If roundoff checking is enabled, there is a bug recovering from intermediate files that will eventually rollback the PRP test to the beginning. I haven't fixed this yet. Currently, I've tested a PRP of 19937 and 44497 successfully. Simon Cunningham's failed test of M79075979 remains unexplained. Last fiddled with by Prime95 on 2019-02-05 at 18:58 |
![]() |
![]() |
![]() |
#7 | |
Sep 2003
32×7×41 Posts |
![]() Quote:
Say, a memory corruption bug that zeroed out the compare interval? Instead of using zero as the value that means no error checking, maybe it should be some specific randomly chosen magic 64-bit constant. And similar for any other variable that could lead to error checking being turned off. |
|
![]() |
![]() |
![]() |
#8 |
P90 years forever!
Aug 2002
Yeehaw, FL
13×563 Posts |
![]()
Another PRP bug fixed. The routine that calculated interim and final residues was not checking the error code from converting the FFT data to binary. The subsequent rotating of the binary data (to undo the shift count) could corrupt memory.
Triggering this bug always caused a crash for me -- not an incorrect final result. I do not think this is related to Simon's problem. |
![]() |
![]() |
![]() |
#9 |
P90 years forever!
Aug 2002
Yeehaw, FL
13×563 Posts |
![]()
In my review of the code I believe the biggest vulnerability is at the start of each Gerbicz block. At that point in time there is only one gwnum value sitting in memory. If there is an error reading that value then the final result will be incorrect and undetected.
I believe gpuowl found a way around this vulnerabiilty. Time to dig through preda's forum messages. |
![]() |
![]() |
![]() |
#10 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
32·72·11 Posts |
![]() Quote:
A->B->C; compare A, C, to check B arrived without error, or detectable error anyway. My notes on the gpuowl development thread say @ post 727.preda re gec mechanism (gpu-cpu-gpu copy) http://www.mersenneforum.org/showthread.php?t=22204 I did something related long ago to check for disk read/write error rate. Large file A->B, many iterations of copy B->C->D->B, compare A, B. Last fiddled with by kriesel on 2019-02-09 at 19:58 |
|
![]() |
![]() |
![]() |
#11 | |
"Mihai Preda"
Apr 2015
2·23·29 Posts |
![]() Quote:
As a schematic pseudocode, for blockSize L=1000, doing the check every L2=L^2=1M iterations (but the check can be done at any multiple of L, not only L^2), and Base is 3: Code:
[init] Data:=3 Check:=1 [one block] repeat L times: Data:=Data^2 if is-time-to-check: Tmp:=Check repeat L times: Tmp:=Tmp^2 Tmp:=Tmp * 3 Check:=Check * Data OK:= (Tmp == Check) else: // don't check yet Check:=Check * Data [repeat one block] The round-tripping trick I use to work around data corruption during the transfer GPU<->CPU is: 0. initially data is GPU-side 1. read data to CPU 2. write what was just read back to GPU (from CPU) 3. do the check on GPU. If the check succeeds, then I'm confident that I have good data CPU-side. An analogy without the GPU would be: 1. write data from RAM to disk (to savefile) 2. read data back from savefile 3. do the check based on what was read from disk. if the check succeeds, then data is likely good on disk. Last fiddled with by preda on 2019-02-10 at 00:00 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Running a Windows machine at the end of a wire | fivemack | Programming | 2 | 2015-06-30 18:02 |
Default ECM assignments | lycorn | PrimeNet | 9 | 2015-01-09 16:32 |
running gimps on a virtual machine | sixblueboxes | Hardware | 2 | 2013-03-31 22:14 |
mfaktO and mfaktC running on same machine. Proof! | swl551 | GPU Computing | 2 | 2012-08-19 13:37 |
running two copies of prime95 in the same machine | ppo | Information & Answers | 25 | 2007-07-30 23:25 |