![]() |
[QUOTE=preda;465415]After consideration and input from this thread, this is the approach I ended up using in gpuOwL RE Jacobi check:
- on startup (from beginning or from savefile), establish a "good Jacobi" point that will be used if rollback is needed. When starting from a savefile this involves running one Jacobi check at the very beginning to verify that the savefile passes Jacobi (if it doesn't, it won't start). - on every Jacobi-check, either move the "good Jacobi" point forward if the check passes, or roll back to the the most recent rollback point if the check fails. The rollback point is kept in RAM, thus no file-read is involved in rolling back (thus simpler implem). One more Jacobi check is done at the end (after the last iteration), with the same behavior.[/QUOTE] Just a thought: you may wish to keep two rollback points to protect against bad bits in RAM, etc., and avoid looping on a bad rollback point. |
[QUOTE=Prime95;465601]Can you test a number that is 2x and 4x larger? From that we can see if run-time increases as O(n^2) or O(n log n).
Also, can you also time GCD for 22,577,252 digits and 2x and 4x? Thanks, I'm unable to run any tests of my own at present. Extra credit would be to time prime95's GCD for 22,577,252 digits.[/QUOTE] Average over 10 runs (on 1 core of Haswell-E 5960X), GMP 6.1.2. The "residue"/2nd number is a random number of the same size as Mp. [CODE] digits Jacobi GCD 2[SUP]75000007[/SUP]-1 22577252 28.65 sec 26.12 sec 2[SUP]150000029[/SUP]-1 45154509 67.61 sec 61.20 sec 2[SUP]300000047[/SUP]-1 90309013 156.48 sec 140.86 sec [/CODE] By timing Prime95's GCD I assume you mean compiling GWNUM and using that? I have never compiled GWNUM and unfortunately I do not have time to experiment right now, maybe in a few days. |
[QUOTE=ATH;465616] I have never compiled GWNUM and unfortunately I do not have time to experiment right now, maybe in a few days.[/QUOTE]
I did it just 3 weeks ago, it's quite easy once you get the right sections together. Drop me a note if you feel you need it :smile: |
[QUOTE=kriesel;465607]If someone has current data on the relative run times of prp efficiently implemented and LL efficiently implemented (such as prime95 or the gpu equivalents) for p~80M or ~100M, on same hardware (cpu or gpu), please share the specifics.[/QUOTE]
I don't have actual gwnum timings available, but for a coder's point-of-view the two are equal. A PRP test requires some multiplies by 3 which is a negligible O(n) cost in the carry propagation step vs. the LL test which has a -2 operation which is an O(1) cost. Both costs are insignificant compared to the O(n log n) FFT cost. You are correct that there is minimal gain from this new error check procedure if one decides double-checking first time results is still required. As you and I both point out, there is significant costs to transitioning from a LL-based to PRP-based system. If SoB decided to abandon double-checking because first time tests were ultra-reliable, then they would get a doubling of throughput. GIMPS would also gain by implementing this even if it continues double-checking because the "GIMPS has run a first test on all exponents below N" milestone would actually mean something because of the high confidence we'd have in first-PRP tests. BTW, I've argued for years that the "we've first tested all exponents below N" milestone should be deleted because it means nothing -- we know for a fact there are still several thousand exponents that have not yet had a correct LL test. |
[QUOTE=Prime95;465600]The server would need to track LL vs PRP residues, assign either LL double-checks or PRP double-checks as necessary, all other Mersenne programs would need work do a PRP test and report a PRP residue.
[/QUOTE] Correct, if for a given p there is an available LL test's residue then you should do the double check with LLT since there is no way to get from the LL test's residue the 3^(mp+1) mod mp residue. [QUOTE=Prime95;465600] If we could convince ourselves that this scheme could produce 99.9+% reliable results, should GIMPS consider eliminating double-checking? Probably not. But maybe it would no longer be a big deal that double-checking is 8 years behind the first-time testers.[/QUOTE] As written this detects all single errors in a given block, but even if there would be multiple errors it seems that the error's probability is in the range of 1/mp. My original post: [url]http://mersenneforum.org/showthread.php?t=22510[/url] this idea can now be extended in the obvious way for N=k*2^n-1 for k>1: choose (odd) base=a, for that gcd(a,N)=1. And in the general case, where N=k*2^n+c (c=+-1), we can delay the k-th powermod, so like for Mersenne numbers we calculate the similar u(i)=a^(2^i) mod N sequence for i=0..n (with this u[0]=u0=a) and at the end we do the k-th powermod (if k!=1, so for not Mersenne/Fermat numbers) , with this we use the same alg. and we don't need to store a largish u0=a^k mod N. To get an even more reliable residue: if n=c1*L+c0 (for 0<=c0<L), then double check the last c0 squarings, because there we are naked, there is no strong error checking. The same is true for the general case: repeat the k-th powermod at the end of the algorithm (if there would be a mismatch then redo it again). Here the overhead is the smallish O(log2(k)+L) mulmods. In the above general case, you have to store also the base=a used at the computation if you want to do later a double check. Or just use the smallest good base>1 and in this case we'd use the same base. [QUOTE=Prime95;465623]A PRP test requires some multiplies by 3 which is a negligible O(n) cost in the carry propagation step vs. the LL test which has a -2 operation which is an O(1) cost. Both costs are insignificant compared to the O(n log n) FFT cost. [/QUOTE] Here we have an even much better situation, doing very few multiplication by 3, only in the error checking part. (note that in the general case, you have to multiple by the small base=a>1) |
[QUOTE=Prime95;465600]I've been following this new error check test with interest. Bravo to R. Gerbicz.[/quote]
Indeed, I owe him an apology for too-hastily perusing his post and concluding he was simply advocating doing a base-3 PRP test. [quote]The biggest problem with implementing such a scheme for Mersenne numbers is the massive changes that would be required outside of prime95. The server would need to track LL vs PRP residues, assign either LL double-checks or PRP double-checks as necessary, all other Mersenne programs would need work do a PRP test and report a PRP residue. I think this makes a great deal of sense to implement in prime95's PRP code. LLR and openPFGW should as well. I think this would allow some projects to proceed faster by eliminating double-checks completely (SoB for example). If we could convince ourselves that this scheme could produce 99.9+% reliable results, should GIMPS consider eliminating double-checking? Probably not. But maybe it would no longer be a big deal that double-checking is 8 years behind the first-time testers.[/QUOTE] I am hopeful that with renewed interest in this topic, a similarly 'bulletproof' check will be found for the LLT. In the meantime the PRP and Pe'pin-test codes can take advantage of the new '99.9%' algorithm. |
[QUOTE=R. Gerbicz;465584]3rd attempt to explain the algorithm.[/QUOTE]
Yes this is a very nice idea! (now that I understand it). After a bit of informal error analysis it seems to me that it has indeed very good error detection, which does not depend (does not diminish) on the length of the check step (the L^2 above). So, the computation cost is one additional multiplication every L iterations, plus L additional iterations for every check. A check can be done at any multiple of L; (the proposed L^2 check frequency strikes a balance). Doing the check "often" (e.g. every L^2==1M iterations, so L==1K iterations) allows to create "correct points" for rollback, and early rollback on error (for let's say 0.5% time overhead) At the other extreme, the check could be done only once towards the very end -- so L^2 ~= p (the mersenne exponent), L==int(sqrt(p)), L around 9K for 80M exponents, offering 10 times less overhead, good error detection, but not any rollback. (this would make sense when there is high probability of correct result). OTOH, the current LL double-check validates not only the hardware (and a bit the software) but also *the user* -- i.e. it protects against the hopefully theoretical possibility of users submitting bogus results intentionally. Excluding that, I think the result validated with this technique is rock-solid. BTW, what's a good catchy name for this check? |
[QUOTE=ATH;465616]Average over 10 runs (on 1 core of Haswell-E 5960X), GMP 6.1.2. The "residue"/2nd number is a random number of the same size as Mp.
[CODE] digits Jacobi GCD 2[SUP]75000007[/SUP]-1 22577252 28.65 sec 26.12 sec 2[SUP]150000029[/SUP]-1 45154509 67.61 sec 61.20 sec 2[SUP]300000047[/SUP]-1 90309013 156.48 sec 140.86 sec [/CODE] By timing Prime95's GCD I assume you mean compiling GWNUM and using that? I have never compiled GWNUM and unfortunately I do not have time to experiment right now, maybe in a few days.[/QUOTE] Thanks for the data. It looks like GMP's Jacobi code uses the O(n log n) algorithm. You should be able to time prime95's GCD by doing P-1 on 2^75000007-1 with B1=120 and B2=120. You may need to use your watch, but accuracy of 1 second is just fine. I'm just trying to gauge how much slower prime95's "giants" code is compared to the latest GMP code. |
[QUOTE=Prime95;465672]You should be able to time prime95's GCD by doing P-1 on 2^75000007-1 with B1=120 and B2=120. You may need to use your watch, but accuracy of 1 second is just fine. I'm just trying to gauge how much slower prime95's "giants" code is compared to the latest GMP code.[/QUOTE]
Doesn't it print something like "GCD done in xxx seconds" ? |
controlling memory errors
Memory errors might occur in the gpu vram, or in the system ram, or if particularly unlucky, both. Either can affect the GIMPS calculation results of GPU applications.
Ideally we would all use highly reliable hardware, with ECC present and turned on, but that is not always the case and not what this thread is about. 1) System ram can be tested with memtest86+. It has the capability to prepare a table of bad physical locations. System ram is inexpensive, so bad modules can be detected and replaced, and the system retested. Retest periodically (annually?) is advisable. 1a) For linux systems, those badram tables can be input to the linux badram kernel patch, which allocates those bad physical locations and hangs on to them so they don't get allocated to some application we care about whose results could be ruined by memory errors, such as GIMPS computations. 1b) For Windows systems, there is not an equivalent user-appliable patch available to my knowledge. For at least some versions, there's a built-in alternative described at [URL]https://superuser.com/questions/420051/running-windows-with-defective-ram#490522[/URL] including lots of detail. Note the caution about possibly causing a boot failure if done incorrectly. This should be a temporary workaround while replacement RAM is on order. 1c) For other OSes, there may be no alternative to RAM replacement. 2) On the GPU side, ECC is often not available, and if present and enabled reduces performance. (Only high end pro-quality card models included ECC in their design.) The memory is not subject to the virtual memory management of the host OS. So it appears possible to do bad-gpu-memory lockout at the application level. Whether that results in memory fragmentation that causes problems is to be determined. |
[QUOTE=ewmayer;465628]Indeed, I owe him an apology for too-hastily perusing his post and concluding he was simply advocating doing a base-3 PRP test.[/QUOTE]
Really no problem. [QUOTE=ewmayer;465628] I am hopeful that with renewed interest in this topic, a similarly 'bulletproof' check will be found for the LLT.[/QUOTE] I've started there, and found no result. Similar question is also interesting for the first stage in pm1 get a^(k_1*k_2*...*k_m) mod N with a strong error check for coprime k_1,k_2,..,k_m integers. [QUOTE=preda;465664] After a bit of informal error analysis it seems to me that it has indeed very good error detection, which does not depend (does not diminish) on the length of the check step (the L^2 above). So, the computation cost is one additional multiplication every L iterations, plus L additional iterations for every check. A check can be done at any multiple of L; (the proposed L^2 check frequency strikes a balance). [/QUOTE] At error check there is also a cheap multiplication by 3 (for Mersenne numbers). Yes, you could do the error check at every L*H-th iteration (even in my original implementation the last strong error check has been done at such an iteration, where i isn't divisible by L*H=L^2). Optimal H depends on the speed of mulmod vs. squaremod speed, it is H/L=(time of one squaremod)/(time of one mulmod), used agm inequality. [QUOTE=preda;465664] Doing the check "often" (e.g. every L^2==1M iterations, so L==1K iterations) allows to create "correct points" for rollback, and early rollback on error (for let's say 0.5% time overhead)[/QUOTE] Maybe that is only 0.25%, if mulmod is 1.5 times slower than squaremod. [QUOTE=preda;465664] At the other extreme, the check could be done only once towards the very end -- so L^2 ~= p (the mersenne exponent), L==int(sqrt(p)), L around 9K for 80M exponents, offering 10 times less overhead, good error detection, but not any rollback. (this would make sense when there is high probability of correct result).[/QUOTE] You could even use multiple files: save (i,res,d,prev_d) for various i values (here i is multiple of L) where we have made only the strong error check on the smallest i value (in the beginning of the alg i=0) 1. if we pass the error check, then save this (i,res,d,prev_d) and delete all previous files containing (i',res',d',prev_d'). 2. if there was an error here, and say there are 8 saved (i,res,d,prev_d) files, then do a binary search on the i value to find where was the error, note that here for the smallest i value contains the rock solid res,d,prev_d so we have to do binary search on 7 i values, that takes 3 strong error checkings. If we found the last correct file, then keep that and delete all other (i',res',d',prev_d') files. Here we are still good, because saving (i,res,d,prev_d) takes less time than doing the strong error checking, and with this we get a closer saved point. Note that in this case we save d also, it is needed, because we delayed the strong error check. using many files is pointless, if there was a single error than in average the best we could gain in average is to redo L*H/2 iterations instead of the L*H, and even with say 7 interim (i,res,d,prev_d) files, not counting the solid rock validated smallest i file we are close to this L*H/2. One minor note: for i=0 in the file you can save say d=1 (not zero!), its value isn't interesting, because we won't use it (note that it isn't interesting if the strong check's d==u0*prev_d^(2^L) will be likely false(!) in the saved file for i=0, since there was no L or any iteration before i=0). Not written, but you should implement the standard check what you do within LLT: if res=0 with a possible hardware error, then we are sure we made an error, we should roll back immediately (or find the largest good saved state if you have multiple files) , the same is true for d, it can't be zero. Checking these takes only O(1) time in average (for random numbers). The strong check would fail in these cases with high probability, see if res=0, then first we'll see d=0, then prev_d=0 and at a strong check we see that 0==0 mod mp, so it wouldn't detect the error. If we want to store the Fermat residues, then for Mersenne numbers we should make a cheap (modular!) division by 9 at the end of the algorithm. For general case N=k*2^n+c where c=-1 you should divide by a^2, we can do that and we'll get the Fermat residue, because for our chosen base=a it is true that gcd(a,N)=1. Or, and this is an alternative option, we can save in every case a^(k*2^n) mod N. Currently don't have more addition/clarification to this algorithm, but I'm still open for discussion. [QUOTE=preda;465664] BTW, what's a good catchy name for this check?[/QUOTE] Probably this is a new method so in math it has no name, I call this as strong error check. |
| All times are UTC. The time now is 23:31. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.