mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-08-10, 22:08   #67
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

2·13·131 Posts
Default

Quote:
Originally Posted by GP2 View Post
There is a selection effect because strategic double checks are being done systematically on likely-bad exponents well ahead of the wavefront. So in the interim there is an artificially high error rate for higher exponents, which will settle back down to normal as the advancing wavefront finishes double-checking all the routine correct first-time checks.
There's also a little bias because results marked as suspect (and have a history of being 50% wrong in that case) are double-checked nearly right away. That can push up the perceived error rate if you're just looking at "proven bad vs proven good"... exponents above the DC threshold don't really have a chance to be proven good yet.

When I've looked at the bad/good ratio (and came up with my 4-5%) I tried to only look at exponents below the DC milestones.

Right now, if I look at exponents < 40E6 I come up with:
1,789,553 good
69,308 bad
131,878 unknown (1 or more LL tests run, but a factor was turned in later)

That bad/good right there is 3.87% but honestly, I've managed to skew the numbers a great deal myself since I did a ton of (unnecessary) triple-checks of smaller exponents (made sure everything < 2E6 got at least 3 checks done).

There are 844,751 exponents below 40E6 that were proven composite by LL (not factored). If we just assume each got only 2 matching results instead of my spurious 3rd checks, we could estimate 1,689,502 verified results. That's over 100K less.

Using that figure gives a bad/good of 4.1%

Looking again at the exponents that were later factored, 51,230 out of 63,878 exponents have 2 or more matching residues, meaning they are good results.

Adding 51,230 x 2 to our total estimated good results means we wind up with a bad/good of 3.87% (oddly about the same as when I included all the redundant triple-checks).

Of course that still leaves 12,648 exponents that only had one result turned in before it was factored... who knows about those. Probably about the same avg error rate. A while back I ran my own double-checks on a bunch of those to see if I matched the solo check. That probably accounts for a good # of those "matched but factored" counts, so once again I've single-handedly skewed the average. - I mainly did that on the small exponents, < 10e6.

Actually if I exclude my own results < 40e6 it only raises the "besides Madpoo" error rate by about 0.03%.

And finally, if I ignore the < 40e6 and look at all bad & good results of any size, it's about 4.22%, reflecting that "suspect results get DC'd first" bias. And since I've made it my mission to do triple-checks on all mismatches, we've been able to show goodness/badness of those rather than lingering in "unknown" limbo for years.
Madpoo is offline   Reply With Quote
Old 2017-08-10, 22:23   #68
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22×3×112 Posts
Default

Quote:
Originally Posted by kriesel View Post
That's an interesting approach to memory errors. I have a GPU that shows errors broadly on roughly the middle third of its vram independently of clocking (575 to 1000MB of a 1.5GB card). Do you pretest the memory and place the buffers in known-reliable areas? If bad memory could be allocated to do nothing and was left that way, and LL testing occurred in memory allocated elsewhere, it might not be necessary to make double runs to get reliable results. (Something like the linux badram driver, although memory block size necessary may interfere here if the allocations need to be contiguous and the bad sections divide up memory too much.) http://rick.vanrein.org/linux/badram/results.html
Placing the buffers avoiding "bad memory" is a good idea, but I have not implemented that yet. A question I have about memory testing, is how does the virtual to physical address mapping interact with bad memory locations. I would suppose, if the virtual mapping changes, that the same bad physical location can show up at different spots in the virtual address space at different points in time, and pinning down the actual bad location (physical) becomes difficult.

Quote:
I'm curious about your choice to run the Jacobi check on the CPU rather than GPU. Was this a design decision to do the check with different probably more reliable hardware than your problematic GPU or with different hardware in general, for ease of programming, something else?
The main reason for doing Jacobi on the CPU is ease of programming (in fact I just invoke GMP and I don't have to program it at all). Implementing that algorithm efficiently and on GPU would be a sizable piece of work.

A secondary benefit of pushing the Jacobi check on the CPU is that the GPU performance does not decrease when the check is on. (at the cost of more CPU cycles).

Quote:
Removing nonzero offset seems unfortunate. The two supersafe runs may need to be the same offset but nonzero same should work. Nonzero offset is a feature that promotes utility in double checks.
I suppose double checks are best done with a different software (e.g. mprime), and if that supports offset then nothing is lost by having gpuOwL use offset==0.
preda is offline   Reply With Quote
Old 2017-08-10, 23:38   #69
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

22·863 Posts
Default

https://en.wikipedia.org/wiki/Jacobi_symbol

I went through the argument just to check, and it looks ok to me:

 (\frac{R0-2}{M_p})  =  (\frac{2}{M_p}) = (-1)^{(\frac{{M_p}^2-1}{8})} = 1   \hspace{25} (since M_p = 7 (mod 8) )<br />
\\[25]<br />
(\frac{R0+2}{M_p}) = (\frac{6}{M_p}) = (\frac{2}{M_p}) * (\frac{3}{M_p}) = (\frac{3}{M_p})<br />
\\ and \hspace{7} (\frac{3}{M_p})*(\frac{M_p}{3}) = -1   \hspace{25} (since M_p = 3 (mod 4) ) <br />
\\ and \hspace{7} (\frac{M_p}{3}) = (\frac{M_p (mod 3)}{3}) = (\frac{1}{3}) = 1<br />
\\ so \hspace{7} (\frac{R0+2}{M_p}) = (\frac{3}{M_p}) = \frac{-1}{(\frac{M_p}{3})} = -1<br />
\\[25]<br />
<br />
(\frac{R_x+2}{M_p}) = (\frac{R_{x-1}^2}{M_p}) = 1 \hspace{25} (since \hspace{7} (\frac{a^2}{n}) = 1 \hspace{7} when \hspace{7} gcd(a,n)=1 )<br />
\\[25]<br />
(\frac{R_x-2}{M_p}) = (\frac{R_{x-1}^2-4}{M_p}) = (\frac{(R_{x-1}-2)*(R_{x-1}+2)}{M_p}) = (\frac{R_{x-1}-2}{M_p})*(\frac{R_{x-1}+2}{M_p}) = 1 * (-1) = -1<br />
ATH is offline   Reply With Quote
Old 2017-08-10, 23:43   #70
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

5AC16 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I don't understand all the technical aspects of using the Jacobi stuff you're all chatting about, but it does make me think it could be useful to save an interim residue every xx% of the way along a test. Then if there's something you could run to catch certain errors, you could roll back to those previous temp files until you find the one with the best odds of being good and start from there.
gpuOwL does save "persistent" checkpoints, by default every 10M iterations. The program does not automatically use those for anything yet, but the user can manually rollback to them.

The issue with automatically rolling back on a Jacobi-error, is that the only previous point that's guaranteed to be correct is the very beginning.

Talking about "wrong LL" probabilities, while the overall error rate as you say is small (about 4% or less), I would suspect a very skewed distribution from the POV of the hardware involved: I think there is a big number of producers with 0% errors, and a small number of producers with a very high error rate (let's say, 50% errors).

Intuitively, I say that the hardware is split in two distinct categories, "good" which produces only correct LL, and "broken" which produces mostly wrong LL.

Now, the moment when a self-check detects a Jacobi-error, it places this particular instance of hardware in the "broken" category, with very high expected error rate. (Not the 4% average error rate, but much higher)
preda is offline   Reply With Quote
Old 2017-08-10, 23:49   #71
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22×3×112 Posts
Default

Quote:
Originally Posted by ATH View Post
https://en.wikipedia.org/wiki/Jacobi_symbol

I went through the argument just to check, and it looks ok to me:

 \hspace{7} (\frac{M_p}{3}) = (\frac{M_p (mod 3)}{3}) = (\frac{1}{3}) = 1<br />
Do you imply there that Mp==1(mod 3) -- why?
preda is offline   Reply With Quote
Old 2017-08-10, 23:55   #72
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dartmouth NS

8,461 Posts
Default

Quote:
Originally Posted by preda View Post
Do you imply there that Mp==1(mod 3) -- why?
because they are ... all mersenne numbers are either 1 or 0 mod 3 only those that divide by three can be 0 mod 3 the rest are 1 mod 3.
science_man_88 is online now   Reply With Quote
Old 2017-08-11, 00:04   #73
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
San Diego, Calif.

32×7×163 Posts
Default

Quote:
Originally Posted by science_man_88 View Post
because they are ... all mersenne numbers are either 1 or 0 mod 3 only those that divide by three can be 0 mod 3 the rest are 1 mod 3.
science_man_88, please check before you post, ok?

Re: preda. Easy to demonstrate by induction for all odd n (we are not interested in even n's, you know)
Batalov is offline   Reply With Quote
Old 2017-08-11, 00:09   #74
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

26548 Posts
Default

Quote:
Originally Posted by preda View Post
Do you imply there that Mp==1(mod 3) -- why?
If p > 2 and p odd, so p == 2k+1,
Mp == 2^(2k+1) - 1
2^(2k+1) == 2 * 4^k
4^k - 1 == (4 - 1) * (...) is a multiple of 3,
so 4^k == 1(mod 3)
so 2 * 4^k == 2(mod3),
so Mp == 2 * 4^k - 1 == 1(mod3)
preda is offline   Reply With Quote
Old 2017-08-11, 00:20   #75
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22·2,939 Posts
Default

Like other LL-immplementers hereabouts, I've been following this thread with interest - kudos to 'error' for the original idea, and to preda for the quick turnaround w.r.to GMP timings. I'd like to also start playing with this check, but never having interfaced my LL code with GMP, a couple of questions:

[1] Which minimal GMP version do I need to get the fast Jacobi-symbol functionality? Assuming a working GMP install on my system, is gmp.h the only added header I need to include?

[2] AFAICT the mpz_t type is basically a uint64 array to hold the base-2^64 words of a bignum, plus whatever dimension/allocation data GMP uses internally. I already convert LL residues from the doubles used by the FFT-mul to base-2^64 unsigned multiword array form as part of the checkpoint-writing step. Is there a way to init an mpz_t object whose array-object subdatum points to the uint64 array I already use, to avoid needless mem-alloc and data-copying?

@GP2: In response to you 'how was this missed?':

[a] Such misses are far from unusual - consider e.g. the AKS primality test;

[b] Even had the idea occurred to someone previously, as the Brent-Zimmerman paper notes, it's only been quite recently that a subquadratic Jacobi-symbol algorithm became known/realized.
ewmayer is offline   Reply With Quote
Old 2017-08-11, 01:44   #76
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default

For interfacing with GMP, see e.g. jacobiCheck() :
https://github.com/preda/gpuowl/blob...puowl.cpp#L381

Yes, gmp.h is the only header needed, and link with -lgmp if using GMP as a library.

mpz_import() reads compacted bits. It does a copy though, won't use your buffer (which is probably a good design, and not costly relatively speaking).
preda is offline   Reply With Quote
Old 2017-08-11, 01:47   #77
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default

Quote:
Originally Posted by ewmayer View Post
[b] Even had the idea occurred to someone previously, as the Brent-Zimmerman paper notes, it's only been quite recently that a subquadratic Jacobi-symbol algorithm became known/realized.
The Brent-Zimmerman paper? (link?)

(sorry, just shows my ignorance, I posted the link previously but forgot the authors' names. silly)

Last fiddled with by preda on 2017-08-11 at 02:14
preda is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Stockfish / Lutefisk game, move 14 poll. Hungry for fish and black pieces. MooMoo2 Other Chess Games 0 2016-11-26 06:52
Redoing factoring work done by unreliable machines tha Lone Mersenne Hunters 23 2016-11-02 08:51
Unreliable AMD Phenom 9850 xilman Hardware 4 2014-08-02 18:08
[new fish check in] heloo mwxdbcr Lounge 0 2009-01-14 04:55
The Happy Fish thread xilman Hobbies 24 2006-08-22 11:44

All times are UTC. The time now is 15:26.


Fri Jul 7 15:26:26 UTC 2023 up 323 days, 12:54, 0 users, load averages: 1.15, 1.13, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔