mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Math

Reply
 
Thread Tools
Old 2018-06-03, 14:58   #23
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally Posted by kriesel View Post
I tried the stats calcs and got probabilities even lower, 0.46% and 0.28%. Either your stats math is wrong, or mine is, or the probabilities per candidate are, or combinations.
A lot of the online binomial probability calculators give bad results, especially when n gets large.

This is the one I used: http://stattrek.com/online-calculator/binomial.aspx

I reran it using scipy.stats on Python, with the same numerical results.

Code:
import scipy.stats as ss

n=438; p=0.0308; bets=5; hh=ss.binom(n, p)

sum([hh.pmf(k) for k in range(0, bets+1)])

Last fiddled with by GP2 on 2018-06-03 at 15:06
GP2 is offline   Reply With Quote
Old 2018-06-03, 16:35   #24
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10100111100112 Posts
Default

Quote:
Originally Posted by GP2 View Post
A lot of the online binomial probability calculators give bad results, especially when n gets large.

This is the one I used: http://stattrek.com/online-calculator/binomial.aspx

I reran it using scipy.stats on Python, with the same numerical results.

Code:
import scipy.stats as ss

n=438; p=0.0308; bets=5; hh=ss.binom(n, p)

sum([hh.pmf(k) for k in range(0, bets+1)])

Thanks for the link. I see your numbers match its cumulative probability <-x while my numbers match its result for binomial probability =x.
kriesel is online now   Reply With Quote
Old 2018-06-04, 19:51   #25
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×53×71 Posts
Default

Quote:
Originally Posted by GP2 View Post
If there was a bug in mprime, perhaps it might be in the code that writes savefiles and later restores memory structures from those savefiles. The one significant difference between spot instances and physical machines is that the former may get interrupted and resumed a lot, so any rare bug in the savefile code would get triggered more often.
This would be the best explanation for the source of any bugs. Although 29.2 and earlier used the same save file code. The only thing new in 29.3/29.4 is that GMP is doing the GCD rather than the modified giants code of Richard Crandall.

The best way to proceed would be to rerun several known factors found in stage 2, preferably at a time when they will get preempted often. As soon as we have a proven failure case, we can start debugging.

Last fiddled with by Prime95 on 2018-06-04 at 19:51
Prime95 is offline   Reply With Quote
Old 2018-06-06, 15:24   #26
GP2
 
GP2's Avatar
 
Sep 2003

50318 Posts
Default

Well, I redid 8 exponents so far with known factors, and it found 7 of them and missed one.

I got all excited at first, but then I remembered Brent-Suyama...

Code:
M86003227 completed P-1, B1=690000, B2=13110000, E=6
Code:
2018-03-11	TheJudger	F-PM1	Factor: 220982747169693074669754599857 / (P-1, B1=695000, B2=13726250, E=12)
k = 23 × 3 × 4801 × 21277 × 33589 × 15601387

The two largest factors of k are the minimum values needed for B1 and B2 in order to find a factor (unless the B-S extension saves the day).
GP2 is offline   Reply With Quote
Old 2018-06-08, 00:19   #27
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

55B16 Posts
Default

How does the TF impact the probability of P-1 finding factors, and the other way around?
- does higher-bit TF reduce the chances of P-1 finding factors? a bit, or a lot?
- after unsuccessful P-1, may it be worth to keep TF-ing?
preda is offline   Reply With Quote
Old 2018-06-08, 02:21   #28
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

165468 Posts
Default

Quote:
Originally Posted by preda View Post
How does the TF impact the probability of P-1 finding factors, and the other way around?
- does higher-bit TF reduce the chances of P-1 finding factors? a bit, or a lot?
- after unsuccessful P-1, may it be worth to keep TF-ing?
Tough question.

Higher TF reduces the chance of P-1 finding a factor which in turn changes the optimal P-1 bounds.

There is also a good argument that one should do P-1 before doing the last one or two TF bit levels. This calculation is made even more complex by the fact that one may be doing TF on a GPU and P-1 on a CPU.

The good news is that we are optimizing the last teensy bit out of the primality testing process. In practice, I don't think it matters much if one TFs one bit more or less than optimal.

My personal preference is to err on the side of putting too much effort into finding a factor because I irrationally put extra "value" on an easily verified factor.
Prime95 is offline   Reply With Quote
Old 2018-06-08, 03:58   #29
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally Posted by preda View Post
How does the TF impact the probability of P-1 finding factors, and the other way around?
- does higher-bit TF reduce the chances of P-1 finding factors? a bit, or a lot?
- after unsuccessful P-1, may it be worth to keep TF-ing?
I took a look at the 86M range. This range has been fully TF'd to 76 bits, and there are only 9 exponents that have been TF'd to higher (seven to 77 bits, and two to 78 bits).

This range has mostly been P−1 tested. However, I chose to limit the range to 86.0 to 86.9M, because there are only 192 pending P−1 assignments in that range, whereas 86.9 to 87.0M currently has 1630 pending P−1 assignments.

In that range, P−1 is generally done with limits similar to B1=700k, B2=13M.

So, in the range 86.0 to 86.9 M, there are:
  • 273 factors of bit size 72.0–73.0 (of which 99 could have been found with P−1 using B1=700k, B2=13M)
  • 257 factors of bit size 73.0–74.0 (of which 85 could have been found with P−1)
  • 239 factors of bit size 74.0–75.0 (of which 73 could have been found with P−1)
  • 220 factors of bit size 75.0–76.0 (of which 65 could have been found with P−1)
  • 625 factors of bit size 76.0 or more (of which 604 "could have been found" with P−1, though actually all were, see below)

The last line above is a bit nonsensical, because all of those factors were actually found with P−1. We are misestimating the number of exponents findable by P−1 for various reasons.

One reason is the Brent-Suyama extension, which lets us find a few extra factors beyond the bounds. Also, there are 192 pending P−1 assignments, we might expect an additional 5 or 6 factors to result, assuming a 3% success rate. And also we are assuming all the P−1 tests used the same bounds of B1=700k, B2=13M, however a small number of P−1 tests were done with smaller bounds, for instance there were 259 tests done with bounds B1=B2=800k, and a small number of P−1 tests were done with higher B2, with a handful even up to 30M.

Nevertheless, the discrepancy is not too bad.

It's easy to determine if a known factor f of Mp could have been found by P−1 testing using bounds B1 and B2. Since f = 2kp + 1 for some k, we have k = (f − 1)/2p and then we factor k. If the second largest factor of k is less than B1 and the largest factor of k is less than B2, then the known factor f of Mp can with certainty be found by P−1 test. And as mentioned earlier, Brent-Suyama sometimes lets us find a few additional factors beyond the bounds.

We also have to be a bit careful because a few exponents have two factors:

86001449 two factors TF 75–76
86024377 P−1, composite split into bit lengths 82 and 84
86219299 two factors TF 72–73
86268041 P−1, composite split into bit lengths 77 and 98
86282017 P−1, composite split into bit lengths 84 and 85
86336231 two factors TF 73–74
86529923 P−1, composite split into bit lengths 78 and 86
86557759 P−1, composite split into bit lengths 78 and 84
86720983 P−1, composite split into bit lengths 80 and 82
86766643 P−1, composite split into bit lengths 89 and 93
86823221 two factors TF 74–75

So really, rather than counting the number of factors of given bit sizes, we should count the number of exponents that have factors of those bit sizes. Instead of 273, 257, 239, 220 and 625 factors in the table above, we should count 272, 256, 238, 219 and 618 exponents.

But really this effect is pretty small, nearly all exponents have no more than one known factor, so we can just stay with our original numbers.


So when TF 72–76 is done first, as was in fact the case, then 273+257+239+220 = 989 factors were found first, and then later 625 factors were found with P−1, which could not have been found using TF 72–76 because they are of bit size higher than 76.0

Whereas if P−1 had been done first, then it would have found 99+85+73+65+625 = 947 factors, leaving (273−99)+(257−85)+(239−73)+(220−65) = 667 factors to be discovered by TF 72–76.

So out of the 1614 factors of bit sizes 72 and higher in the exponent range 86.0 to 86.9M, there are 625 factors that could only have been discovered by P−1, 667 factors that could only have been discovered by TF 72–76, and (99+85+73+65) = 322 factors that could have been discovered by either. With the caveat that "625" (and therefore also "322") is not an exact figure for the reasons mentioned earlier, but it's probably not off by more than a few percent.



So, does it make sense to do P−1 first?

Well, for one thing it's hard to compare the time spent with GPUs versus the time spent with CPUs, it's apples-to-oranges. Maybe we could estimate the total kW-h of energy needed in each case.

Also, you run an additional risk of annoying the people doing TF, because TF is systematic and finds every factor in a given bit range. So the 322 factors that are discoverable by either TF or P−1 would be found first by P−1 and then rediscovered by TF, but without any credit. Whereas when P−1 is done, the list of already-known factors is filtered out.

Finally from the Primenet work distribution map, if we look at the 86.0 to 87.0M range, we see there are 54710 exponents in that range, of which 35553 have been factored, 70 have one or more LL tests, and 19087 have no known factors and no LL tests. The overwhelming majority of those 35553 exponents of course have factors smaller than 72.0 bits. And of the 19087, we estimate that roughly 90%, or 17200 lie within the more limited range 86.0 to 86.9M we were using earlier.

So there are only 322 or so exponents where it makes any difference whether we do P−1 or TF first, which is dwarfed by the 17200 exponents where we unsuccessfully do TF 72–76 and unsuccessfully do P−1 without finding any factors.


TL;DR: in the 86M and comparable ranges, TF 72–76 finds factors that P−1 can't and P−1 finds factors that TF 72–76 can't. Roughly only one-third of the factors being found by TF 72–76 have k smooth enough to be found by P−1. Conversely, a lot of the factors being found by P−1 have bit sizes much too large to be feasible at all by TF, for which the difficulty doubles each time the bit size is incremented.

Only about 20% of the factors that can be found by TF 72–76 and/or P−1 can be found by both TF 72–76 and P−1. That small set is the only one where the order of doing TF 72–76 or P−1 matters, but that constitutes only about 2% of the total number of exponents being tested.

In other words,
* in about 2% of the exponents tested, it might matter whether we do TF 72–76 first or P−1 first, because either method can find a factor
* in about 8% of the exponents tested, it doesn't matter which test we do first, because only one of the two methods can find a factor
* in about 90% of the exponents tested, it doesn't matter which test we do first, because neither method finds a factor.

Last fiddled with by GP2 on 2018-06-08 at 04:57
GP2 is offline   Reply With Quote
Old 2018-06-08, 08:11   #30
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

@GP2: the analysis was very informative, thanks!
preda is offline   Reply With Quote
Old 2018-06-08, 15:19   #31
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·5·7·139 Posts
Default

Quote:
Originally Posted by GP2 View Post
... In that range, P−1 is generally done with limits similar to B1=700k, B2=13M. ...
Indeed! Thank you very much for this analysis.

One question I have is does first TF'ing to higher levels cause the P-1 effort to work on higher bounds, and thus potentially find factors which it wouldn't if the TF'ing was lower?
chalsall is online now   Reply With Quote
Old 2018-06-08, 16:25   #32
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally Posted by chalsall View Post
One question I have is does first TF'ing to higher levels cause the P-1 effort to work on higher bounds, and thus potentially find factors which it wouldn't if the TF'ing was lower?
That's a good point I didn't consider.

Automatically assigned P−1 tests use Pfactor worktodo lines.

It turns out that first TF'ing to higher levels causes P−1 to use lower bounds. So if P−1 was done first, it would actually find more factors. Although it would also take longer.

Code:
Pfactor=1,2,86015023,-1,72,2

./mprime -d
[Work thread Jun 8 16:16] Optimal P-1 factoring of M86015023 using up to 3300MB of memory.
[Work thread Jun 8 16:16] Assuming no factors below 2^72 and 2 primality tests saved if a factor is found.
[Work thread Jun 8 16:16] Optimal bounds are B1=875000, B2=21218750
[Work thread Jun 8 16:16] Chance of finding a factor is an estimated 5.1%
Code:
Pfactor=1,2,86015023,-1,76,2

./mprime -d
[Work thread Jun 8 16:20] Optimal P-1 factoring of M86015023 using up to 3300MB of memory.
[Work thread Jun 8 16:20] Assuming no factors below 2^76 and 2 primality tests saved if a factor is found.
[Work thread Jun 8 16:20] Optimal bounds are B1=690000, B2=13110000
[Work thread Jun 8 16:20] Chance of finding a factor is an estimated 3.04%
However the difference in practice between B2=13M and B2=21M is probably not as large as you might think. There are logarithmic diminishing returns to increasing B2.

The difference between 3.04% and 5.1% looks dramatic, but remember that 3.04% figure is excluding any factors that would be found with TF 72–76 because it assumes that TF has already been done and has already found all those factors. But if we did the P−1 test first it would actually find some of those factors, as mentioned in the earlier message.

Last fiddled with by GP2 on 2018-06-08 at 16:34
GP2 is offline   Reply With Quote
Old 2018-06-08, 17:16   #33
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

22·7·167 Posts
Default

Quote:
Originally Posted by GP2 View Post
In other words,
* in about 2% of the exponents tested, it might matter whether we do TF 72–76 first or P−1 first, because either method can find a factor
* in about 8% of the exponents tested, it doesn't matter which test we do first, because only one of the two methods can find a factor
* in about 90% of the exponents tested, it doesn't matter which test we do first, because neither method finds a factor.
And if I read the rest of if correctly then:

* in about 1% of the exponents tested, it DOES NOT matter whether we do TF 72–76 first or P−1 first, because BOTH methods can find a factor

Mind you we are now at 101% but then the 2, 8, 90 are rounded?
petrw1 is offline   Reply With Quote
Reply



All times are UTC. The time now is 18:44.


Fri Jul 16 18:44:39 UTC 2021 up 49 days, 16:31, 1 user, load averages: 5.48, 5.42, 4.83

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.