mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Math

Reply
 
Thread Tools
Old 2017-09-28, 22:56   #12
GP2
 
GP2's Avatar
 
Sep 2003

5·11·47 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Absent any knowledge of prior factoring efforts, the chances a number has a factor of n digits is approximated by 1/n. 1/25 is 4%; 1/25 + 1/26 + 1/27 + 1/28 + 1/29 is roughly the 19% you cite.
So, the chances that a number has a factor of any size at all is... the sum of the harmonic series.
GP2 is offline   Reply With Quote
Old 2017-09-28, 23:24   #13
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

224058 Posts
Lightbulb

Suppose you are in a completely dark room and you are catching the black cat that is supposedly there.

You randomly grab space in front of you; you make random moves around you.
Now, if the cat is there, you may find it after X attempts. Furthermore, if you have established you grab (that should exceed your reach!) and have done this excercise enough times, you can build a model and you can even make it a sophisticated model: it depends on the size of the cat! You can make quite a nice formula for the X, given enough experimentation. That's 'normal' statistics.

But all of the above is correct if the cat is actually there.

Bayesian statistics deals with the possibility that the cat is not there. As you grab the air for the third hour, your knowledge that the cat is actually in the room continues to change. Bayes would say that your initial belief (that the cat is there and that you can catch it), or in short, a 'prior', changes into your new belief called the 'posterior'.
Batalov is offline   Reply With Quote
Old 2017-09-29, 02:00   #14
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

4,643 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
The 4% you quoted was given to you as specifically a 25-digit factor. You replied with the chance of a 25 to 29 digit factor. Both are correct [estimates]. Absent any knowledge of prior factoring efforts, the chances a number has a factor of n digits is approximated by 1/n. 1/25 is 4%; 1/25 + 1/26 + 1/27 + 1/28 + 1/29 is roughly the 19% you cite.
Quote:
Originally Posted by GP2 View Post
So, the chances that a number has a factor of any size at all is... the sum of the harmonic series.
Obviously, there are some missing caveats. For one thing, the factor can't be bigger than the number.

Still -- taking n = 1, the chances that a number has a factor of 1 digit is approximately... 1! Well, 162/210 is pretty close to 1


However, the sum of the harmonic series between k and round(e*k ) is also about 1 for any positive integer k, so n should have a factor with between k and round(e*k) digits, if round(k) is no larger than the number of digits of n

Last fiddled with by Dr Sardonicus on 2017-09-29 at 02:09
Dr Sardonicus is offline   Reply With Quote
Old 2017-09-29, 04:05   #15
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

4,861 Posts
Default

Quote:
Originally Posted by GP2 View Post
So, the chances that a number has a factor of any size at all is... the sum of the harmonic series.
Wait, are you saying you're surprised a composite number has factors? What if I change the terminology from probability of a factor to "expected number of factors"? I believe that's my mistake, conflating "probability of a factor" with "expected number of factors".

The expected number of factors up to sqrt(n) is greater than 1, since every composite number has a factor and some have more than one below sqrt(n).
VBCurtis is offline   Reply With Quote
Old 2017-09-29, 05:03   #16
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

7·1,373 Posts
Default

One should not forget that all this calculus is just a coarse approximation. Mostly to avoid logarithms and Euler's constant, and complicate formulas which, in fact, are also mostly intuitive and heuristic, and still unproven. Somehow, the number of "bits" in the representation is here (in this discussion) transformed into number of "digits", but well... this is still an approximation. In fact, George does a nice dissection of it, in the math page, where he explains about chances to find a factor. If you clicked the link, then look there for the "Euler's constant" contraption, and click the links to the UTM page around that contraption...

The (n-1)/n stuff comes from the fact that the harmonic series sums to log(n), and from the fact that there are ~n/log(n) primes up to n. Now, of course, if you look for factors of some N, you only have to look below n=sqrt(N), and also, in case of mersenne numbers, ou look for special ("rarer" than normal) primes q=2kp+1, where p is your exponent (of m=2^p-1), and they are which are 1 or 7 (mod 8). [we try hard to stick to this notation along our posts here, hehe].

But this only complicates the calculus, at not much benefit for the accuracy. You can stick to 1/n and so.

Last fiddled with by LaurV on 2017-09-29 at 05:14
LaurV is offline   Reply With Quote
Old 2017-09-29, 05:55   #17
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

22×7×167 Posts
Default Well I'm 1 for 2.

Yeah Yeah I know its just a fluke.

But after 2 exponents in my range of interest and a total of 110 curves between them I have my first factor!
petrw1 is offline   Reply With Quote
Old 2017-09-29, 11:08   #18
rcv
 
Dec 2011

11·13 Posts
Default

Going back to petrw1's original question. Assume the people who wrote your favorite ECM factoring program have done all the nasty math involving the Pi and Dickman's function and have come up with the optimal number of curves to run at each B1 level. For example, a certain version of gmp-ecm recommends running 206 curves at the B1=50000 level.

You can think of this very much like a spinner with 206 spaces. 205 spaces say "loser". The final space is secretly marked by a number god to say "winner", but only if there is a 25-digit factor. (To be more precise, I mean if it has a factor in the range of your B1, or about 25-digits.) But, if there is no 25-digit factor, the number god marks that 1 space "loser". Then the number god hides all of the markings. After you expend the work to run an ECM curve, the contents of one random spot on the spinner is revealed to you, and then hidden before the next contestant runs.

If the number has no 25-digit factor, it doesn't matter how many curves you run or how many times you spin the spinner, it will never show "winner".

Now, let's consider if the number actually has a 25-digit factor
The first time you take a crack at the spinner, you really do have a 1/206 chance of finding a 25-digit factor. The other 205 out of 206 times, the spinner shows "loser". Do you want to spin again, or not?

Still assuming the number actually has a 25-digit factor, if you spin a second time, your chance of finding a 25-digit factor is 1/206*205/206. (You only got to ask this question if the first spin was a loser.) So, your chance of finding a factor on the 2nd spin is a little lower than your chances on the first spin. Do you spin again?

Same assumptions. If you spin a third time, your chance of finding a 25-digit factor is 1/206*(205/206)^2. (You only get to ask this question if the first two spins were losers.) Do you spin again?

On a spinner with a large number, n, of positions, the probability that you will have no winner after n spins approaches 1/e.

Code:
1 - Sum[1/206*(205/206)^(i - 1), {i, 1, 206}] // N
0.366985

1/Exp[1] // N
0.367879
And this corresponds more-or-less to the probability that, if a 25-digit factor exists, you will not have found it after 206 ECM curves.

The common wisdom is that you should generally give up on one B1 level when the probability of missing a factor declines to 1/e. But, it is also well proven that the exact place where you change from one B1 level to the next doesn't really matter very much. And this isn't very relevant to your question, except if you want to do a zillion curves at the B1=50000 level.

Now, we can consider the OP's original question. Assuming the OP's goal is to maximize the number of factors he finds per unit time, running only B1=50000 curves, and if all candidates to be tested took the same amount of time to run a single ECM curve, then the OP should run just 1 curve on each number.

However, the OP's choice of candidates have varying digit-lengths. And the cost of running an ECM curve is higher as the candidate's length increases. And this is where the practical answer to his question gets messy.

Ideally, the OP ("you") should measure the cost of running a single curve at various digit-lengths in your range of interest. Then for each potential candidate, you should divide the time required to run the kth curve for that candidate by the probability (as shown by the spinner formulas above) that the candidate's kth curve will produce a factor. This represents the instantaneous expected value of the "time to find a factor, if it exists" for each candidate. Always pick the candidate whose expected time to find a factor is smallest, among your choice of candidates. And especially realize that this value increases slightly with each curve you run.

Finally, to provide an accurate expected "time to find the next factor", you would also need to account for the probability that a candidate actually has a factor in the 25-digit range. (I.e., divide the "time to find a factor, if it exists" by 0.19, if your sources tell you these candidates have a 19% chance of having a factor in the B1=50000 (25-digit) range.)

Last fiddled with by rcv on 2017-09-29 at 11:42
rcv is offline   Reply With Quote
Old 2017-09-29, 12:45   #19
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

101010101102 Posts
Default

Quote:
Originally Posted by rcv View Post
Going back to petrw1's original question. Assume the people who wrote your favorite ECM factoring program have done all the nasty math involving the Pi and Dickman's function and have come up with the optimal number of curves to run at each B1 level. For example, a certain version of gmp-ecm recommends running 206 curves at the B1=50000 level.

You can think of this very much like a spinner with 206 spaces. 205 spaces say "loser". The final space is secretly marked by a number god to say "winner", but only if there is a 25-digit factor. (To be more precise, I mean if it has a factor in the range of your B1, or about 25-digits.) But, if there is no 25-digit factor, the number god marks that 1 space "loser". Then the number god hides all of the markings. After you expend the work to run an ECM curve, the contents of one random spot on the spinner is revealed to you, and then hidden before the next contestant runs.
To be more exact, for prime factor p, from Hasse theorem and using the elliptic curves selected by Prime95, where the group order is always multiple of 12, you get about 4*sqrt(p)/12 (a number that is a lot greater than 280) different group orders. If p has 25 digits, in average one in 280 curves will allow you to find that number p, the lucky elliptic curves.

So you can easily select 280 curves, and none of them coincide with the lucky ones. Or you can run 280 curves and hit 4 lucky curves, so you would have found 4 times that number p.

So the number of "winner slots" and "loser slots" is not even known.
alpertron is offline   Reply With Quote
Old 2017-09-29, 14:03   #20
rcv
 
Dec 2011

8F16 Posts
Default

Quote:
Originally Posted by alpertron View Post
... If p has 25 digits, in average one in 280 curves will allow you to find that number p, the lucky elliptic curves. ...
Of course, you are right. As long as an *average* of 1 in n ECM curves is one of the "lucky curves", and as long as the designer of the ECM program has applied the "common wisdom" in selecting the number of curves for each B1 level (which decision depends on several factors beyond the scope of the OP's question), then I think my spinner analogy holds, and a regular mortal (non PhD mathematician) can do the arithmetic.
rcv is offline   Reply With Quote
Old 2017-09-29, 14:19   #21
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

25268 Posts
Default

Quote:
Originally Posted by rcv View Post
If the number has no 25-digit factor, it doesn't matter how many curves you run or how many times you spin the spinner, it will never show "winner".
This is also not correct. Some group orders are smoother than others, so by running 280 curves with B1 = 50000, B2 = 5 million, there is a probability of about 10% to find a 30-digit factor. In this case the largest prime factor of the group order would be less than B2 and the second largest prime factor of the group order would be less than B1.
alpertron is offline   Reply With Quote
Old 2017-09-29, 15:29   #22
rcv
 
Dec 2011

11·13 Posts
Default

Quote:
Originally Posted by alpertron View Post
This is also not correct. Some group orders are smoother than others, so by running 280 curves with B1 = 50000, B2 = 5 million, there is a probability of about 10% to find a 30-digit factor. In this case the largest prime factor of the group order would be less than B2 and the second largest prime factor of the group order would be less than B1.
I stand by my analogy for the purpose of simplifying the problem so that the OP (and others) might be able to answer their own question. I also assert you are discussing 2nd order effects, which don't materially change the results.

Unless you believe my analogy is practically flawed, further discussion of the details probably belongs in another thread or in an offline discussion. I think we all know you can find larger factors than the primary target of a B1 level, and you can find smaller factors that were missed by the previous B1 level. And that your chance of success at a given B1 level also depends on your choice of B2. The following sample table from an older version of gmp-ecm, and which was produced (if I recall correctly) through gmp-ecm's sophisticated implementation and use of Dickman's functions, provided the user with an quantifiable glimpse of the final effects of all of this voodoo. Would you be happier if I suggested an ECM curve at the B1=50000 level is analagous to simultaneously spinning 5 spinners?

Code:
// Data from gmp-ecm 6.2
// Expected number of curves to find a factor of n digits: (digits >= 1301)
//  B1 20   25     30      35      40      45      50      55      60      65
//  2K 817  56083  6456542
// 11K 76   1589   51736   2379385
// 50K 20   204    3115    65277   1739727
//250K 8    47     401     4553    64814   1115706
//  1M 5    21     123     957     9094    102769  1362692
//  3M 3    12     56      336     2462    21115   210246  2369351
// 11M 3    7      26      124     702     4628    34585   292164  2735028
// 43M 2    5      15      56      253     1323    7837    51385   373591  2960912
//110M 2    4      10      34      132     600     3062    17350   108466  732708
//260M 2    3      8       23      83      336     1525    7670    42167   251141
//850M 1    3      6       15      47      168     661     2868    13625   69478
//2.9G 1    2      4       11      29      91      316     1202    4960    22000
//4.3G 1    2      4       9       25      75      252     925     3672    15664
rcv is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Odds of Finding a Factor Gordon Factoring 18 2015-09-20 20:33
how much ECM without finding an existing factor dbaugh PrimeNet 4 2013-01-11 16:31
p-1 testing ram vs finding a factor crash893 Software 11 2006-07-03 21:48
Probability of finding a factor JuanTutors Software 20 2004-09-26 09:47
possibility of finding a factor there_is_no_spoon Math 10 2004-03-11 20:05

All times are UTC. The time now is 17:59.


Fri Jul 16 17:59:28 UTC 2021 up 49 days, 15:46, 1 user, load averages: 1.53, 1.39, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.