mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Math

Reply
 
Thread Tools
Old 2017-10-16, 02:45   #45
axn
 
axn's Avatar
 
Jun 2003

5,051 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
One more request from the dataset: What is the number of curves such that 64% (1-1/e) of the factors were found, for each B1 level? This number is the effective t30-level for each of these B1 values.
I am going to go out on a limb and predict that it will be the "Average # of curves" figure that he quotes!
axn is online now   Reply With Quote
Old 2017-10-16, 09:30   #46
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

35×13 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
One more request from the dataset: What is the number of curves such that 64% (1-1/e) of the factors were found, for each B1 level? This number is the effective t30-level for each of these B1 values.
B1=250k: 63.2% (and 63.3%) of the factors were found with <=559 curves
B1=1M: 63.2% (and 63.6%) of the factors were found with <=153 curves
B1=50k: 63.2% (and 63.3%) of the factors were found with <=4580 curves


Quote:
Originally Posted by axn View Post
I am going to go out on a limb and predict that it will be the "Average # of curves" figure that he quotes!
It is close but not quite. The average number of curves are higher than the "median" (which is the # of curves where half the runs are lower and half higher), due to few "unlucky" runs with high curve count.

Last fiddled with by ATH on 2017-10-16 at 11:07
ATH is offline   Reply With Quote
Old 2017-10-16, 20:33   #47
nordi
 
Dec 2016

73 Posts
Default

Quote:
Originally Posted by ATH View Post
I created program making random 30 digit primes between 5*1029 and 6*1029
Wouldn't that make your primes slightly larger (and therefore harder to find) than the average 30 digit prime? Since the 'density' of primes decreases as numbers get bigger, there are more 30 digit primes starting with the digits 1, 2, 3 or 4 than primes starting with 6, 7, 8 or 9.
Also, shouldn't a randomly chosen number be much more likely to have a certain 30 digit factor starting with 1 than a certain 30 digit factor starting with 9? Both effects would mean your 30 digit factors are unusually large and the numbers in your benchmark harder to factor.
nordi is offline   Reply With Quote
Old 2017-10-21, 23:47   #48
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

35·13 Posts
Default

Quote:
Originally Posted by nordi View Post
Wouldn't that make your primes slightly larger (and therefore harder to find) than the average 30 digit prime? Since the 'density' of primes decreases as numbers get bigger, there are more 30 digit primes starting with the digits 1, 2, 3 or 4 than primes starting with 6, 7, 8 or 9.
Also, shouldn't a randomly chosen number be much more likely to have a certain 30 digit factor starting with 1 than a certain 30 digit factor starting with 9? Both effects would mean your 30 digit factors are unusually large and the numbers in your benchmark harder to factor.
Looks like you were right. I did some more tests and 1000 runs with 30 digit factors between 1*1029 and 2*1029, and with B1=250k I hit exactly (1-1/e) chance of finding the factor at the recommended 430 curves (429 actually but very close).

This time I only tested B1=250k, and the larger factor was 60 digits (1*1059 - 1060) just to make the runs slightly faster, doing 1000 tests each time counting the number of curves to find the factor.

I tested with the small factor at 30, 29, 28, 27, 26 and 25 digits, each time in the range 1*1029 - 2*1029, 1*1028 - 2*1028, 1*1027 - 2*1027 and etc.

Code:
			P30*	P29*	P28*	P27*	P26*	P25*
			P60 	P60	P60	P60	P60	P60
Minimum # of curves:	2	2	2	2	2	2
Maximum # of curves:	3896	1898	1390	735	722	377
Average # of curves:	447	275	205	124	79	51
Median # of curves:	291.5	198.5	148	86	56	34.5
Standard deviation:	478	265	203	119	81	52
factors <=430 curves:	63.3%	79.2%	88.1%	97.0%	99.6%	100%
# curves for (1-1/e):	429	269	205	126	78	50
So the 2nd to last line is the fraction of factors found at the recommended 430 curves for B1=250k, and the last line is the actual number of curves required to find 63.2% (1-1/e) of the factors.

Doing a full t30 makes you almost certain to find any 25 and 26 digit factor (up to 2*1026), and a very high chance of finding 27 and 28 digits as well.
ATH is offline   Reply With Quote
Old 2017-11-07, 20:17   #49
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

467610 Posts
Default Some recent success rates

I have a few CPUs doing ECM on exponents in the 2M range with B1=50000 B2=5000000 which I manually assign. I look for exponents with the lowest number of curves already done (about 30 or 40). Then I ask them to do 60 or 70 curves on that exponent.

I have 1 CPU doing ECM on exponents in the current assigned ranges: 6M, 11M, 15-17M; also with B1=50000 and B2=5000000.
I am assigned 3 curves per assignment for the 6M range and 1 curve for the rest.
These exponents have very few (0-5) prior curves.

=====================================
The discussion above, as I understand it, suggests to me that doing more curves obviously takes longer but should have a proportionally larger success rate.

Granted my sample sizes are NOT large but I think adequate.

The first batch doing 60 or 70 curves per exponent has done 209 exponents and found 14 factors.
But has a ratio of 975 curves per factor.

The second batch doing 1 or 3 curves per exponent has done 434 exponents and found 3 factors.
But has a ratio of 155 curves per factor. More than 6 times better.

Is there a statistical explanation for this big difference in success rate?
It seems to suggest I should be running more exponents with less curves.
But since I am assigning these manually I am trying to find a good balance between this manual assignment effort and the success rate.

As I understand the discussion above, once all 280 curves are done for an exponent the success rate will be as statistically expected. So the more curves I do the better my odds should be of finding the factor (if there is one in that range) and I would expect it to be proportionally better odds. BUT 'TIS NOT
petrw1 is offline   Reply With Quote
Old 2017-11-07, 20:31   #50
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

4,861 Posts
Default

Quote:
Originally Posted by petrw1 View Post
I I look for exponents with the lowest number of curves already done (about 30 or 40). ....
I have 1 CPU doing ECM on exponents in the current assigned ranges: 6M, 11M, 15-17M; also with B1=50000 and B2=5000000.
I am assigned 3 curves per assignment for the 6M range and 1 curve for the rest.
These exponents have very few (0-5) prior curves.
Prior curves! If you run a bunch of curves on candidates already tested to t30, you're not going to find any 25-digit factors. Your case is a less exaggerated version of this:

The former group has already had 30+ curves run at the bounds you're running. If you run 30 curves on the latter group, remove all those factors, and then run the same 60-70 curves, they'll find factors at about the same rate. The latter group has been tested less, so there are more factors there to find.
VBCurtis is offline   Reply With Quote
Old 2017-11-07, 20:37   #51
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

22·7·167 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Prior curves! If you run a bunch of curves on candidates already tested to t30, you're not going to find any 25-digit factors. Your case is a less exaggerated version of this:

The former group has already had 30+ curves run at the bounds you're running. If you run 30 curves on the latter group, remove all those factors, and then run the same 60-70 curves, they'll find factors at about the same rate. The latter group has been tested less, so there are more factors there to find.
I believe that is likely; however as a percentage:
My first range has already done 30/280 = (about 11%)
The second range basically none.

So shouldn't I expect the success rates to differ by only about this same 11%?
petrw1 is offline   Reply With Quote
Old 2017-11-08, 01:13   #52
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

23·3·72 Posts
Default

Quote:
Originally Posted by petrw1 View Post
The discussion above, as I understand it, suggests to me that doing more curves obviously takes longer but should have a proportionally larger success rate.
Quote:
Granted my sample sizes are NOT large but I think adequate.

The first batch doing 60 or 70 curves per exponent has done 209 exponents and found 14 factors.
But has a ratio of 975 curves per factor.
14 factored out of 209 exponents = ~6.699% or about 1 in every 15 exponents factored.
Quote:
The second batch doing 1 or 3 curves per exponent has done 434 exponents and found 3 factors.
But has a ratio of 155 curves per factor. More than 6 times better.
3 factored out of 434 exponents = ~0.691% or about 1 in every 144 exponents factored.

I'm currently working in the 1,600,000-1,700,000 range with ~230 curves (B1=50k) per exponent to complete the t25.
52 factors out of 440 exponents = ~11.81% or about 1 every 8.4 exponents factored.

So, yeah, running more curves per exponent increases the ratio of factored exponents.

Quote:
It seems to suggest I should be running more exponents with less curves.
If your goal is to find many ECM factors with the minimal effort, than running only a few curves on exponents with little ECM done is probably the best course of action (keep in mind running a single B1=50k curve on a 2M exponent takes less CPU time than 1 curve (B1=50k) on a 9M exponent. This is your 'curves per factor' ratio. You could convert it to GHzdays per factor for better comparison between factoring 2M exponents and 9M exponents.

Quote:
As I understand the discussion above, once all 280 curves are done for an exponent the success rate will be as statistically expected.
Quote:
So the more curves I do the better my odds should be of finding the factor (if there is one in that range) and I would expect it to be proportionally better odds.
With the standard Prime95 ECM parameters B1=50k
The chance to miss a 25 digit factor after 280 curves (a.k.a. t25) = 1/e = ~36.8%
Now you can do another 280 curves to reduce this chance to 1/(e^2) = ~13.5%
and another 280 curves to get it to 1/(e^3) = ~5%
etc. to reduce it to practically 0%

Now this is not a very handy way to go about it, as you would have done a lot of curves with B1=50k to reduce your chance to miss <25 digit factors, which could be spend to do ECM curves with better chances to find factors with more digits (higher B1 and B2). Also, the curves with higher B1 also 'count' for reducing the chance to miss the <25 digit factors.
VictordeHolland is offline   Reply With Quote
Old 2017-11-08, 01:29   #53
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

4,861 Posts
Default

Quote:
Originally Posted by VictordeHolland View Post

If your goal is to find many ECM factors with the minimal effort, than running only a few curves on exponents with little ECM done is probably the best course of action (keep in mind running a single B1=50k curve on a 2M exponent takes less CPU time than 1 curve (B1=50k) on a 9M exponent. This is your 'curves per factor' ratio. You could convert it to GHzdays per factor for better comparison between factoring 2M exponents and 9M exponents.
This!

If you want to find the most factors per ECM-day, you should be able to find certain exponent ranges to minimize this; it's not always the least-ECM'ed candidates, because the time per curve scales roughly with the exponent. But, within a small range of exponents (say 1.6 to 1.7M) that all have the same prior ECM work done, you'll find factors faster doing one curve per candidate than you will doing 50 curves per input. That's because the candidate with the least prior ECM has the highest chance of a factor on the next curve, so after you've done one ECM curve the current candidate is now less likely than the next to have a factor in the appropriate range.
VBCurtis is offline   Reply With Quote
Old 2017-11-08, 14:13   #54
rcv
 
Dec 2011

8F16 Posts
Default

Quote:
Originally Posted by petrw1 View Post
The first batch doing 60 or 70 curves per exponent has done 209 exponents and found 14 factors.
The second batch doing 1 or 3 curves per exponent has done 434 exponents and found 3 factors.

I believe that is likely; however as a percentage:
My first range has already done 30/280 = (about 11%)
The second range basically none.
So shouldn't I expect the success rates to differ by only about this same 11%?
First, your approximation formula is OK, only because the numerator is small. A better formula is 1-(279/280)^30=10.2% After you run an additional 65 curves, the chances of having found a factor (if it is there) is approximately 1-(279/280)^95=28.8%. I.e., 18.6% of your exponents should have found a factor (if it was there), after you run curves #31 through #95.

If I make a simplifying assumption that your curves will only find factors in the 25-digit range, and define that range as 22.5 digits to 27.5 digits, then I expect about ln(27.5)-ln(22.5)=20.1% of the candidates will actually have a factor in your range. Unless there is a mistake in my math, you should have found 209*20.1%*18.6% = 7.8 factors from your 209*65 curves. (For your second batch, assuming you are running the first two curves, I estimate that you should have found 434*20.1%*0.7%=0.6 factors from your 434*2 curves.)

You are doing much better than expected, which means you are lucky or more factors have been left from the previous B1 level than expected or you are picking up more factors from the next B1 level than my highly simplified calculation expected or these numbers have more factors than random numbers of similar size.

Second, your sample size is very small. During an election, when they poll 3000 to 5000 people, they claim the "margin of error" is 4-5%. I'm not a statistician, but those numbers seem to be universal constants among polling experts whether they are polling something that's near 50/50 or whether they are polling a 12-person primary race. My point is that your sample size is too small to draw any conclusions. You have "polled" 209 exponents. And 5% of 209 exponents is 10.45. So, any number of factors in the range of 7.8-10.45 to 7.8+10.45 would be within a 5% margin of error.

Third. It is possible unreported curves have been run on some of your numbers.

Last fiddled with by rcv on 2017-11-08 at 14:24
rcv is offline   Reply With Quote
Old 2017-11-08, 16:46   #55
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

29·3·7 Posts
Default

Quote:
Originally Posted by rcv View Post
Second, your sample size is very small. During an election, when they poll 3000 to 5000 people, they claim the "margin of error" is 4-5%. I'm not a statistician, but those numbers seem to be universal constants among polling experts whether they are polling something that's near 50/50 or whether they are polling a 12-person primary race. My point is that your sample size is too small to draw any conclusions. You have "polled" 209 exponents. And 5% of 209 exponents is 10.45. So, any number of factors in the range of 7.8-10.45 to 7.8+10.45 would be within a 5% margin of error.
IF the errors are normally distributed the error in the estimate scales as the square root of the sample size. √209 β‰… 14.5, which is 7% of 209.

The above is very hand-waving, of course, because of the assumption made and of the small sample size.
xilman is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Odds of Finding a Factor Gordon Factoring 18 2015-09-20 20:33
how much ECM without finding an existing factor dbaugh PrimeNet 4 2013-01-11 16:31
p-1 testing ram vs finding a factor crash893 Software 11 2006-07-03 21:48
Probability of finding a factor JuanTutors Software 20 2004-09-26 09:47
possibility of finding a factor there_is_no_spoon Math 10 2004-03-11 20:05

All times are UTC. The time now is 17:59.


Fri Jul 16 17:59:32 UTC 2021 up 49 days, 15:46, 1 user, load averages: 1.53, 1.39, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.