mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)

 Prime95 2007-12-10 01:33

I'm tracking ECM effort in the v5 server. I compute total_ecm_effort as sum (curve_count * B1). When this total gets to certain levels I issue ECM assignments with a larger B1 value.

The problem? The summed B1 values assume a B2 value of 100*B1. When I use GMP-ECM B2 is often much higher than 100*B1. I need a simple formula that given B1 and B2 it returns a multiplier. I apply this multiplier to the curve count to get a rough equivalent of the number of B2=B1*100 curves. Accuracy is not critical - a 10-20% error would be fine.

For example, on a number I'm currently atttacking B2 = 7000 * B1. According to GMP-ECM one of these curves equals 2.3 curves of B2 = 100 * B1.

My placeholder routine on the server is this:

function normalized_B1 (\$B1, \$B2)
{
\$ratio = (1.0 * \$B2) / \$B1;
if (\$ratio < 7) return (\$B1 * 0.4);
if (\$ratio < 30) return (\$B1 * 0.6);
if (\$ratio < 70) return (\$B1 * 0.8);
if (\$ratio < 300) return (\$B1);
if (\$ratio < 700) return (\$B1 * 1.4);
if (\$ratio < 3000) return (\$B1 * 1.7);
if (\$ratio < 7000) return (\$B1 * 2.3);
return (\$B1 * 2.8);
}

 akruppa 2007-12-11 13:12

I get these figures for the expected number of curves, all without Brent-Suyama extension:

B1=1e6
B2: Curves to find p35: Ratio:
1e6 14317 0.12
1e7 3388 0.49
1e8 1674 1
1e9 993 1.68
1e10 639 2.62
1e11 430 3.89
1e12 297 5.62

B1=1e7
B2: Curves to find p45: Ratio:
1e7 94363 0.12
1e8 22594 0.5
1e9 11286 1
1e10 6779 1.66
1e11 4421 2.55
1e12 3015 3.74
1e13 2113 5.34

B1=1e8
B2: Curves to find p55: Ratio:
1e8 410456 0.12
1e9 101581 0.51
1e10 51549 1
1e11 31434 1.64
1e12 20823 2.47
1e13 14436 3.57
1e14 10291 5.00

The closest function I got to these ratios is
0.12 + 0.88 * (log_10(B2 / B1) / 2) ^ 1.5, which produces for log_10(B2 / B1) = 0 ... 6
0: 0.1200000000000000000000000000
1: 0.4311269837220809107363715193
2: 1.000000000000000000000000000
3: 1.736663230236897544810207489
4: 2.609015869776647285890972155
5: 3.598505426185217265198782899
6: 4.692614131981836054912458342

The values for 1 and 6 are somewhat too low, but such B2/B1 ratios will probably not be used much. For 3 it's somewhat too high. If you want really accurate estimates, you could call Pari/GP as an external program and let the rho.gp script compute probabilities. Or I'll write a small wrapper around the C code in GMP-ECM.

Alex

 Prime95 2007-12-11 15:03

Thanks, Alex. Your formula was exactly what I needed. Like I said too much accuracy in curve counting is not critical as the only consequence is you move on to a larger B1 a liitle too early or a little too late - no big deal.

Another ECM puzzle for you: I want to hand out ECM assignments such that I roughly maximize the chance of finding a factor for a given effort. What formula would you use given the quantities total_ecm_effort (the sum of completed B1 values) and N (the Mersenne exponent). Ignore the step function nature of FFT lengths and assume the time it takes to run an ECM curve is proportional to N (log N).

A simpler way to phrase this is: If an exponent has had a total_ecm_effort X, what is the probability the next ECM curve will find a factor (to make life easier, ignore the step nature of assigned B1 values).

Given such a formula I can then use the expected CPU run time to maximize factors found.

 akruppa 2007-12-12 17:58

This isn't an easy question. You'd want to compute the distribution of remaining factors given an a priori distribution (i.e. 1/n probability of n-bit factor), adjust by taking previous factoring effort into account and choose the ECM parameters so that the scalar product of that distribution and ECM's probability of finding factors of n bits, divided by the time ECM with those parameters would use is maximal. This is a bit messy.

This is what Silverman and Wagstaff described in the "Practical Analysis" paper (which I have to admit still haven't read as carefully as I should! :alex:). As I understood it, the scheme of doing the expected number of curves for a certain factor size, then moving on to a size 5 digits larger is a good approximation to exactly this goal: maximising each curve's probability of finding a factor per unit time. If you want more precise parameter choice, all I can offer is code to compute ECM's proability of success for given parameters/factor sizes, so you can model the Bayesian process.

Alex

 All times are UTC. The time now is 00:53.