mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2004-06-02, 00:23   #1
dave_dm
 
May 2004

24·5 Posts
Default Saving computation in ECM

Suppose we have run 2900 curves with B1 = 3e6 and we now want to run 5500 curves with B1 = 11e6. If we continue on from the old residues then we've already done about 15% of the work. This saving is superficially significant. (15% = (3e6 * 2900) / (11e6 * 5500)).

Of course, the snag here is conditional probability: since we know that 2900 curves have failed to yield a factor, continuing them to B1 = 11e6 has a lower probability of success than running 2900 'fresh' curves to B1 = 11e6.

There are some good papers out there which show us how to reselect parameters after failure, but none (that I've found) suggest that we keep the residues from previous trials. Has anyone looked into working out such an algorithm?

Dave
dave_dm is offline   Reply With Quote
Old 2004-06-02, 18:00   #2
xilman
Bamboozled!
 
xilman's Avatar
 
May 2003
Down not across

17×593 Posts
Default

Quote:
Originally Posted by dave_dm
Suppose we have run 2900 curves with B1 = 3e6 and we now want to run 5500 curves with B1 = 11e6. If we continue on from the old residues then we've already done about 15% of the work. This saving is superficially significant. (15% = (3e6 * 2900) / (11e6 * 5500)).

Of course, the snag here is conditional probability: since we know that 2900 curves have failed to yield a factor, continuing them to B1 = 11e6 has a lower probability of success than running 2900 'fresh' curves to B1 = 11e6.

There are some good papers out there which show us how to reselect parameters after failure, but none (that I've found) suggest that we keep the residues from previous trials. Has anyone looked into working out such an algorithm?

Dave
Yes, they have.

Exercise: go read up on how to drive version 5 of the GMP-ECM program written primarily by Paul Zimmermann and Alex Kruppa, with assistance from many others. There are enough keywords in the previous sentence to enable you to find the requisite course materials by using any half-way decent search engine.

As you note, whether it's worth continuing is an entirely different question, and the subject of another and rather harder exercise.

Paul
xilman is offline   Reply With Quote
Old 2004-06-03, 00:10   #3
dave_dm
 
May 2004

24·5 Posts
Default

Yes, I'm already very familiar with the gmp-ecm code. It should be fairly easy to produce an algorithm that keeps using the old residues using -save / -resume and some perl scripting, but I'm more concerned with working out the theoretical performance gain, if any.

Quote:
Originally Posted by xilman
Yes, they have.
Could you give me a reference to an analysis such an algorithm then? Neither my stash of ecm papers nor my favourite half-decent search engine seem to have any.

Thanks,

Dave
dave_dm is offline   Reply With Quote
Old 2004-06-03, 08:24   #4
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default

Quote:
Originally Posted by dave_dm
Has anyone looked into working out such an algorithm?
I tried to work this out, I am not an expert so there may be some point I have missed:

Say you have done the expected number of curves to bound B1=m (and saved the stage one residues), and the next higher digit size d has optimal bound B1=n.

Let p,q be the probability of one curve finding a factor of size d with B1=m,B1=n respectively. Let r,s,t be the time to complete stage one to B1=m, stage one to B1=n, and stage two from B1=n respectively.

Then it is worth continuing from the saved residue if (q-p)/(s-r+t) > q/(s+t), or equivalently if the ratio (q-p)(s+t) / q(s-r+t) is greater than 1.

As an example I tried curves on 2^971-1 with a P4 2.66GHz and Prime95 (B2=100B1) using n=11e6, m=43e6. I got the probabilities p=1/79353, q=1/18461 by using Richard McIntosh's Maple program <http://www.mersenneforum.org/showpo...06&postcount=19> with factor size 50.5, and the resulting times were r=151, s=588, t=289 seconds. The ratio came to 0.927, so in this case it is better to start a fresh curve than continuing from the saved one.
geoff is offline   Reply With Quote
Old 2004-06-03, 15:03   #5
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

3×13×191 Posts
Default

Quote:
Originally Posted by geoff
I tried to work this out, I am not an expert so there may be some point I have missed:

Say you have done the expected number of curves to bound B1=m (and saved the stage one residues), and the next higher digit size d has optimal bound B1=n.

Let p,q be the probability of one curve finding a factor of size d with B1=m,B1=n respectively. Let r,s,t be the time to complete stage one to B1=m, stage one to B1=n, and stage two from B1=n respectively.

Then it is worth continuing from the saved residue if (q-p)/(s-r+t) > q/(s+t), or equivalently if the ratio (q-p)(s+t) / q(s-r+t) is greater than 1.

As an example I tried curves on 2^971-1 with a P4 2.66GHz and Prime95 (B2=100B1) using n=11e6, m=43e6. I got the probabilities p=1/79353, q=1/18461 by using Richard McIntosh's Maple program <http://www.mersenneforum.org/showpo...06&postcount=19> with factor size 50.5, and the resulting times were r=151, s=588, t=289 seconds. The ratio came to 0.927, so in this case it is better to start a fresh curve than continuing from the saved one.
Actually it is quite a bit more subtle than this. One also needs to consider
the probability that the group order will be smooth up to the new limits
given the conditional probability that it was NOT smooth up to the old
limit. This involves some very subtle problems in analytic number theory.
I am not sure, offhand, how I would go about solving them. It is also a
Bayseian problem because one must take into account the size distribution
of the desired factor as a prior.

My intuition [and I could be quite wrong] is that if a curve fails, it is
NOT worthwhile to save the output from Step 1 and then extend the B1
limit on the same curve. This intuition comes from the analysis of the role of
large primes in QS & NFS. Pomerance showed that if a large number N is
NOT B1 smooth, then one does not get too many additional relations by
allowing the introduction of 1 or two additional large primes. This can be
made exact via Dickman's function. Thus, if a curve fails to limit B1,
then by extending B1 to a new value B1' you are in essence looking for a
'large prime factorization' of the group order. The 'large prime' variation
only affects the o(1) term in the run time.

This (very rough handwaving!) argument is what gives me my intuition.

Take it for what it is worth.

Bob
R.D. Silverman is offline   Reply With Quote
Old 2004-06-06, 05:01   #6
dave_dm
 
May 2004

24·5 Posts
Default

Quote:
Originally Posted by Bob Silverman

This (very rough handwaving!) argument is what gives me my intuition.

Take it for what it is worth.

Bob
Thanks for the input; IMO very rough handwaving arguments are often very useful :)

Dave
dave_dm is offline   Reply With Quote
Old 2004-06-06, 21:35   #7
dave_dm
 
May 2004

24·5 Posts
Default Experimental analysis :)

OK here's some experimental analysis. I computed the group order of 6014 curves over the 30-digit prime p = 19679309647679934749999111. Of these, 16 were (B1=50000, B2=11757135)-smooth and 88 were (B1=250000, B2=116469998)-smooth.

Let a = 1-16/6014 and b = 1-88/6014. If we sample from the generated set of curves then the probability of success with 500 curves at B1=250000 is

1 - a^500 = 0.9993718853.

Now suppose we take 240 curves which are not (50000, 11757135)-smooth. If we continue these to B1=250000 then how many additional 'fresh' curves do we need to compute to B1=250000 to achieve the same probability of success? Well, assuming my brain's in gear, conditional probability tells us that we need n such that

1 - b^(240+n) / a^240 = 1 - a^500,

so the required n is 304.

It remains to see which strategy takes more time. On my box, running 500 curves to B1=250000 takes 2960s. Running 240 curves from B1=50000 to B1=250000 and then 304 fresh curves to B1=250000 takes 3067s.

So we conclude that, for this 30-digit prime, if we continue the saved B1=50000 residues then it takes 3.6% more time to achieve the same probability of finding p than if we run fresh curves to B1=250000. I reckon this is mainly due to the stage 2's duplicating work, which cannot be avoided.

This argument extends by handwaving to the conclusion that it's probably never worth continuing failed residues.

Dave
dave_dm is offline   Reply With Quote
Old 2004-06-08, 15:09   #8
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

3·13·191 Posts
Default

Quote:
Originally Posted by dave_dm
OK here's some experimental analysis.

<snip>

This argument extends by handwaving to the conclusion that it's probably never worth continuing failed residues.

Dave

Nice!
R.D. Silverman is offline   Reply With Quote
Old 2004-06-12, 14:18   #9
Fusion_power
 
Fusion_power's Avatar
 
Aug 2003
Snicker, AL

7×137 Posts
Default

Extended in another direction, it might also make sense to run ECM with a series of bounds tossing the residue each time. I suspect this would be a null gain situation unless some method were devised of selecting bounds most likely to yield a factor. Also, if the # in question had only very large factors, would the group order preclude finding a factor regardless of the selected bounds?

Fusion

Last fiddled with by Fusion_power on 2004-06-12 at 14:21
Fusion_power is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
New Computation JohnFullspeed Miscellaneous Math 8 2011-07-13 10:54
A computation worthy of the name fivemack Factoring 121 2010-02-20 18:32
New Pi Computation Record ldesnogu Lounge 11 2010-01-07 14:42
Value of computation fivemack Lounge 0 2008-09-05 20:23
Not Saving BioRules Information & Answers 9 2008-05-31 13:52

All times are UTC. The time now is 04:30.

Fri Jul 10 04:30:39 UTC 2020 up 107 days, 2:03, 0 users, load averages: 1.26, 1.49, 1.64

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.