mersenneforum.org Saving computation in ECM
 Register FAQ Search Today's Posts Mark Forums Read

 2004-06-02, 00:23 #1 dave_dm   May 2004 8010 Posts Saving computation in ECM Suppose we have run 2900 curves with B1 = 3e6 and we now want to run 5500 curves with B1 = 11e6. If we continue on from the old residues then we've already done about 15% of the work. This saving is superficially significant. (15% = (3e6 * 2900) / (11e6 * 5500)). Of course, the snag here is conditional probability: since we know that 2900 curves have failed to yield a factor, continuing them to B1 = 11e6 has a lower probability of success than running 2900 'fresh' curves to B1 = 11e6. There are some good papers out there which show us how to reselect parameters after failure, but none (that I've found) suggest that we keep the residues from previous trials. Has anyone looked into working out such an algorithm? Dave
2004-06-02, 18:00   #2
xilman
Bamboozled!

"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across

266058 Posts

Quote:
 Originally Posted by dave_dm Suppose we have run 2900 curves with B1 = 3e6 and we now want to run 5500 curves with B1 = 11e6. If we continue on from the old residues then we've already done about 15% of the work. This saving is superficially significant. (15% = (3e6 * 2900) / (11e6 * 5500)). Of course, the snag here is conditional probability: since we know that 2900 curves have failed to yield a factor, continuing them to B1 = 11e6 has a lower probability of success than running 2900 'fresh' curves to B1 = 11e6. There are some good papers out there which show us how to reselect parameters after failure, but none (that I've found) suggest that we keep the residues from previous trials. Has anyone looked into working out such an algorithm? Dave
Yes, they have.

Exercise: go read up on how to drive version 5 of the GMP-ECM program written primarily by Paul Zimmermann and Alex Kruppa, with assistance from many others. There are enough keywords in the previous sentence to enable you to find the requisite course materials by using any half-way decent search engine.

As you note, whether it's worth continuing is an entirely different question, and the subject of another and rather harder exercise.

Paul

2004-06-03, 00:10   #3
dave_dm

May 2004

24×5 Posts

Yes, I'm already very familiar with the gmp-ecm code. It should be fairly easy to produce an algorithm that keeps using the old residues using -save / -resume and some perl scripting, but I'm more concerned with working out the theoretical performance gain, if any.

Quote:
 Originally Posted by xilman Yes, they have.
Could you give me a reference to an analysis such an algorithm then? Neither my stash of ecm papers nor my favourite half-decent search engine seem to have any.

Thanks,

Dave

2004-06-03, 08:24   #4
geoff

Mar 2003
New Zealand

22058 Posts

Quote:
 Originally Posted by dave_dm Has anyone looked into working out such an algorithm?
I tried to work this out, I am not an expert so there may be some point I have missed:

Say you have done the expected number of curves to bound B1=m (and saved the stage one residues), and the next higher digit size d has optimal bound B1=n.

Let p,q be the probability of one curve finding a factor of size d with B1=m,B1=n respectively. Let r,s,t be the time to complete stage one to B1=m, stage one to B1=n, and stage two from B1=n respectively.

Then it is worth continuing from the saved residue if (q-p)/(s-r+t) > q/(s+t), or equivalently if the ratio (q-p)(s+t) / q(s-r+t) is greater than 1.

As an example I tried curves on 2^971-1 with a P4 2.66GHz and Prime95 (B2=100B1) using n=11e6, m=43e6. I got the probabilities p=1/79353, q=1/18461 by using Richard McIntosh's Maple program <http://www.mersenneforum.org/showpo...06&postcount=19> with factor size 50.5, and the resulting times were r=151, s=588, t=289 seconds. The ratio came to 0.927, so in this case it is better to start a fresh curve than continuing from the saved one.

2004-06-03, 15:03   #5
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

750810 Posts

Quote:
 Originally Posted by geoff I tried to work this out, I am not an expert so there may be some point I have missed: Say you have done the expected number of curves to bound B1=m (and saved the stage one residues), and the next higher digit size d has optimal bound B1=n. Let p,q be the probability of one curve finding a factor of size d with B1=m,B1=n respectively. Let r,s,t be the time to complete stage one to B1=m, stage one to B1=n, and stage two from B1=n respectively. Then it is worth continuing from the saved residue if (q-p)/(s-r+t) > q/(s+t), or equivalently if the ratio (q-p)(s+t) / q(s-r+t) is greater than 1. As an example I tried curves on 2^971-1 with a P4 2.66GHz and Prime95 (B2=100B1) using n=11e6, m=43e6. I got the probabilities p=1/79353, q=1/18461 by using Richard McIntosh's Maple program with factor size 50.5, and the resulting times were r=151, s=588, t=289 seconds. The ratio came to 0.927, so in this case it is better to start a fresh curve than continuing from the saved one.
Actually it is quite a bit more subtle than this. One also needs to consider
the probability that the group order will be smooth up to the new limits
given the conditional probability that it was NOT smooth up to the old
limit. This involves some very subtle problems in analytic number theory.
I am not sure, offhand, how I would go about solving them. It is also a
Bayseian problem because one must take into account the size distribution
of the desired factor as a prior.

My intuition [and I could be quite wrong] is that if a curve fails, it is
NOT worthwhile to save the output from Step 1 and then extend the B1
limit on the same curve. This intuition comes from the analysis of the role of
large primes in QS & NFS. Pomerance showed that if a large number N is
NOT B1 smooth, then one does not get too many additional relations by
allowing the introduction of 1 or two additional large primes. This can be
made exact via Dickman's function. Thus, if a curve fails to limit B1,
then by extending B1 to a new value B1' you are in essence looking for a
'large prime factorization' of the group order. The 'large prime' variation
only affects the o(1) term in the run time.

This (very rough handwaving!) argument is what gives me my intuition.

Take it for what it is worth.

Bob

2004-06-06, 05:01   #6
dave_dm

May 2004

24·5 Posts

Quote:
 Originally Posted by Bob Silverman This (very rough handwaving!) argument is what gives me my intuition. Take it for what it is worth. Bob
Thanks for the input; IMO very rough handwaving arguments are often very useful :)

Dave

 2004-06-06, 21:35 #7 dave_dm   May 2004 1208 Posts Experimental analysis :) OK here's some experimental analysis. I computed the group order of 6014 curves over the 30-digit prime p = 19679309647679934749999111. Of these, 16 were (B1=50000, B2=11757135)-smooth and 88 were (B1=250000, B2=116469998)-smooth. Let a = 1-16/6014 and b = 1-88/6014. If we sample from the generated set of curves then the probability of success with 500 curves at B1=250000 is 1 - a^500 = 0.9993718853. Now suppose we take 240 curves which are not (50000, 11757135)-smooth. If we continue these to B1=250000 then how many additional 'fresh' curves do we need to compute to B1=250000 to achieve the same probability of success? Well, assuming my brain's in gear, conditional probability tells us that we need n such that 1 - b^(240+n) / a^240 = 1 - a^500, so the required n is 304. It remains to see which strategy takes more time. On my box, running 500 curves to B1=250000 takes 2960s. Running 240 curves from B1=50000 to B1=250000 and then 304 fresh curves to B1=250000 takes 3067s. So we conclude that, for this 30-digit prime, if we continue the saved B1=50000 residues then it takes 3.6% more time to achieve the same probability of finding p than if we run fresh curves to B1=250000. I reckon this is mainly due to the stage 2's duplicating work, which cannot be avoided. This argument extends by handwaving to the conclusion that it's probably never worth continuing failed residues. Dave
2004-06-08, 15:09   #8
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

22×1,877 Posts

Quote:
 Originally Posted by dave_dm OK here's some experimental analysis. This argument extends by handwaving to the conclusion that it's probably never worth continuing failed residues. Dave

Nice!

 2004-06-12, 14:18 #9 Fusion_power     Aug 2003 Snicker, AL 26·3·5 Posts Extended in another direction, it might also make sense to run ECM with a series of bounds tossing the residue each time. I suspect this would be a null gain situation unless some method were devised of selecting bounds most likely to yield a factor. Also, if the # in question had only very large factors, would the group order preclude finding a factor regardless of the selected bounds? Fusion Last fiddled with by Fusion_power on 2004-06-12 at 14:21

 Similar Threads Thread Thread Starter Forum Replies Last Post JohnFullspeed Miscellaneous Math 8 2011-07-13 10:54 fivemack Factoring 121 2010-02-20 18:32 ldesnogu Lounge 11 2010-01-07 14:42 fivemack Lounge 0 2008-09-05 20:23 BioRules Information & Answers 9 2008-05-31 13:52

All times are UTC. The time now is 21:01.

Mon Jan 30 21:01:13 UTC 2023 up 165 days, 18:29, 0 users, load averages: 1.01, 1.02, 1.00