20040602, 00:23  #1 
May 2004
80_{10} Posts 
Saving computation in ECM
Suppose we have run 2900 curves with B1 = 3e6 and we now want to run 5500 curves with B1 = 11e6. If we continue on from the old residues then we've already done about 15% of the work. This saving is superficially significant. (15% = (3e6 * 2900) / (11e6 * 5500)).
Of course, the snag here is conditional probability: since we know that 2900 curves have failed to yield a factor, continuing them to B1 = 11e6 has a lower probability of success than running 2900 'fresh' curves to B1 = 11e6. There are some good papers out there which show us how to reselect parameters after failure, but none (that I've found) suggest that we keep the residues from previous trials. Has anyone looked into working out such an algorithm? Dave 
20040602, 18:00  #2  
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
26605_{8} Posts 
Quote:
Exercise: go read up on how to drive version 5 of the GMPECM program written primarily by Paul Zimmermann and Alex Kruppa, with assistance from many others. There are enough keywords in the previous sentence to enable you to find the requisite course materials by using any halfway decent search engine. As you note, whether it's worth continuing is an entirely different question, and the subject of another and rather harder exercise. Paul 

20040603, 00:10  #3  
May 2004
2^{4}×5 Posts 
Yes, I'm already very familiar with the gmpecm code. It should be fairly easy to produce an algorithm that keeps using the old residues using save / resume and some perl scripting, but I'm more concerned with working out the theoretical performance gain, if any.
Quote:
Thanks, Dave 

20040603, 08:24  #4  
Mar 2003
New Zealand
2205_{8} Posts 
Quote:
Say you have done the expected number of curves to bound B1=m (and saved the stage one residues), and the next higher digit size d has optimal bound B1=n. Let p,q be the probability of one curve finding a factor of size d with B1=m,B1=n respectively. Let r,s,t be the time to complete stage one to B1=m, stage one to B1=n, and stage two from B1=n respectively. Then it is worth continuing from the saved residue if (qp)/(sr+t) > q/(s+t), or equivalently if the ratio (qp)(s+t) / q(sr+t) is greater than 1. As an example I tried curves on 2^9711 with a P4 2.66GHz and Prime95 (B2=100B1) using n=11e6, m=43e6. I got the probabilities p=1/79353, q=1/18461 by using Richard McIntosh's Maple program <http://www.mersenneforum.org/showpo...06&postcount=19> with factor size 50.5, and the resulting times were r=151, s=588, t=289 seconds. The ratio came to 0.927, so in this case it is better to start a fresh curve than continuing from the saved one. 

20040603, 15:03  #5  
"Bob Silverman"
Nov 2003
North of Boston
7508_{10} Posts 
Quote:
the probability that the group order will be smooth up to the new limits given the conditional probability that it was NOT smooth up to the old limit. This involves some very subtle problems in analytic number theory. I am not sure, offhand, how I would go about solving them. It is also a Bayseian problem because one must take into account the size distribution of the desired factor as a prior. My intuition [and I could be quite wrong] is that if a curve fails, it is NOT worthwhile to save the output from Step 1 and then extend the B1 limit on the same curve. This intuition comes from the analysis of the role of large primes in QS & NFS. Pomerance showed that if a large number N is NOT B1 smooth, then one does not get too many additional relations by allowing the introduction of 1 or two additional large primes. This can be made exact via Dickman's function. Thus, if a curve fails to limit B1, then by extending B1 to a new value B1' you are in essence looking for a 'large prime factorization' of the group order. The 'large prime' variation only affects the o(1) term in the run time. This (very rough handwaving!) argument is what gives me my intuition. Take it for what it is worth. Bob 

20040606, 05:01  #6  
May 2004
2^{4}·5 Posts 
Quote:
Dave 

20040606, 21:35  #7 
May 2004
120_{8} Posts 
Experimental analysis :)
OK here's some experimental analysis. I computed the group order of 6014 curves over the 30digit prime p = 19679309647679934749999111. Of these, 16 were (B1=50000, B2=11757135)smooth and 88 were (B1=250000, B2=116469998)smooth.
Let a = 116/6014 and b = 188/6014. If we sample from the generated set of curves then the probability of success with 500 curves at B1=250000 is 1  a^500 = 0.9993718853. Now suppose we take 240 curves which are not (50000, 11757135)smooth. If we continue these to B1=250000 then how many additional 'fresh' curves do we need to compute to B1=250000 to achieve the same probability of success? Well, assuming my brain's in gear, conditional probability tells us that we need n such that 1  b^(240+n) / a^240 = 1  a^500, so the required n is 304. It remains to see which strategy takes more time. On my box, running 500 curves to B1=250000 takes 2960s. Running 240 curves from B1=50000 to B1=250000 and then 304 fresh curves to B1=250000 takes 3067s. So we conclude that, for this 30digit prime, if we continue the saved B1=50000 residues then it takes 3.6% more time to achieve the same probability of finding p than if we run fresh curves to B1=250000. I reckon this is mainly due to the stage 2's duplicating work, which cannot be avoided. This argument extends by handwaving to the conclusion that it's probably never worth continuing failed residues. Dave 
20040608, 15:09  #8  
"Bob Silverman"
Nov 2003
North of Boston
2^{2}×1,877 Posts 
Quote:
Nice! 

20040612, 14:18  #9 
Aug 2003
Snicker, AL
2^{6}·3·5 Posts 
Extended in another direction, it might also make sense to run ECM with a series of bounds tossing the residue each time. I suspect this would be a null gain situation unless some method were devised of selecting bounds most likely to yield a factor. Also, if the # in question had only very large factors, would the group order preclude finding a factor regardless of the selected bounds?
Fusion Last fiddled with by Fusion_power on 20040612 at 14:21 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
New Computation  JohnFullspeed  Miscellaneous Math  8  20110713 10:54 
A computation worthy of the name  fivemack  Factoring  121  20100220 18:32 
New Pi Computation Record  ldesnogu  Lounge  11  20100107 14:42 
Value of computation  fivemack  Lounge  0  20080905 20:23 
Not Saving  BioRules  Information & Answers  9  20080531 13:52 