20110402, 11:30  #1 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT)
161C_{16} Posts 
Optimal ECM bounds
I have just been reading R.D. Silverman's paper on ECM and he states optimal bounds for K=100(B2 runs 100 times as fast as B1). These seem to approximately. be the values we use today. Has anyone ever done such a study specifically for GMPECM? He stated that there was a version available at the time capable of K=170 on some machines. For B1=1e6 and N=2^10611 K apeared to be about 1000 for GMPECM(old version) and rose further when I upped the bound to 3e6.

20110402, 13:04  #2 
Tribal Bullet
Oct 2004
2·3·587 Posts 
Modern highperformance implementations of stage 2 use asymptotically faster methods than were known at the time that paper was published, and the expectation now is that B2 can be around B1^2.

20110402, 14:42  #3  
Jun 2005
lehigh.edu
2^{10} Posts 
Quote:
Both programs were highly optimised for the hardware he was using, and one of the more remarkable acomplishments of GMPECM is to get the stage2 performance of ecm/fft (and better) in a much more portable setting (with the help of GMP, of course). There is now lots of ecm being run on constrained hardware, and the info available with Alex's v is perhaps more useful than the tables of either. Bruce 

20110603, 14:50  #4  
Nov 2006
Terra
4E_{16} Posts 
Quote:
The defaults seem to be roughly in the ballpark for optimality , as computed by "v" . Most of my experience does not involve base2 . Am I missing something ? 

20110603, 22:39  #5  
Nov 2003
2^{3}×3^{2}×103 Posts 
Quote:
as it does in step 1. The B2/B1 ratio depends on the relative speed of the two steps. In theory. a 'perfect' FFT implementation could take step 2 to B1^2 as the same time as B1. We can come close to this for P1/FFT, but ECM requires more work. See Peter's thesis. Not that memory requirements are large. 

20110604, 16:56  #6  
Nov 2006
Terra
2×3×13 Posts 
Thanks for your response .
Quote:
I've done a little testing and have this summary : Code:
N has 192 digits Step took B1 35 40 45 50 55 1 2 1e6 8.64h 3.41d 38.42d 1.39y 20.79y 22s 12s B2scale 2 1e6 8.39h< 3.30d 36.88d 1.31y 19.69y 22 16 B2scale 3 1e6 8.66h 3.37d 37.78d 1.34y 19.90y 22 19 3e6 8.51h 2.58d< 22.21d 220.d 6.75y B2scale 2 3e6 9.03h 2.70d 23.05d 227.d 6.89y B2scale 3 3e6 11.06h 3.32d 28.09d 276.d 8.37y 11e6 11.23h 2.64d 17.25d< 129.d 3.00y B2scale 2 11e6 12.27h 2.86d 18.63d 139.d 3.21y B2scale 3 11e6 13.34h 3.10d 20.15d 149.d 3.42y < indicates minimal estimated time , in column , for B1 optimal ( from Sect. 2 of README ) Quote:
for B1 = 3e6 or 11e6 , "v" indicates that omitting B2scale is better than B2scale = 2 . For B1 = 1e6 , B2scale = 2 is better , but even increasing B2scale to 3 , which makes B2 = 2852890960 , the time in Step 2 is still less than in Step 1 , and is much less with B2scale = 2 . And multiplying by roughly 3 to gen that B2 still leaves it very far short of B1^2 . The times estimated by "v" seem fairly insensitive to minor variation of either B1 or B2 . Should I simply believe "v" ? 

20110604, 22:42  #7  
Jun 2005
lehigh.edu
1024_{10} Posts 
Quote:
did some programming of the functions?), not an oracle. The default settings in GMPECM represent the judgement(s) of the development group and/or PaulZ. There were 23 years when I was in close correspondence with Paul, starting from when we were using Peter's ecm/fft, up until the first p50, through the start of gmpecm. (I started even earlier, with the K=170 binary that Peter released on USENET shortly before his thesis.) My recollection is that we always had the time in step2 well below the time in step1. Jason's surely better informed than I am about current practice; but the hardware I'm using rarely has more than 1GB or 2Gb per core (the infiniband nodes have 4GB/core, but that's for mpi). That means running with k 20 on both the AMD cluster (B1=900M) and the i3's (B1=400M). That's with default B2. Larger B2's would be even less effective use of the hardware. Bruce 

20110605, 02:00  #8 
Nov 2006
Terra
78_{10} Posts 

20110605, 03:52  #9  
Jun 2005
lehigh.edu
10000000000_{2} Posts 
Quote:
Code:
Input number is 834480887308189852864258319385610596806807102878057989759424194715 101018717084231996448760702874136063242293479817992219906842611077 80541410625311368649139559999749566012004773094257570166565025601 (197 digits) Using B1=900000000, B2=15892679602816, polynomial Dickson(30), sigma=35391187 Step 1 took 5883661ms Step 2 took 1104159ms Code:
Input number is 2311441782800636721476839371857807107612640190768925727581509336830931529 59432859911205316526447726158837969986730856197713290412756834680396303612926255802672088 981784372547072723261892218374759839058175659333535274438890981601 (228 digits) Using B1=400000000, B2=15892277350966, polynomial Dickson(30), sigma=1330596911 Step 1 took 3478089ms Step 2 took 2141442ms Code:
Input number is 3885859863425607930702651581959725877839501274801459476185487915134976553 90571209561306072334101939984989248503400285461614836689567500474945216351420803015135996 9357925764767518867383066073369732140010282744175931167592624365319891317 (235 digits) Using B1=600000000, B2=9535594068016, polynomial Dickson(30), sigma=1779133192 Step 1 took 5009987ms Step 2 took 820253ms The first is linux/AMD, the last two windows7/intel. No particular reason for you (or anyone else) to focus on what I'm running  all GMPECM defaults. Why not, for the state of the art, compare EPFL's runs on the playstations. Their first large runs used B1 = 3e9, B2 = 1.039e14; more recent ones have used B1 = 1e9, B2 = 2.54e13. If I'm recalling correctly, they choose these B2's as the GMPECM default, despite running step1 on the playstation cluster, saveresume, step2 "using 4 cores per node on a 56node cluster (two hexcore processors per node)". Totally different hardware, not a B1^2 in sight. I think if we had more memory available, we might consider larger B2's; seems rather that we're running the largest B1 we can get so that the default B2 fits in the memory we have. bdodson 

20110605, 14:19  #10  
Nov 2003
2^{3}×3^{2}×103 Posts 
Quote:
was always large enough to hold all of the data. 

20110609, 08:32  #11 
Sep 2010
Scandinavia
3·5·41 Posts 
I don't understand the meaning of "as much time in step 2 as in step 1". Minimizing expected time with v doesn't come close to that. And the paper seems to be recommending 1:~0.4 rather than 1:1.
What am I missing? Also, is there a more recent analysis available? 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
What Bounds to choose, and what are Bounds  144  Information & Answers  5  20170315 13:36 
Optimal LL configuration  aurashift  Hardware  11  20150922 14:09 
optimal B1  MiniGeek  Factoring  4  20110527 12:19 
optimal parameters for GMPECM , oe+ , I  Walter Nissen  GMPECM  16  20070320 19:35 
Optimal ECM Escalation?  wblipp  ElevenSmooth  16  20040813 19:01 