mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2011-04-02, 11:30   #1
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×2,861 Posts
Default Optimal ECM bounds

I have just been reading R.D. Silverman's paper on ECM and he states optimal bounds for K=100(B2 runs 100 times as fast as B1). These seem to approximately. be the values we use today. Has anyone ever done such a study specifically for GMP-ECM? He stated that there was a version available at the time capable of K=170 on some machines. For B1=1e6 and N=2^1061-1 K apeared to be about 1000 for GMP-ECM(old version) and rose further when I upped the bound to 3e6.
henryzz is offline   Reply With Quote
Old 2011-04-02, 13:04   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,529 Posts
Default

Modern high-performance implementations of stage 2 use asymptotically faster methods than were known at the time that paper was published, and the expectation now is that B2 can be around B1^2.
jasonp is offline   Reply With Quote
Old 2011-04-02, 14:42   #3
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

210 Posts
Default

Quote:
Originally Posted by henryzz View Post
I have just been reading R.D. Silverman's paper on ECM and he states optimal bounds for K=100(B2 runs 100 times as fast as B1). These seem to approximately. be the values we use today. Has anyone ever done such a study specifically for GMP-ECM? He stated that there was a version available at the time capable of K=170 on some machines. For B1=1e6 and N=2^1061-1 K apeared to be about 1000 for GMP-ECM(old version) and rose further when I upped the bound to 3e6.
Peter's thesis completely reset the recommendations of Bob & Sam's paper.
Both programs were highly optimised for the hardware he was using, and one
of the more remarkable acomplishments of GMP-ECM is to get the stage2
performance of ecm/fft (and better) in a much more portable setting (with
the help of GMP, of course). There is now lots of ecm being run on
constrained hardware, and the info available with Alex's -v is perhaps more
useful than the tables of either. -Bruce
bdodson is offline   Reply With Quote
Old 2011-06-03, 14:50   #4
Walter Nissen
 
Walter Nissen's Avatar
 
Nov 2006
Terra

2·3·13 Posts
Default

Quote:
Originally Posted by jasonp View Post
Modern high-performance implementations of stage 2 use asymptotically faster methods than were known at the time that paper was published, and the expectation now is that B2 can be around B1^2.
B1^2 is far greater than the default values for B2 which I have seen .
The defaults seem to be roughly in the ballpark for optimality , as
computed by "-v" .
Most of my experience does not involve -base2 .
Am I missing something ?
Walter Nissen is offline   Reply With Quote
Old 2011-06-03, 22:39   #5
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

164448 Posts
Default

Quote:
Originally Posted by Walter Nissen View Post
B1^2 is far greater than the default values for B2 which I have seen .
The defaults seem to be roughly in the ballpark for optimality , as
computed by "-v" .
Most of my experience does not involve -base2 .
Am I missing something ?
Optimality is achieved when the algorithm spends as much time in step 2
as it does in step 1. The B2/B1 ratio depends on the relative speed
of the two steps.

In theory. a 'perfect' FFT implementation could take step 2 to B1^2
as the same time as B1. We can come close to this for P-1/FFT, but ECM
requires more work. See Peter's thesis. Not that memory requirements are
large.
R.D. Silverman is offline   Reply With Quote
Old 2011-06-04, 16:56   #6
Walter Nissen
 
Walter Nissen's Avatar
 
Nov 2006
Terra

2×3×13 Posts
Question

Thanks for your response .

Quote:
Originally Posted by R.D. Silverman View Post
Optimality is achieved when the algorithm spends as much time in step 2
as it does in step 1.
I wonder if this , too , is good theory , but only approximated by GMP-ECM ?

I've done a little testing and have this summary :

Code:
N has 192 digits                                          Step took
             B1    35       40      45       50      55    1    2
             1e6  8.64h   3.41d   38.42d   1.39y   20.79y  22s  12s
 -B2scale 2  1e6  8.39h<- 3.30d   36.88d   1.31y   19.69y  22   16
 -B2scale 3  1e6  8.66h   3.37d   37.78d   1.34y   19.90y  22   19
             3e6  8.51h   2.58d<- 22.21d   220.d    6.75y
 -B2scale 2  3e6  9.03h   2.70d   23.05d   227.d    6.89y
 -B2scale 3  3e6 11.06h   3.32d   28.09d   276.d    8.37y
            11e6 11.23h   2.64d   17.25d<- 129.d    3.00y
 -B2scale 2 11e6 12.27h   2.86d   18.63d   139.d    3.21y
 -B2scale 3 11e6 13.34h   3.10d   20.15d   149.d    3.42y

<- indicates minimal estimated time , in column , for B1 optimal 
( from Sect. 2 of README )
Quote:
Originally Posted by bdodson View Post
the info available with Alex's -v is perhaps more
useful than the tables of either. -Bruce
Focusing on the B1s shown in section 2 of the README file ,
for B1 = 3e6 or 11e6 , "-v" indicates that omitting B2scale
is better than B2scale = 2 .
For B1 = 1e6 , B2scale = 2 is better , but even increasing B2scale
to 3 , which makes B2 = 2852890960 , the time in Step 2 is
still less than in Step 1 , and is much less with B2scale = 2 .
And multiplying by roughly 3 to gen that B2 still leaves it
very far short of B1^2 .
The times estimated by "-v" seem fairly insensitive to minor
variation of either B1 or B2 .

Should I simply believe "-v" ?
Walter Nissen is offline   Reply With Quote
Old 2011-06-04, 22:42   #7
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

210 Posts
Default

Quote:
Originally Posted by Walter Nissen View Post
...
And multiplying by roughly 3 to gen that B2 still leaves it
very far short of B1^2 .
The times estimated by "-v" seem fairly insensitive to minor
variation of either B1 or B2 .

Should I simply believe "-v" ?
It's a computation of the relevant probabilities (for which Alex
did some programming of the functions?), not an oracle. The
default settings in GMP-ECM represent the judgement(s) of the
development group and/or PaulZ. There were 2-3 years when
I was in close correspondence with Paul, starting from when
we were using Peter's ecm/fft, up until the first p50, through
the start of gmp-ecm. (I started even earlier, with the K=170
binary that Peter released on USENET shortly before his thesis.)

My recollection is that we always had the time in step2 well below
the time in step1. Jason's surely better informed than I am about
current practice; but the hardware I'm using rarely has more than
1GB or 2Gb per core (the infiniband nodes have 4GB/core, but that's
for mpi). That means running with -k 20 on both the AMD cluster
(B1=900M) and the i3's (B1=400M). That's with default B2. Larger
B2's would be even less effective use of the hardware.

-Bruce
bdodson is offline   Reply With Quote
Old 2011-06-05, 02:00   #8
Walter Nissen
 
Walter Nissen's Avatar
 
Nov 2006
Terra

2×3×13 Posts
Question

Quote:
Originally Posted by bdodson View Post
My recollection is that we always had the time in step2 well below
the time in step1.
Thanks .

So , then , is it true that you spend the bulk of the time in step 1 ,
but find that step 2 finds a large percentage of the factors ?
Walter Nissen is offline   Reply With Quote
Old 2011-06-05, 03:52   #9
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

40016 Posts
Default

Quote:
Originally Posted by Walter Nissen View Post
Thanks .

So , then , is it true that you spend the bulk of the time in step 1 ,
but find that step 2 finds a large percentage of the factors ?
Sounds right. Here's timings on 3 hardware
Code:
Input number is 834480887308189852864258319385610596806807102878057989759424194715
101018717084231996448760702874136063242293479817992219906842611077
80541410625311368649139559999749566012004773094257570166565025601 (197 digits)
Using B1=900000000, B2=15892679602816, polynomial Dickson(30), sigma=35391187
Step 1 took 5883661ms
Step 2 took 1104159ms
on the new cluster
Code:
Input number is 2311441782800636721476839371857807107612640190768925727581509336830931529
59432859911205316526447726158837969986730856197713290412756834680396303612926255802672088
981784372547072723261892218374759839058175659333535274438890981601 (228 digits)
Using B1=400000000, B2=15892277350966, polynomial Dickson(30), sigma=1330596911
Step 1 took 3478089ms
Step 2 took 2141442ms
on an i3, with the scheduler running four jobs on each i3, and
Code:
Input number is 3885859863425607930702651581959725877839501274801459476185487915134976553
90571209561306072334101939984989248503400285461614836689567500474945216351420803015135996
9357925764767518867383066073369732140010282744175931167592624365319891317 (235 digits)
Using B1=600000000, B2=9535594068016, polynomial Dickson(30), sigma=1779133192
Step 1 took 5009987ms
Step 2 took 820253ms
on a dual core, with the scheduler running two jobs on each.
The first is linux/AMD, the last two windows7/intel. No particular
reason for you (or anyone else) to focus on what I'm running ---
all GMP-ECM defaults. Why not, for the state of the art, compare
EPFL's runs on the playstations. Their first large runs used
B1 = 3e9, B2 = 1.039e14; more recent ones have used B1 = 1e9,
B2 = 2.54e13. If I'm recalling correctly, they choose these B2's
as the GMP-ECM default, despite running step1 on the playstation
cluster, save-resume, step2 "using 4 cores per node on a 56-node
cluster (two hexcore processors per node)". Totally different hardware,
not a B1^2 in sight. I think if we had more memory available, we
might consider larger B2's; seems rather that we're running the largest
B1 we can get so that the default B2 fits in the memory we have.

-bdodson
bdodson is offline   Reply With Quote
Old 2011-06-05, 14:19   #10
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by bdodson View Post

<snip>

Totally different hardware,
not a B1^2 in sight. I think if we had more memory available, we
might consider larger B2's; seems rather that we're running the largest
B1 we can get so that the default B2 fits in the memory we have.

-bdodson
Indeed! The run time analysis in my paper assumed that memory
was always large enough to hold all of the data.
R.D. Silverman is offline   Reply With Quote
Old 2011-06-09, 08:32   #11
lorgix
 
lorgix's Avatar
 
Sep 2010
Scandinavia

3×5×41 Posts
Default

I don't understand the meaning of "as much time in step 2 as in step 1". Minimizing expected time with -v doesn't come close to that. And the paper seems to be recommending 1:~0.4 rather than 1:1.

What am I missing?

Also, is there a more recent analysis available?
lorgix is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
What Bounds to choose, and what are Bounds 144 Information & Answers 5 2017-03-15 13:36
Optimal LL configuration aurashift Hardware 11 2015-09-22 14:09
optimal B1 Mini-Geek Factoring 4 2011-05-27 12:19
optimal parameters for GMP-ECM , -oe+ , -I Walter Nissen GMP-ECM 16 2007-03-20 19:35
Optimal ECM Escalation? wblipp ElevenSmooth 16 2004-08-13 19:01

All times are UTC. The time now is 00:16.

Wed Sep 23 00:16:14 UTC 2020 up 12 days, 21:27, 1 user, load averages: 1.75, 1.81, 1.73

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.