mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2012-07-20, 21:54   #1
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2·3·11·31 Posts
Default GNFS poly selection

NFS@Home will likely complete a few GNFS factorizations in the coming year, so I've started paying more attention to poly selection lately. There is scattered information in the forum, but it's not easy to find and somewhat contradictory. Therefore I'd like to bring together in one place data and best practices, and this thread is as good a place as any.

Tools:
The current best tools seem to be msieve and CADO-NFS. I haven't yet used the latter. For the former, all parameters are in the source. The latter has suggested parameters for up to 171, 180, 190, and 204 digits. Is there a sense of the relative speeds and output quality of these two tools?

Skew:
I have seen a couple reports that it's best to keep the skew under, or not much more than 100,000,000. However, a default run of msieve seems to start with a leading coefficient of 1 and that increases very slowly. For larger inputs, this can result in very large skews. Should large skews be avoided? If so, what is a recommended starting leading coefficient at each size?

Degree:
What point is the recommended crossover from degree 5 to degree 6? Around 205 digits? 210 digits? Also, is there an easy way to compare the Murphy E values for degree 5 and degree 6 polynomials?

Murphy E:
Does anyone have a list of "good" Murphy E values for, say, 180-215 digits? Is the estimate from msieve a good starting point? I'm running a poly search on a C180 right now, and I have an E of 1.047e-13. Is this "good?"

Other:
What have I missed?
____________________

Summary so far.

Tools:
The current best tools are msieve and CADO-NFS. For the former, all parameters are in the source. The latter has suggested parameters for up to 171, 180, 190, and 204 digits. Perhaps interpolation/extrapolation of these will provide parameters for additional values. Is there a sense of the relative speeds and output quality of these two tools?

Skew:
A skew of 1G is too large. 100-200M is likely ok. A default run of msieve seems to start with a leading coefficient of 1 and that increases very slowly. For larger inputs, this can result in very large skews. What is a recommended starting leading coefficient at larger sizes?

Degree:
The recommended crossover from degree 5 to degree 6 is somewhere around 205-210 digits. The Murphy E values for degree 5 and degree 6 polynomials are roughly comparable, give or take a factor of 1.5.

Murphy E:
The Murphy E estimate from msieve is a good starting point, but search long enough and you can expect to do 25% to up to 40% better.

Last fiddled with by Batalov on 2012-07-21 at 18:44 Reason: per request
frmky is offline   Reply With Quote
Old 2012-07-21, 01:38   #2
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·2,399 Posts
Default

From fivemack's spreadsheet:
Code:
Number	Log	Murphy score * 10^12	FB bound	lp	siever	raw rels	uniq rels	qmin	qmax	matrix
a4788.2483	160.35	1.121	50000000	30	14	107814368	84854642	10000000	70000000	6678530
a9282.1052	160.35	1.382	36000000	30	15	91069556	79565413	12000000	36000000	5402330
a8352.1741	160.38	1.281	72000000	30	14	92637288	78614888	24000000	72000000	8269313
a1920.2385	160.66	1.223	75000000	30	14	94865861	80208988	25000000	75000000	8093178
a9436.1257	161.10	1.354	66000000	30	14	90054270	76935917	20000000	70000000	8602493
a9436.1257	161.10	1.354	66000000	30	14	106970831	89686933	20000000	80000000	6802901
a9282.1015	161.26	1.202	43200000	30	15	110357758	94735732	14400000	43200000	4880040
a8352.1737	161.56	1.128	75000000	30	14	101640193	83532964	20000000	80000000	7962677
2+1012	162.16	0.725	80000000	30	15	135844421	115307765	30000000	80000000	5942338
a4788.2672	162.83	0.9121	48000000	30	15	109285743	92477844	14400000	48000000	5424269
a6832.1551	162.90	0.9903	75000000	31	14			25000000	75000000	
6+383	164.36	0.674	50000000	31	14, 15		184469003	24000000	140000000	7735780
a5748.1475	164.37	0.8166	60000000	30	15	112137409	94905542	20000000	60000000	6292857
a3270.620	165.25	0.4943	40000000	30.3a 30r	15	120443951	93204260	15000000	56000000	9199883
a2340.699	168.58	0.5434	42000000	30.3a 30r	15	109366004	86938590	14000000	44400000	10658721
a3270.632	169.20	0.4057	76320000	31	15	196414103	168562029	25440000	81408000	8963259
a2340.693	170.13	0.3204	60000000	31.3a 31r	15	194399054	158222487	20000000	60000000	16262901
a3270.645	171.49	0.2503	108000000	31	15	206766657	175581287	36000000	108000000	10670432
109!+1	176.16	0.1145122	100000000 alg	32a31r	15		277650572	10000000	185000000	13103018
2-877	177.39	0.0976	125000000	31	15	226161090	179686069	25000000	200000000	15247669
5-421	179.93	0.0693	80r150a	32a31r	15		301728045	40000000	340000000	16981887
2+956	186.12	0.02991	150000000	32.3a 31r	16	422973519	314969483	20000000	190000000	22202993
7+374	196.79	0.005627	240000000	33.3a 32r	16	885411825	645669676	80000000	450000000	33423296
Murphy E: Why do CADO-NFS and Msieve calculate different values for the same poly?

And somewhat off topic, how do Msieve/GGNFS compare to CADO-NFS in the other areas besides poly select?

Last fiddled with by Dubslow on 2012-07-21 at 01:39
Dubslow is offline   Reply With Quote
Old 2012-07-21, 02:01   #3
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

216508 Posts
Default

I've been fitting E values for a long time and added this to gnfs/poly/poly_skew.c quite a while ago (see the hidden comment, too):
Code:
                { /* SB: tried L[1/3,c] fit; it is no better than this */
                        double e0 = (digits >= 121) ? (0.0607 * digits + 2.25):
                                                      (0.0526 * digits + 3.23);
                        if (degree == 4) e0 = 0.0625 * digits + 1.69;
                        e0 = exp(-log(10) * e0);
#ifdef HAVE_CUDA
                        e0 *= 1.15;
#endif
                        logprintf(obj, "expecting poly E from %.2le to > %.2le\n",
                                e0, 1.15 * e0);
                        /* seen exceptional polys with +40% but that's */
                        /* rare. The fit is good for 88..232 digits */
                }
This should print a line in your msieve.log. This fit is underestimating the top end (because the higher we go, the less satisfactorily we test the poly space; there are probably excellent polys but there's not enough time to sample well). The fit for degree 4 is for the degree 4 E value; that curve is discontinuous. The low end of degree 5 has some silly hook, too (don't ask me why; it is a fit of the observed values from tons of aliquot, from Kamada's site and from B+D experiments and a few data points were taken from the MForum community jobs).

E.g. for your c180 the E value is surely good.

Crossover from degree 5 to degree 6 is exactly around 205 to 210 digits. You can compare the Murphy E values (as implemented by msieve) for degree 5 and degree 6 polynomials within a fudge factor of 1-1.5, in my experience. I've been running select quite a bit for the c204 with degree 6 and 5 and degree 5 won, recently. However, for RSA704 deg 6 was used and deg 5 poly seived weaker.
Batalov is offline   Reply With Quote
Old 2012-07-21, 03:43   #4
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DA116 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Murphy E: Why do CADO-NFS and Msieve calculate different values for the same poly?

And somewhat off topic, how do Msieve/GGNFS compare to CADO-NFS in the other areas besides poly select?
The E-value calculation is a nasty numerical integration; CADO-NFS compues the integral using a midpoint approximation, while msieve uses a fairly sophisticated adaptive numerical integrator. The end effect is that on average E-values reported by msieve are a little bit smaller than E-values reported by CADO-NFS, but the two packages usually give identical results to within (say) 5%.

The sieving in CADO-NFS is worlds better than what msieve has (i.e. it's a lattice sieve). The filtering and square root are quite comparable. I don't know how the linear algebra compares, since the two packages use vastly different algorithms. Paul's work with RSA704 produced a dataset that was a little undersieved, and the CADO filtering code handled it correctly when rerunning the dataset through msieve produced a bad matrix (an internal limit in the filtering code was silently exceeded).

Edit: Greg, others can point out lists of epected final E-values based on input size. Generally if you run the poly selection for a reasonable length of time you can find an E-value 25-35% better than the minimum value in the code. Paul and Thorsten reported that extremely large skew (say 1G) is too much for the siever to deal with, but generally you don't want extremely large skew because on average the polynomial size suffers too much when the skew is extremely large. In general you'll always be able to find a polynomial with good alpha and much better size, whose skew will be less-than-extremely-large.

Last fiddled with by jasonp on 2012-07-21 at 03:53
jasonp is offline   Reply With Quote
Old 2012-07-21, 04:01   #5
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×2,399 Posts
Default

Do you develop for CADO? How does CADO's siever compare to GGNFS?

And since Greg asked as well, how do the poly selects compare, or has too little comparison work been done?
Dubslow is offline   Reply With Quote
Old 2012-07-21, 05:35   #6
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

111111111102 Posts
Default

Summary so far... Someone with more power than me can put this in the first post if he is so inclined.

Tools:
The current best tools are msieve and CADO-NFS. For the former, all parameters are in the source. The latter has suggested parameters for up to 171, 180, 190, and 204 digits. Perhaps interpolation/extrapolation of these will provide parameters for additional values. Is there a sense of the relative speeds and output quality of these two tools?

Skew:
A skew of 1G is too large. 100-200M is likely ok. A default run of msieve seems to start with a leading coefficient of 1 and that increases very slowly. For larger inputs, this can result in very large skews. What is a recommended starting leading coefficient at larger sizes?

Degree:
The recommended crossover from degree 5 to degree 6 is somewhere around 205-210 digits. The Murphy E values for degree 5 and degree 6 polynomials are roughly comparable, give or take a factor of 1.5.

Murphy E:
The Murphy E estimate from msieve is a good starting point, but search long enough and you can expect to do 25% to up to 40% better.

Thanks!
frmky is offline   Reply With Quote
Old 2012-07-21, 06:08   #7
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22×7×317 Posts
Default

Quote:
Originally Posted by frmky View Post
Someone with more power than me can put this in the first post if he is so inclined.
Please don't!
It is easier to understand if the evolution of the discussion is not changed, at least for me, as non-native English speaker and beginner in NFS stuff. If one moves all answers in the first posts and overwrite the questions, it will look like the answers in the following posts (2,3,4) are on the thin air. This is not "reservation" thread, please let the discussion to flow, I am interested on the subject. Questions like these (well, dumber version of them) I have myself and I never asked because I was not able to put them in a reasonable form (well, to ask the right questions you have to have some knowledge about the subject, to know where to start from ).
We may need a "summary" post at the beginning of the thread (as a forum feature, not special for this thread), but overwriting the first post is generally bad for discussions like this. If one OP expects this evolution, then he should create an empty post for summary, and starts the discussion in the second post.
LaurV is offline   Reply With Quote
Old 2012-07-21, 10:09   #8
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

5×31×37 Posts
Default

It would be possible to edit the first post leaving what is currently there along with an explanation that the first bit is a summary added later. Another alternative is copying a random post from somewhere on the forum with a prior date to this thread and putting what we want in that also with an explanation. I do agree that it would be worth have this all in one place.
henryzz is offline   Reply With Quote
Old 2012-07-21, 20:32   #9
poily
 
Nov 2010

2·52 Posts
Default

Shi Bai keeps record of known factorization's murphy_e's here, might be useful.
poily is offline   Reply With Quote
Old 2012-07-22, 18:07   #10
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3×1,163 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Do you develop for CADO? How does CADO's siever compare to GGNFS?

And since Greg asked as well, how do the poly selects compare, or has too little comparison work been done?
No, I'm not on the team there, though I have contributed to GMP-ECM and still do so once in a while. Alex can better comment on how the sievers compare, since I understand he wrote a great deal of the CADO version.
jasonp is offline   Reply With Quote
Old 2012-07-22, 22:20   #11
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

5×31×37 Posts
Default

Quote:
Originally Posted by jasonp View Post
No, I'm not on the team there, though I have contributed to GMP-ECM and still do so once in a while. Alex can better comment on how the sievers compare, since I understand he wrote a great deal of the CADO version.
The other thing when the question is answered is: How do they compare in how futureproof they are? What is the largest number they can realistically do? Can that be improved easily?

The ggnfs siever struck me as being rather stretched by M1061. We aren't hugely developing it further and there are some architectural difficulties that will be hit when improving it(17e not available(can this be done with any of the lasieve5 source?) etc). It also isn't portable to windows in it's fastest form.
Does the CADO siever share any of these difficulties in the same way?
henryzz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
msieve parallel poly selection with MPI drone84 Msieve 4 2017-06-28 09:18
C138 poly selection firejuggler Aliquot Sequences 1 2011-02-21 06:38
Different msieve 1.39 poly selection outputs... Jeff Gilchrist Msieve 5 2008-12-29 23:07
Homogeneous Cunningham snfs poly selection? nuggetprime Factoring 22 2008-08-15 10:01
poly selection in MPQS bsquared Factoring 3 2007-02-28 14:22

All times are UTC. The time now is 02:04.

Sun Nov 1 02:04:22 UTC 2020 up 51 days, 23:15, 2 users, load averages: 1.77, 1.81, 1.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.