20120720, 21:54  #1 
Jul 2003
So Cal
100000010100_{2} Posts 
GNFS poly selection
NFS@Home will likely complete a few GNFS factorizations in the coming year, so I've started paying more attention to poly selection lately. There is scattered information in the forum, but it's not easy to find and somewhat contradictory. Therefore I'd like to bring together in one place data and best practices, and this thread is as good a place as any.
Tools: The current best tools seem to be msieve and CADONFS. I haven't yet used the latter. For the former, all parameters are in the source. The latter has suggested parameters for up to 171, 180, 190, and 204 digits. Is there a sense of the relative speeds and output quality of these two tools? Skew: I have seen a couple reports that it's best to keep the skew under, or not much more than 100,000,000. However, a default run of msieve seems to start with a leading coefficient of 1 and that increases very slowly. For larger inputs, this can result in very large skews. Should large skews be avoided? If so, what is a recommended starting leading coefficient at each size? Degree: What point is the recommended crossover from degree 5 to degree 6? Around 205 digits? 210 digits? Also, is there an easy way to compare the Murphy E values for degree 5 and degree 6 polynomials? Murphy E: Does anyone have a list of "good" Murphy E values for, say, 180215 digits? Is the estimate from msieve a good starting point? I'm running a poly search on a C180 right now, and I have an E of 1.047e13. Is this "good?" Other: What have I missed? ____________________ Summary so far. Tools: The current best tools are msieve and CADONFS. For the former, all parameters are in the source. The latter has suggested parameters for up to 171, 180, 190, and 204 digits. Perhaps interpolation/extrapolation of these will provide parameters for additional values. Is there a sense of the relative speeds and output quality of these two tools? Skew: A skew of 1G is too large. 100200M is likely ok. A default run of msieve seems to start with a leading coefficient of 1 and that increases very slowly. For larger inputs, this can result in very large skews. What is a recommended starting leading coefficient at larger sizes? Degree: The recommended crossover from degree 5 to degree 6 is somewhere around 205210 digits. The Murphy E values for degree 5 and degree 6 polynomials are roughly comparable, give or take a factor of 1.5. Murphy E: The Murphy E estimate from msieve is a good starting point, but search long enough and you can expect to do 25% to up to 40% better. Last fiddled with by Batalov on 20120721 at 18:44 Reason: per request 
20120721, 01:38  #2 
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 89<O<88
3·29·83 Posts 
From fivemack's spreadsheet:
Code:
Number Log Murphy score * 10^12 FB bound lp siever raw rels uniq rels qmin qmax matrix a4788.2483 160.35 1.121 50000000 30 14 107814368 84854642 10000000 70000000 6678530 a9282.1052 160.35 1.382 36000000 30 15 91069556 79565413 12000000 36000000 5402330 a8352.1741 160.38 1.281 72000000 30 14 92637288 78614888 24000000 72000000 8269313 a1920.2385 160.66 1.223 75000000 30 14 94865861 80208988 25000000 75000000 8093178 a9436.1257 161.10 1.354 66000000 30 14 90054270 76935917 20000000 70000000 8602493 a9436.1257 161.10 1.354 66000000 30 14 106970831 89686933 20000000 80000000 6802901 a9282.1015 161.26 1.202 43200000 30 15 110357758 94735732 14400000 43200000 4880040 a8352.1737 161.56 1.128 75000000 30 14 101640193 83532964 20000000 80000000 7962677 2+1012 162.16 0.725 80000000 30 15 135844421 115307765 30000000 80000000 5942338 a4788.2672 162.83 0.9121 48000000 30 15 109285743 92477844 14400000 48000000 5424269 a6832.1551 162.90 0.9903 75000000 31 14 25000000 75000000 6+383 164.36 0.674 50000000 31 14, 15 184469003 24000000 140000000 7735780 a5748.1475 164.37 0.8166 60000000 30 15 112137409 94905542 20000000 60000000 6292857 a3270.620 165.25 0.4943 40000000 30.3a 30r 15 120443951 93204260 15000000 56000000 9199883 a2340.699 168.58 0.5434 42000000 30.3a 30r 15 109366004 86938590 14000000 44400000 10658721 a3270.632 169.20 0.4057 76320000 31 15 196414103 168562029 25440000 81408000 8963259 a2340.693 170.13 0.3204 60000000 31.3a 31r 15 194399054 158222487 20000000 60000000 16262901 a3270.645 171.49 0.2503 108000000 31 15 206766657 175581287 36000000 108000000 10670432 109!+1 176.16 0.1145122 100000000 alg 32a31r 15 277650572 10000000 185000000 13103018 2877 177.39 0.0976 125000000 31 15 226161090 179686069 25000000 200000000 15247669 5421 179.93 0.0693 80r150a 32a31r 15 301728045 40000000 340000000 16981887 2+956 186.12 0.02991 150000000 32.3a 31r 16 422973519 314969483 20000000 190000000 22202993 7+374 196.79 0.005627 240000000 33.3a 32r 16 885411825 645669676 80000000 450000000 33423296 And somewhat off topic, how do Msieve/GGNFS compare to CADONFS in the other areas besides poly select? Last fiddled with by Dubslow on 20120721 at 01:39 
20120721, 02:01  #3 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
3×3,109 Posts 
I've been fitting E values for a long time and added this to gnfs/poly/poly_skew.c quite a while ago (see the hidden comment, too):
Code:
{ /* SB: tried L[1/3,c] fit; it is no better than this */ double e0 = (digits >= 121) ? (0.0607 * digits + 2.25): (0.0526 * digits + 3.23); if (degree == 4) e0 = 0.0625 * digits + 1.69; e0 = exp(log(10) * e0); #ifdef HAVE_CUDA e0 *= 1.15; #endif logprintf(obj, "expecting poly E from %.2le to > %.2le\n", e0, 1.15 * e0); /* seen exceptional polys with +40% but that's */ /* rare. The fit is good for 88..232 digits */ } E.g. for your c180 the E value is surely good. Crossover from degree 5 to degree 6 is exactly around 205 to 210 digits. You can compare the Murphy E values (as implemented by msieve) for degree 5 and degree 6 polynomials within a fudge factor of 11.5, in my experience. I've been running select quite a bit for the c204 with degree 6 and 5 and degree 5 won, recently. However, for RSA704 deg 6 was used and deg 5 poly seived weaker. 
20120721, 03:43  #4  
Tribal Bullet
Oct 2004
2·3·19·31 Posts 
Quote:
The sieving in CADONFS is worlds better than what msieve has (i.e. it's a lattice sieve). The filtering and square root are quite comparable. I don't know how the linear algebra compares, since the two packages use vastly different algorithms. Paul's work with RSA704 produced a dataset that was a little undersieved, and the CADO filtering code handled it correctly when rerunning the dataset through msieve produced a bad matrix (an internal limit in the filtering code was silently exceeded). Edit: Greg, others can point out lists of epected final Evalues based on input size. Generally if you run the poly selection for a reasonable length of time you can find an Evalue 2535% better than the minimum value in the code. Paul and Thorsten reported that extremely large skew (say 1G) is too much for the siever to deal with, but generally you don't want extremely large skew because on average the polynomial size suffers too much when the skew is extremely large. In general you'll always be able to find a polynomial with good alpha and much better size, whose skew will be lessthanextremelylarge. Last fiddled with by jasonp on 20120721 at 03:53 

20120721, 04:01  #5 
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 89<O<88
7221_{10} Posts 
Do you develop for CADO? How does CADO's siever compare to GGNFS?
And since Greg asked as well, how do the poly selects compare, or has too little comparison work been done? 
20120721, 05:35  #6 
Jul 2003
So Cal
2^{2}×11×47 Posts 
Summary so far... Someone with more power than me can put this in the first post if he is so inclined.
Tools: The current best tools are msieve and CADONFS. For the former, all parameters are in the source. The latter has suggested parameters for up to 171, 180, 190, and 204 digits. Perhaps interpolation/extrapolation of these will provide parameters for additional values. Is there a sense of the relative speeds and output quality of these two tools? Skew: A skew of 1G is too large. 100200M is likely ok. A default run of msieve seems to start with a leading coefficient of 1 and that increases very slowly. For larger inputs, this can result in very large skews. What is a recommended starting leading coefficient at larger sizes? Degree: The recommended crossover from degree 5 to degree 6 is somewhere around 205210 digits. The Murphy E values for degree 5 and degree 6 polynomials are roughly comparable, give or take a factor of 1.5. Murphy E: The Murphy E estimate from msieve is a good starting point, but search long enough and you can expect to do 25% to up to 40% better. Thanks! 
20120721, 06:08  #7  
Romulan Interpreter
Jun 2011
Thailand
2^{3}·19·61 Posts 
Quote:
It is easier to understand if the evolution of the discussion is not changed, at least for me, as nonnative English speaker and beginner in NFS stuff. If one moves all answers in the first posts and overwrite the questions, it will look like the answers in the following posts (2,3,4) are on the thin air. This is not "reservation" thread, please let the discussion to flow, I am interested on the subject. Questions like these (well, dumber version of them) I have myself and I never asked because I was not able to put them in a reasonable form (well, to ask the right questions you have to have some knowledge about the subject, to know where to start from ). We may need a "summary" post at the beginning of the thread (as a forum feature, not special for this thread), but overwriting the first post is generally bad for discussions like this. If one OP expects this evolution, then he should create an empty post for summary, and starts the discussion in the second post. 

20120721, 10:09  #8 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·2,909 Posts 
It would be possible to edit the first post leaving what is currently there along with an explanation that the first bit is a summary added later. Another alternative is copying a random post from somewhere on the forum with a prior date to this thread and putting what we want in that also with an explanation. I do agree that it would be worth have this all in one place.

20120722, 18:07  #10 
Tribal Bullet
Oct 2004
2×3×19×31 Posts 
No, I'm not on the team there, though I have contributed to GMPECM and still do so once in a while. Alex can better comment on how the sievers compare, since I understand he wrote a great deal of the CADO version.

20120722, 22:20  #11  
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
13272_{8} Posts 
Quote:
The ggnfs siever struck me as being rather stretched by M1061. We aren't hugely developing it further and there are some architectural difficulties that will be hit when improving it(17e not available(can this be done with any of the lasieve5 source?) etc). It also isn't portable to windows in it's fastest form. Does the CADO siever share any of these difficulties in the same way? 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
msieve parallel poly selection with MPI  drone84  Msieve  4  20170628 09:18 
C138 poly selection  firejuggler  Aliquot Sequences  1  20110221 06:38 
Different msieve 1.39 poly selection outputs...  Jeff Gilchrist  Msieve  5  20081229 23:07 
Homogeneous Cunningham snfs poly selection?  nuggetprime  Factoring  22  20080815 10:01 
poly selection in MPQS  bsquared  Factoring  3  20070228 14:22 