20221002, 12:36  #23  
"Bob Silverman"
Nov 2003
North of Boston
2^{3}×937 Posts 
Quote:
Quote:


20221002, 17:59  #24 
Jul 2003
So Cal
2·3·421 Posts 
We only perform root optimization and get an escore on the best sizeoptimized hits. That said, I still have ~8.1 million unique sizeoptimized hits from my msieve run on 2,1109+. But I did not save the data on prior searches.

20221002, 20:14  #25 
"Curtis"
Feb 2005
Riverside, CA
2×3×5^{2}×37 Posts 
Forgive my ignorance, but is RDS asking for the mean score of, say, all GNFS155 jobs we've run, and look at the distribution of poly scores for various jobs run at some particular size? That appears to be Sean's interpretation of the question. I can address this, in that if I don't score within 5% of the trendline of the record poly scores I keep on polysearching. The outliers for records aren't THAT great an outlier; only one or two of the record entries are notably lucky / above trend across the entire table.
Greg's interpretation is similar to mine, that RDS was asking how big an outlier the record scores are compared to the typical polynomial score found during a search. However, I don't see how one would measure that mean, since the results depend so heavily on the filters applied at each phase of poly select. I mean, if I rootopt only my best 100 hits while Gimarel rootopts his top 1000, clearly we're going to have different concepts of "mean poly". While these objections are pretty obvious, I don't see a way to pick a standard for either one to yield meaningful distribution data. 
20221002, 20:23  #26  
"Bob Silverman"
Nov 2003
North of Boston
2^{3}·937 Posts 
Quote:
for each one, but it does exist. I am asking how the escores are distributed across all of the polynomials. What is their mean, how far from the mean (in terms of the standard deviation) is the best score? This is Stat101. 

20221002, 21:53  #27 
"Curtis"
Feb 2005
Riverside, CA
2·3·5^{2}·37 Posts 
Each polynomial... from what set? From stage 1, or after size opt, or after root opt?
What settings do you use for stage 1 norm, or stage 2 norm, or for P or nq (in CADO)? Each setting changes both the number of polys generated per unit of effort as well as their average quality. You seem to be treating the poly search as a black box that is the same for every search, while it's far from that. I don't think it's stats 101, despite your condescension rather, I think it reflects that you don't know much about how poly select is actually run. I suppose rather than argue with you, I should just suggest you gather your own data, so you get exactly what you're looking for. Last fiddled with by VBCurtis on 20221002 at 21:56 
20221002, 22:32  #28  
"Bob Silverman"
Nov 2003
North of Boston
7496_{10} Posts 
Quote:
We seem to be talking cross purposes. You are focussing on the specifics of how the search is conducted, and on the specifics of your implementation of that search. I am asking about the mathematics. How the selection is run does not matter. And your accusation that I was being condescending was totally uncalled for. Let N be an odd composite whose factorization is sought. Polynomial selection finds a (set of) polynomials F_i, roots of those polynomials M_i such that F_i(M_i) = 0 mod N. The second polynomial for NFS is then G_i(x) = x M_i. (or x+M_i depending on choice of sign). For each such pair of polynomials we seek to maximize integral integral Prob(norm(G(p)) * Prob(norm(F(p)) where the norms are the norms of the polynomials taken at a lattice point p, and the double integral is taken over the entire lattice and Prob is the probability that the norms are smooth over the factor base. The likelihood that the polys are jointly smooth is measured by the escore. Each such pair of polynomials has a escore. The selection process chooses a final pair of polynomials with maximal escore. The set of escores has a distribution. I am asking how the set of escores is distributed. The exact search process and the inputs that guide that search process do not matter. The question is: For each pair of polynomials (F,G) generated by the search process, [regardless of the specifics of how that search is conducted] there is a escore. It exists even if the search process does not compute it for every pair. What is is the distribution of the escores that are computed? By how much does the maximal escore exceed the mean? One can also ask: If you were to compute an escore for every pair of polynomials, what would its distribution be? I think this is worth a paper. Last fiddled with by R.D. Silverman on 20221002 at 22:59 

20221003, 10:01  #29  
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
6,011 Posts 
Quote:
Sounds like your next research project. The CPU version of msieve sounds like it would be fine for this study although it would be worth considering how the distribution parameters vary with more/less searching. 

20221006, 05:46  #30 
Jul 2003
So Cal
2·3·421 Posts 
I had a bit over 3 million unique sizeoptimized hits from 2,1109+. I initially forgot to sort them before using uniq. A quick histogram of their norms is attached.
I then took the 10000 polys with the lowest (best) norms and ran root optimization with msieve. msieve outputs 200 rootoptimized polynomials for each input, so this resulted in 2M polynomials. A quick histogram of their escores is also attached. 
20221006, 12:29  #31  
"Bob Silverman"
Nov 2003
North of Boston
2^{3}×937 Posts 
Quote:
theorem effect happening here? 

20221006, 16:18  #32 
Sep 2009
29·83 Posts 
I've got a fair amount of similar data for GGNFS poly selection for various numbers (most from the Brent tables) ranging from 130 to 176 digits.
My scripts usually kept the best 200300 polys from msieve np1 nps (size optimized) and put them through msieve npr (root optimized) which generated several times as many polys. I then used the one with the best escore. The scores from nps seem to be on a different scale to the scores from npr though. 6.400158e+19 vs 9.418e13 for the first entry in both files (this is for a c161). I could send you the data if you want it. I don't know enough statistics to do a useful analysis myself. If you want it send me a PM with details of what you want (do you want the polys or just the escores?) and how to send it (email or I could post it on a web site for you to download. And how many numbers would you want it for? 
20221006, 18:01  #33 
Jul 2003
So Cal
9DE_{16} Posts 
Likely since the escore is an integral combining contributions from both size and root optimization. msieve reports both the final norm characterizing the size and alpha characterizing the root properties of the polynomials. Attached are histograms of each of these separately for the same 2 million polynomials.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Polynomial selection  Max0526  NFS@Home  9  20170520 08:57 
Updated polynomial selection  jasonp  Msieve  65  20110501 19:06 
GNFS polynomial selection  Unregistered  Information & Answers  3  20110416 14:24 
2^8771 polynomial selection  fivemack  Factoring  47  20090616 00:24 
Polynomial selection  CRGreathouse  Factoring  2  20090525 07:55 