20110619, 23:12  #1 
Sep 2010
Portland, OR
2^{2}·3·31 Posts 
Estimating minimum relations
I've been digging through the estimation of the minimum number of relations, and I've managed to thoroughly confuse myself. In general, when running factmsieve.py (or the Perl version), the default parameters seem to come from a combination of the defpar.txt file which ships with GGNFS and the factmsieve script itself.
My version of defpar.txt seems to have been last updated in 2005, and says that everything over 118 digits is a guess. Am I overlooking some code in factmsieve which overrides these parameters? Would it be useful to start a data collection effort to update them? I would be happy to contribute to or organize such an effort, but I would probably need guidance from some of the experts on how to interpret the data. Attached is a graph which shows factmsieve's estimated number of relations for lpbr=26,27,28 and the actual number of relations needed for a number of my recent factorizations. The "errorbars" are the relations after the nexttolast and last sieving pass, so when the estimated line passes through the errorbar, it was correct; when the estimated line is below the errorbar, filtering was run unsuccessfully. There are a number of misses in the c107109 range; this may be because msieve seems to produce far few polys in this range. From c115 and up, factmsieve consistently underestimates the number of relations needed. Suggestions/guidance welcome. Thanks! 
20110620, 07:32  #2 
May 2008
Worcester, United Kingdom
2·263 Posts 
I very much welcome your offer of providing an update on the parameters for the ggnfs/msive driver scripts. This would benefit both the Perl and the Python scripts.
When I first built factmsieve.py it was immediately clear that the parameter values were out of date. I started an exercise to improve them but I soon realised that this needed expertise that I did not have. I hoped exxperts here would help with this and many did offer raw data but what was really needed was for someone with deep knowledge of the algorithms to convert this raw data into an algorithm (or tables) for parameter estimation. But nobody took this up so I made only minor improvements. So I will willingly do what I can to incorporate any advances you make into the Python script. 
20110620, 07:43  #3 
Sep 2004
2·5·283 Posts 

20110621, 16:40  #4 
Sep 2010
Portland, OR
174_{16} Posts 
Thanks Brian. Let's start with some data collection and see what happens.
The useful log files are the summary from factmsieve (e.g. g105test.txt) and the full log (e.g. test.log). If folks can post whatever logs they have lying around, and start saving future ones, I'll dig through them and see what I can come up with. I've never done an SNFS factorization so I have no data to contribute there, but if this effort does actually result in changing anything we can incorporate whatever updates are available for SNFS too. I really have no idea what I'm doing here so any suggestions about how to proceed are most welcome. 
20110621, 17:16  #5 
(loop (#_fork))
Feb 2006
Cambridge, England
13·491 Posts 
I have done several hundred factorisations in this range, but since I didn't use factmsieve (I use my own python code to pick gnfs parameters) I imagine my results aren't interesting to this project. I don't see much of a correlation between numberofrelations and sizeofnumber.
The thing I've been noticing recently is that it's reasonable to aim for about 85 million rather than about 100 million unique relations when factoring 155digit numbers with the 14e siever, lpbr=lpba=30 and alim around 3040 million; this obviously saves quite a lot of time (15% of a few dozen CPUdays). Last fiddled with by fivemack on 20110621 at 17:18 
20110621, 18:52  #6 
Sep 2009
977 Posts 
For the 512bit RSA keys we TI calculator community factored in the summer of 2009, factMsieve.pl suggested lpbr=lpba=29 and mfbr=mfbr=58. Postprocessing usually occurred with more than 60M raw relations.
I don't know how much oversieving it represented  at the time, I didn't know the difference between GNFS and SNFS (I learnt it after starting to attend MF) 
20110622, 15:30  #7  
Nov 2007
Halifax, Nova Scotia
2^{3}×7 Posts 
I just checked it out, and it seems that accurately estimating the minimum relations needed isn't trivial:
Quote:
In that paper the authors describe a method based on test sieving a tiny interval, estimating the number of useful relations found, and extrapolating to the number of total relations needed to factor the number. They say their method is ~2% accurate in the cases that they tested. So, I would say that no matter what you do you're not going to get much better than 10% accuracy using a table of numbers and number of relations needed. You could push it to 2% by implementing whatever this person did in their paper. Finding a better way might be an interesting research topic, but that's the level you're talking at. 

20110622, 16:56  #8 
(loop (#_fork))
Feb 2006
Cambridge, England
1100011101111_{2} Posts 
That paper was trying to figure out necessary relation counts for problems big enough that you can't interpolate nearby ones, which is a good and an interesting problem  to decide what the right parameters to use for RSA768 are without having ever done something anything like that size before.
For our purposes we really just need interpolation. I would be tempted just to remove the '>= 160' from the part that switches to using Batalov's interpolated plots, and run a few more tests. (what changes in defpar.txt at 115 digits?) Last fiddled with by fivemack on 20110622 at 16:57 
20110622, 17:59  #9  
Sep 2010
Portland, OR
2^{2}×3×31 Posts 
Quote:
But today, for larger numbers the estimates are (in my experience) guaranteed to be too low. And the specialq range sieved between filtering attempts is fixed, so for a c140 you may retry filtering after increasing the number of relations by only 0.5%. If the estimate is off by 1015%, that means you run filtering 2030 times; this can add a day to your factorization. A simple improvement might be to, if filtering fails, sieve until the number of relations has increased by at least 2.5% (or something) before retrying. That's probably worth doing regardless of any other parameter changes. But looking at the blue line above, the slope looks right but the intercept is off. Just sliding the estimation line up would be an improvement even if still far from perfect. Quote:
It's the midpoint between these two lines: # type,digits,deg,maxs1,maxskew,goodScore,efrac,j0,j1,eStepSize, # maxTime,rlim,alim,lpbr,lpba,mfbr,mfba,rlambda,alambda,qintsize,A,B gnfs,112,5,61,2000,1.6e4,0.25,250,15,50000,2800,3500000,3500000,27,27,50,50,2.6,2.6,100000,4000000,300 gnfs,118,5,63,2000,2.6e5,0.28,250,20,50000,3600,4500000,4500000,27,27,50,50,2.4,2.4,60000,4000000,300 I don't think the maxs1, goodScore, efrac, and j1 fields are used when doing msieve poly selection. That leaves rlambda/alambda as the only relevant difference. Based on the drop in qintsize, I'd guess that it's expecting a siever change here  but I think factmsieve uses 13e from 110139 digits. 

20110622, 18:30  #10  
(loop (#_fork))
Feb 2006
Cambridge, England
13·491 Posts 
Quote:
On the other hand, my driver explicitly does trial sieving: it sets alim as a function of the number of digits, sieves alim .. alim+1000 (or .. alim+10000 if above 105 digits), and then picks the initial sieving range to get enough relations assuming that the yield is constant at alim. What I tend to do manually is a stronger version of that: I pick some roughlyright alim, sieve 10000 to get a yield, and then set alim so that, assuming that I sustain that yield, I'll get enough relations when I sieve Q = alim/3 .. alim. In practice I've found that the dropoff in yield is about at alim/3, and this seems to give about 10% faster finished factorisations than other protocols I've tried (for example aiming to sieve Q = alim/2 .. 3*alim/2). I attach my code here, you want the FactorsByGNFS function. Note that the smaller digitcounts are tuned with comparison to a fairly old msieve; you may not want those bits, but I think the adaptive range selection is probably actually useful. Quote:
Quote:
I suggest you explore doing lpbr=29 with the 14e siever from as low as 140 digits; with my approach of aiming to do all the sieving below alim, that seems to be where the crossover lies. Though my database contains enough changes of strategy and changes of computer that it's pretty difficult to query conclusively. Last fiddled with by fivemack on 20110622 at 18:35 

20110622, 18:32  #11  
Nov 2007
Halifax, Nova Scotia
56_{10} Posts 
Quote:
Quote:
http://myweb.dal.ca/~dstaple/to_bchaffin.tgz The numbers were from wblipp's page of composites: http://oddperfect.org/composites.html Essentially all of those numbers could have been done with SNFS, but several already had factors divided out. Anyway, I factored them all using GNFS. I don't know if that changes things. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Estimating a sum over primes  brownkenny  Math  6  20100825 01:28 
What is minimum RAM needed for 6 Core CPU for P1?  odin  Hardware  15  20100418 14:22 
Minimum/desired CPU specs for ECM factoring  Kaboom  PrimeNet  10  20090417 14:58 
Msieve NFS minimum size  10metreh  Msieve  35  20090402 19:14 
More relations mean many more relations wanted  fivemack  Factoring  7  20070804 17:32 