![]() |
![]() |
#1 |
Sep 2010
Portland, OR
37210 Posts |
![]()
I've been digging through the estimation of the minimum number of relations, and I've managed to thoroughly confuse myself. In general, when running factmsieve.py (or the Perl version), the default parameters seem to come from a combination of the def-par.txt file which ships with GGNFS and the factmsieve script itself.
My version of def-par.txt seems to have been last updated in 2005, and says that everything over 118 digits is a guess. Am I overlooking some code in factmsieve which overrides these parameters? Would it be useful to start a data collection effort to update them? I would be happy to contribute to or organize such an effort, but I would probably need guidance from some of the experts on how to interpret the data. Attached is a graph which shows factmsieve's estimated number of relations for lpbr=26,27,28 and the actual number of relations needed for a number of my recent factorizations. The "errorbars" are the relations after the next-to-last and last sieving pass, so when the estimated line passes through the errorbar, it was correct; when the estimated line is below the errorbar, filtering was run unsuccessfully. There are a number of misses in the c107-109 range; this may be because msieve seems to produce far few polys in this range. From c115 and up, factmsieve consistently underestimates the number of relations needed. Suggestions/guidance welcome. Thanks! |
![]() |
![]() |
![]() |
#2 |
May 2008
Worcester, United Kingdom
10168 Posts |
![]()
I very much welcome your offer of providing an update on the parameters for the ggnfs/msive driver scripts. This would benefit both the Perl and the Python scripts.
When I first built factmsieve.py it was immediately clear that the parameter values were out of date. I started an exercise to improve them but I soon realised that this needed expertise that I did not have. I hoped exxperts here would help with this and many did offer raw data but what was really needed was for someone with deep knowledge of the algorithms to convert this raw data into an algorithm (or tables) for parameter estimation. But nobody took this up so I made only minor improvements. So I will willingly do what I can to incorporate any advances you make into the Python script. |
![]() |
![]() |
![]() |
#3 |
Sep 2004
2×5×283 Posts |
![]() ![]() |
![]() |
![]() |
![]() |
#4 |
Sep 2010
Portland, OR
22·3·31 Posts |
![]()
Thanks Brian. Let's start with some data collection and see what happens.
The useful log files are the summary from factmsieve (e.g. g105-test.txt) and the full log (e.g. test.log). If folks can post whatever logs they have lying around, and start saving future ones, I'll dig through them and see what I can come up with. I've never done an SNFS factorization so I have no data to contribute there, but if this effort does actually result in changing anything we can incorporate whatever updates are available for SNFS too. I really have no idea what I'm doing here so any suggestions about how to proceed are most welcome. ![]() |
![]() |
![]() |
![]() |
#5 |
(loop (#_fork))
Feb 2006
Cambridge, England
638210 Posts |
![]()
I have done several hundred factorisations in this range, but since I didn't use factmsieve (I use my own python code to pick gnfs parameters) I imagine my results aren't interesting to this project. I don't see much of a correlation between number-of-relations and size-of-number.
The thing I've been noticing recently is that it's reasonable to aim for about 85 million rather than about 100 million unique relations when factoring 155-digit numbers with the 14e siever, lpbr=lpba=30 and alim around 30-40 million; this obviously saves quite a lot of time (15% of a few dozen CPU-days). Last fiddled with by fivemack on 2011-06-21 at 17:18 |
![]() |
![]() |
![]() |
#6 |
Sep 2009
977 Posts |
![]()
For the 512-bit RSA keys we TI calculator community factored in the summer of 2009, factMsieve.pl suggested lpbr=lpba=29 and mfbr=mfbr=58. Post-processing usually occurred with more than 60M raw relations.
I don't know how much oversieving it represented - at the time, I didn't know the difference between GNFS and SNFS (I learnt it after starting to attend MF) ![]() |
![]() |
![]() |
![]() |
#7 | |
Nov 2007
Halifax, Nova Scotia
23·7 Posts |
![]()
I just checked it out, and it seems that accurately estimating the minimum relations needed isn't trivial:
Quote:
In that paper the authors describe a method based on test sieving a tiny interval, estimating the number of useful relations found, and extrapolating to the number of total relations needed to factor the number. They say their method is ~2% accurate in the cases that they tested. So, I would say that no matter what you do you're not going to get much better than 10% accuracy using a table of numbers and number of relations needed. You could push it to 2% by implementing whatever this person did in their paper. Finding a better way might be an interesting research topic, but that's the level you're talking at. |
|
![]() |
![]() |
![]() |
#8 |
(loop (#_fork))
Feb 2006
Cambridge, England
638210 Posts |
![]()
That paper was trying to figure out necessary relation counts for problems big enough that you can't interpolate nearby ones, which is a good and an interesting problem - to decide what the right parameters to use for RSA768 are without having ever done something anything like that size before.
For our purposes we really just need interpolation. I would be tempted just to remove the '>= 160' from the part that switches to using Batalov's interpolated plots, and run a few more tests. (what changes in defpar.txt at 115 digits?) Last fiddled with by fivemack on 2011-06-22 at 16:57 |
![]() |
![]() |
![]() |
#9 | ||
Sep 2010
Portland, OR
1011101002 Posts |
![]() Quote:
But today, for larger numbers the estimates are (in my experience) guaranteed to be too low. And the special-q range sieved between filtering attempts is fixed, so for a c140 you may retry filtering after increasing the number of relations by only 0.5%. If the estimate is off by 10-15%, that means you run filtering 20-30 times; this can add a day to your factorization. A simple improvement might be to, if filtering fails, sieve until the number of relations has increased by at least 2.5% (or something) before retrying. That's probably worth doing regardless of any other parameter changes. But looking at the blue line above, the slope looks right but the intercept is off. Just sliding the estimation line up would be an improvement even if still far from perfect. Quote:
It's the midpoint between these two lines: # type,digits,deg,maxs1,maxskew,goodScore,efrac,j0,j1,eStepSize, # maxTime,rlim,alim,lpbr,lpba,mfbr,mfba,rlambda,alambda,qintsize,A,B gnfs,112,5,61,2000,1.6e-4,0.25,250,15,50000,2800,3500000,3500000,27,27,50,50,2.6,2.6,100000,4000000,300 gnfs,118,5,63,2000,2.6e-5,0.28,250,20,50000,3600,4500000,4500000,27,27,50,50,2.4,2.4,60000,4000000,300 I don't think the maxs1, goodScore, efrac, and j1 fields are used when doing msieve poly selection. That leaves rlambda/alambda as the only relevant difference. Based on the drop in qintsize, I'd guess that it's expecting a siever change here -- but I think factmsieve uses 13e from 110-139 digits. |
||
![]() |
![]() |
![]() |
#10 | |||
(loop (#_fork))
Feb 2006
Cambridge, England
2·3,191 Posts |
![]() Quote:
On the other hand, my driver explicitly does trial sieving: it sets alim as a function of the number of digits, sieves alim .. alim+1000 (or .. alim+10000 if above 105 digits), and then picks the initial sieving range to get enough relations assuming that the yield is constant at alim. What I tend to do manually is a stronger version of that: I pick some roughly-right alim, sieve 10000 to get a yield, and then set alim so that, assuming that I sustain that yield, I'll get enough relations when I sieve Q = alim/3 .. alim. In practice I've found that the drop-off in yield is about at alim/3, and this seems to give about 10% faster finished factorisations than other protocols I've tried (for example aiming to sieve Q = alim/2 .. 3*alim/2). I attach my code here, you want the FactorsByGNFS function. Note that the smaller digit-counts are tuned with comparison to a fairly old msieve; you may not want those bits, but I think the adaptive range selection is probably actually useful. Quote:
Quote:
I suggest you explore doing lpbr=29 with the 14e siever from as low as 140 digits; with my approach of aiming to do all the sieving below alim, that seems to be where the crossover lies. Though my database contains enough changes of strategy and changes of computer that it's pretty difficult to query conclusively. Last fiddled with by fivemack on 2011-06-22 at 18:35 |
|||
![]() |
![]() |
![]() |
#11 | ||
Nov 2007
Halifax, Nova Scotia
23·7 Posts |
![]() Quote:
Quote:
http://myweb.dal.ca/~dstaple/to_bchaffin.tgz The numbers were from wblipp's page of composites: http://oddperfect.org/composites.html Essentially all of those numbers could have been done with SNFS, but several already had factors divided out. Anyway, I factored them all using GNFS. I don't know if that changes things. |
||
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Estimating a sum over primes | brownkenny | Math | 6 | 2010-08-25 01:28 |
What is minimum RAM needed for 6 Core CPU for P-1? | odin | Hardware | 15 | 2010-04-18 14:22 |
Minimum/desired CPU specs for ECM factoring | Kaboom | PrimeNet | 10 | 2009-04-17 14:58 |
Msieve NFS minimum size | 10metreh | Msieve | 35 | 2009-04-02 19:14 |
More relations mean many more relations wanted | fivemack | Factoring | 7 | 2007-08-04 17:32 |