20070508, 21:30  #1 
(loop (#_fork))
Feb 2006
Cambridge, England
6,323 Posts 
Why only three large primes
ggnfs doesn't allow relations to have more than three large primes on either side. This means that, when I try reprocessing some relations against a smaller factor base, I lose a fair number of relations because they fail to factor fully, and end up with #rels << #primes and almost nothing coming out of the combiningpartialrelations phase.
From a brief look at xilman's 'MPQS with three primes' paper, I can't see why you can't have more than three large primes on each side, and have relations with three or more large primes acting as 'crosslinkers' in his polymer model of filtering. I was expecting reprocessing against smaller and smaller factor bases to give me matrices whatever happened, but denser and denser matrices so that N*W became quickly impractically large as N got small ... it appears that this isn't the case, I just get no effective combinations at all. Is this an artefact of losing enough relations from the #p<=3 filtering that I've gone well below the combinationexplosion threshold, or is there some subtler reason why large and small primes are so fundamentally different? 
20070508, 22:21  #2  
Tribal Bullet
Oct 2004
3·1,163 Posts 
Quote:
There's nothing special about the number of large primes that appear in the filtering stage; any number will do. If memory serves for QS with 3 large primes there was never really an explosion; the cycle creation rate was just very low and after a certain point the rate switched to becoming very high. The difference between the two rates was more pronounced when the factor base started off small. I don't know that anyone has looked systematically at the cycle behavior in this case, since it's equivalent to 5 large primes in relations and not just the 4 that papers describe. If it will help, msieve's filtering assumes an arbitrary number of large primes. Rational and algebraic large primes are always dealt with in a unified way, and the filtering bound is dynamically adjusted until the amount of excess is enough to form a reasonable matrix. It handles relations with 7+ large primes fairly regularly. I also meant to mention that with your C200 the large prime bound seems to be 30 bits, and that means a huge number of relations are needed to get to the explosion point. Your run has something like 60% excess with 67M relations, whereas Greg's C204 with 28bit large primes has something like 200% excess with 54M relations Last fiddled with by jasonp on 20070508 at 22:29 

20070508, 23:00  #3  
(loop (#_fork))
Feb 2006
Cambridge, England
6,323 Posts 
Quote:
What I'm running into with ggnfs, aside from seeing relations discarded because they have too many large primes, is the restriction on the number of relations that can be combined into any given cycle ... I get things like Code:
There are 20609 relations with 0 large primes. There are 7212 relations with 1 large primes. There are 435327 relations with 2 large primes. There are 269734 relations with 3 large primes. There are 87287 relations with 4 large primes. There are 13568 relations with 5 large primes. There are 718 relations with 6 large primes. Quote:
Quote:
For the C136 that I'm sieving at the moment, I did preruns with various sizes of large prime and of factor base, and optimised for [time per relation] * [2^lpbd] on the principle that pi(2^30) was near enough 2*pi(2^29) to make little difference, and thinking that the size of the factor base was a secondorder effect in comparison to couponcollecting enough primes to reach the explosion point. Ended up with 29bit large primes (28 was nearly as good, 30 significantly worse) and spbd=9e6, 18 relations/second/CPU and aiming for ~30 million relations in about three CPUweeks. I did the same sort of preruns for 2^8411 out of curiosity (yes, I realise this will take a couple of CPUyears to latticesieve, and I can't afford that many CPUs, but I've only wasted three days); there it looked as if lpb=2^29, smallprime=2^24 would optimise the sieving time; that seemed a very small smallprime bound, which is why I started reprocessing data on a smaller example to see what happens if the smallprime bound was too small. I suspect that after the two CPUyears I would end up with an *impossibly* dense matrix. I'm trying to get some idea of what the densitysmallprimebound curve actually looks like. There may be a good paper which talks about all of this, but I haven't seen it yet; the reallybig GNFS runs have produced only onepage 'look! parameters! factors!' papers, I'm not sure I've ever seen a really clear justification for the bounds chosen for a large factorisation. 

20070509, 02:26  #4  
Jun 2005
lehigh.edu
1024_{10} Posts 
Quote:
Not to be picky, but if you check the title page you'd see xilman, Arjen Lenstra, bdodson and Sam (just of the coauthors I recall ... Moffet, maybe?). While the credit's certainly mainly Paul's (and the writing as well), there was a point early in the computation where Arjen and I helped assure that the computation was feasible (iirc). bd 

20070509, 06:42  #5  
Tribal Bullet
Oct 2004
3×1,163 Posts 
Quote:
If you're willing to do a recompile, you can hardwire the value of the variable 'filtmin' in gnfs/filter/filter.c to force the initial bound on large ideals; otherwise the code sets filtmin to a weighted average of all the ideals encountered when relations were read in during duplicate removal. If it turns out there's a lot of excess with that choice, filtmin is reduced 10% at a time and the singleton and clique processing is rerun until there's ~8% excess. 

20070509, 12:21  #6  
Nov 2003
16100_{8} Posts 
Quote:
computers even 3 large primes is problematic. Suppose we have a norm of a lattice point: f(a,b). We want f(a,b) = II p_i A1 A2 A3 where p_i are in the factor base and A1, A2, A3 are outside, so A1 > max(p_i) and we have C = A1 A2 A3 < max(p_i)^(3 + epsilon) for some selected epsilon. You will find that for numbers we can do today that you will find a LOT of such C. Most of them will get rejected because C will split into (say) q1 q2 where q1 and q2 are beyond the large prime bound, or else C itself will be prime, or else C = r1 r2 r3 where r1 is just barely beyond max(p_i) and r2 and r3 are too big. Note also that once factor base primes are sieved, you must try to factor the candidates you suspect are smooth by resieving. When you have a LOT of such candidates you will find memory requirements for the sieve code will increase by quite a bit. The net result is that you will spend much too much time finding, processing and *rejecting* these false positives. For SNFS my code already implements 3 large primes for the linear side (one, of course, is a special q). For the algebraic side, I estimate that 3 large primes do not become effective until one starts using a septic polynomial. For degree 5 and 6, 3 large primes are not effective. Note also that splitting C = A1 A2 A3 is aso much more time consuming than splitting two primes. Degree 7 does not become effective until about 250 digits. 

20070509, 13:47  #7  
Tribal Bullet
Oct 2004
3·1,163 Posts 
Quote:
The difficulty of resieving would be manageable; if you have really so many sieve reports that you have a ton of resieving to do, it still saves loads of time to resieve the sieve reports in blocks. There are worse problems to have than too many reports out of the sieve :) 

20070509, 14:06  #8  
Nov 2003
2^{6}×113 Posts 
Quote:
ideals during filtering"??? "systematically deal with all of them...."????? Filtering is *easy* and takes little time relative to sieving or the LA. And the amount of resieving has NOTHING to do with the number of reports. The resieving work is fixed by the size of the factor base and the size of the sieve region. What does increase is the *memory* needed to store the primes when resieving detects that a prime has hit a lattice point that is a smooth candidate. I outlined the costs associated with a wrongsized factor base in my recent paper. The cost of a factor base that is too large grows linearly with the size of the factor base. The cost of one that is too small grows faster than exponentially. The cost of dealing with large primes is *negligible* unless you use too many of them [with respect to the size of the norms]. As numbers grow larger it is worthwhile to use more large primes. It is almost certainly worthwhile to use 3 large primes on each size for (say) RSA768. For SNFS, 3 large primes on the algebraic side do not become worthwhile until one starts using degree 7. For degree 5 or 6 you get way too many false positives. 

20070509, 16:49  #9  
Tribal Bullet
Oct 2004
3489_{10} Posts 
Quote:
It's a sign of my ignorance that I don't consider filtering to be easy at all :) Quote:
Last fiddled with by jasonp on 20070509 at 16:53 

20070509, 17:08  #10  
Nov 2003
2^{6}×113 Posts 
Quote:
the distinction between factor base primes and 'large primes' TOTALLY VANISHES. What one has is relations consisting of the product of primes. PERIOD. Filtering gathers those primes together, combining matching primes in different relations while preserving sparseness. It doesn't care what kind of primes they are. Using a smaller sized factor base will increase sieve time faster than exponentially. The added cost of rejecting false hits will only exacerbate this problem. Using a factor base that is too small is VERY EXPENSIVE. Using a too small factor base while trying to compensate with additional large primes will be a DISASTER. 

20070509, 17:21  #11 
Tribal Bullet
Oct 2004
3×1,163 Posts 
I understand that (you can't code it up and not realize that). But I think it's valid to ask if there are ways to tweak NFS parameters that speed up the algorithm for input sizes of interest.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Where do I send my PRP primes with large k?  Trilo  Riesel Prime Search  3  20130820 00:32 
48bit large primes!  jasonp  Msieve  24  20100601 19:14 
lots of large primes  Peter Hackman  Factoring  2  20080815 14:26 
NFS with 5 and 6 large primes  jasonp  Factoring  4  20071204 18:32 
What is the use of these large primes  Prime Monster  Lounge  34  20040610 18:12 