20111013, 05:46  #1 
May 2011
23 Posts 
Size optimization
I have noticed that for larger problem like RSA768 or bigger, the polynomial selection has a very high tendency to report negative skewness in the initial size optimization. Is this to be expected when we move to a larger composite?

20111013, 10:45  #2 
Tribal Bullet
Oct 2004
3^{2}·5·79 Posts 
That's not expected at all; can you give an example?

20111014, 07:59  #3 
May 2011
23 Posts 
Using v1.49 Msieve, I got this result using a RSA768 stage 1 hit during initial optimization. The skew is quite close to 0 after the routine.
Code:
norm 6.9891978e+34 skew 7.533692 95874395962566283976879517727145211059 +2122472720518479292507 +87840551615038595739331407354941471553 35337242809776485267112309864495751433 3856114479612131191693344932738535682 +1020669703390328498616179428060004185 1563710833682459157669146 456 +1584 Code:
12 3535308466027048639492187 570107532450270477086892611321876406536982348 norm 1.5492085e+43 skew 3.243352 570107532450270477086892611321876406536982348 +3535308466027048639492187 442512866685373337176887522011622592629342461 +1783619376608838135560536234346985353366253293 +130308875328793669965090832180115584266425892 232742995070824773894200971356601413407818738 +123811301456959059442938409267 +6 +12 Code:
norm 1.0000000e+50 skew 21968.720494 1497020286335533624272472469866417926156994242006493 +3060129714738514697695981 61512974071785246992630097516374839936951467724821651725 +3224769695783746561958228856476108502074711656233260565 41608949586908930017300382371452135437782031835809457 123217001150933869214201846566669926297110497087536 2518560936318891493017569856712661 2683 +12 norm 1.0000000e+50 skew 142410668.715203 1333693454126410609876682984700939586082833248126732 +4154713801963641711545561 1044189187320323999592363385700694274967127262451502777716728767989 +348063062440139044513995512405594594231066311863890859490518423312 7867305474845877618631173929204554872401418193374191 +490468982860015174126338691966369496986648447647714 3287287660421733701627942163036395 433 +24 So I was wondering if for very large composite there is a greater chance of occurance in getting a negative skew during size optimization. 
20111014, 10:53  #4 
Tribal Bullet
Oct 2004
3^{2}×5×79 Posts 
I've never tried to get poly selection to work for > 768bit inputs, I just assumed it would be broken :) It's on my todo list to fix.

20111014, 16:30  #5 
Tribal Bullet
Oct 2004
DE3_{16} Posts 
Actually, there have been many changes since v1.49 was released that hopefully make polynomial selection for these large jobs behave better. Can you retry with the latest SVN trunk?

20111015, 15:08  #6 
Tribal Bullet
Oct 2004
3^{2}·5·79 Posts 
Okay, I fixed a few problems in the size optimization (the code's notion of 'infinitely large size' needed updating to cope with 1024bit inputs :)
Try the new latest SVN and see if that works better. By the way, what did you do to get those 1024bit stage 1 hits, and how long did it take? 
20111017, 06:49  #7 
May 2011
27_{8} Posts 
I see... Thanks! I will try it now=) Hmmm I think it takes about 2.5hr per hit on average. I was using the SVN version and ran it on my P4 desktop. I have added a 309 digit entry using 1.5E+40 as my bound but it's just a random guess:P
Also is there any reference I can look at for the size optimization algorithm? I would like to read up on it:) By the way the root sieve implemented in Msieve seems to have a lot of changes from the one described in Murphy's thesis, especially for degree 6. Is there a new algorithm? 
20111017, 13:41  #8 
Tribal Bullet
Oct 2004
DE3_{16} Posts 
There's no documentation for any of it. Look for slides from the CWI winter workshop, where I'll be giving some details. The short answer is that the algorithm is the same as described by Murphy, but modified to sieve over lattices instead of lines. With degree 6 (and even the larger degree 5 problems), the root sieve would take forever if you didn't limit the search space somehow.
Here's an excerpt from my reply to someone else who asked about this: Code:
To determine the bounds on the size of a rotation coefficient, the code chooses a multiple of the bound on the L2 norm of the polynomial and then looks at how large the coefficient can get before the polynomial size exceeds that bound. This is tricky since the skew and translation can be adjusted to delay the inevitable. Different bounds on the norm can be tried successively so that small rotations that increase the size only a little can be mixed in with larger rotation coefficients that affect the polynomial size more. The code you were reading proceeded one J1 at a time, found the bound on J0 and ran the root sieve on that J0 'line', continuing to the next J1 until the J0 line size became too small. In that way the code could rigorously explore even oddly shaped (i.e. rotated elliptical rather than rectangular) search spaces. That worked fine until people started needing huge root sieve problems to be solved, and the code was taking hours for a single one. Also, degree 6 problems have such a huge search space that it is infeasible to rigorously search even the smallest problems in this way. The updated root sieve works in stages now; the general idea is to choose a lattice size L that is the product of (powers of) many small primes, and to sieve over the J0(i) = R + i*L, where R is a residue class mod L that happens to have many roots modulo the primes dividing L. In this way we limit the sieve to rotations that have a good root score modulo the smallest primes, then find a subset that also have a good root score modulo larger primes. For degree 4 we only need to do the root sieve over J1=0 and a few R lattices. For small degree 5 problems we can vary J1 over its entire range and save the highestscoring few R lattices, then lattice sieve over those. For degree 6 and the larger degree 5 problems, this is still too much work, so L also has to be broken up into several pieces. For large degree 5 problems we can sieve over an L0 x L0 grid and find the highestscoring residue classes R0, then choose a grid size L1 coprime to L0 and find the highestscoring residue classes R1 + i*L1, with the constraint that all such lattice points fall on R0 + (J0*j+J1*k)*L0 for one of the R0 from the first stage. In practice L0 is the product of the first few (powers of) small primes and L1 is the product of the next small primes. The result is a heap of the bestscoring residue classes R2 modulo L0*L1, where R2 is formed via the chinese remainder theorem, and each residue class modulo L0*L1 is sieved in the conventional way. This method hopefully still finds the best rotations but is L0 times faster than searching each J1 value with a lattice size of L0*L1. Degree 6 treats L as L0*L1*L2 but otherwise works the same way. Here it is important to allow as much precomputation as possible, because the middle stage has to search an L1 x L1 grid for each of the highscoring residue classes in the first L0 x L0 x L0 grid. Once the actual lattice sieve finds a highscoring rotation, it tries to vary the translation and skew to reduce the size of the resulting polynomial. The rotation then gets a score that combines the size and root scores, and goes into a heap that remembers the top 200 such rotations. After the lattice sieving finishes, any rotations in the heap are applied to the actual polynomial and the translation and skew are varied to optimize the Murphy score. Last fiddled with by jasonp on 20111017 at 14:04 
20111017, 19:39  #9 
Sep 2009
11×89 Posts 
In less than 2h of CPU time on one core of Core 2 Duo T7200 (2 GHz), msieve r646 has found ~2000 polynomials (!) for RSA1024.
At the beginning, msieve wrote "expecting poly E from 1.11e21 to 1.28e21". The best e score is # norm 4.371900e24 alpha 9.729326 e 1.702e22 rroots 4 About one third of the polynomials have skew 1001.00 or 1002.62, and e scores around ~1e23. 
20111017, 22:08  #10 
Tribal Bullet
Oct 2004
110111100011_{2} Posts 
The 'expecting poly' line was recently added by Serge, and is a complete extrapolation for inputs this large. Likewise, my sieving bounds are pretty loose so that a lot of hits get generated. I've also walked through some of the choices that stage 1 makes when initializing, and it appears to cut off a lot of the potential search space because stage 1 was specialized to top out to 32bit numbers a while ago. Running stage 1 for 1024bit GNFS really needs key pieces of support code updated to handle 48 to 64bit numbers, and that's quite tricky.

20111018, 01:02  #11 
May 2008
3·5·73 Posts 
The restriction means that we need m0/a[d2] is at most about 2^64. This would be easier to achieve by switching to degree 7, which would likely yield better polynomials for rsa1024 anyway.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
gcc optimization/intrinsics  R.D. Silverman  Factoring  12  20150915 08:51 
Program optimization  henryzz  Programming  17  20120926 13:21 
Possible optimization for GIMPS  alpertron  Math  3  20120813 16:08 
NFS Optimization  R.D. Silverman  Factoring  91  20100124 20:48 
ASM Optimization  Cyclamen Persicum  Hardware  4  20040526 07:51 