mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2010-08-20, 16:10   #1
FactorEyes
 
FactorEyes's Avatar
 
Oct 2006
vomit_frame_pointer

23·32·5 Posts
Thumbs up 6,353+

This C178 GNFS ran a little faster thanks to jrk's suggestion that we try triple large primes on the algebraic side:

n:
Code:
3495720155744160390308702080911694892393254003819177989194364433392602119210358965833430295064663146579080152295928278131537398607330890608815038143831212204077824454760817944697
# norm 1.828473e-17 alpha -7.945069 e 9.914e-14
skew: 5242872.26
c0:  1415684462271305089012313057623015234842880
c1:  56489085491552175398550702459917716
c2: -355405802142076475701272154680
c3: -30573939026244353965597
c4:  18914955682611026
c5:  637985160
Y0: -5594293033634964367018772799829077
Y1:  10914669743009582011
rlim: 60000000
alim: 60000000
lpbr: 31
lpba: 31
mfbr: 62
mfba: 90
rlambda: 2.6
alambda: 3.6
This seems to buy around an 8% improvement in speed for GNFS jobs of this magnitude. The individual Q values require more time, but we generate more relations per Q.

The C178 split as P68.P110:

Code:
P68: 76772819749620922859599019064869997729081875556728878881833848945739
P110: 45533304197302470775235834478317413598842411794792485571017691246712640554863455599919842904873592456844389323
FactorEyes is offline   Reply With Quote
Old 2010-08-22, 19:05   #2
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,069 Posts
Default 3LP sieving: memory and speed savings!

Quote:
Originally Posted by FactorEyes View Post
This C178 GNFS ran a little faster thanks to jrk's suggestion that we try triple large primes on the algebraic side:
If you still have it, please post the msieve filtering log. I'm interested to know how it progressed.
frmky is online now   Reply With Quote
Old 2010-08-27, 02:37   #3
FactorEyes
 
FactorEyes's Avatar
 
Oct 2006
vomit_frame_pointer

23·32·5 Posts
Cool 6,353+ log

Quote:
Originally Posted by frmky View Post
If you still have it, please post the msieve filtering log. I'm interested to know how it progressed.
Okey-dokey.

I sieved with 15e, on the algebraic side, 24M < Q < 164M. Jayson had the inspiration for the triple large prime approach a few days into the sieving, so about 17% of relations came from a standard double large prime approach with higher factor base limits (rlim=alim=120000000).

Per Jayson, a better range might have been 40M through 180M or so. I am mostly swayed to this viewpoint, but I can't stand to watch the meager yields at the higher end. Low-Q sieving is one of many things I don't understand about this business.

Log attached.

6P353.log.gz

BTW, Greg: Thanks for posting all your logs - they have been educational, so I'm happy to return the favor.

Last fiddled with by FactorEyes on 2010-08-27 at 03:18
FactorEyes is offline   Reply With Quote
Old 2010-08-27, 03:21   #4
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

32·17·61 Posts
Default

It is a bit difficult to run the whole project of this size twice (for comparison vs. 2LP), but would it be possible to devise a data filter that would emulate (not necessarily precisely, but somewhat close) the removal* of lines with strictly 3LPs? It would be interesting to do that because this line:
"Thu Jul 22 23:59:04 2010 found 49,137,133 duplicates and 173,981,192 unique relations"
appears to indicate a bit higher than the usual redundancy rate. But I could be totally wrong.

So it would be interesting to see if a simulated equivalent 2LP set would have less redundancy. If it would have 8% less redundancy, then it would look like a wash, right?

3LP is an impressive device, and it is not my intent to disrespect it. It deserves an extended postmortem, what do you think? You saved the dataset, right? You could dissect it in a number of ways with some perl scripts.
________

*I sieved a bit with 3LP and the usual 2LP, and the sieving result with the latter is always nested inside the result of the former. So it could be reversed, for simulation purposes.

Last fiddled with by Batalov on 2010-08-27 at 03:27
Batalov is offline   Reply With Quote
Old 2010-08-27, 03:37   #5
FactorEyes
 
FactorEyes's Avatar
 
Oct 2006
vomit_frame_pointer

5508 Posts
Default

Quote:
Originally Posted by Batalov View Post
3LP is an impressive device, and it is not my intent to disrespect it. It deserves an extended postmortem, what do you think? You saved the dataset, right? You could dissect it in a number of ways with some perl scripts.
Sure could.

The duplication rate did look high. When I saw it, I attributed it to the relatively small factor base limits, as if that excused it.

I can tell you that this took just under 10% less CPU time than I was expecting for this number. It was back-o'-the-envelope, and I threw away the envelope, but a C172 I ran (2LP) took 31 days of sieving on all my resources. Assuming a doubling of sieving time for 5 extra digits - crude, I know - puts a C178 at around 31*2^(6/5) = 71.2 days. I finished the sieving in around 66 days. Figuring that the first 24M or so Q were sieved with 2 large primes puts the speed improvement at around 8-9%.

In summary, it seems to be faster.

Another thing, for all those hesitant to join group factorizations due to memory constraints: this 3LP stuff takes 40% less memory.

EDIT: A lot of this is, obviously, pretty crude estimation on my part. By way of adding some more impressionistic and imprecise thoughts, it seemed as if block Lanczos took a bit longer than I would expect for a 31-bit GNFS job, but this is just an impression.

Last fiddled with by FactorEyes on 2010-08-27 at 03:41
FactorEyes is offline   Reply With Quote
Old 2010-08-27, 05:05   #6
FactorEyes
 
FactorEyes's Avatar
 
Oct 2006
vomit_frame_pointer

23×32×5 Posts
Default Duplication rate

I just ran through my notes on other factorizations: 49137133 duplicates in 222996222 relations seems about right for a 31-bit GGNFS factorization - right around 22%.

EDIT: A local example is the fivemack project for 2^877-1, also a C178 31-bit GNFS.

Code:
Wed Sep  9 11:47:24 2009  found 46267759 hash collisions in 226161090 relations
Wed Sep  9 11:51:08 2009  found 46475021 duplicates and 179686069 unique relations

Last fiddled with by FactorEyes on 2010-08-27 at 05:13
FactorEyes is offline   Reply With Quote
Old 2010-08-27, 05:44   #7
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,069 Posts
Default

I think the most remarkable thing about this log is that there's nothing in it that's remarkable. 20% dupes doesn't seem excessive at this size. It required the usual minimum number of relations. The filtering progressed smoothly. The matrix is a bit large, but that's unsurprising simply because there was little oversieving.

Quote:
Originally Posted by FactorEyes View Post
Another thing, for all those hesitant to join group factorizations due to memory constraints: this 3LP stuff takes 40% less memory.
That's what has caught my interest! I don't really care about a 10% change in runtime either way; that's a few days in a month-long sieving project. But saving significant memory by lowering the factor base size yet still capturing many (most?) of those relations by allowing a third large prime is very attractive!

I noticed that you cut the fb limit in half. Was this optimal, or did you just try it and find it to be good enough? Why did you lower the FB limit on the rational side as well even though you still used 2 large primes on that side?
frmky is online now   Reply With Quote
Old 2010-08-27, 07:04   #8
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

247516 Posts
Default

Just in case, here's a script that strips 3LP relations and leaves a simulated 2LP-output (I've verified on a small real set):
Code:
#!/usr/local/bin/perl -w
 
#--a script to skip 3LP relations
use constant SIDE => 2; # or 1 if sieving was -r
use constant ALIM => 60000000;
 
while(<>) {
  my @t = split ':';
  my @s = split(',',$t[SIDE]);
  $s[$#s] =~ s/\s+$//;
  $q0 = hex($s[$#s]);
  $q0 = ALIM if ($q0 > ALIM);
  for($i=$n=0;$i<$#s;$i++) {
    $n++ if (hex($s[$i])>$q0);
  }
  print $_ if $n<3;
}
It is well expected that if you pipe the existing data through it, there will be not enough relations for the matrix, but what is going to happen to redundancy?
No need to filter, you can instead remdups the original one and the stripped one, what redundancy will you observe? Just curious.

(I am not saying if it is going to be one way or another.)
_________

...And it will probably be inconclusive - smaller set will have lower redundancy. Sure. If someone was inclined to run the equivalent 2LP test, then this stripped set would probably amount to 60% of needed relations and 40% more 2LP sieving is needed to get the same volume of relations and then check the redundancy. That would be a fair test. Hmmm...

Could you pipe and prepare the "2LP" set, and wc -l it, then head -<same size> full.set > 3LPsmall.set; and then remdups these two sets. At least they will have the same initial relation count. Could be fun.

Last fiddled with by Batalov on 2010-08-27 at 08:07
Batalov is offline   Reply With Quote
Old 2010-08-27, 08:23   #9
jrk
 
jrk's Avatar
 
May 2008

21078 Posts
Default

I think Al mentioned already, the first few million Q were sieved with 2lp parameters, so those should definitely be omitted from the comparison. Or else you will get a wrong picture.

The reason I think it is better to start sieving with relatively higher Q is because the smaller Q will tend to produce more duplicates than do higher Q. For instance, if a sieving run goes from Q=20M to 200M, likely more than 2/3 of the relations near Q=20M will end up being duplicates (this is from looking at some relations and counting how many have large primes in Q range). So factor in a 2/3 loss of yield for those first few relations you find. It will probably slow you down.

I don't know how to determine the most optimal sieving range for Q but a gut feeling is that if the estimated range is so big that Q_max is more than, say, 5 times Q_min, then the range should be moved some more.

Quote:
Originally Posted by frmky View Post
I noticed that you cut the fb limit in half. Was this optimal, or did you just try it and find it to be good enough? Why did you lower the FB limit on the rational side as well even though you still used 2 large primes on that side?
Gut feeling led to halving the fb limit. The trials between 40M, 60M and 80M suggested that 60M was more optimal for 3lp. For 2lp, 120M was optimal. We didn't try using different fb limits on each side, but remained content that things were still working well enough even when both limits were moved (i.e. the yield was still higher). 3lp on rational side was too slow so it was left at 2lp. Note that rsa768 did similar, having different cofactor bounds on each side.

My reaction to this result is:

(assuming we're sieving on algebraic side)

If I've understood right, a small special Q lowers the algebraic norm (and increases rational norm) by a smaller amount than a large special Q does. So it is harder to factor algebraic norms with smaller special Q. In fact it seems that the ideal thing to do would be to have a larger algebraic fb for small Q and a smaller algebraic fb for large Q, to accommodate this difference (and just the opposite for rational fb). But this introduces its own problems (w.r.t. duplication), and GGNFS siever by default does not allow the fb limit on the sieving side to be greater than Q (it can be changed, of course).

So 3lp is a sort of compromise which allows for more work to be done on the algebraic norms at small Q without having to increase the fb limit for them. This is why I think that the boost in yield is greater for the small Q, less so for larger Q. In fact it may be optimal to actually switch back over to 2lp when a certain Q has been reached, but this wasn't tried.

I also wondered if having the additional large primes floating around might help build cycles faster/earlier, since it provides more coverage over the large ideals. But I'm not knowledgeable enough to make this determination.
jrk is offline   Reply With Quote
Old 2010-08-27, 20:51   #10
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

221658 Posts
Default

I've separated the sub-thread so it would be easier to find as a technique guide (rather than in 6+ as a specific case).

It can be probably moved to Factoring!
Batalov is offline   Reply With Quote
Old 2010-08-27, 23:11   #11
FactorEyes
 
FactorEyes's Avatar
 
Oct 2006
vomit_frame_pointer

36010 Posts
Post Ignorance <> bliss

Quote:
Originally Posted by jrk
I also wondered if having the additional large primes floating around might help build cycles faster/earlier, since it provides more coverage over the large ideals.
All the action in filtering is about the large primes. Working against 3LP relations is the need for more of them to produce a cycle: for example, using just 3LP relations, at least 3 will be needed for the smallest cycle. In favor of 3LP relations is that, at this magnitude the cycles are so complicated, and there are so many large primes floating around, that the smaller large primes become part of a larger version factor base, so that many 3LP relations can really be thought of as 2LP relations.

This, anyway, is how I conceive of it in my fevered imagination. I probably have no idea what I'm talking about.

Quote:
Originally Posted by Batalov
$s[$#s] =~ s/\s+$//;
Hahahaha - I love Perl! That trims off trailing spaces: right?

I'll run this stuff in the next few hours and report back. It might be worth running this one to completion with 2LP relations - chucking out the 3LP relations - but the principal complication here is that 2LP sieving should use larger (120M) FB bounds, which further complicates the comparison, as we will miss some relations that we would have caught had we done the whole thing with that larger f.b. Got that?
FactorEyes is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Kaby Lake Memory Speed airsquirrels Hardware 12 2017-06-22 14:48
"Hybrid Memory Cube" offers 1 Tb/s memory bandwith at just 1.4 mW/Gb/s ixfd64 Hardware 4 2011-12-14 21:24
Different Speed in different OS's Dubslow Software 11 2011-08-02 00:04
Line sieving vs. lattice sieving JHansen NFSNET Discussion 9 2010-06-09 19:25
Combined Sieving speed Joe O Prime Sierpinski Project 7 2006-09-22 09:34

All times are UTC. The time now is 09:37.

Mon Mar 8 09:37:25 UTC 2021 up 95 days, 5:48, 0 users, load averages: 1.15, 1.14, 1.18

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.