mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2009-05-07, 06:50   #1
jrk
 
jrk's Avatar
 
May 2008

100010001112 Posts
Default ggnfs sieving yield wierdness

Has anyone noticed this? Using gnfs-lasieve4I13e siever*, sieving smaller ranges in separate instances causes the siever to find more relations than if one big range is sieved.

This is confirmed by re-sieving over a small range at the end of the original range and observing that new relations are found that are missing in the original output. In the case shown in the graph below, this increase is about 25%, but the increase seems to be related to the length of the original range.

So what's happening? Is this expected? Does it have anything to do with duplicates being found?

* tested with both the regular and "experimental" versions.
Attached Thumbnails
Click image for larger version

Name:	siever-output-graph.png
Views:	155
Size:	12.4 KB
ID:	3635  

Last fiddled with by jrk on 2009-05-07 at 07:03
jrk is offline   Reply With Quote
Old 2009-05-07, 07:08   #2
smh
 
smh's Avatar
 
"Sander"
Oct 2002
52.345322,5.52471

29×41 Posts
Default

I think this is because you're probably sieving below the factor base. The siever automatically lowers the factor base for you.

If you're resieving the first part of that range, you probably won't notice any difference.
smh is offline   Reply With Quote
Old 2009-05-07, 07:11   #3
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

2×17×73 Posts
Default

Quote:
Originally Posted by jrk View Post
Has anyone noticed this? Using gnfs-lasieve4I13e siever*, sieving smaller ranges in separate instances causes the siever to find more relations than if one big range is sieved.

This is confirmed by re-sieving over a small range at the end of the original range and observing that new relations are found that are missing in the original output. In the case shown in the graph below, this increase is about 25%, but the increase seems to be related to the length of the original range.

So what's happening? Is this expected? Does it have anything to do with duplicates being found?

* tested with both the regular and "experimental" versions.
Is this from the c138 of the 4th aliquot team sieve? When you start sieving, alim (algebraic factor base bound, IIRC, which is 9M in the input file) will be set to (start of the range -1). With alim as low as e.g. 3.2M, the yield will drop quickly with growing special Q.
When you do everything in one big range, you will still sieve with alim=3.2M for Q=5M. When you do many small ranges, alim will always be "just below Q", thus yielding more relations.
Andi47 is offline   Reply With Quote
Old 2009-05-07, 07:16   #4
jrk
 
jrk's Avatar
 
May 2008

3×5×73 Posts
Default

Quote:
Originally Posted by smh View Post
I think this is because you're probably sieving below the factor base. The siever automatically lowers the factor base for you.
You're right, the limit is automatically lowered. Shouldn't the limit then be kept in step with the current Q being tested? Maybe that can be a RFE for the maintainers?

Over a big range like that above, the difference can be significant!

Quote:
If you're resieving the first part of that range, you probably won't notice any difference.
You are right again. Like I mentioned, retesting smaller ranges shows that the difference increases as the original range increases. The difference increases gradually and is largest at the end of the original range tested.
jrk is offline   Reply With Quote
Old 2009-05-07, 07:21   #5
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by Andi47 View Post
Is this from the c138 of the 4th aliquot team sieve? When you start sieving, alim (algebraic factor base bound, IIRC, which is 9M in the input file) will be set to (start of the range -1). With alim as low as e.g. 3.2M, the yield will drop quickly with growing special Q.
When you do everything in one big range, you will still sieve with alim=3.2M for Q=5M. When you do many small ranges, alim will always be "just below Q", thus yielding more relations.
You are exactly right.

I think the alim/rlim should be increased continually with the Q being tested, so that relations are not "lost." Is that possible to do in the program? Serge?

A workaround would obviously be to split up all ranges below alim/rlim into smaller chunks and merging them afterward.
jrk is offline   Reply With Quote
Old 2009-05-07, 07:47   #6
smh
 
smh's Avatar
 
"Sander"
Oct 2002
52.345322,5.52471

29×41 Posts
Default

Just wondering. How did you create the graph?
smh is offline   Reply With Quote
Old 2009-05-07, 08:09   #7
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by smh View Post
Just wondering. How did you create the graph?
cat relations.out | sed -e 's/.*,//g' | ./graph1.pl

graph11.pl is just a hack-ish perl script I put together:

Code:
#!/usr/bin/perl

$start = 3200000;
$end = 5000000;
$inc = 5000;
$steps = ($end - $start) / $inc;

@y = ();
while(<STDIN>)
{
	chomp;
	#input is represented in hexadecimal
	$x = hex;
	if($x >= $start && $x <= $end)
	{
		$x = int(($x - $start) / $inc);
		
		$y[$x]++;
	}
	else
	{
	}
}
for($i = 0; $i < $steps; $i++)
{
	$p = $start + $i * $inc;
	print $p . ' ' . $y[$i] . "\n";
}
then put the output in openoffice calc to make the graph (overkill...).

I made the assumption that the Q value is at the end of the relation line. But actually it is not always. So I throw away the lines where the last value is out of the range, and if the value is in range I assume it is a Q value. Figured it was close enough for my purposes anyway.
jrk is offline   Reply With Quote
Old 2009-05-07, 09:00   #8
smh
 
smh's Avatar
 
"Sander"
Oct 2002
52.345322,5.52471

29×41 Posts
Default

Thanks.

I'm new to Calc and i can't seem to get a nice looking graph in calc.

What type of graph do you use? Also, the x and y axes are always the other way around.
smh is offline   Reply With Quote
Old 2009-05-07, 09:15   #9
jrk
 
jrk's Avatar
 
May 2008

3×5×73 Posts
Default

I used an "area" graph. And you need to check "first column as label" under "data range".
jrk is offline   Reply With Quote
Old 2009-05-07, 09:54   #10
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

2·4,787 Posts
Default

Quote:
Originally Posted by jrk View Post
You are exactly right.

I think the alim/rlim should be increased continually with the Q being tested, so that relations are not "lost." Is that possible to do in the program? Serge?

A workaround would obviously be to split up all ranges below alim/rlim into smaller chunks and merging them afterward.
Well, this thing is rooted deep. Changing alim (or rlim) doens't merely change a number or two, -- no, it affects many (practically all?) core structures of the program, so for the purposes of the siever (as it is now) the FB_limits (also known as rlim/alim) are constants and set up once. Re-writing around this paradigm is hard and prone to error. Not just a few lines. (There are many places in the code where no asserts are made, for the speed sake. It is brittle in this sense.)

The workaround is indeed what everyone is doing: small ranges. but not too small, because when the siever starts, it spends some time preparing all the above mentioned housekeeping items. This overhead will negate all potential gains, if the intended region is cut too fine. Cut in 10 ranges or so and you will do well.
Batalov is offline   Reply With Quote
Old 2009-05-07, 17:41   #11
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by Batalov View Post
Well, this thing is rooted deep. [...]
I suspected as much. Thanks for the info.
jrk is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
CADO-NFS and GGNFS sieving ray10may Msieve 1 2017-04-14 14:19
SNFS polynomial with very low yield ryanp Factoring 9 2013-04-03 06:33
1870L : Yield Rate R.D. Silverman Cunningham Tables 44 2010-10-21 14:32
Resume sieving in GGNFS nuggetprime Factoring 5 2007-06-04 14:42
Benchmarks wierdness delta_t Hardware 3 2004-06-24 13:35

All times are UTC. The time now is 09:23.


Tue Oct 26 09:23:18 UTC 2021 up 95 days, 3:52, 0 users, load averages: 1.96, 2.06, 2.07

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.