mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Cunningham Tables

Reply
 
Thread Tools
Old 2010-10-20, 21:38   #34
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

164448 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I must be missing something in the bucket code. If I read the code correctly, most of the global variables are 1MB at most. The only big memory chunk is allocated via malloc using the MEM_FOR_POINT_UPDATES #define which can be adjusted up or down to suit your needs.

What is MEM_FOR_POINT_UPDATES set to? Which other bucket globals take up significant space?
It is this define that is causing the problem.

For a 10K x 20K lattice, it is set at 100 * 6400 * 1024. If I crank it
down, I get core dumps because the bucket code starts to overwrite
locations that it shouldn't.

For a 9K x 18K lattice, it is set at 81 * 6400 * 1024
For 8K x 16K it is set at 64 * 6400 * 1024

If I reduce it, I get core dumps.
R.D. Silverman is offline   Reply With Quote
Old 2010-10-20, 21:53   #35
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·7·269 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
It is this define that is causing the problem.

If I reduce it, I get core dumps.
Oh, we have a bug in the bucket code. The code is supposed to handle the case where all the points don't fit in the allocated memory.

If you can't find and fix the bug, I can look at it next week. I may need you to package up a sample case and the current code. My copy of your code is 5 years old.
Prime95 is offline   Reply With Quote
Old 2010-10-20, 22:28   #36
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

11101001001002 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
It is this define that is causing the problem.

For a 10K x 20K lattice, it is set at 100 * 6400 * 1024. If I crank it
down, I get core dumps because the bucket code starts to overwrite
locations that it shouldn't.

For a 9K x 18K lattice, it is set at 81 * 6400 * 1024
For 8K x 16K it is set at 64 * 6400 * 1024

If I reduce it, I get core dumps.
Here is a memory breakdown.... 10K x 20K sieve area (broken into two pieces)
6 million ideals

factor base primes and their roots 6M x 4 x 2 = 48Mbytes
storage for primes when resieving 24M
reduced lattices for ideals 16bytes/ideal 96M
logs of primes 6M
sieve array 105M
memory for buckets 655M
storage to record success locations after
algebraic sieve (I've seen as many as 45M
potential successes for small q on a quartic) 180M
(i can reduce this for larger q's) it could also
be reduced by sieving the rational side FIRST.
total: 1.1G

(plus incidentals)

I can get rid of the storage of success locations
entirely if I do a time/space tradeoff. After I sieve
the algebraic side, I scan the sieve region, looking
for potential hits. Because algebraic norms are small,
there are a LOT of them. Then, after sieving the rational
side, I scan the algebraic success points only. By scanning
all of the points, I could do away with this array entirely.

For sextics, this array is much much smaller.
R.D. Silverman is offline   Reply With Quote
Old 2010-10-20, 22:43   #37
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

746010 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Oh, we have a bug in the bucket code. The code is supposed to handle the case where all the points don't fit in the allocated memory.

If you can't find and fix the bug, I can look at it next week. I may need you to package up a sample case and the current code. My copy of your code is 5 years old.
I've made a fair number of improvements since then as well. I will bundle
up the current code and input data and send it at your convenience.
Let me know when. I won't bother with the source code for creating the
factor base files. I will just send those as zipped binaries.

I also note the following. Greg did a sample run of the GGNFS siever
on my data. (same factor base sizes, LP bounds, sieve area etc.)
For q near 60M, he got 8.5 relations per q at an average rate of
about .65 seconds per relation. This is about 5.5 seconds per q.

Our sieve takes over 6 seconds by itself just do do the bucket sieve
on my fastest machine (3.2Ghz I5). My home PC is an i7 but it is only
2.6Ghz/core. Add the other overhead, and my siever is about 2.2
times slower than GGNFS for this quartic. This is one reason why I
want to try to rework the bucket sieve. I understand why it is slower.
Inherently, the Pollard approach has quite a bit more overhead in setting
up the sieve regions. There are some time/memory tradeoffs that
will save me 10-15%, but I have to bring down the memory footprint
first.
R.D. Silverman is offline   Reply With Quote
Old 2010-10-20, 22:45   #38
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

746010 Posts
Default

Quote:
Originally Posted by frmky View Post
Yep, that's low. Using
Code:
n: <the number>
skew: 1.41
c4: 1
c3: -2
c2: -6
c1: 12
c0: -4
Y1: 2^93
Y0: -(2^187+1)
alim: 14500000
rlim: 86000000
lpba: 30
lpbr: 31
mfba: 60
mfbr: 62
alambda: 2.6
rlambda: 2.6
which I believe matches your parameters, and sieving using 14e (which has a slightly smaller sieve area of 16K x 8K), I sieved the range q = 60M to 60M+500. There were 27 special-q in the range, and it found 229 relations at a rate of 0.64 sec/relation on the slow Opteron (a Core 2 would get under 0.4 sec/relation). This is about 8.5 rels/special-q using a smaller sieve region.
I found out why you are getting a slightly greater yield than I am. You
are using a skew of 1.41 (4^1/4). I am using a skew of 1. The
skew of 1.41 is slightly more efficient.
R.D. Silverman is offline   Reply With Quote
Old 2010-10-20, 22:56   #39
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

24×593 Posts
Default

I have tried to model the initial value for skew by simple LSF (log|ci| ~ i), and for this poly it suggests 1.58. But sims show that this has almost no difference vs 41/4 (note that this usual rule of thumb is equivalent to LSF with two points c0 and cn, disregarding the wobbly coefficients in the middle. For two-term SNFS polys, e.g. 3x6+1, of course this is the same).



(LSF = I meant, linear regression; reverse engineering my thoughts from Russian, "least square fit")

Last fiddled with by Batalov on 2010-10-20 at 23:04
Batalov is offline   Reply With Quote
Old 2010-10-20, 23:55   #40
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·7·269 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Let me know when. I won't bother with the source code for creating the
factor base files. I will just send those as zipped binaries.
You can email the zip files to me at woltman@alum.mit.edu whenever they are ready. Or, if you'd rather, I can download them from an FTP site or a service like sendspace.

Quote:
Add the other overhead, and my siever is about 2.2
times slower than GGNFS for this quartic.
So if I understand this correctly, if we reduce the bucket sieve time to zero we are still a tad slower than GGNFS. Does GGNFS use the same sieving strategy (i.e. is our code really *that* much slower) or do we need to make some algorithmic changes too?
Prime95 is offline   Reply With Quote
Old 2010-10-21, 00:30   #41
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by Prime95 View Post
You can email the zip files to me at woltman@alum.mit.edu whenever they are ready. Or, if you'd rather, I can download them from an FTP site or a service like sendspace.



So if I understand this correctly, if we reduce the bucket sieve time to zero we are still a tad slower than GGNFS. Does GGNFS use the same sieving strategy (i.e. is our code really *that* much slower) or do we need to make some algorithmic changes too?
The Pollard approach has inherently more overhead. I do not know what GGNFS uses for its sieve strategy. It is certain that it does not sieve over
(e,f) for a prime p such that e.g.

(0, -M) < e * v1 + f * v2 < (N,M).

Doing this entails computing sieve boundaries, then figuring out how (e.f)
stay inside the skewed parallelogram. GG does not need to even compute
the v1 and v2 vectors for each prime. Just computing the sieve start points
and the reduced lattices takes about 3 seconds for 6 million primes.

They may also have a more efficient technique for scanning the sieve
array after sieving, looking for successful hits.

I know how to reduce some of the overhead. But it requires additional
memory.
R.D. Silverman is offline   Reply With Quote
Old 2010-10-21, 00:55   #42
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD716 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I don't see how it can do that. Suppose for a given special q, we are
sieving one ideal with norm p. The hits that p takes will be all over the
(c,d) plane. How can one restrict to some subset of the plane?
You could add a next pointer to each factor base entry, 'sort' them in place, and then proceed region by region. Each region picks the head of a linked list of updates, then proceeds FB entry by FB entry and fills in a narrow rectangle of (c,d) points, then each entry computes its 'next' (c,d) point (say in lexicographic order) and gets prepended to the bucket containing the next point. You get cache misses traversing the linked list but not for any other reason, and you can at least use prefetches for those. Msieve's line sieve works that way, but it only has to deal with one line at a time.

Not to say that that's what the KF siever does; the only description of that code I could find was 'Contiued Fractions and Lattice Sieving', a short paper by Franke and Kleinjung.
jasonp is offline   Reply With Quote
Old 2010-10-21, 07:09   #43
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by jasonp View Post
You could add a next pointer to each factor base entry, 'sort' them in place,
You probably don't want to sort in place. Do an index sort; it involves
less data movement.

Quote:

and then proceed region by region. Each region picks the head of a linked list of updates, then proceeds FB entry by FB entry and fills in a narrow rectangle of (c,d) points, then each entry computes its 'next' (c,d) point (say in lexicographic order) and gets prepended to the bucket containing the next point. You get cache misses traversing the linked list but not for any other reason,
Indeed. It is these misses that concern me. And the 'prepending'
may cause misses as well.

Quote:

Not to say that that's what the KF siever does; the only description of that code I could find was 'Contiued Fractions and Lattice Sieving', a short paper by Franke and Kleinjung.
I was asked to referee that paper..... I wrote that it left out too much
material; did not adequately explain many things; made some unsubstantiated
claims; etc. and that it needed a MAJOR overhaul before publishing.

It was first published in a venue known for slack editing. My suggesttions
were basically ignored and it was published essentially as is. I complained
to the editors. I got ignored.
R.D. Silverman is offline   Reply With Quote
Old 2010-10-21, 14:30   #44
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

164448 Posts
Default

Quote:
Originally Posted by Prime95 View Post
You can email the zip files to me at woltman@alum.mit.edu whenever they are ready. Or, if you'd rather, I can download them from an FTP site or a service like sendspace.



So if I understand this correctly, if we reduce the bucket sieve time to zero we are still a tad slower than GGNFS. Does GGNFS use the same sieving strategy (i.e. is our code really *that* much slower) or do we need to make some algorithmic changes too?
I have mailed the code along with a README file that contains a lot
of info.

GGNFS does indeed use a totally different sieving strategy.


Bob
R.D. Silverman is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bad LL-D Success Rate TheMawn Data 14 2014-10-13 20:19
SNFS polynomial with very low yield ryanp Factoring 9 2013-04-03 06:33
Distributed finishing for 2,1870L R.D. Silverman Cunningham Tables 64 2011-02-27 21:39
ggnfs sieving yield wierdness jrk Factoring 10 2009-05-07 17:41
EFF prize and error rate S485122 PrimeNet 15 2009-01-16 11:27

All times are UTC. The time now is 08:09.


Tue Jul 27 08:09:48 UTC 2021 up 4 days, 2:38, 0 users, load averages: 1.32, 1.54, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.