mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cunningham Tables (https://www.mersenneforum.org/forumdisplay.php?f=51)
-   -   1870L : Yield Rate (https://www.mersenneforum.org/showthread.php?t=14069)

R.D. Silverman 2010-10-20 21:38

[QUOTE=Prime95;234023]I must be missing something in the bucket code. If I read the code correctly, most of the global variables are 1MB at most. The only big memory chunk is allocated via malloc using the MEM_FOR_POINT_UPDATES #define which can be adjusted up or down to suit your needs.

What is MEM_FOR_POINT_UPDATES set to? Which other bucket globals take up significant space?[/QUOTE]

It is this define that is causing the problem.

For a 10K x 20K lattice, it is set at 100 * 6400 * 1024. If I crank it
down, I get core dumps because the bucket code starts to overwrite
locations that it shouldn't.

For a 9K x 18K lattice, it is set at 81 * 6400 * 1024
For 8K x 16K it is set at 64 * 6400 * 1024

If I reduce it, I get core dumps.

Prime95 2010-10-20 21:53

[QUOTE=R.D. Silverman;234027]It is this define that is causing the problem.

If I reduce it, I get core dumps.[/QUOTE]

Oh, we have a bug in the bucket code. The code is supposed to handle the case where all the points don't fit in the allocated memory.

If you can't find and fix the bug, I can look at it next week. I may need you to package up a sample case and the current code. My copy of your code is 5 years old.

R.D. Silverman 2010-10-20 22:28

[QUOTE=R.D. Silverman;234027]It is this define that is causing the problem.

For a 10K x 20K lattice, it is set at 100 * 6400 * 1024. If I crank it
down, I get core dumps because the bucket code starts to overwrite
locations that it shouldn't.

For a 9K x 18K lattice, it is set at 81 * 6400 * 1024
For 8K x 16K it is set at 64 * 6400 * 1024

If I reduce it, I get core dumps.[/QUOTE]

Here is a memory breakdown.... 10K x 20K sieve area (broken into two pieces)
6 million ideals

factor base primes and their roots 6M x 4 x 2 = 48Mbytes
storage for primes when resieving 24M
reduced lattices for ideals 16bytes/ideal 96M
logs of primes 6M
sieve array 105M
memory for buckets 655M
storage to record success locations after
algebraic sieve (I've seen as many as 45M
potential successes for small q on a quartic) 180M
(i can reduce this for larger q's) it could also
be reduced by sieving the rational side FIRST.
total: 1.1G

(plus incidentals)

I can get rid of the storage of success locations
entirely if I do a time/space tradeoff. After I sieve
the algebraic side, I scan the sieve region, looking
for potential hits. Because algebraic norms are small,
there are a LOT of them. Then, after sieving the rational
side, I scan the algebraic success points only. By scanning
all of the points, I could do away with this array entirely.

For sextics, this array is much much smaller.

R.D. Silverman 2010-10-20 22:43

[QUOTE=Prime95;234032]Oh, we have a bug in the bucket code. The code is supposed to handle the case where all the points don't fit in the allocated memory.

If you can't find and fix the bug, I can look at it next week. I may need you to package up a sample case and the current code. My copy of your code is 5 years old.[/QUOTE]

I've made a fair number of improvements since then as well. I will bundle
up the current code and input data and send it [b]at your convenience.[/b]
Let me know when. I won't bother with the source code for creating the
factor base files. I will just send those as zipped binaries.

I also note the following. Greg did a sample run of the GGNFS siever
on my data. (same factor base sizes, LP bounds, sieve area etc.)
For q near 60M, he got 8.5 relations per q at an average rate of
about .65 seconds per relation. This is about 5.5 seconds per q.

Our sieve takes over 6 seconds by itself just do do the bucket sieve
on my fastest machine (3.2Ghz I5). My home PC is an i7 but it is only
2.6Ghz/core. Add the other overhead, and my siever is about 2.2
times slower than GGNFS for this quartic. This is one reason why I
want to try to rework the bucket sieve. I understand why it is slower.
Inherently, the Pollard approach has quite a bit more overhead in setting
up the sieve regions. There are some time/memory tradeoffs that
will save me 10-15%, but I have to bring down the memory footprint
first.

R.D. Silverman 2010-10-20 22:45

[QUOTE=frmky;233805]Yep, that's low. Using
[CODE]n: <the number>
skew: 1.41
c4: 1
c3: -2
c2: -6
c1: 12
c0: -4
Y1: 2^93
Y0: -(2^187+1)
alim: 14500000
rlim: 86000000
lpba: 30
lpbr: 31
mfba: 60
mfbr: 62
alambda: 2.6
rlambda: 2.6
[/CODE]
which I believe matches your parameters, and sieving using 14e (which has a slightly smaller sieve area of 16K x 8K), I sieved the range q = 60M to 60M+500. There were 27 special-q in the range, and it found 229 relations at a rate of 0.64 sec/relation on the slow Opteron (a Core 2 would get under 0.4 sec/relation). This is about 8.5 rels/special-q using a smaller sieve region.[/QUOTE]

I found out why you are getting a slightly greater yield than I am. You
are using a skew of 1.41 (4^1/4). I am using a skew of 1. The
skew of 1.41 is slightly more efficient.

Batalov 2010-10-20 22:56

I have tried to model the initial value for skew by simple LSF (log|c[SUB]i[/SUB]| ~ i), and for this poly it suggests 1.58. But sims show that this has almost no difference vs 4[SUP]1/4[/SUP] (note that this usual rule of thumb is equivalent to LSF with two points c[SUB]0[/SUB] and c[SUB]n[/SUB], disregarding the wobbly coefficients in the middle. For two-term SNFS polys, e.g. 3x[SUP]6[/SUP]+1, of course this is the same).



(LSF = I meant, linear regression; reverse engineering my thoughts from Russian, "least square fit")

Prime95 2010-10-20 23:55

[QUOTE=R.D. Silverman;234035]Let me know when. I won't bother with the source code for creating the
factor base files. I will just send those as zipped binaries.[/quote]

You can email the zip files to me at [email]woltman@alum.mit.edu[/email] whenever they are ready. Or, if you'd rather, I can download them from an FTP site or a service like sendspace.

[quote] Add the other overhead, and my siever is about 2.2
times slower than GGNFS for this quartic.[/QUOTE]

So if I understand this correctly, if we reduce the bucket sieve time to zero we are still a tad slower than GGNFS. Does GGNFS use the same sieving strategy (i.e. is our code really *that* much slower) or do we need to make some algorithmic changes too?

R.D. Silverman 2010-10-21 00:30

[QUOTE=Prime95;234040]You can email the zip files to me at [email]woltman@alum.mit.edu[/email] whenever they are ready. Or, if you'd rather, I can download them from an FTP site or a service like sendspace.



So if I understand this correctly, if we reduce the bucket sieve time to zero we are still a tad slower than GGNFS. Does GGNFS use the same sieving strategy (i.e. is our code really *that* much slower) or do we need to make some algorithmic changes too?[/QUOTE]

The Pollard approach has inherently more overhead. I do not know what GGNFS uses for its sieve strategy. It is certain that it does not sieve over
(e,f) for a prime p such that e.g.

(0, -M) < e * v1 + f * v2 < (N,M).

Doing this entails computing sieve boundaries, then figuring out how (e.f)
stay inside the skewed parallelogram. GG does not need to even compute
the v1 and v2 vectors for each prime. Just computing the sieve start points
and the reduced lattices takes about 3 seconds for 6 million primes.

They may also have a more efficient technique for scanning the sieve
array after sieving, looking for successful hits.

I know how to reduce some of the overhead. But it requires additional
memory.

jasonp 2010-10-21 00:55

[QUOTE=R.D. Silverman;233984]I don't see how it can do that. Suppose for a given special q, we are
sieving one ideal with norm p. The hits that p takes will be all over the
(c,d) plane. How can one restrict to some subset of the plane?
[/QUOTE]
You could add a next pointer to each factor base entry, 'sort' them in place, and then proceed region by region. Each region picks the head of a linked list of updates, then proceeds FB entry by FB entry and fills in a narrow rectangle of (c,d) points, then each entry computes its 'next' (c,d) point (say in lexicographic order) and gets prepended to the bucket containing the next point. You get cache misses traversing the linked list but not for any other reason, and you can at least use prefetches for those. Msieve's line sieve works that way, but it only has to deal with one line at a time.

Not to say that that's what the KF siever does; the only description of that code I could find was 'Contiued Fractions and Lattice Sieving', a short paper by Franke and Kleinjung.

R.D. Silverman 2010-10-21 07:09

[QUOTE=jasonp;234043]You could add a next pointer to each factor base entry, 'sort' them in place,
[/QUOTE]

You probably don't want to sort in place. Do an index sort; it involves
less data movement.

[QUOTE]

and then proceed region by region. Each region picks the head of a linked list of updates, then proceeds FB entry by FB entry and fills in a narrow rectangle of (c,d) points, then each entry computes its 'next' (c,d) point (say in lexicographic order) and gets prepended to the bucket containing the next point. You get cache misses traversing the linked list but not for any other reason,

[/QUOTE]

Indeed. It is these misses that concern me. And the 'prepending'
may cause misses as well.

[QUOTE]

Not to say that that's what the KF siever does; the only description of that code I could find was 'Contiued Fractions and Lattice Sieving', a short paper by Franke and Kleinjung.[/QUOTE]

I was asked to referee that paper..... I wrote that it left out too much
material; did not adequately explain many things; made some unsubstantiated
claims; etc. and that it needed a MAJOR overhaul before publishing.

It was first published in a venue known for slack editing. My suggesttions
were basically ignored and it was published essentially as is. I complained
to the editors. I got ignored.

R.D. Silverman 2010-10-21 14:30

[QUOTE=Prime95;234040]You can email the zip files to me at [email]woltman@alum.mit.edu[/email] whenever they are ready. Or, if you'd rather, I can download them from an FTP site or a service like sendspace.



So if I understand this correctly, if we reduce the bucket sieve time to zero we are still a tad slower than GGNFS. Does GGNFS use the same sieving strategy (i.e. is our code really *that* much slower) or do we need to make some algorithmic changes too?[/QUOTE]

I have mailed the code along with a README file that contains a lot
of info.

GGNFS does indeed use a totally different sieving strategy.


Bob


All times are UTC. The time now is 08:04.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.