View Single Post
Old 2009-08-23, 21:29   #1
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×3,191 Posts
Default Counting duplicates

I'm wondering what can sensibly be done to predict the number of duplicate relations that a large sieve run will produce.

My initial attempt ignores the fact that lattices are involved, and make the false assumption that sieving with some special-Q gives all the relations from some Platonic fixed set of relations which have that Q dividing the polynomial value.

In which case, to estimate the number of unique relations that you'd get sieving from Q0 to Q1, sieve short intervals at q=Q0, q=(Q0+Q1)/2, q=Q1-epsilon, measuring sieving time, and then either

* count only relations containing no factor between q+1 and Q1.
* count relations with one factor between Q0 and Q1 as having weight 1, relations with two factors as weight 1/2, with three as weight 1/3 ...

This gets you a unique-yield-per-q-range figure at three points, and you integrate it by the trapezium rule; also integrate the raw-yield-per-q and the sieve-time-per-q that you have measured, because that's easy, and the sieve time gives some idea how long the full run will take.

Measuring the timings means this does account for large Q taking longer than small Q; on the other hand, a large Q does sample a significantly larger space: I sieved ranges R1=25M..25M+10k and R2=75M..75M+10k, there was one relation Z in R1 with a factor in R2, and five relations in R2 with a factor in R1 (obviously including the relation Z). So this technique gives an underestimate of the number of unique relations, which isn't terribly helpful.

I have enormous amounts of data from many large sieving jobs to play with; what should I do next?
fivemack is offline   Reply With Quote