Counting duplicates
I'm wondering what can sensibly be done to predict the number of duplicate relations that a large sieve run will produce.
My initial attempt ignores the fact that lattices are involved, and make the false assumption that sieving with some specialQ gives all the relations from some Platonic fixed set of relations which have that Q dividing the polynomial value.
In which case, to estimate the number of unique relations that you'd get sieving from Q0 to Q1, sieve short intervals at q=Q0, q=(Q0+Q1)/2, q=Q1epsilon, measuring sieving time, and then either
* count only relations containing no factor between q+1 and Q1.
* count relations with one factor between Q0 and Q1 as having weight 1, relations with two factors as weight 1/2, with three as weight 1/3 ...
This gets you a uniqueyieldperqrange figure at three points, and you integrate it by the trapezium rule; also integrate the rawyieldperq and the sievetimeperq that you have measured, because that's easy, and the sieve time gives some idea how long the full run will take.
Measuring the timings means this does account for large Q taking longer than small Q; on the other hand, a large Q does sample a significantly larger space: I sieved ranges R1=25M..25M+10k and R2=75M..75M+10k, there was one relation Z in R1 with a factor in R2, and five relations in R2 with a factor in R1 (obviously including the relation Z). So this technique gives an underestimate of the number of unique relations, which isn't terribly helpful.
I have enormous amounts of data from many large sieving jobs to play with; what should I do next?
