Thread: Counting duplicates View Single Post 2009-08-23, 21:29 #1 fivemack (loop (#_fork))   Feb 2006 Cambridge, England 11001001000012 Posts Counting duplicates I'm wondering what can sensibly be done to predict the number of duplicate relations that a large sieve run will produce. My initial attempt ignores the fact that lattices are involved, and make the false assumption that sieving with some special-Q gives all the relations from some Platonic fixed set of relations which have that Q dividing the polynomial value. In which case, to estimate the number of unique relations that you'd get sieving from Q0 to Q1, sieve short intervals at q=Q0, q=(Q0+Q1)/2, q=Q1-epsilon, measuring sieving time, and then either * count only relations containing no factor between q+1 and Q1. * count relations with one factor between Q0 and Q1 as having weight 1, relations with two factors as weight 1/2, with three as weight 1/3 ... This gets you a unique-yield-per-q-range figure at three points, and you integrate it by the trapezium rule; also integrate the raw-yield-per-q and the sieve-time-per-q that you have measured, because that's easy, and the sieve time gives some idea how long the full run will take. Measuring the timings means this does account for large Q taking longer than small Q; on the other hand, a large Q does sample a significantly larger space: I sieved ranges R1=25M..25M+10k and R2=75M..75M+10k, there was one relation Z in R1 with a factor in R2, and five relations in R2 with a factor in R1 (obviously including the relation Z). So this technique gives an underestimate of the number of unique relations, which isn't terribly helpful. I have enormous amounts of data from many large sieving jobs to play with; what should I do next?  