Quote:
Originally Posted by henryzz
@CraigLo How are you working out your probabilities? The obvious calculation is to sum the probabilities for each start point x < 0 that a gap >=xx merit starts there. However, these probabilities don't strike me as independent so summing them wouldn't be correct.
I assume that this calculation is always going to be too slow to optimize based on this.

1. Calculate the probability that a gap >= G starts at the first point < 0.
2. Calculate the probability that a gap >= G starts at the second point < 0. A gap starting here will also include the first point so add this to the probability for the first point as well.
3. Calculate the probability that a gap >= G starts at the third point < 0. A gap starting here will also include the first and second points so add this to the probability for the first two as well.
4. Continue this for all points < 0.
When you are finished we have the probability that a point is included in a gap >= G. Every gap will include the first point. A gap will include a point at G/2 only about half the time so sieving a point at G/2 is only half as helpful as sieving a point at 0. A gap will only include a point at 3G/4 about 1/10th the time so sieving a point here is only 1/10th as helpful as sieving a point at 0.
The probabilities calculated above will change depending on the offset chosen but I think the changes will be small enough that they can be ignored. With this assumption the probabilities can be precomputed and the optimization problem is similar to the one you are already doing.
You are currently subtracting 1 for each element sieved in the interval G/2 to G/2 and 0 outside this region and minimizing the result.
I am suggesting we subtract the precomputed probability for each element sieved over the interval G to G and minimize this.