To summarize, the algorithm I use is:
If an exponent has had 1 LL test done:
 We can't draw any conclusions.
If an exponent has had (N > 1) LL tests done, with a match:
 We know exactly how many of the N tests are good and how many are bad
If an exponent has had (N > 1) LL tests done, with no match:
 We know that at least N1 of the tests are bad.
 Assume N1 bad and 1 good, because that's much more likely than all N bad.
