Sieving discussion thread
Post all discussions about sieving here, in order to avoid confusion.
I've removed all recent posts dealing with sieving, and I put them here.
by biwema (4/16)
Maybe it is better not to go into the top 5000. the list would be flooded and the 250000 bit candidates will fall out of the list soon anyway.
some Data (based on a P4, 3.4 GHz)
sizebits testtime Twins in 100G CPU time of 1 Million candidates
180000 108s 8.6 3.4y
200000 133s 7 4.2y
250000 205s 4.4 6.5y
300000 326s 3.1 10.3y
400000 586s 1.7 18.6y
Test Factoring: also P4, 3.4 GHz (athlon would be fatser)
250000 bits, Range 10G (larger ranges do not take much longer)
limit 10^12; 5449847 candidates left (fit into estimate above)
80 Million / s at 1 T; 33 candidtes removed per second.
Candidates removed in 100 G Range:
removedfact time
46.6650 bits: 5.2M 0.36years
5053.33 bits: 4.2M 3.6years (optimal limit for 100kbit candidates)
53.5556.66 bits: 3.5M 36years (optimal limit for 300kbit candidates)
56.6660 bits: 3.9M 360 years (optimal limit for 1Mbit candidates)
I recommend to sieve up to 53 or 54 bits (10Q), assuming you choose a range of about 25G candidates at 180000 or 200000 bits (25 G contain about 2 twins)
_____________________________________________________________
by gribozavr (4/30)
Just an update on sieving progress:
n=195000, kmin=1e8, kmax=5e9, without even k's. Now I'm at p=7.0 trillion, 2,294,824 k's left, sieving rate is 1 k every 1.4 sec.
____________________________________________________________
by davar55 (5/4)
To Moooomoo: Who will do the double checking of the smaller primes?
The sieving algorithm requires these to be constantly rechecked.
____________________________________________________________
by gribozavr (5/4)
Please, explain, what do you mean by "smaller primes"?
____________________________________________________________
by davar55 (5/4)
M1 thru M1000
____________________________________________________________
by gribozavr (5/4)
I don't think we will be ever doublechacking everything. Maybe, at some point in future, when we gather many participiants, we will check just, say, 5 random numbers from a "chunk" in presieved ranges. If one or more residues will not match, the whole range will be released once more for doublechecking.
___________________________________________________________
by davar55 (5/4)
The point of sieving is to do multiple tasks at the same time.
Each higher level must recheck all lower levels first.
____________________________________________________________
by gribozavr (5/4)
Can't really understand what you are talking about. I'm sieving on a Prime Stable computer with NewPGen  a program which is proven BugFree (TM) with expirience of years. I'm 99.999% sure that sieving hasn't removed even a single number, having found a false factor.
___________________________________________________________
by biwema (5/5)
NewPGen is safe that it does not remove twin candidtes. There is an option to verify all factors in (almost) zero time.
PRP, LLR could moss twins due to hardware failure. Nevertheless it makes no sense to doublecheck. We don't need to find *all* twins in a range, hence it is more efective to check a new range instead of doublechecking. The calculation time of a candidate is short, so in case of a hardware error only a very small fraction of candidateds are faulty (unlike mersenne numbers, where one fault in one month could destroy the test).
Probability of finding a twin in a range:
my calculations of n=195000 give...
range
k=5 G chance of finding a twin: 31%
k=10 G chance of finding a twin: 52%
k=20 G chance of finding a twin: 77%
k=25 G chance of finding a twin: 84%
k=30 G chance of finding a twin: 89%
k=40 G chance of finding a twin: 95%
k=50 G chance of finding a twin: 97.5%
gribozavr, maybe it makes sense to start a new range of 5G to 25G and merge it to the previous one when it reached the same level.
This only makes sense if the project does no jump to a new exponent before reching 5G.
