![]() |
Sieving discussion thread
100M-150M reserved (I'm just going to sieve this)
|
jasong, I'm already sieving 100M-5G. I've reached a point where I can release 100-150M for LLR, I'm just waiting for MooooMoo to become a moderator and setup a thread like "Team drives" in Reisel Prime Search.
|
[QUOTE=gribozavr]100,000,000 - 5,000,000,000
The second is way too large ;) Will we ever have enough resources for LLR'ing it?[/QUOTE] lol, I managed to sieve my range to 1 billion, but the second step required so much memory that the computer started thrashing(did I say that correctly? no matter). I'm going to try again by doing it in two batches. As to it being two big, I'm doing it for fun and have no idea what I've gotten myself into in terms of whether or not I can accomplish the task. It'll be fun to attempt, though. (If it takes more than two weeks, I'll try to find a way to send it back. That will be an interesting problem, as the file at 1 billion p is more than a gig.) |
[QUOTE=jasong]lol, I managed to sieve my range to 1 billion, but the second step required so much memory that the computer started thrashing(did I say that correctly? no matter).
I'm going to try again by doing it in two batches. As to it being two big, I'm doing it for fun and have no idea what I've gotten myself into in terms of whether or not I can accomplish the task. It'll be fun to attempt, though. (If it takes more than two weeks, I'll try to find a way to send it back. That will be an interesting problem, as the file at 1 billion p is more than a gig.)[/QUOTE] Remember to choose the "twin search" on NewPGen, and do not include even k values. For a range of 5G, you should have less than 3 million candidates after sieving to 1 billion |
[QUOTE=MooooMoo]Remember to choose the "twin search" on NewPGen, and do not include even k values. For a range of 5G, you should have less than 3 million candidates after sieving to 1 billion[/QUOTE]
I seem to be making errors left and right, but since there are plenty of other k's for people to sieve in the interim, I'm going to try one last time. Btw, I'm unreserving 125 billion to 150 billion so I can concentrate on 100 billion to 125 billion. |
My range is too big to edit as a text file. I'm considering asking the dude who does prothsieve.com to help me deal with it. He has a huge file too, but he developed a reservation method that would let you download certain ranges.
Btw, if anyone wants to start a website for this, I'd be willing to donate $5 to help pay for it. |
[QUOTE=jasong]My range is too big to edit as a text file. I'm considering asking the dude who does prothsieve.com to help me deal with it. He has a huge file too, but he developed a reservation method that would let you download certain ranges.
Btw, if anyone wants to start a website for this, I'd be willing to donate $5 to help pay for it.[/QUOTE] Pacionet is working on a website (see "twin prime search?" thread). I don't know what you're talking about for 100 billion to 125 billion though: - Everything above 100 million (not billion) is available for LLR - Everything above 5 billion is available for sieving - We expect to find a twin before 20 billion |
[QUOTE=MooooMoo]Pacionet is working on a website (see "twin prime search?" thread). I don't know what you're talking about for 100 billion to 125 billion though:
- Everything above 100 million (not billion) is available for LLR - Everything above 5 billion is available for sieving - We expect to find a twin before 20 billion[/QUOTE] I think we're dealing with two different definitions of ranges. In one thread 100M-125M actually means 100 billion to 125 billion. I think in this thread 100M-125M means exactly what it says. I don't like wasting time, so I'm going to do some Riesel Sieve LLRing until we get this straightened out. In the mean time, I have a 1.75GHz 32-bit Sempron, and would appreciate it if someone would assign me a range that takes about 2 weeks to get down to 2 minutes an n. Thanks in advance. |
I use SI prefixes and advice everyone to do so:
1M = 1e6 1G = 1e9 1T = 1e12 Billion is a bit fuzzy: it is defined as 10e9 on short scale, while it is equal to 1e12 on long scale. That's why I prefer not to use this word. Links: [url]http://en.wikipedia.org/wiki/SI_prefix[/url] [url]http://en.wikipedia.org/wiki/Long_and_short_scales[/url] |
pacionet,
It is 100,000,000-104,999,999. |
Sieving discussion thread
[b]Post all discussions about sieving here, in order to avoid confusion.[/b]
I've removed all recent posts dealing with sieving, and I put them here. by biwema (4/16) Maybe it is better not to go into the top 5000. the list would be flooded and the 250000 bit candidates will fall out of the list soon anyway. some Data (based on a P4, 3.4 GHz) sizebits testtime Twins in 100G CPU time of 1 Million candidates 180000 108s 8.6 3.4y 200000 133s 7 4.2y 250000 205s 4.4 6.5y 300000 326s 3.1 10.3y 400000 586s 1.7 18.6y Test Factoring: also P4, 3.4 GHz (athlon would be fatser) 250000 bits, Range 10G (larger ranges do not take much longer) limit 10^12; 5449847 candidates left (fit into estimate above) 80 Million / s at 1 T; 33 candidtes removed per second. Candidates removed in 100 G Range: -------------removed--fact time 46.66-50 bits: 5.2M 0.36years 50-53.33 bits: 4.2M 3.6years (optimal limit for 100kbit candidates) 53.55-56.66 bits: 3.5M 36years (optimal limit for 300kbit candidates) 56.66-60 bits: 3.9M 360 years (optimal limit for 1Mbit candidates) I recommend to sieve up to 53 or 54 bits (10Q), assuming you choose a range of about 25G candidates at 180000 or 200000 bits (25 G contain about 2 twins) _____________________________________________________________ by gribozavr (4/30) Just an update on sieving progress: n=195000, kmin=1e8, kmax=5e9, without even k's. Now I'm at p=7.0 trillion, 2,294,824 k's left, sieving rate is 1 k every 1.4 sec. ____________________________________________________________ by davar55 (5/4) To Moooomoo: Who will do the double checking of the smaller primes? The sieving algorithm requires these to be constantly rechecked. ____________________________________________________________ by gribozavr (5/4) Please, explain, what do you mean by "smaller primes"? ____________________________________________________________ by davar55 (5/4) M1 thru M1000 ____________________________________________________________ by gribozavr (5/4) I don't think we will be ever doublechacking everything. Maybe, at some point in future, when we gather many participiants, we will check just, say, 5 random numbers from a "chunk" in presieved ranges. If one or more residues will not match, the whole range will be released once more for doublechecking. ___________________________________________________________ by davar55 (5/4) The point of sieving is to do multiple tasks at the same time. Each higher level must re-check all lower levels first. ____________________________________________________________ by gribozavr (5/4) Can't really understand what you are talking about. I'm sieving on a Prime Stable computer with NewPGen -- a program which is proven BugFree (TM) with expirience of years. I'm 99.999% sure that sieving hasn't removed even a single number, having found a false factor. ___________________________________________________________ by biwema (5/5) NewPGen is safe that it does not remove twin candidtes. There is an option to verify all factors in (almost) zero time. PRP, LLR could moss twins due to hardware failure. Nevertheless it makes no sense to doublecheck. We don't need to find *all* twins in a range, hence it is more efective to check a new range instead of doublechecking. The calculation time of a candidate is short, so in case of a hardware error only a very small fraction of candidateds are faulty (unlike mersenne numbers, where one fault in one month could destroy the test). Probability of finding a twin in a range: my calculations of n=195000 give... range k=5 G chance of finding a twin: 31% k=10 G chance of finding a twin: 52% k=20 G chance of finding a twin: 77% k=25 G chance of finding a twin: 84% k=30 G chance of finding a twin: 89% k=40 G chance of finding a twin: 95% k=50 G chance of finding a twin: 97.5% gribozavr, maybe it makes sense to start a new range of 5G to 25G and merge it to the previous one when it reached the same level. This only makes sense if the project does no jump to a new exponent before reching 5G. |
biwema,
I'm now at p=20T. I'll start another sieving of [5e9; 25e9] and when it reaches 20T, I will merge them. I will start just now, so there is not much work to catch up. |
I have merged [150e6; 5e9] and [5e9; 25e9] this morning. Now the whole range [150e6; 25e9] is at p=21.4T, 10,947,000 k's left, rate is about 1 k/sec.
|
[QUOTE=gribozavr]I have merged [150e6; 5e9] and [5e9; 25e9] this morning. Now the whole range [150e6; 25e9] is at p=21.4T, 10,947,000 k's left, rate is about 1 k/sec.[/QUOTE]
Excellent progress! |
I have released [150e6; 200e6] for LLR at p=30.6T. Now the range [200e6; 25e9] is at p=30.6T, 10,667,000 k's left, rate is ~1.4-1.6 k/sec.
MooooMoo's edit: The ranges are now in the "pre sieved range reservation thread". |
Now the range [250e6; 25e9] is at p=51.7T, 10,334,553 k's left, rate is ~2.5-2.7 k/sec.
|
Now the range [250e6; 25e9] is at p=61.0T, 10,226,786 k's left, rate is ~2.5-2.7 k/sec.
|
[QUOTE=MooooMoo]
Probability of finding a twin in a range: my calculations of n=195000 give... range k=5 G chance of finding a twin: 31% k=10 G chance of finding a twin: 52% k=20 G chance of finding a twin: 77% k=25 G chance of finding a twin: 84% k=30 G chance of finding a twin: 89% k=40 G chance of finding a twin: 95% k=50 G chance of finding a twin: 97.5% [/QUOTE] I can confirm that your calculation is correct. Here it is a small ( PARI ) program to compute the probabilities from 5G to 50G step by 5G: [CODE] T=4.0;forprime(p=3,10^5,T*=(p-2)/p/(1-1/p)^2);p=T*(195000*log(2))^-2;\ forstep(i=5,50,5,print("k=",i," G ","chance of finding a twin: ",1-(1-p)^(i*10^9/2))) k=5 G chance of finding a twin: 0.3032663771235724704662848430 k=10 G chance of finding a twin: 0.5145622587534880610868001205 k=15 G chance of finding a twin: 0.6617792038603679416316561856 k=20 G chance of finding a twin: 0.7643501993734845214281243748 k=25 G chance of finding a twin: 0.8358148606793800287773442252 k=30 G chance of finding a twin: 0.8856066930586734521235834760 k=35 G chance of finding a twin: 0.9202983368219543694155290908 k=40 G chance of finding a twin: 0.9444691714646835051627448017 k=45 G chance of finding a twin: 0.9613098046532592367695271155 k=50 G chance of finding a twin: 0.9730432400262686098144673574 [/CODE] About the first 5 digits of the probabilities is correct. |
Nice.
My Approach was slightly different: I was a bit afraid on how much the thiw is dependant on each other. Therefor I sieved a range of 10G to K=1T, and did the probability calculation with the number of remaining candidates to minimize the dependency. Ofcourse, this number has also a statistical distribution. |
The range [300e6; 25e9] is at p=83.0T, 10,005,865 k's left, rate is ~5.0-8.0 sec/k.
Please excuse me, in posts #15, #16 and #17 I made a mistake: I wrote "k/sec", while it is "sec/k". :redface: |
hi gribozavr,
Could it be, that you are quite unlucky? [QUOTE=gribozavr]I have released [150e6; 200e6] for LLR at p=30.6T. Now the range [200e6; 25e9] is at p=30.6T, 10,667,000 k's left, rate is ~1.4-1.6 k/sec. [/QUOTE] [QUOTE=gribozavr] The range [300e6; 25e9] is at p=83.0T, 10,005,865 k's left [/QUOTE] My Calculation gives: 10667000 * (24.7G / 24.8G) * (Log2(30.6T) / Log2(83T))^2 = 9972751 Candidates, so there are 28500 Factors too few. Candidates * New Range * Newfactoring Depth ratio^twin Is there a flaw in my calculations? Edit: More detailed calculations with excel give: 3.06E+13 24.8 10667000 5.17E+13 24.75 10334553 10294806 39747 6.10E+13 24.75 10226768 10227121 -353 8.30E+13 24.7 10005867 10010909 -5042 from 30.6T to 51.7T 39747 too few factors; from 51.7T to 61T approx. correct from 61T to 83T 5000 too many |
The range [400e6; 25e9] is at p=101.1T, 9,849,791 k's left, rate is ~5.0-10.0 sec/k.
Sieving has passed 100T:banana: :showoff: biwema, I'm intrested in this math, but I don't understand the "Newfactoring Depth ratio" part. Can you explain it to me, please? (I undersatnd that log2(x) is the number of bits in x) |
[QUOTE=gribozavr]
biwema, I'm intrested in this math, but I don't understand the "Newfactoring Depth ratio" part. Can you explain it to me, please? (I undersatnd that log2(x) is the number of bits in x)[/QUOTE] I've figured out the correct calculation: if you want an estimation for the number of survivors sieving by primes up to H and sieving k from Nmin to Nmax ( sieve only odd numbers ) then there will be about prodprime(p=3,H,(p-2)/p)*(Nmax-Nmin)/2 survivors because there are two forbidden resideu classes for every p>2 prime and primes are "independent". So we need only to approximate prodprime(p=3,H,(p-2)/p). Known limits: limit(H->infinity,prodprime(p=3,H,p*(p-2)/(p-1)^2))=C2=0.6601618158 where C2 is the twin prime constant. limit(H->infinity,prodprime(p=2,H,(p-1)/p)*log(H))=exp(-gamma) where gamma is the Euler-Mascheroni=0.5772156649. From these limits we can get: limit(H->infinity,prodprime(p=3,H,(p-2)/p)*log(H)^2)=4*exp(-2*gamma)*C2. So a good approximation for the number of survivors: 2*exp(-2*gamma)*C2*(Nmax-Nmin)/log(H)^(-2), here log is the natural logarithm. Using your latest data: Nmax=25e9 and Nmin=400e6 and H=101.1T, so the estimation: 2*exp(-2*0.5772156649)*0.6601618158*(25*10^9-4*10^8)*log(101.1*10^12)^-2=9846234 Very good prediction because your correct data was 9849791 |
[QUOTE=gribozavr]
biwema, I'm intrested in this math, but I don't understand the "Newfactoring Depth ratio" part. Can you explain it to me, please? (I undersatnd that log2(x) is the number of bits in x)[/QUOTE] My mathematics behind is not that difficult: The probability of not finding a factor of a number is log(beginningfactorrange)/log(endfactorrange) That is independant of the logarithm base (if you take the same base in the nominator and denominator. I squared this number bevcause we are searching for twins (after sieving to 10^12 they are independant enough). If you multiply it with the number of candidates, you get the remaining number of candidates. the (24.7G / 24.8G) is just the proportional shotening if the range. My predictions just say, that there are many too few candidates removed between 30.6T and 51.7T. Maybe your information was rounded or your initial data is taken from a range starting at 250M instead of 200M. --- You can see that I'm a typical engineer (as you know it from many jokes): Take some input (your statistical data on sieving) and predict the future with as few mathematics as possible.:grin: R. Gerbicz is a typical mathematician: Take the whole equation with all constants etc. It is amazing how accurate these numbers are (if you consider that the impact of the statistical variation is bigger at the beginning of sieving) |
An Athlon64 machine sieved 150T-200T in parallel with Celeron sieving 100-150T.
Range [550e6; 25e9] at p=150.1T had 9,553,770 k's left. Now the range [550e6; 25e9] is at p=215.6T, 9,345,423 k's left. |
The range [550e6; 25e9] is at p=271.1T, 9,216,786 k's left.
|
Again, I sieved in parallel:
Range [550e6; 25e9] at p=300.0T had 9,160,689 k's left. Now the range [550e6; 25e9] is at p=352.4T, 9,073,142 k's left. |
I've got my hands on an Athlon XP 2500+ @1800Mhz and got a sieving rate of 9-15 sec/k. LLR takes about 150-200 sec, deending on the CPU. Maybe we should postpone LLR and start distributed sieving? With a couple of Athlon boxes we could sieve to 1000T quite easily.
The data file is 170 Mb uncompressed, about 35-40 Mb compressed. I have hosting/bandwidth for it. I wonder, is there some ready software, that would allow to report factors only without uploading the whole data file. |
Range [600e6; 25e9] at p=400.4T had 8,985,536 k's left.
Now the range [600e6; 25e9] is at p=550.8T, 8,878,628 k's left. |
[QUOTE=gribozavr;87148]Range [600e6; 25e9] at p=400.4T had 8,985,536 k's left.
Now the range [600e6; 25e9] is at p=550.8T, 8,878,628 k's left.[/QUOTE] Would it be beneficial for my 2.8GHz Pentium-D to do some sieving? I know Pentiums are significantly slower, but I thought the help would be appreciated anyway. |
I was doing a little surfing and discovered the following theorem about twin primes:
[quote]Theorem: (Clement 1949) The integers n, n+2, form a pair of twin primes if and only if 4[(n-1)!+1] = -n (mod n(n+2))[/quote] I don't pretend to understand the math, but wonder if this theorem could be used to help sieving? Edit: if the "!" means factorial, then this formula will most definitely NOT help sieving, but just in case it means something different, I'll go ahead and post this. |
[QUOTE=jasong;89624]but just in case it means something different, I'll go ahead and post this.[/QUOTE]
nope, it is factorial |
[quote]Theorem: (Clement 1949)
The integers n, n+2, form a pair of twin primes if and only if 4[(n-1)!+1] = -n (mod n(n+2))[/quote] I've been thinking about this, and I'd like to throw something out there again. Would it be possible to find patterns in the (n-1)! part of the equation? Maybe a situation where the last 6 to some odd digits are calculated correctly? Even a partial pattern might be helpful. I know there are plenty of people out there who have better skills than I do! |
I am assuming you got this from this page: [url]http://primes.utm.edu/top20/page.php?id=1[/url]
If so, the key sentence would be: [quote="prime pages"]Nice--too bad it is of virtually no practical value![/quote] |
[quote]I am assuming you got this from this page: [url]http://primes.utm.edu/top20/page.php?id=1[/url]
If so, the key sentence would be: [quote]Quote: Originally Posted by prime pages Nice--too bad it is of virtually no practical value![/quote][/quote] Didn't people once say the same thing about imaginary numbers? Anyway, I just thought I'd throw it out there, just in case it was overlooked. Never mind. |
FYI, I started sieving 25G-50G. I'm planning to stop when p reaches 3T.
|
I don't think it is worthful. I can do that in 7-10 hours (or so I think) and the datafile will be quite large (> 300Mb) -- and thus not really transferrable.
|
[QUOTE=gribozavr;91511]I don't think it is worthful. I can do that in 7-10 hours (or so I think) and the datafile will be quite large (> 300Mb) -- and thus not really transferrable.[/QUOTE]
Ok, then I'll keep testing numbers ;) |
Splitting up horizontal sieving does not make sense, at all. (Of course, for the first G (before merging) this rule does not apply; here a bitmap instead of a hashtable is used due to the large number of candidates)
The sieving speed does not depend on the size of the range, therefore it is recommended, that the range is initially sized larger then necessary. We just need to take care that the computer has enough memory. (I do also some sieving on a 100G range and I have no impact on the sieving speed. The safe file is 750MBytes (but it can compressed down to 20% if the sieving is distributed) and NewPGen takes 320 Mbytes for the hashtable) n=195000 does also follow this strategy; they chance of failing is only 1/6. In case we do not find a twin after the first 20G we can discuss our future plans. Either continuing from 25G or choosing a new N is an option. (I suggest to choose a N which is some 100 below a FFT change point (SSE2 and nonSSE2) in order to improve efficiency) |
I think that we'll find a twin prime before 20G.
|
[QUOTE=biwema;92467]n=195000 does also follow this strategy; they chance of failing is only 1/6. In case we do not find a twin after the first 20G we can discuss our future plans. Either continuing from 25G or choosing a new N is an option. (I suggest to choose a N which is some 100 below a FFT change point (SSE2 and nonSSE2) in order to improve efficiency)[/QUOTE]I know it's a bit early, but shouldn't we start thinking about the next N already?
|
Range [1200e6; 25e9] is at p=930.0T, 8,397,809 k's left.
|
[QUOTE=smh;92818]I know it's a bit early, but shouldn't we start thinking about the next N already?[/QUOTE]
Some time ago I did some testing and continued quite a bit. My favourite is N=333333. It has a little bit more than 100000 Digits and is just below a FFT change for SSE2. So, I already started sieving on that number. I have chosen a range of 100G (Has the same probability of finding a twin like a 34G Range at 195000). At the moment I am somewhere beyond 150T and have less than 39 million candidtes left. I will continue sieving that range and we can start with that N after finishing. N=195000. Maybe even distributes sieving is possible; the ideal sieving depth is between 50P and 100P depending architecture used. How do you think about it? Edit: Imagaine, we are working on than N, it is very likely that we find thousands of primes that are not a twin. These Primes will flood the top5000 page and have quite an inpact on the diversity of primes. biwema |
Looking at [url]http://primes.utm.edu/primes/lists/all.txt[/url] I see that 17 primes have already been reported for n=333333.
|
[QUOTE=gribozavr;93034]Looking at [url]http://primes.utm.edu/primes/lists/all.txt[/url] I see that 17 primes have already been reported for n=333333.[/QUOTE]Looks like Biwema already started searching for primes in that range, and that Daniel Heuer missed one :smile:
If you need help, I do have a pc available that can do some sieving. |
Well observed. :smile:
I already performed some tests on that Range. I assume that Daniel Heuer did not miss any Prime. He was earching for -1 Riesel Primes and did not sieving for twins. I did Sieving for twins and then testing +1 first (I still like the fermat factor idea). That's why I have fewer primes per Range even I have to confess that I was very lucky (5 primes in K = 2 - 7M); At the moment I expect 1 prime per 10M. I searched the Range up to k=20M. I agree that it is a good Idea to distribute sieving, if the twinprimesearch community agrees on this exponent. At the moment I am removing around 80 candidates per minute (even if my P4 3.4 GHz is not optimal for sieving) at 160T. One Primalty test takes 267 Seconds on the same machine. We can expect a sieving depth of 55-60P (60 Million G) - maybe even more, depending on how much faster amd is. (On my P4 that depth needs 24 cpu years of sieving) |
I have an athlon and a celeron doing sieving for k=195000. Will it slow down sieving if I merge both k?
|
If the program supports that I think it will not slow down. On the other hand, I don't know if the overall performance gets better.
The most important thing you may consider is the memory. My range is 4 times larger and NewPGen uses more than 300MBytes while running. Mybe there is a little slowdown because the 2 ranges (0-25G and 0-100G) are not matching. Regarding the range of sieved p you can start at that point you arrived with N=195000, The the range between can be sieved by someone else at a leter time and merged after. |
Range [1700e6; 25e9] is at p=1033.9T, 8,171,379 k's left.
|
I normally use NewPGen for sieving. I don't know how much it is optimized to the new architectures. It is not maintained anymore since more than a year. If we compare the speed in vertical sieving to the tool used in sobsieve etc, we can assume that there is still some potential for improvement.
While implementing a tool that is specialised for horizontal sieving, specific optimizations are possible. Siving constellations of more than one prime (twin, triplet, various chains...) has the problem that the initial range/number of candidates is very large and is dropping very quickly. The first part is completed with an array, where the numbers are crossed out. NewPGen uses a bitmap for sieving up to 1 billion. After that point, the number of candidates dropped so much that severly subranges can be grouped together. The feature of reduced screen output below 1 G is a bit flawy with the opposite effect that every divisor below 1 G is printed. The extensive screen output is slowing down the whole process. Optimizations below 1 G: - Adapt the bitmap to the characteristic of the constellation: k must be divisible by 3 (that +/-1 can be prime), so it needs only to contain these candidates. Hence the same amount of memory can hold a 3 times larger range. - Instead of splitting the range in pieces of (0..2G, 2..4G, 4..6G), one can split modulo groups. (1 mod 35, 2 mod 35, 3 mod 35, 4 mod 35, 6 mod 35, 8 mod 35 ...) that needs just 24 instead of 35 groups and the slow sieving 5 an 7 can be avoided. - define the bitmap siza as product of small primes (11*13*17*19*23*29*31=955049953). These primes can be sieved out once. After that every time the presieved range can be copied and continued sieving 37. Maybe this step is not necessary, if the second point is implemented. - While sieving small primes, it is not necessary to count the sieved candidates. Just delete the appropriate bit without checking if that candidate was alread sieved earlier. The number of candidates can be counted when they are saved. When I sieved a 100G range of N=333333, i was quite lucky, that the merged file after 1G was less than 2GByte. Optimizations beyond 1 G: - Use a hash table (NewPGen does that already) It is faster than a list or tree. - finding candidate divisors: they don't need to be prime. If they are composite they simply don't sieve out a candidate. So the candidate divisors can be sieved (some of the optimizations above can be applied here, too) up to maybe 1000 (the optimal threshold has to be found), then run a quick fermat test (or similar) on these number and trial divide the ramaining divisors. Test dividing some pseudoprimes is faster than prove the primalty of the candidates. (Prime95 uses that strategy already - here a fermat is too slow that this part is omitted) Multithreading (architecture specific) - Splitting up the divisor generator and siever into different threads (maybe also a thread each for -1 sieving and +1 sieving). To go further: Is one thread for 6n-1 divisors and one thread for 6n+1 divisors even faster or could it be that they drift off each other? Any Ideas? |
See, biwema:
By hashtable the sieve wouldn't be faster. In Newpgen everything is in the memory. Newpgen is using Erathosthenes or Atkin's fast method to find all prime numbers in an interval, and this is much faster than checking if p is a divisor of some k*2^n+-1 or not, if n is large and in this project n is large. So Newpgen isn't sieving by composite numbers, as I know. Sobsieve is using a complicated multiple k-sieve, twinsieve isn't a such problem. The math of the twin prime sieving is very easy, I think this is the method: By powmod determine A=((p+1)/2)^333333 mod p (here p is prime), it is require 18 modulare squarings and 7 multiplies by (p+1)/2 which is really fast, because that is only a bitshift or add+bitshift. If for A or (-A) mod p is in your range then you can cancel that k value, because k*2^333333+-1 will be divisible by p, and if p is small then by sieve you can cancel also k+p,k+2*p,k+3*p and so on. There is also another method: compute B=2^333333 mod p by the same powmod but then do A=modinv(B,p), I think this is faster, because it is require only about 13 modular square+4 bitshifts but also one modular inverse operation, which isn't so cheap, but the total cost will be less and I think Newpgen using this method. Therefor your multithreading idea is wrong, newpgen has got almost the same speed for single k*2^n+1 sieve and for twins k*2^n+-1 sieve. About memory: it is possible to write a sieve that is using say less than 1M of memory: store the remaining k numbers in a sievefile and do another say 210000 files: for example c_129589.txt will contain the k numbers where k*2^n+-1 has got a prime divisor, what sieve has found (here 129589*10^6<=k<129590*10^6). If you've done some work, say you've sieved from 1000G to 1001G then you can open the sievefiles and merge the k factors. After this delete all 210000 files and continue sieving. In theory it wouldn't be slow, it is very rare that you've to merge the lists. |
I think there are a few misunderstandings...
[QUOTE=R. Gerbicz;94859]By hashtable the sieve wouldn't be faster. In Newpgen everything is in the memory.[/QUOTE] I am thinking about a hashtable in the memory that a p can be found in O(1). Assuming our list has millions of candidates (p) left. A sorted list or a tree is O(log n). On the other hand: If we are sieving a 100G range at 1 P, only every 10000th p divides a k in [0..100G] that lookups are quite rare (a few 1000 per second). Maybe the offline sieving could be an option when we are working on really large candidates (300000 k or 1 M digits). [QUOTE] Newpgen is using Erathosthenes or Atkin's fast method to find all prime numbers in an interval, and this is much faster than checking if p is a divisor of some k*2^n+-1 or not, if n is large and in this project n is large. So Newpgen isn't sieving by composite numbers, as I know. [/QUOTE] Yes, but I assume that they are not sieving up to Sqrt(p) because that is too slow. There are too few p removed after sieving beyond 100000, that it makes more sense to accept composite p. (Confusing, I know: There are two cascaded sieves, The outer sieved just the sieve divisors of the inner sieve.) [QUOTE] Sobsieve is using a complicated multiple k-sieve, twinsieve isn't a such problem. [/QUOTE] I agree. Actually I wanted to say. If there are improvements in the vertical sieve (sobsieve), then there might be also some in the horizontal sieve. The algorithmns are different, of course. [QUOTE] Therefor your multithreading idea is wrong, newpgen has got almost the same speed for single k*2^n+1 sieve and for twins k*2^n+-1 sieve. [/QUOTE] Ok. You're right. So this part of multithreading makes no sense. How about the other way? Which part of the twinsieving takes more time? - The generation of p, which are used for sieving. - Testing if a p divides a twin candidate. If the first part takes much more time, generating more (but also composite) p in much less time could make sense. Maybe setting the thershhold of the part 1 sieve [QUOTE] About memory: it is possible to write a sieve that is using say less than 1M of memory: store the remaining k numbers in a sievefile and do another say 210000 files: for example c_129589.txt will contain the k numbers where k*2^n+-1 has got a prime divisor, what sieve has found (here 129589*10^6<=k<129590*10^6). If you've done some work, say you've sieved from 1000G to 1001G then you can open the sievefiles and merge the k factors. After this delete all 210000 files and continue sieving. In theory it wouldn't be slow, it is very rare that you've to merge the lists. [/QUOTE] That could work, but depends on the speed of the disks. New Idea (Adapted from R.Gerbicz approach): Don't hold the list of still remaining twin candidates in the memory. Just sieve and keep all removed p in the sieve range (here 0..100G) in the memory. They appear randomly in the range. Therefore build up a tree, where you can read all p in a sorted way. After the memory, that is reserved for the sieving application is used up, merge the p with the twincandidate file on the disk (That process replaces the intermediate save). The whole thing is only effecient after reaching an adequate level of sieving (some T). |
I run newpgen on Linux and I have a question.
While I'm always well informed by the program what the the sieving value is at, I only seem to see updates of how many k are left, or the rate, when a k is removed. This could be a problem if a siever wants to sieve until the rate is more than about 2 minutes, but wants to stop the sieve manually. Am I just not patient enough, or is this a problem that hasn't been considered? |
Where is the problem? just wait 2 minutes. you are not watching the programm 24/4, are you? So it's no big deal, even if one sieves a day longer.
|
Question: If you set it to log factors, will it log factors as soon as it gets them, or go by the update interval? The reason I ask is that I set the update interval to 10,000 minutes, thinking that only applied to the main file. I'm hoping the factors will update more frequently, and if there's a power outage I can simply update the starting point with vi.
Is this the correct way to do things? |
[QUOTE=jasong;95035]Question: If you set it to log factors, will it log factors as soon as it gets them, or go by the update interval? The reason I ask is that I set the update interval to 10,000 minutes, thinking that only applied to the main file. I'm hoping the factors will update more frequently, and if there's a power outage I can simply update the starting point with vi.
Is this the correct way to do things?[/QUOTE] Yes. The factors are written immediately. If you set the update interval to a high level, you risk to lose the whole sieving time after the last factor was found. In this sieving project the factor density is so high that it will be no problem, at all. Besides that you save the intermediate file saving time (around 90 sec every time) |
I'm averaging about 3.5-4.0 k/sec (times 2) sieving 333,333 on my X2 3800+, how is everybody else doing? I just want to benchmark myself is all.
|
Celeron 1200Mhz: 14.5-15.5 k/sec
Athlon 2500+ 1800Mhz: 4-6 k/sec Athlon64 3000+ 1800Mhz: 4-5.5 k/sec |
I must be doing something wrong. My AMD Athlon(tm) 64 3400+ is only doing 1 k per 2.5-3.0 seconds. Any ideas???
I'm currently working on 500T-550T. |
I mis-typed..
3.5-4 sec/k |
[QUOTE=jmblazek;96298]I must be doing something wrong. My AMD Athlon(tm) 64 3400+ is only doing 1 k per 2.5-3.0 seconds. Any ideas???
I'm currently working on 500T-550T.[/QUOTE] Sieving k/sec rate depends on the sieving depth, you can sieve about 2 times slower at 500T than if you would sieve at 250T. |
[quote=Skligmund;96303]I mis-typed..
3.5-4 sec/k[/quote] Whew...that makes sense now...since you're working about 300T deeper. |
[quote=MooooMoo;95056]
The problems with both ideas is the [SIZE=4]huge[/SIZE] size of the new output file and the factor list.[/quote] Is not the problem, I think. We can write a program that give us all reduce numbers ( The number is not in the new file after sieving ) The MASTER ADMIN taks the small file and reduce from file [URL="http://www.sendspace.com/file/32ye9b"]32ye9b[/URL] also with a special program. I work with 2 PCs and this is for my project very helpfull when I sieve serveral ranges. best |
If anyone's curious...
For n=195000, we actually sieved far more than the previous record holders did. They sieved only to 281T, while we got over 1000T (see [url]http://www.hpc-europa.org/index.php?section=Transnational&subsection=scientific_highlights&page=sci_high/csajbok[/url])
The number of Intel Itanium 2 processors they used "varied between 1 and 96" while they were working on getting their previous record twin. They had a total of 5 million candidates after sieving (out of an original 8.5 billion). This means they were probably choosing a range that had an ~80% certainty of containing a twin. It would be equivalent to sieving n=195000 from 1-22G. |
[quote=MooooMoo;95056]- somewhere to upload the new output file, so that NewPGen's feature for " merging files across several machines" can be used.
[/quote] I can't figure out how to use that on my own computers. The instructions are confusing at best. I know this is the wrong thread for this, but it only made sense here. If anybody knows how this works, let me know. Maybe make a simple tutorial for it? |
If you want to report results, you don't need to upload the datafile; only newpgen.del file is needed. Of cource, unless you forgot to set newpgen to log factors.
|
[QUOTE=Skligmund;96565]The instructions are confusing at best. I know this is the wrong thread for this, but it only made sense here. If anybody knows how this works, let me know. Maybe make a simple tutorial for it?[/QUOTE]
I hope this tutorial helps (written by jmblazek): [url]http://www.elaurian.org/tpsieving/[/url] |
[quote=MooooMoo;96570]I hope this tutorial helps (written by jmblazek):
[URL]http://www.elaurian.org/tpsieving/[/URL][/quote] sieve across several machines...duplicate and merge... I'll see if I can add that to the page. |
Thanks guys.
The reason I ask is I run across 7 cores (5 computers) and I think I could really reduce the size of that 736MB file. If I was able to merge all my files together, I could probably be down at 730 or less already, but I only get one range worth (usually like 200 kilobytes or something) instead of 7 ranges worth (1400 KB ish). After a while, it adds up. |
[quote=jmblazek;96572]sieve across several machines...duplicate and merge...
I'll see if I can add that to the page.[/quote] done...let me know if it is clear. [URL]http://www.elaurian.org/tpsieving/[/URL] |
Ah hah!
I was doing everything manually, I just didn't understand how to merge all the files. Now I know! Thanks for the tutorial. I shall use it when all my subranges complete. |
What is the "Verify result" option in NewPGen ? Should we use it ?
|
from the NewPGen Help/Contents page:
If "Verify results" is checked, NewPGen will do a careful check to make sure that the prime actually does divide the number that it thinks it does. This will make NewPGen very slightly slower, but it is recommended that it is always checked, particularly as p gets large (where the scope for programming errors increases), and only uncheck it during the early parts of the sieving if you are using a primorial mode and many k's are being thrown out every second. If you are not using a primorial sieve then leave this checked, as then it makes no difference to NewPGen's performance. "particularly as p gets large" That means what??? 500T, 1P, 10P??? Let me know when and if it should be checked and I'll note it on the help page. |
[QUOTE=jmblazek;96610]from the NewPGen Help/Contents page:
If "Verify results" is checked, NewPGen will do a careful check to make sure that the prime actually does divide the number that it thinks it does. This will make NewPGen very slightly slower, but it is recommended that it is always checked, particularly as p gets large (where the scope for programming errors increases), and only uncheck it during the early parts of the sieving if you are using a primorial mode and many k's are being thrown out every second. If you are not using a primorial sieve then leave this checked, as then it makes no difference to NewPGen's performance. "particularly as p gets large" That means what??? 500T, 1P, 10P??? Let me know when and if it should be checked and I'll note it on the help page.[/QUOTE] It doesn't need to be checked if we can come up with a factor verification script. |
I will check all factors when I will be preparing files for PG.
|
[QUOTE=gribozavr;96627]I will check all factors when I will be preparing files for PG.[/QUOTE]How can you check all factors? If i send you my newpgen output file, you would have to refactor all numbers I removed?
|
We plan to collect newpgen.del files.
|
Sieve reservation
Hi
I was contacted by jmblazek asking for my support doing sieve, so now I'm asking can the range 1120 T - 1170 T (approximately 10 days of work), be reserved for me? Also which file do I have to sieve from, the one in this thread, or the one in the tutorial site? Regards! |
It actually doesn't matter which. I think the one in the tutorial might be a little smaller, so I'd go that route (corrrect me if I'm wrong).
If my upload was faster, I would let you download mine, as it is a few megs smaller yet. |
[QUOTE=Skligmund;96672]It actually doesn't matter which. I think the one in the tutorial might be a little smaller, so I'd go that route (corrrect me if I'm wrong).
If my upload was faster, I would let you download mine, as it is a few megs smaller yet.[/QUOTE] Thanks, I've actually also decided to use that file from the tutorial, it was updated according to the time table in Windows, on January 19 2007, and it was approximately 10 megs smaller, so I'll stick with that one... thanks for your replys, happy sieving everyone :) |
I tried to start sieve n=500,000 for a little to test if all works. (0-300G)
NewPgen has successfully generated a PROTHGEN.TMP file of 5.58 GB, but after that , a windows popup appear with an error caused by the application (when NewPGen, after generating PROTHGEN.TMP, go to the next step). Is it normal ? |
no, that is not normal.
if you are on fat32 no files over 4GB or so are supportet. try it again with a ntfs structure. or split the ranges. |
No I am on NTFS, and the file PROTHGEN.TMP is still there (5.58 GB).
Probably the range is too large and this cause this error ? Is it better to split it ? |
Ok I have received this answer from Paul Jobling (NewPGen author):
[QUOTE]Hi Andrea, [...] > Now we are switching to exponent n=333,333 and after, > probably, n=500,000. > > I began to sieve n=500,000 from 0 to 300G. NewPGen has created a file, > PROTHGEN.TMP of 5.58GB sieving all k to 1 billion, but immediately > before start next step, it crashes (windows popup error saying there > is an error caused by newpgen). The file PROTHGEN.TMP is still there. > [...] > Is the range too large ? I have to split it ? The space required to store those numbers in a bitmap is too large: 34 GB are required. To store them in a list is also too large: if the file is 5.5 GB, I would imagine that it contains about 330 million different k values, so the space required is 1.2 GB. You will have to use a smaller range of k, unless you can find a PC with more RAM, and simply load the file PROTHGEN.TMP into that. Regards, and good luck, Paul.[/QUOTE] So MooMooo I'll not able to sieve the entire range, I am sorry. I'll make some tests in these days and I'll let you know the maximum range of K I can reserve. |
Why not do 0-207G? From biwema in a previous post in a different thread:
500000 FFT size: 48K Time 1 test: 500 s Optimal sieve bits: 57.5 Tests per twin: 23.7M Twin every G: 90 90% chance after G: 207 CPU years per twin: 376 Actually, biwema suggests 250G... [URL]http://www.mersenneforum.org/showpost.php?p=96504&postcount=19[/URL] |
I can try .
Ho can I split the PROTHGEN.TMP file in 2 files: 1) 0 -207G 2) 207G-300G (Windows) ? |
Ok I'll be able to sieve, for the beginning, the range 0-50G for exponents around n=500,000.
As soon as we decide n I'll start with that range. |
[QUOTE]Why not do 0-207G? From biwema in a previous post in a different thread:
500000 FFT size: 48K Time 1 test: 500 s Optimal sieve bits: 57.5 Tests per twin: 23.7M Twin every G: 90 90% chance after G: 207 CPU years per twin: 376 [/QUOTE] [QUOTE=pacionet;97003]Ok I'll be able to sieve, for the beginning, the range 0-50G for exponents around n=500,000. As soon as we decide n I'll start with that range.[/QUOTE] Pacionet, I can sieve the rest of the range (50G-207G), if your computer doesn't have enough RAM to sieve the whole 207G at once. It's a lot more efficient to sieve the whole range at once, than to split it into 50G ranges of k and then sieve each k range individually. Please let me know if you want to sieve the whole range (1-207G) at once, or sieve 1-50G and let me sieve the rest (50G-207G). It's your choice, but the first option is more efficient. |
New master file
I will upload a new file. Unzipped is it 714 MB. Name: twin333333-017.txt (zip: 123 MB)
NewPgen works fine with it. First line: "35909316 k's remaining. p=1175000396142961 divides k=48988743009" |
[QUOTE=MooooMoo;97004]Pacionet, I can sieve the rest of the range (50G-207G), if your computer doesn't have enough RAM to sieve the whole 207G at once. It's a lot more efficient to sieve the whole range at once, than to split it into 50G ranges of k and then sieve each k range individually.
Please let me know if you want to sieve the whole range (1-207G) at once, or sieve 1-50G and let me sieve the rest (50G-207G). It's your choice, but the first option is more efficient.[/QUOTE] OK, I'll sieve 0-50G because I have not enough RAM. Do you have enough RAM to sieve 50-207 at once ? :surprised |
[QUOTE=pacionet;97105]OK, I'll sieve 0-50G because I have not enough RAM.
Do you have enough RAM to sieve 50-207 at once ? :surprised[/QUOTE] I have (barely) enough for sieving 50-207 at once. Sieving 1-207G at once is too much for my computer to handle, though :sad: |
What kind of RAM requirements for that? Just curious.
|
[QUOTE=Skligmund;97110]What kind of RAM requirements for that? Just curious.[/QUOTE]
Around 800-900MB of RAM for the early parts (~2T), but only ~570MB is needed once the sieving depth is quite large (500T or more) |
I recommend sieving the entire range at once, either 0G-207G or 0G-250G. Maybe someone can sieve it until RAM requirements lower and then turn it over.
OR, if splitting the file is necessary, then I suggest 0G-90G and 90G-207G or 250G. The faster computer would get the 0G-90G and the slower computer would get the other range. Reasoning...a twin is expected every 90G at n=500000...so if the dat file is going to be split, might as well sieve deeper into 0G-90G. By the time 90G is LLR'd, the other range will be sufficiently sieved. Also, 1G RAM can be as low as $80...so for a $160 investment 2G will adequately cover the RAM requirements. :smile: |
[QUOTE=jmblazek;97119]Also, 1G RAM can be as low as $80...so for a $160 investment 2G will adequately cover the RAM requirements. :smile:[/QUOTE]
Only until the next n after 500,000. Beyond that, you'll probably need round 4G of RAM :smile: Anyway, I've started sieving 50G-208G. My progress is at 26T. Unless something unexpected happens, it'll be close to (or at) the optimal sieving depth after n=333333 is complete, and after 1-50G is done for n=500000. |
Master-File Size
With special methods is the 714MB file just 64.4MB large. Is it positiv ?
|
I achieved the same compression ratio.
Cybertronic, what methods did you use? |
| All times are UTC. The time now is 13:33. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.