![]() |
![]() |
#89 | |
"Mark"
Apr 2003
Between here and the
11000011001112 Posts |
![]() Quote:
When the application starts all of the workers are started and put themselves in a state called "waiting for work". The main thread works in a loop like this: 1) find thread in "waiting for work" 2) if none are found, sleep for 10 ms. 3) if one is found, get the next 1e6 primes then tell the worker to start processing that chunk 4) return to step 1 You have 6 threads, so the assumption is that this loop is iterated 6 times and thus all workers have work. That assumption is not correct. At some point the worker started by step 3 will finish. It will then go back to the status of "waiting for work". What you see happening with this sieve is that the worker started by the first iteration of the loop finishes its chunk of work before the main thread has iterated 6 times. Because of this the first worker (one that already did a chunk of 1e6 primes) will get the next chunk of 1e6 primes instead of the 5th or 6th worker. In essence getting 6 chunks of 1e6 primes takes more time than testing a single chunk of 1e6 primes. This is going to happen with fncsieve, fkbnsieve, gfndsieve, and dmdsieve because of how fast their worker threads can process a single chunk of work. This is why I suggested using -w to bump from 1e6 to 1e7. This should give the main thread the opportunity to feed all of the workers before any individual worker completes it chunk of work. The only solution to this problem would be to modify the framework such that the worker threads work on a range of primes, i.e. 100e9 to 101e9, 101e9 to 102e9, etc instead of a count of primes. The downside of such a change is that each worker will be testing a variable number of primes rather than a fixed number of primes. This would not be good for GPU workers as they are most efficient using a number of primes that is a multiple of the GPU work group size. |
|
![]() |
![]() |
![]() |
#90 | |
Dec 2011
After milion nines:)
3×7×67 Posts |
![]() Quote:
To explain: when I start your latest released sieve then p=0 ( and should not be zero since it doesnot start from zero , it start from header value in sieve) And since sieve in previous version works ok then it is cosmetic bug not flaw in program itself ( also sieve speed is same) fbncsieve.exe -p 347880187459691 -P 6000000000000000 -i 1100.txt -o 1100.txt -f N -W6 -w 1e7 -O s53factodes.txt fbncsieve v1.4, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k Sieve started: 347880187459691 < p < 6e15 with 75104 terms (1014 < k < 1999948, k*10^1100000+1) (expecting 5887 factors) p=0, 46.94M p/sec, no factors found But in one part you were correct , using -w 1e7 it nearly doubles speed!!!! Last fiddled with by pepi37 on 2021-01-02 at 02:13 |
|
![]() |
![]() |
![]() |
#91 | |
"Alexander"
Nov 2008
The Alamo City
503 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#92 | |
"Mark"
Apr 2003
Between here and the
6,247 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#93 |
"Alexander"
Nov 2008
The Alamo City
503 Posts |
![]()
I'm not able to create new sieve files using srsieve2 (SVN rev 84):
Code:
$ ./srsieve2 -W 4 -n 5e4 -N 145e3 -P "1e9" -o 't17_b2.prp' -f B -s "34767*2^n-1" srsieve2 v1.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic Fatal Error: Expected 95001 terms, but only set 0 |
![]() |
![]() |
![]() |
#94 | |
"Mark"
Apr 2003
Between here and the
6,247 Posts |
![]() Quote:
Here is the OS X output: Code:
./srsieve2 -W 4 -n 5e4 -N 145e3 -P "1e9" -o 't17_b2.prp' -f B -s "34767*2^n-1" srsieve2 v1.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic Sieve started: 3 < p < 1e9 with 95001 terms (50000 < n < 145000, k*2^n+c) (expecting 89964 factors) Sieving with generic logic Split 1 base 2 sequence into 10 base 2^20 sequences. Sieve completed at p=1000000009. Processor time: 153.19 sec. (0.53 sieving) (3.14 cores) 8181 terms written to t17_b2.prp Primes tested: 50843824. Factors found: 86820. Remaining terms: 8181. Time: 48.66 seconds. |
|
![]() |
![]() |
![]() |
#95 | |
"Alexander"
Nov 2008
The Alamo City
503 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#96 |
"Mark"
Apr 2003
Between here and the
624710 Posts |
![]() |
![]() |
![]() |
![]() |
#97 |
"Mark"
Apr 2003
Between here and the
6,247 Posts |
![]()
I ran range of Sierpinski/Riesel sequences thru the various programs that can sieve them. This can be used as a guide to determine which program to use for your sieving purposes. Note that all tested the same input file for the same range of P:
Code:
srsieve -P2002626803 -m1e10 b25_old.abcd srsieve 1.1.0 -- A sieve for integer sequences in n of the form k*b^n+c. Read 1532485 terms for 81 sequences from ABCD format file `b25_old.abcd'. Split 81 base 25 sequences into 1432 base 25^90 subsequences. srsieve started: 300000 <= n <= 1000000, 1000950101 <= p <= 2002626803 Sieving 1000950101 <= p <= 2002626803 eliminated 49708 terms, 1482777 remain. Wrote 1482777 terms for 81 sequences to srsieve format file `srsieve.out'. srsieve stopped: at p=2002626803 because --pmax was reached. These two lines are from srsieve.log: 01/07/21 08:51:03 srsieve started: 300000 <= n <= 1000000, 1000950101 <= p <= 2002626803 01/07/21 09:33:45 srsieve stopped: at p=2002626803 because --pmax was reached. -- 2562 seconds sr2sieve -ib25_old.abcd -P2002626803 -q sr2sieve 1.9.1 -- A sieve for multiple sequences k*b^n+/-1 or b^n+/-k. Read 1532485 terms for 81 sequences from ABCD format file `b25_old.abcd'. Split 81 base 25 sequences into 1432 base 25^90 subsequences. Expecting to find factors for about 49622.16 terms in this range. sr2sieve 1.9.1 started: 300000 <= n <= 1000000, 1000950101 <= p <= 2002626803 p=1990805857, 2188973 p/sec, 49245 factors, 98.8% done, 0 sec/factor, ETA 07 Jan 08:47 sr2sieve 1.9.1 stopped: at p=2002626803 because range is complete. Found factors for 49708 terms in 486.927 sec. (expected about 49622.16) sr2sieve -ib25_old.abcd -P2002626803 -q -x sr2sieve 1.9.1 -- A sieve for multiple sequences k*b^n+/-1 or b^n+/-k. Read 1532485 terms for 81 sequences from ABCD format file `b25_old.abcd'. Split 81 base 25 sequences into 1432 base 25^90 subsequences. Expecting to find factors for about 49622.16 terms in this range. sr2sieve 1.9.1 started: 300000 <= n <= 1000000, 1000950101 <= p <= 2002626803 p=1982941577, 1086937 p/sec, 48960 factors, 98.0% done, 0 sec/factor, ETA 07 Jan 10:20 sr2sieve 1.9.1 stopped: at p=2002626803 because range is complete. Found factors for 49708 terms in 977.900 sec. (expected about 49622.16) srsieve2 -ib25_old.abcd -P2002626803 srsieve2 v1.3.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic Split 81 base 25 sequences into 1432 base 25^90 sequences. Sieve started: 1000950101 < p < 2e9 with 1532485 terms (300000 < n < 1000000, k*25^n+c) (expecting 49531 factors) p=1996357801, 19.78K p/sec, 49480 factors found at 15.61 f/sec (last 1 min), 99.6% done. ETC 2021-01-07 10:16 Sieve completed at p=2000000087. CPU time: 2319.69 sec. (0.23 sieving) (1.00 cores) 1482777 terms written to b25_n.abcd Primes tested: 47452160. Factors found: 49708. Remaining terms: 1482777. Time: 2325.74 seconds. srsieve2cl -ib25_old.abcd -P2e9 srsieve2cl v1.3.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic Split 81 base 25 sequences into 1432 base 25^90 sequences. GPU primes per worker is 143360 Sieve started: 1000950101 < p < 2e9 with 1532485 terms (300000 < n < 1000000, k*25^n+c) (expecting 49531 factors) p=1925947301, 65.81K p/sec, 46892 factors found at 40.83 f/sec (last 1 min), 92.6% done. ETC 2021-01-07 08:38 Sieve completed at p=2002626803. CPU time: 189.30 sec. (0.12 sieving) (0.26 cores) GPU time: 713.29 sec. 1482777 terms written to b25_n.abcd Primes tested: 47452160. Factors found: 49708. Remaining terms: 1482777. Time: 723.70 seconds.
So the conclusion is this:
"too many sequences for the GPU" is going to be dependent on GPU memory. I do not know (at this time) how to determine beforehand if the GPU can handle all of the sequences thrown at it. What I do know is that my Quadro P3200 can handle more than 5000 sequences at a time (but less than 10000) and still be at least 3x faster than srsieve2. If you have read between the lines you will see stats for srsieve2cl. Nobody here has seen it as it is in development. It should be ready for release this weekend. After srsieve2cl is released, you will need to do some independent testing to compare srsieve2cl to sr2sieve to see which is faster. Maybe someone the forum could package up a script that take an input file and range of P and tell the user which program to use. That same script could compare the outputs to verify that there are no missed factors. One last thing, srsieve2cl does not end on the exact boundary specified on the command line. That is okay. The key is that I had to use the max p from that run to set the max p for srsieve2, srsieve, and sr2sieve to ensure that I was comparing apples to apples. |
![]() |
![]() |
![]() |
#98 |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
132768 Posts |
![]()
What is holding srsieve2 back from reaching sr2sieve's speed? How come the sieving % was only 23%? How does srsieve2cl compare when running on the cpu? I assume sr1sieve is still much faster than srsieve2 for one sequence.
Last fiddled with by henryzz on 2021-01-07 at 22:24 |
![]() |
![]() |
![]() |
#99 |
"Mark"
Apr 2003
Between here and the
6,247 Posts |
![]()
There are significant differences between sr2sieve and srsieve. srsieve2 only handles what srsieve can handle, which is fairly general. I will have to write and test a lot of code before srsieve2 can do what sr2sieve does. I work on it in bits and pieces, when I am motivated, but sr2sieve is a mess from the coding perspective. Until today I didn't think I could write code to compete (speed wise) with sr2sieve and I didn't just want to "copy and paste" its code into srsieve2 because it isn't quite that easy. I was given an ask to support p > 2^52 on sr1sieve for non-x86 CPUs. I know how to do that so I think I also have a change of pulling sr2sieve into srsieve2 and getting better performance without all of myriad code paths that sr2sieve supports.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mtsieve | rogue | Software | 543 | 2021-02-27 18:43 |
srsieve/sr2sieve enhancements | rogue | Software | 287 | 2021-01-16 08:02 |
LLRnet enhancements | kar_bon | No Prime Left Behind | 10 | 2008-03-28 11:21 |
TODO list and suggestions/comments/enhancements | Greenbank | Octoproth Search | 2 | 2006-12-03 17:28 |
Suggestions for future enhancements | Reboot It | Software | 16 | 2003-10-17 01:31 |