![]() |
|
|
#133 |
|
"Mark"
Apr 2003
Between here and the
3·2,447 Posts |
I have done some very limited tests with srsieve2. Here are some numbers comparing srsieve to srsieve2:
Code:
srsieve -n25000 -N50000 -P1e8 -m1e8 r56.in srsieve 1.1.4 -- A sieve for integer sequences in n of the form k*b^n+c. Read 200 sequences from input file `r56.in'. srsieve started: 25000 <= n <= 50000, 3 <= p <= 100000000 Split 200 base 3 sequences into 1471 base 3^24 subsequences. Sieving 3 <= p <= 100000000 eliminated 4838822 terms, 161378 remain. Wrote 161378 terms for 200 sequences to srsieve format file `srsieve.out'. srsieve stopped: at p=100000000 because --pmax was reached. Processor time: 555.30 sec Code:
srsieve2 -n25000 -N50000 -P1e8 -fA -sr56.in srsieve2 v2.0.0, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieve started: 2 < p < 1e8 with 5000200 terms (25000 < n < 50000, k*3^n+c) (expecting 4812049 factors) Split 200 base 3 sequences into 1470 base 3^24 sequences. Sieve completed at p=100000007. Processor time: 340.83 sec. (0.00 sieving) (1.00 cores) 161378 terms written to b3_n.pfgw Primes tested: 4761456. Factors found: 4838822. Remaining terms: 161378. Time: 341.24 seconds. ![]() I do not know why there is a different number of subsequences, but I will look into that. It doesn't seem to be impacting the results. I have not incorporated the logic from sr1sieve or sr2sieve into it yet. I really want to focus on making sure this code is solid before I start working on that. I will release the code "as is" once I feel confident in what I've written. I have to run more tests before I release it. To help me in my testing, could people send me (via PM) sequences that they would like me to run my tests with? Whether it is one sequence or a thousand, I don't care. I will limit the upper bound of sieving based upon the sequences. The key to the testing is ti compare the outputs and times between the two programs. Last fiddled with by rogue on 2019-05-24 at 16:27 |
|
|
|
|
|
#134 |
|
"Mark"
Apr 2003
Between here and the
3×2,447 Posts |
Here is a comparison between sr2sieve and srsieve2:
Code:
sr2sieve -p1e7 -P4e7 -q -i sr_3.abcd sr2sieve 2.0.0 -- A sieve for multiple sequences k*b^n+/-1 or b^n+/-k. Read 2489826 terms for 1007 sequences from ABCD format file `sr_3.abcd'. Split 1007 base 3 sequences into 7082 base 3^12 subsequences. Expecting to find factors for about 197186.63 terms in this range. sr2sieve 2.0.0 started: 25000 <= n <= 50000, 10000000 <= p <= 40000000 p=35035009, 95979 p/sec, 179938 factors, 83.5% done, 0 sec/factor, ETA 24 May 14:53 sr2sieve 2.0.0 stopped: at p=40000000 because range is complete. Found factors for 197251 terms in 513.223 sec. (expected about 197186.63) Code:
srsieve2 -ib3_n.pfgw -ob3.out -p1e7 -P4e7 srsieve2 v2.0.0, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Split 1007 base 3 sequences into 3028 base 3^4 sequences. Sieve started: 1e7 < p < 4e7 with 2489826 terms (25000 < n < 50000, k*3^n+c) (expecting 197187 factors) p=26274431, 4.040K p/sec, 141072 factors found at 435.4 f/sec, 54.2% done. ETC 2019-05-24 15:01 Sieve completed at p=40000003. Processor time: 441.67 sec. (0.03 sieving) (1.00 cores) 2292575 terms written to b3.out Primes tested: 1769076. Factors found: 197251. Remaining terms: 2292575. Time: 439.72 seconds. |
|
|
|
|
|
#135 |
|
"Mark"
Apr 2003
Between here and the
11100101011012 Posts |
I forgot that I had mixe FPU and SSE code for testing. Switching to SSE only, the numbers look even better:
Code:
Primes tested: 1769076. Factors found: 197251. Remaining terms: 2292575. Time: 341.86 seconds. |
|
|
|
|
|
#136 |
|
Apr 2019
5×41 Posts |
I built this on Linux, but couldn't figure out how to get it to link with OpenCL, so I ended up settings "ENABLE_GPU=no".
Also had this build error which forced me to remove srsieve2 from the list of programs in the makefile, after which I was able to build: Code:
make: *** No rule to make target 'srsieve2', needed by 'all'. Stop. I got this line at one point Code:
p=10519166057, 3.179K p/sec, 140559 factors found at 15.29 sec per factor, 0.0% done. ETC 4219-12-25 02:25 Code:
p=36020724671, 3.132K p/sec, 150902 factors found at 7.623 f/sec, 0.0% done. ETC 3946-08-09 02:00 Code:
Primes tested: 1552746720. Factors found: 150913. Remaining terms: 191421. Time: 487670.27 seconds. Last fiddled with by hansl on 2019-06-11 at 23:25 |
|
|
|
|
|
#137 | |
|
Apr 2019
5·41 Posts |
Another note (*emphasis mine)
Quote:
https://github.com/kimwalisch/primes...master/COPYING It seems to me that modification is 100% ok, as far as licensing is concerned. However that "COPYING" file should probably be included in your "sieve" directory(if not your top level), especially since all the code comments make reference to it as such. Last fiddled with by hansl on 2019-06-11 at 23:56 |
|
|
|
|
|
|
#138 |
|
"Mark"
Apr 2003
Between here and the
734110 Posts |
srsieve2 is still a "work in progress". It is not officially released yet.
The calculation of the factoring rate is rather complex. The first few minutes that the sieve is running significantly skew that rate so the code tries to avoid using data from the first few minutes in the calculation. That being written, I'm sure there is more room for improvement. Anyone is welcome to submit suggestions for improvement. I put a copy of COPYING into the folder with the primesieve code. |
|
|
|
|
|
#139 |
|
Apr 2019
5·41 Posts |
Question about using gfndsieve: how do you determine a good pmax -P setting? At what point is it optimal to move over to PFGW?
|
|
|
|
|
|
#140 |
|
Dec 2011
After 1.58M nines:)
1,699 Posts |
|
|
|
|
|
|
#141 | |
|
"Mark"
Apr 2003
Between here and the
1CAD16 Posts |
Quote:
Of course if any of the mtsieve programs is outputting bad factoring rates, that can be a problem. |
|
|
|
|
|
|
#142 | |
|
Apr 2019
110011012 Posts |
Quote:
So I have been testing out GPU capability in gfndsieve, and as was mentioned earlier in the thread by pepi37, it doesn't do much. I have been trying a bunch of different settings combinations to see if any particular setup is more efficient. Starting at p=3 I don't see any noticeable load, but it does seem to increase with larger p. So trying pmin=2^34 and pmax=2^35 just as a benchmark, I see 60-75% load on my laptop's Quadro M1000M. Not sure if its really doing more work or just dealing with much more overhead in some way. Nothing I did to try tuning -G or -g at this point would max the load though. Its difficult to measure what optimal settings are, since the CPU throughput dwarfs the GPU, so maybe its a moot point to try tuning at this stage, but would it be possible to support -W0 when -G is nonzero to make GPU-only tuning more testable? Anyways I don't know a whole lot about optimizing OpenCL or CUDA configurations, but I'm wondering if there's a general rule of thumb for setting -G and -g? Does it make most sense to set -G equal to the number of "Compute Units" aka "Streaming Multiprocessors"? Maybe add 1 or 2 to help keep the work queue filled or does that not help? Then I guess set -g to whatever still fits comfortably in GPU memory? Here's a portion of output from "clinfo" for what its worth. Code:
Device Name Quadro M1000M
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 418.39
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Topology (NV) PCI-E, 01:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 4
Max clock frequency 1071MHz
Compute Capability (NV) 5.0
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
|
|
|
|
|
|
|
#143 |
|
"Mark"
Apr 2003
Between here and the
3·2,447 Posts |
I could allow for -W0, but some sieves have other constraints which require a minimum p before using the GPU.
It is certainly possible that someone could write better OpenCL code than I. Some sieves will see a noticeable gain when using the GPU and others will not. |
|
|
|