![]() |
|
|
#1 | |
|
"Seth"
Apr 2019
2×3×83 Posts |
I got nerd sniped by a comment in A217259 related to differences in nearby values of sigma (sum of divisors)
Quote:
Overview This guided my 1st (of 3) attempts Code:
for i in segment:
sigma[i] = [i + 1]
for d in 2 .. isqrt(n)
sigma[d*d] += d
for m in d+1 .. n / d
sigma[m * d] += d + m
2nd attempt I tried to reduce the number of divisions by keeping the index of the next multiple of each prime. This means there's a vector<uint64_t> next_multiple over all The code (SegmentedSieveSigma) is 10-20% faster. It's harder to make this multi-threaded, my plan would be to split the workers to consecutive intervals and delay processing of the results till all workers complete which is sub-optimal vs the 1st attempts iterative, embarrassingly parallel nature. 3rd attempt Sigma has another definition based on the factorization of n. This attempt tries to reduce the overall work done by looking only at prime divisors (reducing This requires keeping two mutable arrays the factored[i] = 1 which is the partial multiplication of the prime powers of i and sigma[i] = 1 which iteratively builds up sigma_1 via multiplication. A final pass is required to handle any remaining prime factors (which are larger than the sqrt(n). This approach (SegmentedPrimeSieveSigma) is the slowest (by 10-30%), I think because of excessive division and has the same limitations in terms of parallelization of the 2nd attempt. Optimization I've tried to optimize my code by
On my Ryzen 3900x I can calculate the sigma of 70-130 million integers per second single threaded! This is something like 30-55 cycles per integer for segments around 1-10 billion and around 50-90 M/s at 10^12. With 11 worker threads and 1 coordinator I can get over calculate the sigma of the first billion numbers in under a second, but the speed decreases quickly to 500M/s at 10^11 to around 150 M/s at 10^13 All of the code is available in my OEIS repository at https://github.com/sethtroisi/OEIS/tree/main/A217259 If people have advice or pointers I'd appreciate it. Last fiddled with by LaurV on 2022-10-18 at 04:33 Reason: fixed the sigma symbol in TeX lines |
|
|
|
|
|
|
#2 |
|
"Robert Gerbicz"
Oct 2005
Hungary
3×547 Posts |
Have you used buckets in the algorithm? If not, you are dead lost, and way behind the state of art, ref:
http://sweet.ua.pt/tos/software/prime_sieve.html |
|
|
|
|
|
#3 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
1137410 Posts |
Quote:
But... As a deep coder... I have found that using an array (AKA a vector) in a well-defined temporal domain can be very useful when doing empirical measurements. Basically, Idx=mod(t) to get your index into your vector. My apologies if this comes across as stupid. |
|
|
|
|
|
|
#4 |
|
"Seth"
Apr 2019
2·3·83 Posts |
Thanks for the link, bucketing is one of the items I mentioned in "Optimizations".
I have not implemented it (yet) because I was unsure of how much impact it would have. My authority on this subject is Kimwalisch. Their simple segmented sieve guide doesn't make use of bucketing so I was unsure if it's a high priority optimization. Obviously their full primesieve makes use of a clever two stage bucketing but it's only 2-5x faster than the simple version and includes dozens of additional optimizations. I was also unsure how best to cache the buckets (vector<vector<>>? vector<linkedlist>) so hopefully Tomás' code will give me some insight on what might be a good implementation. |
|
|
|
|
|
#5 | |
|
"Robert Gerbicz"
Oct 2005
Hungary
3·547 Posts |
Quote:
The bucket part is in pure c, so at least in that part there is no dirty asm. Using vectors just slows down everything. To store all buckets a simple 1D array is enough. |
|
|
|
|
|
|
#6 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×112×47 Posts |
|
|
|
|
|
|
#7 | |
|
"Seth"
Apr 2019
2·3·83 Posts |
Quote:
![]() For the largest new divisors, sqrt(n), it can be placed in bucket sqrt(n) / sieve_length away. For my bounds this isn't likely to be more than 90 buckets away which I have a plan on how I'll implement. For 2^64 it would be sqrt(2^64) / (2^16) = 65535 buckets! Do you know what strategies people use here? Allocate all of those buckets? Allocate the first X then have a >X bucket (that they empty every X-1 cycles?) Can you elaborate on using a 1D array? Do you place bucket[1] starting at 0, bucket[2] starting at 100, bucket[3] starting at 180, ...? |
|
|
|
|
|
|
#8 | |
|
"Robert Gerbicz"
Oct 2005
Hungary
3·547 Posts |
Quote:
You really don't need to use sqrt(2^32)=65536 buckets, it is better to use less, but of course it depends on the computer. Let the i-th bucket contains the array positions from [i*2^20,(i+1)*2^20), so you'll have 2^32/22^20=2^12 buckets. But one bucket's length is not 2^20 !!! Say choose its length is 2^11, and continuous in the array, where you store the buckets. So at any point you have 2^12 buckets, each of them has 2^11 length, that is only 2^23 array positions, and what is also important, that for random data these 2^12 buckets are also almost continuous, so you can hold these 2^23 positions in L3 cache. Plus when you're processing a bucket, you're updating an interval that has length 2^20, but that is also fits in L3. Maybe it'd better to reach that these 2^20 and 2^23 are closer with better parameters. You really don't need a linked lists/vectors, since you need to know only the (small) bucket's order, when you're processing the bucket, and store also all 2^12 bucket's last position's. How to check that you will be out of the (small) bucket at the next update ? For this you'd think that just store also the size of the current small bucket, but it is better to see if the next position would be divisible by 2^11, so check only that (offset&2047)==0. Also need a stack/deque to know what (small) buckets are available (you can do this using again a 1D array). When you processed one then you can give back it, and when you need a bucket then you request one. Of course if you are requesting a small bucket there should be at least one, I have checked this. Calculate also how many small buckets you'd need at any point, the worst case is when all of them are full and the actives are empty. ps. Maybe you'd need (2^12)+1 buckets and not 2^12 buckets with length 2^20, since at update pos2=pos+p could be on the same bucket where pos was, if p is close to 2^32. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| discrete divisors | MattcAnderson | MattcAnderson | 1 | 2021-06-15 17:36 |
| O'Neil fast Modular Factoring Code one of a kind | ONeil | ONeil | 11 | 2021-01-02 14:03 |
| PunchWheel: A (new?) wheel / segmented sieve idea (w/ code) | hansl | Computer Science & Computational Number Theory | 6 | 2019-10-16 20:36 |
| Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) | jasong | jasong | 35 | 2016-12-11 00:57 |
| Out-of-memory segmented sieve | CRGreathouse | Software | 11 | 2011-07-25 12:57 |