20200826, 19:30  #1 
Feb 2016
3·5 Posts 
How to optimize the sieving stage of QS?
My latest version of QS is moving in fifth of the speed of light as it only 5 times slower than Tills version now, lol.
40% of my sieving code is spent on the below loop. Do you have an idea how to improve it? Code:
// that loop takes 40% of the performance for (Wheel wheel : wheels) { wheel.update(logs); } // Wheel update method public void update(byte[] logs) { for (; currentPosition < logs.length; currentPosition += prime) { // currentPosition is int logs[currentPosition] += log; } this.currentPosition = logs.length; } 
20200826, 19:50  #2 
"Robert Gerbicz"
Oct 2005
Hungary
17×83 Posts 

20200826, 20:17  #3  
"Ben"
Feb 2007
2×17×97 Posts 
Quote:
cp1 = currentPosition[i] cp2 = currentPosition[i+1] cp3 = currentPosition[i+2] cp4 = currentPosition[i+3] p1 = primes[i] p2 = primes[i+1] p3 = primes[i+2] p4 = primes[i+3] maxCP = MAX(cp1, cp2, cp3, cp4) maxP = MAX(p1, p2, p3, p4) for (; maxCP < logs.length; ) { logs[cp1] += log; logs[cp2] += log; logs[cp3] += log; logs[cp4] += log; cp1 += p1; cp2 += p2; cp3 += p3; cp4 += p4; maxCP += maxP; } // now finish the last part of each range with individual loops // finally update all positions for the next block currentPosition[i] = cp1  logs.length currentPosition[i+1] = cp2  logs.length currentPosition[i+2] = cp3  logs.length currentPosition[i+3] = cp4  logs.length Last fiddled with by bsquared on 20200826 at 20:24 

20200826, 20:22  #4 
"Ben"
Feb 2007
2·17·97 Posts 
Also, usually people initialize the sieve byte values to a cutoff:
for (i=0; i < sieve_length; i++) { sieve[i] = cutoff; } Then sieve just like normal, but subtract instead of add. Now, you can easily find locations that are ready for trial division by ANDing with a hibit mask. if (sieve[i] & 0x80) { trial_divide(i) } But most locations won't need trial division, so you can check and eliminate many at once by overloading the sieve to an 8byte unsigned long long type: if (((uint64_t)sieve[i] & 0x8080808080808080ULL) == 0) { continue;' } Makes scanning for sieve reports much faster. 
20200826, 21:36  #5  
Feb 2016
3·5 Posts 
Quote:
My sieving bound M is represented by an array of the size S where Sk = M for a small value of k(I use it as a square of M), I found it more memory efficient. Then my sieving loop looks like this: Code:
for (i=0;i<k;i++){ for (Wheel wheel : wheels) { wheel.update(logs); } }  Quote:
Quote:
Last fiddled with by Ilya Gazman on 20200826 at 21:37 

20200826, 21:41  #6  
Feb 2016
3×5 Posts 
Quote:
Code:
public void update(byte[] logs) { int length = logs.length; int currentPosition = this.currentPosition; byte log = this.log; int prime = this.prime; while (currentPosition < length) { logs[currentPosition] += log; currentPosition += prime; } this.currentPosition = currentPosition  length; } 

20200826, 22:03  #7  
"Ben"
Feb 2007
2·17·97 Posts 
Quote:
All this might be moot, as the bottleneck of the loop is probably memory access and not loop overhead. Unrolling 4x might actually help with memory access since some percentage of the time the writes might hit the same cache line (this is a good thing). Anyway, it might not be a big win but it might help. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Bypass Sieving Stage?  Sam Kennedy  Factoring  1  20161129 02:59 
Stage 1 with mprime/prime95, stage 2 with GMPECM  D. B. Staple  Factoring  2  20071214 00:21 
Need help to run stage 1 and stage 2 separately  jasong  GMPECM  9  20071025 22:32 
Stage 1 and stage 2 tests missing  Matthias C. Noc  PrimeNet  5  20040825 15:42 
A distributedcomputing project to optimize GIMPS FFT? Genetic algorithms  GP2  Software  10  20031209 20:41 