20200611, 15:38  #1 
Oct 2012
1010010_{2} Posts 
Optimising Sieving & Trial Division for SIQS
I finally managed to get a working implementation of SIQS
However it still has a long way to go in terms of efficiency and speed. My code can be found here: https://gitlab.com/SamKennedy/siqs/.../MPQS/siqs.cpp I'm not too concerned right now about the linear algebra step, since the sieving and trial division need a lot more work. I would be really grateful for any suggestions on optimising these steps. It currently takes 1 minute 53 seconds (on an i77700) to find enough smooth relations for the hardcoded N, but based on other implementation I've seen this should be possible in just a few seconds. I can't ignore a potential speed up of almost two orders of magnitude, however I'm out of ideas! (Edit: Tweaked the threshold, sieve size and factor base and got it down to approximately 70 seconds) Last fiddled with by Sam Kennedy on 20200611 at 16:36 
20200611, 17:43  #2 
"Tilman Neumann"
Jan 2016
Germany
3×139 Posts 
I really like your code because it is so concise.
To improve performance I'ld advice you to take timings of the subalgorithms (polynomial determination, sieving, smooth finding, linear algebra). This will be quite helpful to show you which part needs most improvement. Later on you'ld want to check the timings of parts of the subalgorithms (like sieve initialization, sieving, sieve collection)... 
20200611, 19:38  #3  
Oct 2012
2·41 Posts 
Quote:
Polynomials used: 8676 Trial divisions: 500849 Time spent sieving: 33.6548 seconds Time spent trial dividing: 24.2576 seconds Time spent changing polynomials: 2.61052 seconds Time spent solving roots: 0.0203665 seconds (Sieve size was 500k, total relations found was 477). 

20200611, 21:07  #4 
"Tilman Neumann"
Jan 2016
Germany
417_{10} Posts 
It seems that your trial dividing stage needs most improvement. Typically the sieve should use about 70% of the total CPU usage.
Apart from that, it is possible that your parameter adjustent is still far from the optimal. Your sieve array size seems to be too big, and you didn't report the prime base size ... 
20200611, 21:46  #5 
Mar 2003
1001011_{2} Posts 
You're spending a lot of time on trial division. This will get much worse with larger factor bases. Like R.D. Silverman recommended the other thread, use resieving instead of trial division on sieve report locations. You basically sieve again but instead of adding logs at each location, you check the existing value and if it's above the threshold, you save the factor. In C++ a map of vectors with the location as the key works well. When you're done, iterate over the map to get your relations, factors and all.

20200611, 22:52  #6  
Oct 2012
2·41 Posts 
Quote:
Factor base size: 4340 M: 30000 Threshold: 0.7 Polynomials used: 45838 Trial divisions: 90298 Time spent sieving: 6.2506 seconds Time spent trial dividing: 7.83779 seconds Time spent changing polynomials: 11.4207 seconds Time spent solving roots: 0.017022 seconds Quote:
After sieving, perform the sieve step again, checking for values above the threshold. If a value is above the threshold then you have a map of (Sieve Index) > (List of primes divisible at this location). Then in order to trial divide, rather than iterating over the sieve again, lookup the sieve index in the map and divide by the primes it maps to? 

20200612, 00:21  #7  
Mar 2003
1001011_{2} Posts 
Quote:
On a related note, it's also common to not use the smallest primes and prime powers in the main sieve. They are much more computeintensive. Just skip them but add some leeway in how you evaluate the thresholds when sieving is complete. You'll end up more false positives but, especially when using resieving, the benefits are enormous. My best achievement with my SIQS code was factoring a 114 digit composite in about 14 days using about 60 cores, if I recall correctly. I used my siever with someone else's Block Lanczos for the linear algebra. That was back in 2002. Like you, I worked mainly based on Contini's paper. Last fiddled with by dleclair on 20200612 at 00:36 

20200612, 18:43  #8 
Oct 2012
2×41 Posts 
I tried implementing a map of sieve indices to a set of indices in the factor base which are divisible at that location. It worked but was slower than scanning the sieve array and checking the sieve index against soln1 and soln2. I might revisit it at some point and see if I can get it implemented more efficiently.
I'm going to split the sieve array into smaller blocks which should hopefully fit in cache and see if that helps any. My polynomial changing routine could do with some optimisation now, I'm converting strings to mpz_t types which I can avoid all together by using an array of mpz_t. I've been thinking about partial relations and I have an idea, but I'm guessing there's a reason not to do this (or maybe it is an optimisation and I've misunderstood): If there are 5 almostsmooth remainders which share a common bigprime factor: A, B, C, D and E, I've been combining AB, AC, AD and AE, so I've been getting N  1 full relations. However there are actually (N * N1) / 2 unique pairings so I should be able to get 10 full relations instead of 4. I'm guessing there is a risk of duplicate relations? 
20200612, 21:48  #9 
"Tilman Neumann"
Jan 2016
Germany
3·139 Posts 

20200613, 16:14  #10  
"Tilman Neumann"
Jan 2016
Germany
641_{8} Posts 
Quote:
I pass the smallest solutions of (ax+b)^2kN == 0 (mod p) to the smooth finding stage. Lets call it x1Array and x2Array, each of size pCount. The smooth finding stage has two passes. The first pass identifies the primes p that need further consideration. Using some pseudocode, this looks like Code:
for each smooth candidate x proposed by the sieve { for pIndex = 0 to pCount1 { p = pArray[pIndex]; xModP = x%p; if xModP == x1Array[pIndex] or xModP == x2Array[pIndex] { add p to pass 2; } } } Pass 2 simply walks over the primes collected in pass 1 and divides Q(x) by those primes. Finally, with the 2/nlarge variants we need to factor the undivided rest that remains of Q(x). Last fiddled with by Till on 20200613 at 16:16 

20200613, 17:01  #11 
"Tilman Neumann"
Jan 2016
Germany
3·139 Posts 
Edit: Pass 2 must be done inside the loop iterating over all x, too...

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
MPQS Trial Division Optimisation  Sam Kennedy  Factoring  13  20200610 16:14 
Sublinear complexity of trial division?  yih117  Math  5  20180202 02:49 
Mersenne trial division implementation  mathPuzzles  Math  8  20170421 07:21 
P95 trial division strategy  SPWorley  Math  8  20090824 23:26 
Need GMP trialdivision timings  ewmayer  Factoring  7  20081211 22:12 