mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2020-06-11, 15:38   #1
Sam Kennedy
 
Sam Kennedy's Avatar
 
Oct 2012

1228 Posts
Question Optimising Sieving & Trial Division for SIQS

I finally managed to get a working implementation of SIQS

However it still has a long way to go in terms of efficiency and speed.

My code can be found here: https://gitlab.com/SamKennedy/siqs/-.../MPQS/siqs.cpp

I'm not too concerned right now about the linear algebra step, since the sieving and trial division need a lot more work.

I would be really grateful for any suggestions on optimising these steps. It currently takes 1 minute 53 seconds (on an i7-7700) to find enough smooth relations for the hard-coded N, but based on other implementation I've seen this should be possible in just a few seconds.

I can't ignore a potential speed up of almost two orders of magnitude, however I'm out of ideas!

(Edit: Tweaked the threshold, sieve size and factor base and got it down to approximately 70 seconds)

Last fiddled with by Sam Kennedy on 2020-06-11 at 16:36
Sam Kennedy is offline   Reply With Quote
Old 2020-06-11, 17:43   #2
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

401 Posts
Default

I really like your code because it is so concise.


To improve performance I'ld advice you to take timings of the sub-algorithms (polynomial determination, sieving, smooth finding, linear algebra). This will be quite helpful to show you which part needs most improvement.


Later on you'ld want to check the timings of parts of the sub-algorithms (like sieve initialization, sieving, sieve collection)...
Till is offline   Reply With Quote
Old 2020-06-11, 19:38   #3
Sam Kennedy
 
Sam Kennedy's Avatar
 
Oct 2012

8210 Posts
Default

Quote:
Originally Posted by Till View Post
I really like your code because it is so concise.


To improve performance I'ld advice you to take timings of the sub-algorithms (polynomial determination, sieving, smooth finding, linear algebra). This will be quite helpful to show you which part needs most improvement.


Later on you'ld want to check the timings of parts of the sub-algorithms (like sieve initialization, sieving, sieve collection)...
Thank you for this suggestion, I've just run the timings now, I've also calculated how many polynomials are used and how many times the trial division routine is called, which seems a lot compared to other implementations:

Polynomials used: 8676
Trial divisions: 500849
Time spent sieving: 33.6548 seconds
Time spent trial dividing: 24.2576 seconds
Time spent changing polynomials: 2.61052 seconds
Time spent solving roots: 0.0203665 seconds


(Sieve size was 500k, total relations found was 477).
Sam Kennedy is offline   Reply With Quote
Old 2020-06-11, 21:07   #4
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

401 Posts
Default

It seems that your trial dividing stage needs most improvement. Typically the sieve should use about 70% of the total CPU usage.


Apart from that, it is possible that your parameter adjustent is still far from the optimal.

Your sieve array size seems to be too big, and you didn't report the prime base size ...
Till is offline   Reply With Quote
Old 2020-06-11, 21:46   #5
dleclair
 
dleclair's Avatar
 
Mar 2003

3·52 Posts
Default

You're spending a lot of time on trial division. This will get much worse with larger factor bases. Like R.D. Silverman recommended the other thread, use re-sieving instead of trial division on sieve report locations. You basically sieve again but instead of adding logs at each location, you check the existing value and if it's above the threshold, you save the factor. In C++ a map of vectors with the location as the key works well. When you're done, iterate over the map to get your relations, factors and all.
dleclair is offline   Reply With Quote
Old 2020-06-11, 22:52   #6
Sam Kennedy
 
Sam Kennedy's Avatar
 
Oct 2012

2×41 Posts
Default

Quote:
Originally Posted by Till View Post
It seems that your trial dividing stage needs most improvement. Typically the sieve should use about 70% of the total CPU usage.


Apart from that, it is possible that your parameter adjustent is still far from the optimal.

Your sieve array size seems to be too big, and you didn't report the prime base size ...
I've played around with the parameters, I've also taken jasonp's suggestion of sieving from [0, 2*M] instead of [-M,M], the time for sieving and trial division is down to 26 seconds now :)


Factor base size: 4340
M: 30000
Threshold: 0.7
Polynomials used: 45838
Trial divisions: 90298
Time spent sieving: 6.2506 seconds
Time spent trial dividing: 7.83779 seconds
Time spent changing polynomials: 11.4207 seconds
Time spent solving roots: 0.017022 seconds



Quote:
Originally Posted by dleclair View Post
You're spending a lot of time on trial division. This will get much worse with larger factor bases. Like R.D. Silverman recommended the other thread, use re-sieving instead of trial division on sieve report locations. You basically sieve again but instead of adding logs at each location, you check the existing value and if it's above the threshold, you save the factor. In C++ a map of vectors with the location as the key works well. When you're done, iterate over the map to get your relations, factors and all.
I'm reading this at almost midnight but I think I understand:
After sieving, perform the sieve step again, checking for values above the threshold. If a value is above the threshold then you have a map of (Sieve Index) |-> (List of primes divisible at this location). Then in order to trial divide, rather than iterating over the sieve again, lookup the sieve index in the map and divide by the primes it maps to?
Sam Kennedy is offline   Reply With Quote
Old 2020-06-12, 00:21   #7
dleclair
 
dleclair's Avatar
 
Mar 2003

3×52 Posts
Default

Quote:
Originally Posted by Sam Kennedy View Post
I'm reading this at almost midnight but I think I understand:
After sieving, perform the sieve step again, checking for values above the threshold. If a value is above the threshold then you have a map of (Sieve Index) |-> (List of primes divisible at this location). Then in order to trial divide, rather than iterating over the sieve again, lookup the sieve index in the map and divide by the primes it maps to?
Yes, that was my strategy. If I remember correctly (it's been a while), when re-sieving it's also beneficial to skip primes below some limit. The smallest primes take the longest to (re)sieve, of course. Just record the larger factors, then when you finally validate the relation, divide by them and factor the remainder with trial division below the chosen limit or maybe Pollard rho instead.

On a related note, it's also common to not use the smallest primes and prime powers in the main sieve. They are much more compute-intensive. Just skip them but add some leeway in how you evaluate the thresholds when sieving is complete. You'll end up more false positives but, especially when using re-sieving, the benefits are enormous.

My best achievement with my SIQS code was factoring a 114 digit composite in about 14 days using about 60 cores, if I recall correctly. I used my siever with someone else's Block Lanczos for the linear algebra. That was back in 2002. Like you, I worked mainly based on Contini's paper.

Last fiddled with by dleclair on 2020-06-12 at 00:36
dleclair is offline   Reply With Quote
Old 2020-06-12, 18:43   #8
Sam Kennedy
 
Sam Kennedy's Avatar
 
Oct 2012

10100102 Posts
Default

I tried implementing a map of sieve indices to a set of indices in the factor base which are divisible at that location. It worked but was slower than scanning the sieve array and checking the sieve index against soln1 and soln2. I might revisit it at some point and see if I can get it implemented more efficiently.

I'm going to split the sieve array into smaller blocks which should hopefully fit in cache and see if that helps any. My polynomial changing routine could do with some optimisation now, I'm converting strings to mpz_t types which I can avoid all together by using an array of mpz_t.

I've been thinking about partial relations and I have an idea, but I'm guessing there's a reason not to do this (or maybe it is an optimisation and I've misunderstood):

If there are 5 almost-smooth remainders which share a common big-prime factor: A, B, C, D and E, I've been combining A-B, A-C, A-D and A-E, so I've been getting N - 1 full relations.

However there are actually (N * N-1) / 2 unique pairings so I should be able to get 10 full relations instead of 4. I'm guessing there is a risk of duplicate relations?
Sam Kennedy is offline   Reply With Quote
Old 2020-06-12, 21:48   #9
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

401 Posts
Default

Quote:
Originally Posted by Sam Kennedy View Post
I'm guessing there is a risk of duplicate relations?

Hi Sam, certainly.
But for large enough numbers, it is negligible. Treating it would not help performance.
Till is offline   Reply With Quote
Old 2020-06-13, 16:14   #10
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

401 Posts
Default

Quote:
Originally Posted by dleclair View Post
You're spending a lot of time on trial division. This will get much worse with larger factor bases. Like R.D. Silverman recommended the other thread, use re-sieving instead of trial division on sieve report locations. You basically sieve again but instead of adding logs at each location, you check the existing value and if it's above the threshold, you save the factor. In C++ a map of vectors with the location as the key works well. When you're done, iterate over the map to get your relations, factors and all.
I am using a different approach. Assume the prime base is stored in an array pArray of size pCount.
I pass the smallest solutions of (ax+b)^2-kN == 0 (mod p) to the smooth finding stage. Lets call it x1Array and x2Array, each of size pCount.
The smooth finding stage has two passes. The first pass identifies the primes p that need further consideration.
Using some pseudo-code, this looks like
Code:
 
for each smooth candidate x proposed by the sieve {
    for pIndex = 0 to pCount-1 {
        p = pArray[pIndex];
        xModP = x%p;
        if xModP == x1Array[pIndex] or xModP == x2Array[pIndex] {
            add p to pass 2;
        }
    }
}
Pass 1 finds the primes we need to consider with integer mods instead of big number divisions. This is pretty fast, and can be made faster for primes p > |x|, because then no modulus is needed at all.

Pass 2 simply walks over the primes collected in pass 1 and divides Q(x) by those primes.

Finally, with the 2/n-large variants we need to factor the undivided rest that remains of Q(x).

Last fiddled with by Till on 2020-06-13 at 16:16
Till is offline   Reply With Quote
Old 2020-06-13, 17:01   #11
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

401 Posts
Default

Edit: Pass 2 must be done inside the loop iterating over all x, too...
Till is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
MPQS Trial Division Optimisation Sam Kennedy Factoring 13 2020-06-10 16:14
Sublinear complexity of trial division? yih117 Math 5 2018-02-02 02:49
Mersenne trial division implementation mathPuzzles Math 8 2017-04-21 07:21
P95 trial division strategy SPWorley Math 8 2009-08-24 23:26
Need GMP trial-division timings ewmayer Factoring 7 2008-12-11 22:12

All times are UTC. The time now is 19:21.

Tue Aug 11 19:21:29 UTC 2020 up 25 days, 15:08, 2 users, load averages: 2.38, 1.88, 1.75

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.