![]() |
![]() |
#1 |
Oct 2012
1228 Posts |
![]()
I finally managed to get a working implementation of SIQS
![]() However it still has a long way to go in terms of efficiency and speed. My code can be found here: https://gitlab.com/SamKennedy/siqs/-.../MPQS/siqs.cpp I'm not too concerned right now about the linear algebra step, since the sieving and trial division need a lot more work. I would be really grateful for any suggestions on optimising these steps. It currently takes 1 minute 53 seconds (on an i7-7700) to find enough smooth relations for the hard-coded N, but based on other implementation I've seen this should be possible in just a few seconds. I can't ignore a potential speed up of almost two orders of magnitude, however I'm out of ideas! (Edit: Tweaked the threshold, sieve size and factor base and got it down to approximately 70 seconds) Last fiddled with by Sam Kennedy on 2020-06-11 at 16:36 |
![]() |
![]() |
![]() |
#2 |
"Tilman Neumann"
Jan 2016
Germany
433 Posts |
![]()
I really like your code because it is so concise.
To improve performance I'ld advice you to take timings of the sub-algorithms (polynomial determination, sieving, smooth finding, linear algebra). This will be quite helpful to show you which part needs most improvement. Later on you'ld want to check the timings of parts of the sub-algorithms (like sieve initialization, sieving, sieve collection)... |
![]() |
![]() |
![]() |
#3 | |
Oct 2012
2×41 Posts |
![]() Quote:
Polynomials used: 8676 Trial divisions: 500849 Time spent sieving: 33.6548 seconds Time spent trial dividing: 24.2576 seconds Time spent changing polynomials: 2.61052 seconds Time spent solving roots: 0.0203665 seconds (Sieve size was 500k, total relations found was 477). |
|
![]() |
![]() |
![]() |
#4 |
"Tilman Neumann"
Jan 2016
Germany
1B116 Posts |
![]()
It seems that your trial dividing stage needs most improvement. Typically the sieve should use about 70% of the total CPU usage.
Apart from that, it is possible that your parameter adjustent is still far from the optimal. Your sieve array size seems to be too big, and you didn't report the prime base size ... |
![]() |
![]() |
![]() |
#5 |
Mar 2003
7·11 Posts |
![]()
You're spending a lot of time on trial division. This will get much worse with larger factor bases. Like R.D. Silverman recommended the other thread, use re-sieving instead of trial division on sieve report locations. You basically sieve again but instead of adding logs at each location, you check the existing value and if it's above the threshold, you save the factor. In C++ a map of vectors with the location as the key works well. When you're done, iterate over the map to get your relations, factors and all.
|
![]() |
![]() |
![]() |
#6 | ||
Oct 2012
2·41 Posts |
![]() Quote:
Factor base size: 4340 M: 30000 Threshold: 0.7 Polynomials used: 45838 Trial divisions: 90298 Time spent sieving: 6.2506 seconds Time spent trial dividing: 7.83779 seconds Time spent changing polynomials: 11.4207 seconds Time spent solving roots: 0.017022 seconds Quote:
After sieving, perform the sieve step again, checking for values above the threshold. If a value is above the threshold then you have a map of (Sieve Index) |-> (List of primes divisible at this location). Then in order to trial divide, rather than iterating over the sieve again, lookup the sieve index in the map and divide by the primes it maps to? |
||
![]() |
![]() |
![]() |
#7 | |
Mar 2003
7×11 Posts |
![]() Quote:
On a related note, it's also common to not use the smallest primes and prime powers in the main sieve. They are much more compute-intensive. Just skip them but add some leeway in how you evaluate the thresholds when sieving is complete. You'll end up more false positives but, especially when using re-sieving, the benefits are enormous. My best achievement with my SIQS code was factoring a 114 digit composite in about 14 days using about 60 cores, if I recall correctly. I used my siever with someone else's Block Lanczos for the linear algebra. That was back in 2002. Like you, I worked mainly based on Contini's paper. Last fiddled with by dleclair on 2020-06-12 at 00:36 |
|
![]() |
![]() |
![]() |
#8 |
Oct 2012
1228 Posts |
![]()
I tried implementing a map of sieve indices to a set of indices in the factor base which are divisible at that location. It worked but was slower than scanning the sieve array and checking the sieve index against soln1 and soln2. I might revisit it at some point and see if I can get it implemented more efficiently.
I'm going to split the sieve array into smaller blocks which should hopefully fit in cache and see if that helps any. My polynomial changing routine could do with some optimisation now, I'm converting strings to mpz_t types which I can avoid all together by using an array of mpz_t. I've been thinking about partial relations and I have an idea, but I'm guessing there's a reason not to do this (or maybe it is an optimisation and I've misunderstood): If there are 5 almost-smooth remainders which share a common big-prime factor: A, B, C, D and E, I've been combining A-B, A-C, A-D and A-E, so I've been getting N - 1 full relations. However there are actually (N * N-1) / 2 unique pairings so I should be able to get 10 full relations instead of 4. I'm guessing there is a risk of duplicate relations? |
![]() |
![]() |
![]() |
#9 |
"Tilman Neumann"
Jan 2016
Germany
433 Posts |
![]() |
![]() |
![]() |
![]() |
#10 | |
"Tilman Neumann"
Jan 2016
Germany
433 Posts |
![]() Quote:
I pass the smallest solutions of (ax+b)^2-kN == 0 (mod p) to the smooth finding stage. Lets call it x1Array and x2Array, each of size pCount. The smooth finding stage has two passes. The first pass identifies the primes p that need further consideration. Using some pseudo-code, this looks like Code:
for each smooth candidate x proposed by the sieve { for pIndex = 0 to pCount-1 { p = pArray[pIndex]; xModP = x%p; if xModP == x1Array[pIndex] or xModP == x2Array[pIndex] { add p to pass 2; } } } Pass 2 simply walks over the primes collected in pass 1 and divides Q(x) by those primes. Finally, with the 2/n-large variants we need to factor the undivided rest that remains of Q(x). Last fiddled with by Till on 2020-06-13 at 16:16 |
|
![]() |
![]() |
![]() |
#11 |
"Tilman Neumann"
Jan 2016
Germany
6618 Posts |
![]()
Edit: Pass 2 must be done inside the loop iterating over all x, too...
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
MPQS Trial Division Optimisation | Sam Kennedy | Factoring | 13 | 2020-06-10 16:14 |
Sublinear complexity of trial division? | yih117 | Math | 5 | 2018-02-02 02:49 |
Mersenne trial division implementation | mathPuzzles | Math | 8 | 2017-04-21 07:21 |
P95 trial division strategy | SPWorley | Math | 8 | 2009-08-24 23:26 |
Need GMP trial-division timings | ewmayer | Factoring | 7 | 2008-12-11 22:12 |