View Single Post
Old 2020-07-12, 16:06   #37
Till's Avatar
"Tilman Neumann"
Jan 2016

3·139 Posts

Originally Posted by bsquared View Post
I found this thesis comparing Kleinjung's algorithm with traditional SIQS:

The re-explanation of the algorithm is very helpful. Lots of comparisons are presented on up to 480-bit discriminants. One thing it appears to be lacking, however, is a comparison of a well-tuned traditional siqs with a well-tuned Kleinjung approach. Of course, traditional siqs will be very slow in comparison with Kleinjung when the parameters are the same (e.g., block size 2048). And vice versa (Kleinjung does not do well with large sieve area). I am most curious to see how they compare when each are tuned to their respective strengths.

Still, it is a very nice thesis report. I can see better now that Kleinjung's algorithm is compatible with the rest of siqs factorization (e.g., polynomial generation, filtering and matrix steps), which might ease integration of the new approach with existing implementations.
Instead of comparing a well-tuned Kleinjung SIQS to a well-tuned polynomial-major sieve, I did an intermediate step:
Compare a polynomial-major SIQS to a prime-major SIQS. (both single-threaded)

The conversion of my old (polynomial-major) SIQS to a prime-major SIQS took two steps:
1. Compute the solution arrays for all (a, b)-polynomials of one a-parameter outside the sieve.
2. In the sieve, have outer loop over primes, inner loop over polynomials.

At first, performance was very bad. I had to switch two parameters to improve that:
1. Increase the sieve array size.
2. Reduce qCount = the number of q's whose product gaves the a-parameter.

These measures improved performance a lot, but the result is still bad.
Here is a comparison to the polynomial-major sieve:

200 bit: Poly-mj      4s, 789ms, Prime-mj      11s, 224ms, ratio = 2.34370
210 bit: Poly-mj      8s, 22ms,  Prime-mj      21s, 269ms, ratio = 2.65133
220 bit: Poly-mj     19s, 609ms, Prime-mj      55s, 826ms, ratio = 2.84695
230 bit: Poly-mj     40s, 232ms, Prime-mj  2m, 12s, 960ms, ratio = 3.30483
240 bit: Poly-mj     55s, 305ms, Prime-mj  3m, 12s, 622ms, ratio = 3.48290, Prime-mj sieveArraySize = 119552
250 bit: Poly-mj 2m, 15s, 526ms, Prime-mj  8m, 11s, 63ms,  ratio = 3.62338, Prime-mj sieveArraySize = 159744
260 bit: Poly-mj 4m, 49s, 561ms, Prime-mj 43m, 30s, 551ms, ratio = 9.01554, Prime-mj sieveArraySize = 206848
All timings of the prime-major SIQS were taken with qCount=6; using qCount=7 at 260 bit took 47m, 17s, 109ms.

For smaller N, the performance gap stems mostly from additional array accesses and another loop for the largest primes that is not necessary in the polynomial-major version.
The performance break-in at 260 bit might point out too many L3 cache misses.

Kleinjungs SIQS might remedy many of the problems of a silly prime-major sieve. But maybe we can see here what the challenge is.


@Jason: It seems that Colin Percivals stuff has never been published? The only thing I found were some comments in another forum...

Last fiddled with by Till on 2020-07-12 at 16:24
Till is offline   Reply With Quote