Quote:
Originally Posted by bsquared
I found this thesis comparing Kleinjung's algorithm with traditional SIQS:
https://prism.ucalgary.ca/bitstream/...=2&isAllowed=y
The reexplanation of the algorithm is very helpful. Lots of comparisons are presented on up to 480bit discriminants. One thing it appears to be lacking, however, is a comparison of a welltuned traditional siqs with a welltuned Kleinjung approach. Of course, traditional siqs will be very slow in comparison with Kleinjung when the parameters are the same (e.g., block size 2048). And vice versa (Kleinjung does not do well with large sieve area). I am most curious to see how they compare when each are tuned to their respective strengths.
Still, it is a very nice thesis report. I can see better now that Kleinjung's algorithm is compatible with the rest of siqs factorization (e.g., polynomial generation, filtering and matrix steps), which might ease integration of the new approach with existing implementations.

Instead of comparing a welltuned Kleinjung SIQS to a welltuned polynomialmajor sieve, I did an intermediate step:
Compare a polynomialmajor SIQS to a primemajor SIQS. (both singlethreaded)
The conversion of my old (polynomialmajor) SIQS to a primemajor SIQS took two steps:
1. Compute the solution arrays for all (a, b)polynomials of one aparameter outside the sieve.
2. In the sieve, have outer loop over primes, inner loop over polynomials.
At first, performance was very bad. I had to switch two parameters to improve that:
1. Increase the sieve array size.
2. Reduce qCount = the number of q's whose product gaves the aparameter.
These measures improved performance a lot, but the result is still bad.
Here is a comparison to the polynomialmajor sieve:
Code:
200 bit: Polymj 4s, 789ms, Primemj 11s, 224ms, ratio = 2.34370
210 bit: Polymj 8s, 22ms, Primemj 21s, 269ms, ratio = 2.65133
220 bit: Polymj 19s, 609ms, Primemj 55s, 826ms, ratio = 2.84695
230 bit: Polymj 40s, 232ms, Primemj 2m, 12s, 960ms, ratio = 3.30483
240 bit: Polymj 55s, 305ms, Primemj 3m, 12s, 622ms, ratio = 3.48290, Primemj sieveArraySize = 119552
250 bit: Polymj 2m, 15s, 526ms, Primemj 8m, 11s, 63ms, ratio = 3.62338, Primemj sieveArraySize = 159744
260 bit: Polymj 4m, 49s, 561ms, Primemj 43m, 30s, 551ms, ratio = 9.01554, Primemj sieveArraySize = 206848
All timings of the primemajor SIQS were taken with qCount=6; using qCount=7 at 260 bit took 47m, 17s, 109ms.
For smaller N, the performance gap stems mostly from additional array accesses and another loop for the largest primes that is not necessary in the polynomialmajor version.
The performance breakin at 260 bit might point out too many L3 cache misses.
Kleinjungs SIQS might remedy many of the problems of a silly primemajor sieve. But maybe we can see here what the challenge is.

@Jason: It seems that Colin Percivals stuff has never been published? The only thing I found were some comments in another forum...