 mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)

 tabidots 2019-05-16 02:45

SIQS diagnostics

Hello all, this is my first post on the forum.

I recently got interested in factoring while implementing some number theory-related programs in Clojure to teach myself the language. As far as algorithms go, I have managed to get as far as MPQS. My implementation is hardly competitive (the speed of the Alperton calculator blows my mind), but for a language that is not exactly known for speed, I'm quite satisfied with it right now.

My MPQS implementation is based on [URL="https://codegolf.stackexchange.com/a/9088/86875"]this Python implementation[/URL] (and it's faster than the CPython version—don't know about PyPy) and a bunch of papers.

I recently tried to "upgrade" my MPQS to SIQS using Contini's paper and [URL="https://github.com/skollmann/PyFactorise/blob/master/factorise.py"]this Python implementation[/URL], but it's very slow.

So far my MPQS beats the Python MPQS. But the Python SIQS (which was programmed by someone else) is slower than that, and my SIQS is slower still, for the same numbers. For example, a number that my MPQS could factor in a couple of seconds takes my SIQS 30 seconds or more.

In both the Clojure and Python implementations, it seems like it has to run through way too many polynomials to pick up enough smooth relations. It's not uncommon for an entire cycle through a polynomial with a unique "a" and all of its "b"s to generate no smooth relations.

If SIQS is supposed to be massively faster than MPQS, shouldn't just a couple of "brand-new" polynomials be enough? (Brand-new meaning a new "a" coefficient.)

I did come across [URL="https://www.mersenneforum.org/showthread.php?t=10797"]this thread[/URL] where people mentioned that there was a typo in Contini's thesis. However, it is not clear to me what the typo actually is (and when I fiddled with the signs, it didn't make a difference).

So my basic question is, [B]what kind of gotchas should I look for with the polynomials[/B] to make sure they can capture as many smooth relations as possible? I have already verified that b^2 or (a - b)^2 is congruent to N mod a for each b.

Other details of my implementation: In the sieving stage, I am ignoring small primes below some threshold, which is adjusted to compensate. The parameters for this were taken from the Python MPQS and they contribute to a significant speedup in SIQS as well.

The Python SIQS uses a Monte Carlo method to generate the factors of "a", with the candidate pool restricted to primes between 400 and 4000. I've kept the same parameters in my implementation (tinkering with these bounds seems to make it take more time to generate an acceptable "a"—Contini's paper mentions that his minimum prime factor of "a" was >2000, but this seems crazy to me).

As a side note, I'm testing on semiprimes that are considered "small" in the factoring world—40 to 50 digits—because I don't have time to sit around all day waiting for my program to finish 😅

Thanks!

 henryzz 2019-05-16 08:15

The aim with SIQS is to make switching polynomial very fast such that many more polynomials can be used than MPQS. However, I believe that my implementation still spends 10% of its time switching A. The more factors you have in A the less you will need to switch B, however, the factors get smaller the more you have and are excluded from sieving so they can't get too small.
Are you using large primes?
At 50 digits I use about 11 As. This increases rapidly from here as numbers get bigger. Unless you are spending all your time generating them I wouldn't worry about having a few As. Each A had 6 factors allowing for 64 Bs per A.

 tabidots 2019-05-16 09:49

[QUOTE]Are you using large primes?[/QUOTE]
For the factors of A, I am constraining them to between 400 and 4000, following the Python implementation I referenced.

[QUOTE]At 50 digits I use about 11 As.[/QUOTE]
My SIQS is picking up so few smooths per A that even for this 40-digit number (4493699349619351139582065363011367885289), 133 As are required to come up with the ~500 smooths required. All told it takes 75 seconds to factor (compared to my MPQS, 2.5s). I could further try to optimize how A is generated, but as your example illustrates, clearly 133 As should not be required for a 40-digit number!

 tabidots 2019-05-16 10:01

[QUOTE]Each A had 6 factors allowing for 64 Bs per A.[/QUOTE]

Shouldn't that be 32 Bs per A given 6 factors? (2^(6-1) = 32)

For that 40-digit number I mentioned above, my As have 5 factors, generating 16 bs per A. (2^(5-1) = 16)

Contini's paper says:

[QUOTE]To generate a polynomial, we need to find `b` satisfying `b^2 = N mod a`. There are actually 2^s distinct `b (mod a)` that satisfy this, but only half of the b's will be used since `g_{a,b}(x)` gives the same residues as `g_{a,-b}(x)`. Hence we can get 2^{s-1} polynomials.[/QUOTE]

 henryzz 2019-05-16 11:10

I was wondering the same thing about the number of Bs per A. I will check that when I have more time. It would be nice if this turns into a factor of 2 improvement. My implementation never goes below 6 factors per A. The factors get smaller but it is worth it to need less As. Eventually MPQS or even QS get faster as the numbers get smaller due to this sort of issue.

The amount of As required may be fairly implementation dependent. For example my implementation is unusual in that it stores the sum of the logs of the found factors in enough precision that it is possible to work out the remaining factor(as long as it is <~53 bits) without having to refactor. This saves some time but uses more memory which means less cache efficiency.

The large primes I referenced were remaining prime factors of x^2-n larger than the factor base bound. These cost very little to collect and improve yield. A third of my relations are made by combining these.
For 50 digits, my factor base bound is 60k but large primes will be accepted upto 2^23. For each polynomial I sieve a range of 600k. My SIQS is 2.3s for 50 digits.

I think your factor base bounds is smaller than mine. I need around 1500 relations for 40 digits to your ~500. Again this is implementation dependent.

 tabidots 2019-05-16 11:34

[QUOTE]The large primes I referenced were remaining prime factors of x^2-n larger than the factor base bound.[/QUOTE]

Ah, right. Yes, I'm doing that, but have not set a limit for accepting them. Any value that is not 1 after trial division is saved, and future remainders are checked against it, and merged if they match.

[QUOTE]My implementation never goes below 6 factors per A. The factors get smaller but it is worth it to need less As.[/QUOTE]

A greater number of A-factors decreases the number of As necessary?

My implementation filters the factor base for primes between 400 and 4000 (call them "candidates"), and accumulates random candidates until their product is greater than the "tipping point" (sqrt((max-candidate - min-candidate) / 2)). It generates these until there are 30 that fall within plus-or-minus 10% of the ideal A, and chooses the one with the least error. In practice, the error of the A is 1% or less. It's not the fastest method, but the results are good (insofar as the obtained A is close to the ideal A) and the code is very clean—this sort of thing is quite legible and idiomatic in Clojure.

I tried an alternate approach (as described in some other paper) to generate lexicographic combinations for the first `s-2` A-factors and find the best-fitting product log for the last pair of A-factors. But it was much slower and very complicated.

[QUOTE]For 50 digits, my factor base bound is 60k... For each polynomial I sieve a range of 600k.[/QUOTE]

Wow, those numbers are very different from mine. (To be fair, the Python MPQS I adapted used parameters that are quite different to what I read in the papers, but the weird parameters worked initially and the ones in the papers didn't).

I just generated a random 50-digit semiprime (from two equal-sized factors) (55720182748350551450373114705225729163236062899069). My MPQS generates these parameters:

B = 5*(log_10(N))^2 ~= 12373.32, which ends up yielding a factor base of...
#F = 725 primes, before including -1, so looking for 727 relations
M = 60 * #F = 43500 (so sieving over a range of 87000 in total b/c positive & negative)

75s with MPQS.

For SIQS I am using the #F parameter from msieve, and using that rather than B to constrain the factor base. So, for a 50-digit number, that's 1200 primes (excluding -1). Many As are not generating smooths—sometimes even 10 in a row go by without any new smooths.

Edited to add: I just switched out my parameters for yours in my MPQS and it did not make a huge difference—71s vs 75s—so that is indeed implementation-dependent. Interesting!

 jasonp 2019-05-16 12:08

tabidots is correct, k factors in A will generate 2^(k-1) polynomials to sieve. Each factor has 2 roots (a positive and negative root) to contribute to each B, but if you generated all 2^k combinations the B values from all the negative roots are just the negative of the B values from the positive roots, so you would find the same relations twice. So in practice you can pick the first root, force it to be positive, then enumerate all combinations of the other k-1 roots.

Contini showed that if the time for switching A values was neglected and the size of polynomial coefficients and sieving interval was chosen optimally, you would expect to accumulate relations twice as fast as with MPQS, so that's the maximum speedup I would expect. The lower overhead allows the sieving interval to be much smaller for SIQS, which is what improves the probability of finding relations (the size of numbers to factor increases with larger sieving interval).

Debugging the sieving when only a few relations are found is extremely difficult. I would start with maintaining the full value of each A, B and C, verifying that repeated B do not appear, then finding the values mod each prime in the factor base and verifying that your starting sieve roots (after all setup is complete) cause the polynomial to be zero. Note also that for MPQS all B are positive but for SIQS it is possible to have negative B.

This process is what led me to believe that Contini's thesis has a typo in the arithmetic for computing the next B given the previous one; instead of adding the next root you have to subtract it.

 tabidots 2019-05-16 12:13

[QUOTE]Debugging the sieving when only a few relations are found is extremely difficult. I would start with maintaining the full value of each A, B and C, verifying that repeated B do not appear, then finding the values mod each prime in the factor base and verifying that your starting sieve roots (after all setup is complete) cause the polynomial to be zero. Note also that for MPQS all B are positive but for SIQS it is possible to have negative B. [/QUOTE]

Perfect, thanks! I will give this a shot later. I had no idea where to start with debugging this.

 henryzz 2019-05-16 18:58

[QUOTE=jasonp;516904]tabidots is correct, k factors in A will generate 2^(k-1) polynomials to sieve. Each factor has 2 roots (a positive and negative root) to contribute to each B, but if you generated all 2^k combinations the B values from all the negative roots are just the negative of the B values from the positive roots, so you would find the same relations twice. So in practice you can pick the first root, force it to be positive, then enumerate all combinations of the other k-1 roots.

Contini showed that if the time for switching A values was neglected and the size of polynomial coefficients and sieving interval was chosen optimally, you would expect to accumulate relations twice as fast as with MPQS, so that's the maximum speedup I would expect. The lower overhead allows the sieving interval to be much smaller for SIQS, which is what improves the probability of finding relations (the size of numbers to factor increases with larger sieving interval).

Debugging the sieving when only a few relations are found is extremely difficult. I would start with maintaining the full value of each A, B and C, verifying that repeated B do not appear, then finding the values mod each prime in the factor base and verifying that your starting sieve roots (after all setup is complete) cause the polynomial to be zero. Note also that for MPQS all B are positive but for SIQS it is possible to have negative B.

This process is what led me to believe that Contini's thesis has a typo in the arithmetic for computing the next B given the previous one; instead of adding the next root you have to subtract it.[/QUOTE]

This leads me to wonder why I am not drowning in duplicate relations. I do get some but very few. I was assuming this was me finding x^2-N and (-x)^2-N
I wonder if this is something to do with me not changing the centre of my sieving range when I change B.
I do end up having both B and -B in my list of Bs. Is it these you are suggesting should have the same relations?
I end up sieving twice as many As if I don't sieve the second half of the Bs for each A. I do completely loose all duplicates though. Something strange is going on.

 henryzz 2019-05-16 19:33

[QUOTE=henryzz;516937]This leads me to wonder why I am not drowning in duplicate relations. I do get some but very few. I was assuming this was me finding x^2-N and (-x)^2-N
I wonder if this is something to do with me not changing the centre of my sieving range when I change B.
I do end up having both B and -B in my list of Bs. Is it these you are suggesting should have the same relations?
I end up sieving twice as many As if I don't sieve the second half of the Bs for each A. I do completely loose all duplicates though. Something strange is going on.[/QUOTE]

I have worked out how my sieving is overlapping only slightly for -B and B. My sieve bound has ended up being selected such that at the bottom of the sieve range x is a very small negative number in (Ax+b)^2-N. Those negative x end up overlapping with the positive x for -B. It looks like I need to sieve only the Bs with the same sign as the one I use to set the centre of my sieve range.Unfortunately this ends up being a random selection of my list of Bs so switching from one B to the next has on average just become twice as expensive as I can't really change the order to fix this. I suppose I should look at recentring the sieve range each time and then I can just use the first 2^(n-1) Bs. As a positive of this it looks like I might be able to raise my sieve range without half the extra being duplicated now.

 bsquared 2019-05-16 20:06

[QUOTE=henryzz;516891] The aim with SIQS is to make switching polynomial very fast such that many more polynomials can be used than MPQS. However, I believe that my implementation still spends 10% of its time switching A.
[/QUOTE]

Is that time spent just building new A polys or does it also include setup time for the new poly (e.g., computing A^-1 mod p for all factor base p)? 10% seems high either way but if that's just for building new A polys then it seems *really* high. In contrast I usually spend < 3% total for building new A's and all of the setup for new A's.

[QUOTE=henryzz;516891]
At 50 digits I use about 11 As. This increases rapidly from here as numbers get bigger. Unless you are spending all your time generating them I wouldn't worry about having a few As. Each A had 6 factors allowing for 64 Bs per A . [/QUOTE]

Seems reasonable given your other parameters. I typically see anywhere from 50 to 150 A's at 50 digits, using 6 factors per A, but I use a far smaller factor base and sieve interval. It is all a tradeoff.

[QUOTE=tabidots;516881]

So my basic question is, what kind of gotchas should I look for with the polynomials to make sure they can capture as many smooth relations as possible? I have already verified that b^2 or (a - b)^2 is congruent to N mod a for each b.

[/QUOTE]

I would ask, how are you determining smoothness exactly? In other words, the sieve value above (or below) which you try to find a relation. If your polynomials and roots are good then maybe it is as simple as tweaking your smoothness bound so as to generate more potential relation reports.

The value I use is obtained from a bunch of empirically determined magic numbers, loosely related to Contini's suggestion along the lines of:
"compute the number of bits in M/2*sqrt(N/2), then subtract some slack". M is the size of your sieve interval.

All times are UTC. The time now is 09:30.