I was wondering the same thing about the number of Bs per A. I will check that when I have more time. It would be nice if this turns into a factor of 2 improvement. My implementation never goes below 6 factors per A. The factors get smaller but it is worth it to need less As. Eventually MPQS or even QS get faster as the numbers get smaller due to this sort of issue.
The amount of As required may be fairly implementation dependent. For example my implementation is unusual in that it stores the sum of the logs of the found factors in enough precision that it is possible to work out the remaining factor(as long as it is <~53 bits) without having to refactor. This saves some time but uses more memory which means less cache efficiency.
The large primes I referenced were remaining prime factors of x^2n larger than the factor base bound. These cost very little to collect and improve yield. A third of my relations are made by combining these.
For 50 digits, my factor base bound is 60k but large primes will be accepted upto 2^23. For each polynomial I sieve a range of 600k. My SIQS is 2.3s for 50 digits.
I think your factor base bounds is smaller than mine. I need around 1500 relations for 40 digits to your ~500. Again this is implementation dependent.
