20190722, 15:09  #1 
"Ben"
Feb 2007
2^{2}·3^{2}·7·13 Posts 
three large primes
Thought about posting this in the Happy Me thread, but didn't want to pollute it with
any discussion that may ensue. I've recently completed implementing the triple large prime variation for yafu's siqs. I think it is really neat, and along the way sparked some good discussion about the best way to process larger residues. As a weekend stress test, and due to readily available parameterization info, I decided to refactor the C135 from Xilman's paper. The job actually finished and produced the factorization. Yay! Code:
==== sieve params ==== n = 135 digits, 446 bits factor base: 550016 primes (max prime = 17158543) single large prime cutoff: 1073741824 (2^30) double large prime range from 294415597882849 to 18014398509482000 triple large prime range from 5051742696143573745664 to 154742504910672259484483584 allocating 17 large prime slices of factor base buckets hold 2048 elements large prime hashtables have 8912896 bytes using AVX512 enabled 32k sieve core sieve interval: 32 blocks of size 32768 polynomial A has ~ 18 factors using multiplier of 1 using SPV correction of 21 bits, starting at offset 32 trial factoring cutoff at 127 bits ==== sieving in progress (256 threads): 550080 relations needed ==== last: 588203, now: 75365 full, 957982 slp, 2002324 dlp (2037katt, 15316kprp), 9687124 tlp (986318katt, 1416158kprp), (44 r/sec) sieving required 917086811 total polynomials (27829 'A' polynomials) trial division touched 1734948798 sieve locations out of 1923270439862272 brentrho: 159 failures, 2037449 attempts, 135291144 outside range, 15316534 prp, 2002324 useful total reports = 1734948798, total surviving reports = 153678474 total blocks sieved = 2858995264, avg surviving reports per block = 0.05 TLP filtering time = 2543.1387 seconds. QS elapsed time = 266753.9892 seconds. ==== post processing stage (msieve1.38) ==== read 12722795 relations begin singleton removal with 12722795 relations now at pass 0: 12722795 relations now at pass 1: 5763606 relations now at pass 2: 4099820 relations now at pass 3: 3545870 relations now at pass 4: 3330790 relations now at pass 5: 3240557 relations now at pass 6: 3201113 relations now at pass 7: 3183243 relations now at pass 8: 3175295 relations now at pass 9: 3171772 relations now at pass 10: 3170185 relations now at pass 11: 3169442 relations now at pass 12: 3169148 relations now at pass 13: 3169006 relations now at pass 14: 3168943 relations now at pass 15: 3168908 relations now at pass 16: 3168895 relations now at pass 17: 3168890 relations reduce to 3168890 relations in 18 passes attempting to read 3168890 relations recovered 3168803 relations recovered 3163236 polynomials commencing rbp/pbr table initialization found 75365 full relations pbr table size = 3168803 rbp table size = 2562969 commencing cycle formation pass 1 commencing cycle formation pass 2 commencing cycle formation pass 3 commencing cycle formation pass 4 commencing cycle formation pass 5 commencing cycle formation pass 6 commencing cycle formation pass 7 commencing cycle formation pass 8 commencing cycle formation pass 9 commencing cycle formation pass 10 commencing cycle formation pass 11 commencing cycle formation pass 12 expected 605834 cycles, found 605834 cycles found 530469 cycles from partial relations maximum cycle length = 776 82.5% of cycles from partials involve a tlp found 605834 cycles from 3168803 relations in 12 passes distribution of cycle lengths: length 1 : 75365 length 2 : 37018 length 3 : 24248 length 4 : 19387 length 5 : 16863 length 6 : 14880 length 7 : 13851 length 9+: 404222 largest cycle: 776 relations building matrix with 605834 columns matrix is 550016 x 605834 (1070.5 MB) with weight 275787690 (455.22/col) sparse part has weight 275787690 (455.22/col) filtering completed in 4 passes matrix is 548649 x 548713 (785.0 MB) with weight 201395563 (367.03/col) sparse part has weight 201395563 (367.03/col) saving the first 48 matrix rows for later matrix is 548601 x 548713 (725.5 MB) with weight 193262029 (352.21/col) sparse part has weight 184707217 (336.62/col) matrix includes 64 packed rows using block size 65536 for processor cache size 16896 kB commencing Lanczos iteration memory use: 519.5 MB lanczos halted after 8678 iterations (dim = 548596) recovered 12 nontrivial dependencies Lanczos elapsed time = 3164.9000 seconds. Sqrt elapsed time = 11.7200 seconds. SIQS elapsed time = 3235.4824 seconds. ***factors found*** P66 = 337779774700456816455577092228603627733197301999086530154776370553 P69 = 346129173115857975809709331088291920685569205287238835924196565083957 Of course, most of that improvment is advances in hardware. And also of course, this size number is best done by NFS. I have done many many smaller jobs as well, learning some things about the parameters that influence tlp jobs. I was hoping that it would be effective even at smaller (90100 digit) inputs. But unfortunately, so far it is not really any faster than the double large prime variation. The trouble seems to be that in order to get the critical mass of TLP's necessary to see an explosion of cycles, one has to greatly increase the number of sieve reports processed by trial division and the TLP cofactoring code. This tends to slow things down to the point where the DLP implementation can overtake it. I ran a C120 with both DLP and TLP and the DLP was faster. I will continue to run experiments, and I hope to have the code in a state where I can check it in soon. 
20190725, 18:56  #2 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT)
1011001000000_{2} Posts 
Where is the majority of time spent here? Is it the trial division or the rho?
I am a little surprised that you used rho given your recent posts about your super fast ecm routine. 
20190725, 20:39  #3  
"Tilman Neumann"
Jan 2016
Germany
2·7·29 Posts 
Quote:
Sorry, but taking Moore's law into account your result does not look too impressive. Original computation time (2001): 231 days ~= 5544 hours. New Computation time (2019): 75 hours Assuming that computing speed doubles every 2 years we have Original computation time/2^9 = 10.8 hours 

20190726, 05:15  #4  
"Ben"
Feb 2007
2^{2}·3^{2}·7·13 Posts 
Quote:
So far I have been disappointed by the results as it doesn't seem to be any faster than DLP. I'm hoping that I just haven't yet figured out the proper parameterizations. Smaller factor base bounds and lower trial division thresholds seem to help, to a point. I would also like to try jasonp's batch factoring/gcd method. 

20190726, 05:51  #5  
"Ben"
Feb 2007
2^{2}×3^{2}×7×13 Posts 
Quote:
More concerning to me is the fact that the TLP doesn't seem any better than DLP at the moment. Its very possible that I'm doing something wrong still, some bug or oversight that is limiting the TLP speed, or maybe it is parameters. Or maybe it is a matter of trying a large enough number. If I were to try something big enough to take the KNL 230 days to do, then maybe TLP would be the previously observed 1.7x faster than DLP. How big would that have to be? About 6 doublings of 75 hours, which at 3 digits per doubling (rough QS rule of thumb) is about a C153. That seems ridiculous to contemplate doing by QS. I think I will continue trying things at smaller sizes :) Last fiddled with by bsquared on 20190726 at 05:59 Reason: another thought... 

20190726, 15:41  #6  
"Tilman Neumann"
Jan 2016
Germany
2·7·29 Posts 
Quote:
Actually I am impressed too (despite my calculation above). Do you know Thorsten Kleinjungs SIQS variant? http://www.ams.org/journals/mcom/201...2015030580/ He seems to have made an algorithmic improvement to SIQS. There are some numbers for up to 130 digit factor arguments (measured in core years). 

20190726, 16:27  #7  
"Ben"
Feb 2007
2^{2}×3^{2}×7×13 Posts 
Quote:


20190726, 18:45  #8  
"Ben"
Feb 2007
2^{2}×3^{2}×7×13 Posts 
Quote:
Code:
size params polybucket sievemed sievelp tdivlp tdivresieve tdivmed ecm sum C95 default TLP 40.5 30.7 7.5 2.8 1 0.55 12.29 95.34 C95 default DLP 44.7 35.5 9.5 3.7 1.4 0.6 0.84 96.24 C100 default TLP 38.5 26.2 7.5 2.4 0.9 0.4 20.19 96.09 C100 default DLP 47.3 34 9.4 3.2 1.2 0.52 0.61 96.23 C105 default TLP 36.8 25.1 6.9 1.9 0.56 0.24 25.81 97.31 C105 default DLP 50.5 33.9 9 2.33 1 0.4 0.23 97.36 C110 default TLP 41 24.5 7.6 2 0.56 0.26 21.68 97.6 C110 default DLP 53.5 31.9 9.4 2 0.63 0.3 0.1 97.83 C110 TLP MFBT 2.9 TF 107 26.9 16.4 5.1 1.6 0.4 0.23 46.87 97.5 C135 TLP MFBT 3.0 LPB 32 B 496656 49.6 18.7 6.1 0.93 0.24 0.1 22.98 98.65 C135 TLP MFBT 2.9 LPB 30 B 550000 59.2 21.3 7.3 1.1 0.28 0.13 9.58 98.89 C135 TLP MFBT 2.9 LPB 30 B 550000 TF 120 58 20.8 7.2 1.6 0.4 0.18 10.57 98.75 C135 DLP LPB 30 B 550000 65.4 23.4 8 1.5 0.4 0.15 0 98.85 polybucket: root calculations for new Bpolys for each prime, and placement into buckets sievemed: traditional sieving of primes less than about about 1.5*32768 sievelp: emptying the appropriate bucket into the sieved block tdivlp: dividing out bucketed primes from sieve reports tdivresieve: dividing out primes between 2^13 and 1.5*2^15 from sieve reports tdivmed: dividing out primes between 2^10 and 2^13 from sieve reports ecm: processing TLP or DLP residues after tdiv* have finished processing sieve reports TLP: triple large prime variation DLP: double large prime variation LPB: large prime bound (bits) B: factor base bound (# of primes) MFBT: exponent of LPB. reports below (2^LPB)^MFBT are subjected to TLP processing TF: trial factoring bound The above routines are > 95% of the runtime. Up to C110, TLP transfers about 1025% of the runtime from polybucket and elsewhere into ECM. Or quite a bit more if TF is tweaked down. Overall this is a speed loss, as the TLPs are not as effective at combining into cycles as DLPs; the default parameters do not allow discovery of enough TLPs. Forcing the parameters into a state where TLP cycle formation is high only results in a more significant speed loss at these sizes... at least in experiments so far. The runtimes for complete factorizations are actually pretty close between DLP and TLP up to C110. I view this as a good thing in the long run since further optimization of polybucket is probably not going to happen, but there is still room to influence ECM (e.g., jasonp's batch factoring). EDIT: I am rerunning the LPB 32 case for the test C135 right now. Relation discovery rate is quite a bit higher, as expected. We'll see if this parameterization is better for TLP cycle growth and a net reduction in job time. Last fiddled with by bsquared on 20190726 at 19:51 Reason: lpb mfbt detail 

20190726, 18:50  #9 
Tribal Bullet
Oct 2004
3528_{10} Posts 
How do you handle the filtering for datasets that include TLP relations?
Last fiddled with by jasonp on 20190726 at 18:50 
20190726, 19:09  #10 
Bamboozled!
May 2003
Down not across
2·5,101 Posts 
I'm sure Ben is aware of this but others may not be: in our paper we stated that GNFS was a better option than TMPQS for numbers of this size. Our experiment was purely to investigate the cycle explosion behaviour and to characterize the phase transition when it happened.
Apropos another comment: SSW ran a modified version of his SIQS implementation for his contribution; the rest of us used Lenstra&Manasse's "factoringbyemail" MPQS as modified by me for TLP. Also as stated in the paper, we chose to use (at the time) an elderly implementation so that we could make a direct comparison with the RSA129 effort. I even fired up a DECstation which had been living in my loft so that identically the same hardware ran the sieving code for each project. It's not surprising that Ben sees somewhat different results because the underlying multiprecision arithmetic libraries are surely very different. I agree with another comment: a weekend is a remarkably short time for a single machine to factor a C135 by QS. Last fiddled with by xilman on 20190726 at 19:17 
20190726, 19:09  #11 
Bamboozled!
May 2003
Down not across
2·5,101 Posts 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Where do I send my PRP primes with large k?  Trilo  Riesel Prime Search  3  20130820 00:32 
48bit large primes!  jasonp  Msieve  24  20100601 19:14 
NFS with 5 and 6 large primes  jasonp  Factoring  4  20071204 18:32 
Why only three large primes  fivemack  Factoring  18  20070510 12:14 
What is the use of these large primes  Prime Monster  Lounge  34  20040610 18:12 