20190726, 19:19  #12  
"Ben"
Feb 2007
2×5×7×47 Posts 
Quote:
The resulting matrix for the C135 wasn't very large, but it was pretty dense. Some of that can be improved by rejecting cycles that are huge, which I didn't do. Even so, singlethread Lanczos only took an hour or so to run on the dense matrix. 

20190726, 19:25  #13 
Tribal Bullet
Oct 2004
3,529 Posts 
Ok, I was asking because if you want to tune that part of the process you can convert relations into the intermediate format that Msieve's NFS filtering uses and then run all the filtering as if it was an NFS job. I can almost guarantee the resulting matrix would wind up much better because you can leverage the algorithm research that goes into NFS filtering.
Of course that takes a backseat to tuning the sieving. 
20190726, 19:30  #14  
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
2×3^{2}×569 Posts 
Quote:
Back in the day there were still many composites of interest which were still best done by QS, not least because GNFS implementations were not easily available and, even if you could get hold of one, it was bleeding edge code which need skill and patience to nurse its way through a factorization. Sic transit gloria mundi. 

20190726, 19:36  #15  
"Ben"
Feb 2007
2·5·7·47 Posts 
Quote:
An NFS run on the same number and same computer took about 8 hours for poly find and sieving. The matrix was larger, of course, but used Jason's multithreaded and better optimized Lanczos. The matrix+sqrt took about another hour (on a different machine on which those tasks are better suited). In round numbers, QS ran about 75hrs / 9 hrs ~ 8 times slower than NFS. The NFS in question is yafu's automated msieve+ggnfs version. 

20190726, 19:39  #16  
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
2×3^{2}×569 Posts 
Quote:
The hardware architectures are radically different. Back in the day memory could almost keep up with the cpu. That hasn't been true for a long time now. Our computation ran for a large part on MIPS and Sparc processors. That which did use x86 ran on 486 or early (inevitably 32bit) Pentium cpus. You also appear to be comparing elapsed times. A fairer comparison would be to use cpucore time. Last fiddled with by xilman on 20190726 at 19:41 

20190726, 19:42  #17  
"Ben"
Feb 2007
2×5×7×47 Posts 
Quote:


20190726, 19:52  #18 
Tribal Bullet
Oct 2004
3,529 Posts 
I've been remiss in adding my congrats, getting endtoend QS with three large primes is a major major undertaking.

20190726, 21:00  #19 
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
2×3^{2}×569 Posts 
I'll take that as a compliment to Arjen and myself
Last fiddled with by xilman on 20190726 at 21:01 
20190727, 15:57  #20  
Sep 2009
764_{16} Posts 
Quote:
Chris 

20190727, 17:45  #21  
"Tilman Neumann"
Jan 2016
Germany
2·11·19 Posts 
Quote:
Quote:
Otherwise, I might be able to deduce the parameters of Ben's hardware from other threads, but https://infoscience.epfl.ch/record/1...es/NPDF28.pdf does not seem to contain the necessary information. Anyway, whoever wants more accurate numbers is free to compute them :) 

20190729, 19:10  #22  
"Ben"
Feb 2007
CDA_{16} Posts 
Quote:
The paper has timings for the main computational steps of the class group computations for imaginary quadratic fields with ddigit discriminants. They mention that the sieving step of this computation is the same as that used during SIQS integer factorization, but I don't know much more than that about class group computations. Anyway, the table reports a time of 2.23 core years spent in the sieving step for a C130 input. Their core is a 1.9GHz Opteron core. I used a 7210 KNL processor that has 256 virtual cores (64 physical cores, each with 4 hyperthreads) that run at 1.5 GHz. The C135 sieving took 74 wall clock hours * 256 cores = 18,944 corehours = 2.16 core years. Although I'm not sure how to handle the hyperthreads... if you only count physical cores (performance certainly does not scale linearly past 64 cores on this processor) then it spent 0.54 core years. Call it double that to 1.08 coreyears to account for the diminishing return of the hyperthreads. Scaling the ratio of coreyears by the ratio of clock frequencies and inversely by the ratio of effort I get: 2.23 / 1.08 * 1.9 / 1.5 * exp(sqrt(ln C135 ln ln C135)) / exp(sqrt(ln C130 ln ln C130)) which if I did that right means 4.88 times the effort for equivalent job sizes. This still isn't right because the sievers are probably memory bound, not cpu bound, and the two implementations use way different instruction sets (and AVX512 could maybe also accelerate their ideas). But it's enough to tell me that it might not be worth the effort to try to implement it. Even so, I do need to read more and try to understand it better. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Where do I send my PRP primes with large k?  Trilo  Riesel Prime Search  3  20130820 00:32 
48bit large primes!  jasonp  Msieve  24  20100601 19:14 
NFS with 5 and 6 large primes  jasonp  Factoring  4  20071204 18:32 
Why only three large primes  fivemack  Factoring  18  20070510 12:14 
What is the use of these large primes  Prime Monster  Lounge  34  20040610 18:12 