20100130, 18:15  #1 
Tribal Bullet
Oct 2004
3537_{10} Posts 
Block Lanczos with a reordering pass
If anyone feels adventurous, it would be nice to test out the msieve distribution in branches/lanczosopt on a fairly large problem. This code adds an extra pass to the matrixbuilding phase that permutes the rows and columns of the matrix so that sparse regions are packed together for increased efficiency when performing matrix multiplies.
The method for doing so involves:  reading the matrix from disk and ignoring dense rows and columns  running a graph partitioning algorithm recursively on the sparse core that is left over. Currently the algorithm used is a crappy version of KernighanLin with improvements by FidducciaMatheyses; about the only good thing I can say about it is that it should be able to scale up to large size problems with memory use somewhat less than is needed for NFS filtering.  the graph partitioning algorithm divides the matrix into two halves that are independent of each other plus a border region that hopefully is narrow, then permutes the rows and columns so that the border region comes first and then the independent regions come last, then recurses on each region  the next time the matrix is read in, the final permutation is applied and the permuted matrix is written to disk. If running with multiple threads, the number of matrix columns given to each thread is now nonuniform, because the density of matrix entries decreases as you move from the top left to bottom right of the matrix. I haven't had time to test it out on a large problem, but I think the LA should at least work okay, if not faster. The quality of the partitioning has a crucial effect on performance, and currently the quality is not very good (plus partitioning an NFS matrix seems to be a very large and difficult problem). A similar bunch of partitioning code has been in the works for a while now, intended to be applied to the filtering stage, but I think right now that we need some tweaks to make the LA go faster and this is a good candidate to get extra performance for free. It goes without saying that the modified code is furiously preliminary, and hopefully will help out for the largest size matrices (2M columns and up). Please let me know if it helps! Last fiddled with by jasonp on 20100130 at 18:26 
20100130, 19:27  #2 
Oct 2004
Austria
2×17×73 Posts 
What is "fairly large"? c13x? c150? c165?
I am currently sieving a c133 (should finish in a few days from now) and could give it a try on this one, but I can't find out where to download the file? (BTW: is there any place where I can download a version (post 1.43, but not a GPU version as my graphics card is too old) which includes the fix to the issue which causes FactMsieve.pl to crash when I have just not enough relations and msieve therefore can't build a matrix? I would need a windows binary of this too  I don't like it when I run aliquot sequences and aliqueit gets stuck in "GNFS failed, running ECM forever" due to this issue when I'm not around to catch it...) 
20100130, 23:23  #3 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
9,391 Posts 
We'll have a 2,913+/ coming up soon. Will try.

20100131, 04:03  #4 
"Frank <^>"
Dec 2004
CDP Janesville
84A_{16} Posts 
I also have a c133 sieving right now. I've also got two systems I can test on: 32 & 64 bit Vista, both on dual cores....

20100131, 09:24  #5 
Oct 2004
Austria
2·17·73 Posts 
Where can I download the files? The link seems to be very well hidden (or I am too blind to see...)

20100131, 14:28  #7  
Oct 2004
Austria
100110110010_{2} Posts 
Quote:


20100131, 19:05  #8 
Sep 2009
977 Posts 
A few hours ago, I started the LA for XYYXF C175_113_65, a 5043428 x 5043605 matrix, the initial ETA was around 97h (on a Core 2 Duo @ 2.33 GHz).
When it's over, in several days, I'll try this experimental msieve version :) 
20100131, 22:22  #9 
(loop (#_fork))
Feb 2006
Cambridge, England
13×491 Posts 
I'm running the 2845972 x 2846197 matrix for aliquot 4800 step 1449 (C146) as I type; it took 58878 seconds (4 threads, Core2 Q6600) with 1.43 last time.
I'm not seeing any messages at all from the partitioning pass  do I need some commandline option to invoke it? ETA at 1%complete is 16h43m which suggests it might be a little (say 2000 seconds) slower than the last run. Last fiddled with by fivemack on 20100131 at 23:20 
20100131, 23:23  #10 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
9,391 Posts 
I've built the experimental binary and the trunk binary, SVN 182. (k8, 64bit linux)
Ran both with t 4 for the toy tests below with a gnfs135, matrix size ~1.8M. 1. To start with, I've tested it incorrectly, I think: I took an existing matrix from a finished gnfs135 and a .chk file and restarted ncr; then, the experimental binary was 2x slower than the trunk sibling. 2. I restarted from .dat, nc1, nc2. It reported "permuting matrix for faster multiplies" and then proceeded but also about 2x slower. 3. The experimental binary says "using block size 5000 for processor cache size 1024 kB", while the trunk binary says "using block size 43690..."  is that as planned? Maybe, that alone would explain the slowdown? 
20100201, 02:12  #11 
Tribal Bullet
Oct 2004
3^{3}×131 Posts 
Sorry about the 5000 stuff, I added it so that I could simulate what the matrix would look like for a much bigger problem, without having to build it. Unfortunately I left that hack in the source when committing the changes; its been reverted now.
Greg: did you pull out the SVN tree in branches/lanczosopt and not trunk? Last fiddled with by jasonp on 20100201 at 02:13 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Msieve with MPI block Lanczos  jasonp  Msieve  104  20130617 11:28 
block wiedemann and block lanczos  ravlyuchenko  Msieve  5  20110509 13:16 
Why is lanczos hard to distribute?  Christenson  Factoring  39  20110408 09:44 
Lanczos error  Andi47  Msieve  7  20090111 19:33 
Msieve Lanczos scalability  Jeff Gilchrist  Msieve  1  20090102 09:32 