20071106, 15:13  #1 
Nov 2007
4_{10} Posts 
GF(2) Matrix request
Hi everyone. I'm now in a process of writing and testing the linear algebra
step for GNFS (or QS), whoose goal is to solve linear system on GF(2). My realization is a realization of BlockCoppersmith algorithm and is intended for use on parallel computers (using MPIinterface). BlockCoppersmith can be parallelized much better, than BlockLanscoz, which is now used in GGNFS and msieve programs, so that may be a good option for handling really large numbers. As I understood from this forum, many of you are doing hard job in factoring large integers using GGNFS or msieve programs, sometimes sieving takes a very long time. My question is, that if someone can give me some test matrices for linear algebra step, I will appreciate it very much. I hope, that after some testing I will be able to present a version of BlockCoppersmith for clusters and maybe even for distributed computations (I have some ideas and progs for that). So, I ask for matrices, of any size :) Thank you in advance, Ivan. 
20071106, 16:35  #2  
Tribal Bullet
Oct 2004
DD1_{16} Posts 
Quote:
The largest matrix I have on disk has size about 4M, which is only moderately large considering the size of current NFS efforts. With enough compression it should fit on a single CD. Send me email to coordinate delivery if you're interested; others here can offer far larger matrices 

20071106, 16:54  #3  
(loop (#_fork))
Feb 2006
Cambridge, England
14357_{8} Posts 
Quote:
And what sort of size are you really after? The standard processing programs tend not to write out matrices in full, though obviously they could be persuaded to; a large factorisation I have lying around is Code:
Constructed a 2916896 x 2917143 matrix of weight 202031153 in 82 minutes Solved in 23.5 hours (two CPUs), using 1.5G memory 

20071106, 18:57  #4  
Tribal Bullet
Oct 2004
DD1_{16} Posts 
Quote:
Actually, maybe it would be a valuable exercise to document all the ondisk formats of the intermediate files. That would let other programs interoperate, and it's a big win to use the same relation format as GGNFS :) 

20071106, 20:27  #5 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
5,869 Posts 
would it be possible for someone to write a parellized siever as well so all the slowest bits can be shared

20071106, 21:18  #6 
"Nancy"
Aug 2002
Alexandria
2,467 Posts 
Sieving already is embarrasingly parallel. Just assign different ranges of bvalues or specialq values to different machines.
Alex 
20071106, 21:41  #7 
Jul 2003
So Cal
2^{3}×3^{2}×29 Posts 
I can send a 6.4M matrix your way once I know the format you want, and how to put an msieve matrix in that format. Perhaps msieve 1.29's binary format would be easiest.
Greg Last fiddled with by frmky on 20071106 at 21:42 
20071107, 08:17  #8 
Nov 2007
2^{2} Posts 
Thank you for you replies. I've PMed everyone with details :)
The format I'm looking for can be any format I can read from file and convert in to GGNFS matrix format (I looks very convenient for handling sparse matrices). I'll report the results as soon as they are available and hope to present them and even the code :) 
20071107, 14:51  #9 
Tribal Bullet
Oct 2004
3^{3}×131 Posts 
Greg, the easiest way to produce the matrix file is to rerun the filtering using msieve 1.29, then start the linear algebra. This will produce msieve.dat.mat; for linear algebra experiments that will not use the dependencies that are generated, you only need this file. Apologies that this will take hours though :(
In case anyone is interested, here's the binary format (it's really easy): The matrix file is an array of 32bit words, with byte ordering determined by the machine that produced the matrix (it will be littleendian for just about everybody). The nonzero matrix entries are stored by column, because they are easy to generate that way. The first word in the file is the number of rows, the second is the number of dense rows ('D'), and the third is the number of matrix columns. Each column is then dumped out. For each column, there is a 32bit word indicating the number of sparse entries in the column, then a list of that many nonzero column positions (each a 32bit word, with words in arbitrary order), then (D+31)/32 words for the dense rows in that column. The latter should be treated as an array of D bits in littleendian word order. Rows are numbered from 0 to (number of rows  1). Rows 0 to D1 are considered dense, with a set bit in the bitfield indicating a nonzero entry for that column. The sparse entries in each column are numbered from D onward. Functions dump_matrix and read_matrix, in gnfs/gf2.c, give sample code for writing to / reading from the matrix file. Does the above make sense? Does anyone think it would be a good idea to compress the above somehow? For example, for matrices with < 16M rows and columns (this is enough for 200digit GNFS) all the 32bit quantities can be stored as 24byte integers, saving 1/4 of the disk space. 
20071107, 15:27  #10 
Nov 2007
100_{2} Posts 
Yes it maybe worth for large matrices.Another idea may be to "zip" a matrix somehow, because if you store a sparse matrices (from GGNFS factorization) it is smaller in zip format, so there is a room for improvement :). Maybe you can store not indices
but their differences in each column (not the index, but the distance to the next nonzero entry), so fewer bits for the matrix storage. Last fiddled with by oslik on 20071107 at 16:09 
20071107, 19:21  #11  
Jul 2003
So Cal
4050_{8} Posts 
A few hours of time is the least of my worries, especially if it will lead to an efficient, parallel, public LA implementation.
Quote:
Greg 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Connectivity Matrix  Xyzzy  Lounge  13  20170221 18:29 
Coprime matrix little puzzle  Damian  Puzzles  18  20121208 19:01 
12+256 matrix job  fivemack  Factoring  11  20090818 17:39 
[Need help] about Matrix Polynomial  buan  Homework Help  3  20070717 15:07 
CWI Matrix Solver  R.D. Silverman  NFSNET Discussion  6  20060319 17:56 