mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2007-11-06, 15:13   #1
oslik
 
Nov 2007

22 Posts
Default GF(2) Matrix request

Hi everyone. I'm now in a process of writing and testing the linear algebra
step for GNFS (or QS), whoose goal is to solve linear system on GF(2).
My realization is a realization of Block-Coppersmith algorithm and is intended
for use on parallel computers (using MPI-interface). Block-Coppersmith can be parallelized much better, than Block-Lanscoz, which is now used in
GGNFS and msieve programs, so that may be a good option for handling
really large numbers.

As I understood from this forum, many of you are doing hard job in factoring
large integers using GGNFS or msieve programs, sometimes sieving takes a very long time. My question is, that if someone can give me some test matrices for linear algebra step, I will appreciate it very much. I hope, that after some testing I will be able to present a version of Block-Coppersmith for clusters and maybe even for distributed computations (I have some ideas and progs for that).

So, I ask for matrices, of any size :-)

Thank you in advance,
Ivan.
oslik is offline   Reply With Quote
Old 2007-11-06, 16:35   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3×1,181 Posts
Default

Quote:
Originally Posted by oslik View Post
My question is, that if someone can give me some test matrices for linear algebra step, I will appreciate it very much. I hope, that after some testing I will be able to present a version of Block-Coppersmith for clusters and maybe even for distributed computations (I have some ideas and progs for that).

So, I ask for matrices, of any size :-)
I hope you have lots of disk space :)

The largest matrix I have on disk has size about 4M, which is only moderately large considering the size of current NFS efforts. With enough compression it should fit on a single CD. Send me email to coordinate delivery if you're interested; others here can offer far larger matrices
jasonp is offline   Reply With Quote
Old 2007-11-06, 16:54   #3
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

3×19×113 Posts
Default

Quote:
Originally Posted by oslik View Post
So, I ask for matrices, of any size :-)

Thank you in advance,
Ivan.
In what format do you want these matrices? Tens of millions of lines of 'X Y\n'?

And what sort of size are you really after? The standard processing programs tend not to write out matrices in full, though obviously they could be persuaded to; a large factorisation I have lying around is

Code:
Constructed a 2916896 x 2917143 matrix of weight 202031153 in 82 minutes
Solved in 23.5 hours (two CPUs), using 1.5G memory
but writing out that matrix would take most of a CD in compressed form. I have significantly smaller ones available, too, from 120 rather than 206-digit SNFS.
fivemack is offline   Reply With Quote
Old 2007-11-06, 18:57   #4
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD716 Posts
Default

Quote:
Originally Posted by fivemack View Post
In what format do you want these matrices? Tens of millions of lines of 'X Y\n'?

And what sort of size are you really after? The standard processing programs tend not to write out matrices in full, though obviously they could be persuaded to; a large factorisation I have lying around is
Since the latest msieve version does checkpoint and restart, the whole matrix is in fact written to disk, in a completely undocumented binary format.

Actually, maybe it would be a valuable exercise to document all the on-disk formats of the intermediate files. That would let other programs interoperate, and it's a big win to use the same relation format as GGNFS :)
jasonp is offline   Reply With Quote
Old 2007-11-06, 20:27   #5
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

22×1,481 Posts
Default

would it be possible for someone to write a parellized siever as well so all the slowest bits can be shared
henryzz is online now   Reply With Quote
Old 2007-11-06, 21:18   #6
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

Sieving already is embarrasingly parallel. Just assign different ranges of b-values or special-q values to different machines.

Alex
akruppa is offline   Reply With Quote
Old 2007-11-06, 21:41   #7
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2·7·157 Posts
Default

I can send a 6.4M matrix your way once I know the format you want, and how to put an msieve matrix in that format. Perhaps msieve 1.29's binary format would be easiest.

Greg

Last fiddled with by frmky on 2007-11-06 at 21:42
frmky is offline   Reply With Quote
Old 2007-11-07, 08:17   #8
oslik
 
Nov 2007

22 Posts
Default

Thank you for you replies. I've PM-ed everyone with details :-)
The format I'm looking for can be any format I can read from file and convert in to GGNFS matrix format (I looks very convenient for handling sparse matrices). I'll report the results as soon as they are available and hope to present them and even the code :-)
oslik is offline   Reply With Quote
Old 2007-11-07, 14:51   #9
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

Greg, the easiest way to produce the matrix file is to rerun the filtering using msieve 1.29, then start the linear algebra. This will produce msieve.dat.mat; for linear algebra experiments that will not use the dependencies that are generated, you only need this file. Apologies that this will take hours though :(

In case anyone is interested, here's the binary format (it's really easy):

The matrix file is an array of 32-bit words, with byte ordering determined by the machine that produced the matrix (it will be little-endian for just about everybody). The nonzero matrix entries are stored by column, because they are easy to generate that way. The first word in the file is the number of rows, the second is the number of dense rows ('D'), and the third is the number of matrix columns. Each column is then dumped out. For each column, there is a 32-bit word indicating the number of sparse entries in the column, then a list of that many nonzero column positions (each a 32-bit word, with words in arbitrary order), then (D+31)/32 words for the dense rows in that column. The latter should be treated as an array of D bits in little-endian word order.

Rows are numbered from 0 to (number of rows - 1). Rows 0 to D-1 are considered dense, with a set bit in the bitfield indicating a nonzero entry for that column. The sparse entries in each column are numbered from D onward. Functions dump_matrix and read_matrix, in gnfs/gf2.c, give sample code for writing to / reading from the matrix file.

Does the above make sense? Does anyone think it would be a good idea to compress the above somehow? For example, for matrices with < 16M rows and columns (this is enough for 200-digit GNFS) all the 32-bit quantities can be stored as 24-byte integers, saving 1/4 of the disk space.
jasonp is offline   Reply With Quote
Old 2007-11-07, 15:27   #10
oslik
 
Nov 2007

22 Posts
Default

Yes it maybe worth for large matrices.Another idea may be to "zip" a matrix somehow, because if you store a sparse matrices (from GGNFS factorization) it is smaller in zip format, so there is a room for improvement :-). Maybe you can store not indices
but their differences in each column (not the index, but the distance to the next non-zero entry), so fewer bits for the matrix storage.

Last fiddled with by oslik on 2007-11-07 at 16:09
oslik is offline   Reply With Quote
Old 2007-11-07, 19:21   #11
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2·7·157 Posts
Default

Quote:
Originally Posted by jasonp View Post
Apologies that this will take hours though :(
A few hours of time is the least of my worries, especially if it will lead to an efficient, parallel, public LA implementation.
Quote:
Originally Posted by jasonp View Post
Does anyone think it would be a good idea to compress the above somehow?
I personally don't think it really matters. If there are lots of 0's, then that just means it will compress better when shipped from one place to another. Hard drives are large now, so the space during a run doesn't matter. It will take a bit longer to access, but as long as it is rarely accessed on disk during a run, it doesn't seem to make much difference.

Greg
frmky is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Connectivity Matrix Xyzzy Lounge 13 2017-02-21 18:29
Coprime matrix little puzzle Damian Puzzles 18 2012-12-08 19:01
12+256 matrix job fivemack Factoring 11 2009-08-18 17:39
[Need help] about Matrix Polynomial buan Homework Help 3 2007-07-17 15:07
CWI Matrix Solver R.D. Silverman NFSNET Discussion 6 2006-03-19 17:56

All times are UTC. The time now is 14:34.


Wed Oct 27 14:34:40 UTC 2021 up 96 days, 9:03, 0 users, load averages: 1.43, 1.23, 1.19

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.