20080808, 21:09  #1 
Aug 2008
19_{10} Posts 
Yet another Block Wiedemann thread
I'm trying to implement block wiedemann to see if I can reduce the per system memory requirements for the linear algebra step of nfs. I'm currently using block lanczos code with the matrix divided into submatrices and the resultant of each matvector or transmatvector per submatrix is summed, although as my matrix size increases obviously the communication bandwidth increases. I've read Coppersmith's paper, Villard's, and the FFT minpoly paper and am left with the following questions:
Am I to assume this algorithm requires matrix B to be symmetric such that i have to do tmp = B*x, then y = trans(B) * tmp? I can get results with symmetric matrices, but not with assymetric. Although my code is based on the wlss2 code, which contains many bugs. Does anyone have a reference implementation of the FFT minpoly? PS  Its been a decade since I took a linear algeba course 
20080809, 22:10  #2 
(loop (#_fork))
Feb 2006
Cambridge, England
2^{4}·3·7·19 Posts 
I'm interested that you get results at all  I'm stuck at getting a kernel vector from the minpoly, on small test cases (small enough to do the linear algebra densely) with nonsymmetric square matrices. The minpoly I get is correct  at least it divides MinimalPolynomial(M) as given by magma  but I don't see how you get a kernel from it.
I've not seen a working FFT minpoly, it's one of the things I'm hoping to see at Nancy in October; isn't the FFT used basically to multiply the polynomials, so you could use NTL's poly arithmetic, which is pretty decent (Zimmermann did better but I don't know if his changes have been merged), initially? 
20080811, 01:44  #3 
Aug 2008
19 Posts 
Asymmetric to symmetric trick
So I don't know if this falls under the realm of obvious, but I've not seen it presented in any implementations or papers. What I've been doing to get symmetric matrices is to pad the row count with extra QCB entries (beyond the 128 I'm using). This removes the necessity of computing y = B' * B * x and you are left with y = A * x (A = sym, B = asym)
I guess the additional QCB entries only pose a problem if you heavily oversieved (ie: ncols  nrows > 128 or such) Patrick 
20080813, 01:02  #4 
Aug 2008
10011_{2} Posts 
Sorry for the confusion, I meant nonsquare to square not asym to sym.

20080813, 15:26  #5  
Tribal Bullet
Oct 2004
2×29×61 Posts 
Quote:
Modifying my code to find out would be fairly straightforward, but could take a while. 

20080813, 16:53  #6  
Aug 2004
130_{10} Posts 
Quote:
But you never have to calculate (or store) the symmetric matrix in the BL implementation, so does it matter? Chris 

20080813, 17:27  #7 
Tribal Bullet
Oct 2004
2×29×61 Posts 
As Patrick mentioned, if the matrix only has to be square, then you can add quadratic character rows (most of which will be redundant), so that you end up with a matrix that is square but still singular. Then the core of the Lanczos iteration only needs one matrix multiply instead of now, when you need two matrix multiplies to construct a symmetrized matrix. That could make block Lanczos 50% faster.

20080813, 17:54  #8  
Aug 2004
2·5·13 Posts 
Quote:
Chris 

20080814, 05:09  #9  
Aug 2008
19 Posts 
Quote:
What I've been dabbling with is a way to work Lanczos with a square matrix (not a symmetric matrix) with only matrix vector products. However I've yet to find a tridiagonal without calculating some form of generator polynomial (ala block wiedemann). A buddy of mine and I have been working on alternate schemes. Here's some good docs on the background of Lanczos algorithms: http://elib.zib.de/netlib/lanczos/vol2/ None of these are related to GF(2), but the concepts can be carried forward when reading Montgomery's paper. So back to my mischief. I've put the Lanczos work on halt, because of the previously mentioned problem of needing a generator polynomial. Which is exactly what the Block Wiedemann algorithm computes in Step 2. Step 1 and Step 3 of Wiedemann require repeated calculation of matrix vector products and scalar products. Note: no transpose matrix vector products, which means you can optimize the hell out of your storage structures and get higher performance. My implementation, which has minpoly code is based on Kaltofen's paper and probably less than optimal, seems to require similar number of matrix (regardless of nontranspose or transpose) to that of lanczos. I'm trying to better understand the heuristic nature of these algorithms with regards to overall number of operations (as I discussed with jasonp via email). I've been putting together a possible submission to the CADO workshop on optimizing the expensive parts of block wiedemann and block lanczos and debunking a few pseudo myths about the distributed nature of Wiedemann (splitting its blocking factor). I've targetted most of my numbers around the matrices I've constructed from RSA100, RSA110, RSA130, and now a fake SNFS1024, using parameters from the Aoki, Franke, Kleinjung, Lenstra, and Osvik report. If the talk is accepted or no, I'll forward some finalized code to whomever wants a copy once I get an implementation of the subquadratic FFT minpoly (used in Stage 2 of Wiedemann). Patrick 

20080814, 05:34  #10 
Aug 2008
10011_{2} Posts 
Oh, forgot to post a couple of questions about the SNFS 1024 if anyone knows as I've emailed Kleinjung and Aoki but have yet to hear back.
The matrix used was 66,718,154 x 66,718,154 with a weight of ~9.5b. However later it mentions quadratic character tests were performed after block wiedemann? Am I to read this as no quadratic character base was present in the matrix? Further if one was present, how many entries? The fake matrix I've generated takes into account a random yield between 50 and 100 per q for the lattice ranges mentioned, a 1/p distribution for the factor base, and 128 qcb entries. 
20080814, 18:40  #11  
Jul 2003
So Cal
100000110001_{2} Posts 
Quote:
Do you think Block Wiedemann would scale well with 32 threads? Even if overall throughput wouldn't be better, the storage space advantages of running only one matrix at a time would make it preferable. We have been hoping that a good implementation would come along, but in light of your comment I'm wondering if it will be the answer to our problem. Thanks, Greg 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Block Wiedemann for beginners  paul0  Factoring  7  20151116 17:09 
block wiedemann and block lanczos  ravlyuchenko  Msieve  5  20110509 13:16 
Very basic question about Wiedemann methods  fivemack  Math  0  20080616 10:57 
P1: Block of 73: 19.619.65M  dave_0273  Completed Missions  2  20050516 17:55 
Deutscher Thread (german thread)  TauCeti  NFSNET Discussion  0  20031211 22:12 