![]() |
![]() |
#1 |
Aug 2008
19 Posts |
![]()
I'm trying to implement block wiedemann to see if I can reduce the per system memory requirements for the linear algebra step of nfs. I'm currently using block lanczos code with the matrix divided into sub-matrices and the resultant of each mat-vector or transmat-vector per submatrix is summed, although as my matrix size increases obviously the communication bandwidth increases. I've read Coppersmith's paper, Villard's, and the FFT minpoly paper and am left with the following questions:
Am I to assume this algorithm requires matrix B to be symmetric such that i have to do tmp = B*x, then y = trans(B) * tmp? I can get results with symmetric matrices, but not with assymetric. Although my code is based on the wlss2 code, which contains many bugs. Does anyone have a reference implementation of the FFT minpoly? PS - Its been a decade since I took a linear algeba course |
![]() |
![]() |
![]() |
#2 |
(loop (#_fork))
Feb 2006
Cambridge, England
2×7×461 Posts |
![]()
I'm interested that you get results at all - I'm stuck at getting a kernel vector from the minpoly, on small test cases (small enough to do the linear algebra densely) with non-symmetric square matrices. The minpoly I get is correct - at least it divides MinimalPolynomial(M) as given by magma - but I don't see how you get a kernel from it.
I've not seen a working FFT minpoly, it's one of the things I'm hoping to see at Nancy in October; isn't the FFT used basically to multiply the polynomials, so you could use NTL's poly arithmetic, which is pretty decent (Zimmermann did better but I don't know if his changes have been merged), initially? |
![]() |
![]() |
![]() |
#3 |
Aug 2008
238 Posts |
![]()
So I don't know if this falls under the realm of obvious, but I've not seen it presented in any implementations or papers. What I've been doing to get symmetric matrices is to pad the row count with extra QCB entries (beyond the 128 I'm using). This removes the necessity of computing y = B' * B * x and you are left with y = A * x (A = sym, B = asym)
I guess the additional QCB entries only pose a problem if you heavily oversieved (ie: ncols - nrows > 128 or such) -Patrick |
![]() |
![]() |
![]() |
#4 |
Aug 2008
19 Posts |
![]()
Sorry for the confusion, I meant non-square to square not asym to sym.
|
![]() |
![]() |
![]() |
#5 | |
Tribal Bullet
Oct 2004
DED16 Posts |
![]() Quote:
Modifying my code to find out would be fairly straightforward, but could take a while. |
|
![]() |
![]() |
![]() |
#6 | |
Aug 2004
2058 Posts |
![]() Quote:
But you never have to calculate (or store) the symmetric matrix in the BL implementation, so does it matter? Chris |
|
![]() |
![]() |
![]() |
#7 |
Tribal Bullet
Oct 2004
356510 Posts |
![]()
As Patrick mentioned, if the matrix only has to be square, then you can add quadratic character rows (most of which will be redundant), so that you end up with a matrix that is square but still singular. Then the core of the Lanczos iteration only needs one matrix multiply instead of now, when you need two matrix multiplies to construct a symmetrized matrix. That could make block Lanczos 50% faster.
|
![]() |
![]() |
![]() |
#8 | |
Aug 2004
7·19 Posts |
![]() Quote:
Chris |
|
![]() |
![]() |
![]() |
#9 | |
Aug 2008
1316 Posts |
![]() Quote:
What I've been dabbling with is a way to work Lanczos with a square matrix (not a symmetric matrix) with only matrix vector products. However I've yet to find a tridiagonal without calculating some form of generator polynomial (ala block wiedemann). A buddy of mine and I have been working on alternate schemes. Here's some good docs on the background of Lanczos algorithms: http://elib.zib.de/netlib/lanczos/vol2/ None of these are related to GF(2), but the concepts can be carried forward when reading Montgomery's paper. So back to my mischief. I've put the Lanczos work on halt, because of the previously mentioned problem of needing a generator polynomial. Which is exactly what the Block Wiedemann algorithm computes in Step 2. Step 1 and Step 3 of Wiedemann require repeated calculation of matrix vector products and scalar products. Note: no transpose matrix vector products, which means you can optimize the hell out of your storage structures and get higher performance. My implementation, which has minpoly code is based on Kaltofen's paper and probably less than optimal, seems to require similar number of matrix (regardless of non-transpose or transpose) to that of lanczos. I'm trying to better understand the heuristic nature of these algorithms with regards to overall number of operations (as I discussed with jasonp via email). I've been putting together a possible submission to the CADO workshop on optimizing the expensive parts of block wiedemann and block lanczos and debunking a few pseudo myths about the distributed nature of Wiedemann (splitting its blocking factor). I've targetted most of my numbers around the matrices I've constructed from RSA-100, RSA-110, RSA-130, and now a fake SNFS-1024, using parameters from the Aoki, Franke, Kleinjung, Lenstra, and Osvik report. If the talk is accepted or no, I'll forward some finalized code to whomever wants a copy once I get an implementation of the subquadratic FFT minpoly (used in Stage 2 of Wiedemann). -Patrick |
|
![]() |
![]() |
![]() |
#10 |
Aug 2008
19 Posts |
![]()
Oh, forgot to post a couple of questions about the SNFS 1024 if anyone knows as I've emailed Kleinjung and Aoki but have yet to hear back.
The matrix used was 66,718,154 x 66,718,154 with a weight of ~9.5b. However later it mentions quadratic character tests were performed after block wiedemann? Am I to read this as no quadratic character base was present in the matrix? Further if one was present, how many entries? The fake matrix I've generated takes into account a random yield between 50 and 100 per q for the lattice ranges mentioned, a 1/p distribution for the factor base, and 128 qcb entries. |
![]() |
![]() |
![]() |
#11 | |
Jul 2003
So Cal
2·33·72 Posts |
![]() Quote:
Do you think Block Wiedemann would scale well with 32 threads? Even if overall throughput wouldn't be better, the storage space advantages of running only one matrix at a time would make it preferable. We have been hoping that a good implementation would come along, but in light of your comment I'm wondering if it will be the answer to our problem. Thanks, Greg |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Block Wiedemann for beginners | paul0 | Factoring | 7 | 2015-11-16 17:09 |
block wiedemann and block lanczos | ravlyuchenko | Msieve | 5 | 2011-05-09 13:16 |
Very basic question about Wiedemann methods | fivemack | Math | 0 | 2008-06-16 10:57 |
P-1: Block of 73: 19.6-19.65M | dave_0273 | Completed Missions | 2 | 2005-05-16 17:55 |
Deutscher Thread (german thread) | TauCeti | NFSNET Discussion | 0 | 2003-12-11 22:12 |