![]() |
Block Lanczos with a reordering pass
If anyone feels adventurous, it would be nice to test out the msieve distribution in branches/lanczos-opt on a fairly large problem. This code adds an extra pass to the matrix-building phase that permutes the rows and columns of the matrix so that sparse regions are packed together for increased efficiency when performing matrix multiplies.
The method for doing so involves: - reading the matrix from disk and ignoring dense rows and columns - running a graph partitioning algorithm recursively on the sparse core that is left over. Currently the algorithm used is a crappy version of Kernighan-Lin with improvements by Fidduccia-Matheyses; about the only good thing I can say about it is that it should be able to scale up to large size problems with memory use somewhat less than is needed for NFS filtering. - the graph partitioning algorithm divides the matrix into two halves that are independent of each other plus a border region that hopefully is narrow, then permutes the rows and columns so that the border region comes first and then the independent regions come last, then recurses on each region - the next time the matrix is read in, the final permutation is applied and the permuted matrix is written to disk. If running with multiple threads, the number of matrix columns given to each thread is now nonuniform, because the density of matrix entries decreases as you move from the top left to bottom right of the matrix. I haven't had time to test it out on a large problem, but I think the LA should at least work okay, if not faster. The quality of the partitioning has a crucial effect on performance, and currently the quality is not very good (plus partitioning an NFS matrix seems to be a very large and difficult problem). A similar bunch of partitioning code has been in the works for a while now, intended to be applied to the filtering stage, but I think right now that we need some tweaks to make the LA go faster and this is a good candidate to get extra performance for free. It goes without saying that the modified code is furiously preliminary, and hopefully will help out for the largest size matrices (2M columns and up). Please let me know if it helps! |
What is "fairly large"? c13x? c150? c165?
I am currently sieving a c133 (should finish in a few days from now) and could give it a try on this one, but I can't find out where to download the file? (BTW: is there any place where I can download a version (post 1.43, but not a GPU version as my graphics card is too old) which includes the fix to the issue which causes FactMsieve.pl to crash when I have [I]just not enough[/I] relations and msieve therefore can't build a matrix? I would need a windows binary of this too - I don't like it when I run aliquot sequences and aliqueit gets stuck in "GNFS failed, running ECM forever" due to this issue when I'm not around to catch it...) |
We'll have a 2,913+/- coming up soon. Will try.
|
I also have a c133 sieving right now. I've also got two systems I can test on: 32 & 64 bit Vista, both on dual cores....
|
Where can I download the files? The link seems to be very well hidden (or I am too blind to see...)
|
I believe he's referring to [URL="http://msieve.svn.sourceforge.net/viewvc/msieve/branches/lanczos-opt/"]this code[/URL], which is downloadable via [URL="http://msieve.svn.sourceforge.net/viewvc/msieve/branches/lanczos-opt.tar.gz?view=tar"]this link[/URL]. (no prebuilt verisons available AFAICT)
|
[QUOTE=Mini-Geek;203972]I believe he's referring to [URL="http://msieve.svn.sourceforge.net/viewvc/msieve/branches/lanczos-opt/"]this code[/URL], which is downloadable via [URL="http://msieve.svn.sourceforge.net/viewvc/msieve/branches/lanczos-opt.tar.gz?view=tar"]this link[/URL]. (no prebuilt verisons available AFAICT)[/QUOTE]
Thanks: I have now tried to build a binary and it fails, see here: [url]http://www.mersenneforum.org/showthread.php?t=13040[/url] |
A few hours ago, I started the LA for XYYXF C175_113_65, a 5043428 x 5043605 matrix, the initial ETA was around 97h (on a Core 2 Duo @ 2.33 GHz).
When it's over, in several days, I'll try this experimental msieve version :) |
I'm running the 2845972 x 2846197 matrix for aliquot 4800 step 1449 (C146) as I type; it took 58878 seconds (4 threads, Core2 Q6600) with 1.43 last time.
I'm not seeing any messages at all from the partitioning pass - do I need some command-line option to invoke it? ETA at 1%-complete is 16h43m which suggests it might be a little (say 2000 seconds) slower than the last run. |
I've built the experimental binary and the trunk binary, SVN 182. (k8, 64-bit linux)
Ran both with -t 4 for the toy tests below with a gnfs-135, matrix size ~1.8M. 1. To start with, I've tested it incorrectly, I think: I took an existing matrix from a finished gnfs-135 and a .chk file and restarted -ncr; then, the experimental binary was 2x slower than the trunk sibling. 2. I restarted from .dat, -nc1, -nc2. It reported "permuting matrix for faster multiplies" and then proceeded but also about 2x slower. 3. The experimental binary says "using block size 5000 for processor cache size 1024 kB", while the trunk binary says "using block size 43690..." - is that as planned? Maybe, that alone would explain the slowdown? |
Sorry about the 5000 stuff, I added it so that I could simulate what the matrix would look like for a much bigger problem, without having to build it. Unfortunately I left that hack in the source when committing the changes; its been reverted now.
Greg: did you pull out the SVN tree in branches/lanczos-opt and not trunk? |
I get
[code] using block size 65536 for processor cache size 4096 kB commencing Lanczos iteration (4 threads) memory use: 879.8 MB linear algebra at 0.0%, ETA 17h49m46197 dimensions (0.0%, ETA 17h49m) linear algebra completed 2845775 of 2846197 dimensions (100.0%, ETA 0h 0m) lanczos halted after 45009 iterations (dim = 2845971) recovered 32 nontrivial dependencies BLanczosTime: 60345 [/code] using a version of msieve obtained by downloading lanczos-opt.tgz from msieve.sourceforge.org and building it. So no messages about the partitioning, and a couple of percent slower than the last time I ran this matrix. |
There is a lanczos-opt directory at the top level that should not be there; are you sure you zipped up branches/lanczos-opt ? (The note about the partitioning should occur right after the initial matrix build and right before the text you quote)
I also spent some time yesterday feeding the matrix to a high-quality graph partitioning library (Metis is really nice), and it appeared to do quite a good job. Unfortunately, multilevel graph partitioning with simultaneous constraints takes around 6x the memory use of my crappy code, so it's a non-starter for really big problems. |
[QUOTE=jasonp;204169]There is a lanczos-opt directory at the top level that should not be there; are you sure you zipped up branches/lanczos-opt ? (The note about the partitioning should occur right after the initial matrix build and right before the text you quote)
I also spent some time yesterday feeding the matrix to a high-quality graph partitioning library (Metis is really nice), and it appeared to do quite a good job. Unfortunately, multilevel graph partitioning with simultaneous constraints takes around 6x the memory use of my crappy code, so it's a non-starter for really big problems.[/QUOTE] Do you mean SVN184 which should not be there? Is SVN185 part of the lanczos-opt brach or is it the main "branch"? |
SVN 185 was a fix to branches/lanczos-opt; the repository copy in SVN 184 actually does not have the reordering code at all.
You can go to 'SVN Browse' in the 'Develop' menu on sourceforge, then click on a particular revision number and see exactly which files in the repository are affected. |
matrix step of a c133 with 1.43:
[CODE] Fri Feb 5 06:41:40 2010 Msieve v. 1.43 Fri Feb 5 06:41:40 2010 random seeds: 35296e5c f31b436b Fri Feb 5 06:41:40 2010 factoring 5335366387062594103281599506619078830675568770597674764072096981029494235489350569590888813457656823560321885511353040519174462222557 (133 digits) Fri Feb 5 06:41:41 2010 no P-1/P+1/ECM available, skipping Fri Feb 5 06:41:41 2010 commencing number field sieve (133-digit input) Fri Feb 5 06:41:41 2010 R0: -43590657298344915874817701 Fri Feb 5 06:41:41 2010 R1: 467844431521753 Fri Feb 5 06:41:41 2010 A0: 3052516397340263636960237838715200 Fri Feb 5 06:41:41 2010 A1: 29100578051888094818724416280 Fri Feb 5 06:41:41 2010 A2: 36523592624186545117942 Fri Feb 5 06:41:41 2010 A3: -123602480596394341 Fri Feb 5 06:41:41 2010 A4: -18461007596 Fri Feb 5 06:41:41 2010 A5: 33900 Fri Feb 5 06:41:41 2010 skew 863842.14, size 7.089820e-13, alpha -7.025253, combined = 4.837199e-11 Fri Feb 5 06:41:41 2010 Fri Feb 5 06:41:41 2010 commencing relation filtering Fri Feb 5 06:41:41 2010 estimated available RAM is 1985.8 MB Fri Feb 5 06:41:41 2010 commencing duplicate removal, pass 1 Fri Feb 5 06:44:27 2010 found 3339904 hash collisions in 14676617 relations Fri Feb 5 06:45:06 2010 added 62981 free relations Fri Feb 5 06:45:06 2010 commencing duplicate removal, pass 2 Fri Feb 5 06:45:46 2010 found 3492350 duplicates and 11247248 unique relations <snip> Feb 5 06:55:03 2010 commencing linear algebra Fri Feb 5 06:55:04 2010 read 1428636 cycles Fri Feb 5 06:55:08 2010 cycles contain 4538304 unique relations Fri Feb 5 06:56:02 2010 read 4538304 relations Fri Feb 5 06:56:10 2010 using 20 quadratic characters above 134217690 Fri Feb 5 06:56:42 2010 building initial matrix Fri Feb 5 06:58:32 2010 memory use: 595.5 MB Fri Feb 5 06:58:34 2010 read 1428636 cycles Fri Feb 5 06:58:42 2010 matrix is 1428459 x 1428636 (435.0 MB) with weight 135788023 (95.05/col) Fri Feb 5 06:58:42 2010 sparse part has weight 96885657 (67.82/col) Fri Feb 5 06:59:07 2010 filtering completed in 2 passes Fri Feb 5 06:59:07 2010 matrix is 1427844 x 1428021 (434.9 MB) with weight 135765621 (95.07/col) Fri Feb 5 06:59:07 2010 sparse part has weight 96880921 (67.84/col) Fri Feb 5 06:59:24 2010 read 1428021 cycles Fri Feb 5 06:59:26 2010 matrix is 1427844 x 1428021 (434.9 MB) with weight 135765621 (95.07/col) Fri Feb 5 06:59:26 2010 sparse part has weight 96880921 (67.84/col) Fri Feb 5 06:59:26 2010 saving the first 48 matrix rows for later Fri Feb 5 06:59:27 2010 matrix is 1427796 x 1428021 (418.5 MB) with weight 108661836 (76.09/col) Fri Feb 5 06:59:27 2010 sparse part has weight 95415355 (66.82/col) Fri Feb 5 06:59:27 2010 matrix includes 64 packed rows Fri Feb 5 06:59:27 2010 using block size 65536 for processor cache size 4096 kB Fri Feb 5 06:59:36 2010 commencing Lanczos iteration (2 threads) Fri Feb 5 06:59:36 2010 memory use: 415.9 MB Fri Feb 5 06:59:47 2010 linear algebra at 0.1%, ETA 5h13m Fri Feb 5 12:58:11 2010 lanczos halted after 22580 iterations (dim = 1427795) Fri Feb 5 12:58:15 2010 recovered 29 nontrivial dependencies Fri Feb 5 12:58:15 2010 [B]BLanczosTime: 21792[/B][/CODE] Same c133 on the same C2D laptop with SVN185: [CODE]Fri Feb 5 22:20:34 2010 Msieve v. 1.44 Fri Feb 5 22:20:34 2010 random seeds: eeced92d 786ed506 Fri Feb 5 22:20:34 2010 factoring 5335366387062594103281599506619078830675568770597674764072096981029494235489350569590888813457656823560321885511353040519174462222557 (133 digits) <snip> Fri Feb 5 22:33:54 2010 commencing linear algebra Fri Feb 5 22:33:55 2010 read 1428636 cycles Fri Feb 5 22:33:59 2010 cycles contain 4538304 unique relations Fri Feb 5 22:34:53 2010 read 4538304 relations Fri Feb 5 22:35:02 2010 using 20 quadratic characters above 134217690 Fri Feb 5 22:35:34 2010 building initial matrix Fri Feb 5 22:37:25 2010 memory use: 595.5 MB Fri Feb 5 22:37:27 2010 read 1428636 cycles Fri Feb 5 22:37:35 2010 matrix is 1428459 x 1428636 (435.0 MB) with weight 135788023 (95.05/col) Fri Feb 5 22:37:35 2010 sparse part has weight 96885657 (67.82/col) Fri Feb 5 22:38:00 2010 filtering completed in 2 passes Fri Feb 5 22:38:00 2010 matrix is 1427844 x 1428021 (434.9 MB) with weight 135765621 (95.07/col) Fri Feb 5 22:38:00 2010 sparse part has weight 96880921 (67.84/col) Fri Feb 5 22:38:18 2010 read 1428021 cycles Fri Feb 5 22:38:19 2010 matrix is 1427844 x 1428021 (434.9 MB) with weight 135765621 (95.07/col) Fri Feb 5 22:38:19 2010 sparse part has weight 96880921 (67.84/col) Fri Feb 5 22:38:19 2010 saving the first 48 matrix rows for later Fri Feb 5 22:38:20 2010 matrix is 1427796 x 1428021 (418.5 MB) with weight 108661836 (76.09/col) Fri Feb 5 22:38:20 2010 sparse part has weight 95415355 (66.82/col) Fri Feb 5 22:38:20 2010 matrix includes 64 packed rows Fri Feb 5 22:38:20 2010 using block size 65536 for processor cache size 4096 kB Fri Feb 5 22:38:29 2010 commencing Lanczos iteration (2 threads) Fri Feb 5 22:38:29 2010 memory use: 415.9 MB Fri Feb 5 22:38:40 2010 linear algebra at 0.1%, ETA 5h12m Sat Feb 6 04:44:09 2010 lanczos halted after 22579 iterations (dim = 1427795) Sat Feb 6 04:44:14 2010 recovered 29 nontrivial dependencies Sat Feb 6 04:44:14 2010 [B]BLanczosTime: 22220[/B] [/CODE] Seems SVN185 is slower. |
I made some tests on C149_113_58 (SNFS difficulty 200) on Core 2 Duo T7200, 2 GB RAM (but that didn't work out, see below) and Core 2 Duo T7600, 3 GB RAM:
SVN185: [code]Msieve v. 1.44 Thu Feb 4 18:02:13 2010 random seeds: c08d3a90 82f55b5b factoring 58726049878536391557011831786679470473096960150546862137088288968129115887237799139638233172513763094056500204963331970846586616031413324264998772683 (149 digits) searching for 15-digit factors commencing number field sieve (149-digit input) R0: -624332378391008467754527498760580235264 R1: 38358611506121121577937 A0: 1442897 A1: 0 A2: 0 A3: 0 A4: 0 A5: 195112 skew 1.49, size 1.169368e-14, alpha -1.135403, combined = 6.064487e-12 commencing linear algebra read 3259953 cycles cycles contain 10103761 unique relations read 10103761 relations using 20 quadratic characters above 536870768 building initial matrix memory use: 1326.4 MB read 3259953 cycles matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col) sparse part has weight 226894247 (69.60/col) filtering completed in 1 passes matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col) sparse part has weight 226894247 (69.60/col) permuting matrix for faster multiplies matrix is 3259753 x 3259953 sparse core is 2313986 x 1549672 graph has 11341062 edges memory use: 160.22 MB max edge weight is 36 read 3259953 cycles matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col) sparse part has weight 226894247 (69.60/col) saving the first 48 matrix rows for later matrix is 3259705 x 3259953 (960.0 MB) with weight 242856059 (74.50/col) sparse part has weight 219071221 (67.20/col) matrix includes 64 packed rows using block size 65536 for processor cache size 4096 kB commencing Lanczos iteration (2 threads) memory use: 975.0 MB linear algebra at 0.0%, ETA 33h37m59953 dimensions (0.0%, ETA 33h37m) ^Cnear algebra completed 6836 of 3259953 dimensions (0.2%, ETA 33h18m) received signal 2; shutting down lanczos halted after 111 iterations (dim = 7024) BLanczosTime: 976 elapsed time 00:16:17[/code] trunk (SVN 189, I think): [code]Msieve v. 1.44 Thu Feb 4 18:19:19 2010 random seeds: 4c27aea9 6f01d9ad factoring 58726049878536391557011831786679470473096960150546862137088288968129115887237799139638233172513763094056500204963331970846586616031413324264998772683 (149 digits) searching for 15-digit factors commencing number field sieve (149-digit input) R0: -624332378391008467754527498760580235264 R1: 38358611506121121577937 A0: 1442897 A1: 0 A2: 0 A3: 0 A4: 0 A5: 195112 skew 1.49, size 1.169368e-14, alpha -1.135403, combined = 6.064487e-12 commencing linear algebra read 3259953 cycles cycles contain 10103761 unique relations read 10103761 relations using 20 quadratic characters above 536870768 building initial matrix memory use: 1326.4 MB read 3259953 cycles matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col) sparse part has weight 226894247 (69.60/col) filtering completed in 1 passes matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col) sparse part has weight 226894247 (69.60/col) read 3259953 cycles matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col) sparse part has weight 226894247 (69.60/col) saving the first 48 matrix rows for later matrix is 3259705 x 3259953 (960.0 MB) with weight 242856059 (74.50/col) sparse part has weight 219071221 (67.20/col) matrix includes 64 packed rows using block size 65536 for processor cache size 4096 kB commencing Lanczos iteration (2 threads) memory use: 974.9 MB linear algebra at 0.0%, ETA 33h18m59953 dimensions (0.0%, ETA 33h18m) ^Cnear algebra completed 7343 of 3259953 dimensions (0.2%, ETA 33h35m) received signal 2; shutting down lanczos halted after 119 iterations (dim = 7530) BLanczosTime: 806 elapsed time 00:13:28[/code] I guess I need to mention that despite the "memory use: 974.9 MB" and "memory use: 975.0 MB"; traces, consistent wrt. each other, the memory consumption of the reordering-enabled binary was at least 700 MB (!) higher than that of the other binary. On the T7200 with 2 GB RAM, the reordering-enabled binary was killed by Linux's OOM killer before it could display "commencing Lanczos iteration (2 threads)", so I switched to the T7600. jasonp, if you think that it would be useful for testing the algorithm (all of our tasks are beyond 2M cycles), I can send you how to get the raw sieving data of the RSALS grid :smile: |
For some reason, I can't seem to be allowed to edit my own posts (even after logging out and logging back in), so I'm creating a new post.
[EDIT: the Edit button appears on this post right after I've made it - is there a time limit ?] [quote]Unfortunately, multilevel graph partitioning with simultaneous constraints takes around 6x the memory use of my crappy code, so it's a non-starter for really big problems.[/quote]If the total memory requirements of the other graph partitioning code are lower than the requirements of the rest of the msieve LA pass, it can remain a valid option (barring e.g. licensing issues, of course). |
[QUOTE=debrouxl;204761][EDIT: the Edit button appears on this post right after I've made it - is there a time limit ?][/QUOTE]
Yes there is: 1 hour (unless you're a moderator). |
| All times are UTC. The time now is 11:42. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.