mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Block Lanczos with a reordering pass (https://www.mersenneforum.org/showthread.php?t=13036)

jasonp 2010-01-30 18:15

Block Lanczos with a reordering pass
 
If anyone feels adventurous, it would be nice to test out the msieve distribution in branches/lanczos-opt on a fairly large problem. This code adds an extra pass to the matrix-building phase that permutes the rows and columns of the matrix so that sparse regions are packed together for increased efficiency when performing matrix multiplies.

The method for doing so involves:
- reading the matrix from disk and ignoring dense rows and columns
- running a graph partitioning algorithm recursively on the sparse core that is left over. Currently the algorithm used is a crappy version of Kernighan-Lin with improvements by Fidduccia-Matheyses; about the only good thing I can say about it is that it should be able to scale up to large size problems with memory use somewhat less than is needed for NFS filtering.
- the graph partitioning algorithm divides the matrix into two halves that are independent of each other plus a border region that hopefully is narrow, then permutes the rows and columns so that the border region comes first and then the independent regions come last, then recurses on each region
- the next time the matrix is read in, the final permutation is applied and the permuted matrix is written to disk. If running with multiple threads, the number of matrix columns given to each thread is now nonuniform, because the density of matrix entries decreases as you move from the top left to bottom right of the matrix.

I haven't had time to test it out on a large problem, but I think the LA should at least work okay, if not faster. The quality of the partitioning has a crucial effect on performance, and currently the quality is not very good (plus partitioning an NFS matrix seems to be a very large and difficult problem). A similar bunch of partitioning code has been in the works for a while now, intended to be applied to the filtering stage, but I think right now that we need some tweaks to make the LA go faster and this is a good candidate to get extra performance for free.

It goes without saying that the modified code is furiously preliminary, and hopefully will help out for the largest size matrices (2M columns and up). Please let me know if it helps!

Andi47 2010-01-30 19:27

What is "fairly large"? c13x? c150? c165?

I am currently sieving a c133 (should finish in a few days from now) and could give it a try on this one, but I can't find out where to download the file?

(BTW: is there any place where I can download a version (post 1.43, but not a GPU version as my graphics card is too old) which includes the fix to the issue which causes FactMsieve.pl to crash when I have [I]just not enough[/I] relations and msieve therefore can't build a matrix? I would need a windows binary of this too - I don't like it when I run aliquot sequences and aliqueit gets stuck in "GNFS failed, running ECM forever" due to this issue when I'm not around to catch it...)

Batalov 2010-01-30 23:23

We'll have a 2,913+/- coming up soon. Will try.

schickel 2010-01-31 04:03

I also have a c133 sieving right now. I've also got two systems I can test on: 32 & 64 bit Vista, both on dual cores....

Andi47 2010-01-31 09:24

Where can I download the files? The link seems to be very well hidden (or I am too blind to see...)

Mini-Geek 2010-01-31 13:58

I believe he's referring to [URL="http://msieve.svn.sourceforge.net/viewvc/msieve/branches/lanczos-opt/"]this code[/URL], which is downloadable via [URL="http://msieve.svn.sourceforge.net/viewvc/msieve/branches/lanczos-opt.tar.gz?view=tar"]this link[/URL]. (no prebuilt verisons available AFAICT)

Andi47 2010-01-31 14:28

[QUOTE=Mini-Geek;203972]I believe he's referring to [URL="http://msieve.svn.sourceforge.net/viewvc/msieve/branches/lanczos-opt/"]this code[/URL], which is downloadable via [URL="http://msieve.svn.sourceforge.net/viewvc/msieve/branches/lanczos-opt.tar.gz?view=tar"]this link[/URL]. (no prebuilt verisons available AFAICT)[/QUOTE]

Thanks: I have now tried to build a binary and it fails, see here: [url]http://www.mersenneforum.org/showthread.php?t=13040[/url]

debrouxl 2010-01-31 19:05

A few hours ago, I started the LA for XYYXF C175_113_65, a 5043428 x 5043605 matrix, the initial ETA was around 97h (on a Core 2 Duo @ 2.33 GHz).
When it's over, in several days, I'll try this experimental msieve version :)

fivemack 2010-01-31 22:22

I'm running the 2845972 x 2846197 matrix for aliquot 4800 step 1449 (C146) as I type; it took 58878 seconds (4 threads, Core2 Q6600) with 1.43 last time.

I'm not seeing any messages at all from the partitioning pass - do I need some command-line option to invoke it? ETA at 1%-complete is 16h43m which suggests it might be a little (say 2000 seconds) slower than the last run.

Batalov 2010-01-31 23:23

I've built the experimental binary and the trunk binary, SVN 182. (k8, 64-bit linux)
Ran both with -t 4 for the toy tests below with a gnfs-135, matrix size ~1.8M.

1. To start with, I've tested it incorrectly, I think: I took an existing matrix from a finished gnfs-135 and a .chk file and restarted -ncr; then, the experimental binary was 2x slower than the trunk sibling.

2. I restarted from .dat, -nc1, -nc2. It reported "permuting matrix for faster multiplies" and then proceeded but also about 2x slower.

3. The experimental binary says "using block size 5000 for processor cache size 1024 kB", while the trunk binary says "using block size 43690..." - is that as planned? Maybe, that alone would explain the slowdown?

jasonp 2010-02-01 02:12

Sorry about the 5000 stuff, I added it so that I could simulate what the matrix would look like for a much bigger problem, without having to build it. Unfortunately I left that hack in the source when committing the changes; its been reverted now.

Greg: did you pull out the SVN tree in branches/lanczos-opt and not trunk?

fivemack 2010-02-01 16:03

I get

[code]
using block size 65536 for processor cache size 4096 kB
commencing Lanczos iteration (4 threads)
memory use: 879.8 MB
linear algebra at 0.0%, ETA 17h49m46197 dimensions (0.0%, ETA 17h49m)
linear algebra completed 2845775 of 2846197 dimensions (100.0%, ETA 0h 0m)
lanczos halted after 45009 iterations (dim = 2845971)
recovered 32 nontrivial dependencies
BLanczosTime: 60345
[/code]

using a version of msieve obtained by downloading lanczos-opt.tgz from msieve.sourceforge.org and building it. So no messages about the partitioning, and a couple of percent slower than the last time I ran this matrix.

jasonp 2010-02-01 17:16

There is a lanczos-opt directory at the top level that should not be there; are you sure you zipped up branches/lanczos-opt ? (The note about the partitioning should occur right after the initial matrix build and right before the text you quote)

I also spent some time yesterday feeding the matrix to a high-quality graph partitioning library (Metis is really nice), and it appeared to do quite a good job. Unfortunately, multilevel graph partitioning with simultaneous constraints takes around 6x the memory use of my crappy code, so it's a non-starter for really big problems.

Andi47 2010-02-01 18:05

[QUOTE=jasonp;204169]There is a lanczos-opt directory at the top level that should not be there; are you sure you zipped up branches/lanczos-opt ? (The note about the partitioning should occur right after the initial matrix build and right before the text you quote)

I also spent some time yesterday feeding the matrix to a high-quality graph partitioning library (Metis is really nice), and it appeared to do quite a good job. Unfortunately, multilevel graph partitioning with simultaneous constraints takes around 6x the memory use of my crappy code, so it's a non-starter for really big problems.[/QUOTE]

Do you mean SVN184 which should not be there?

Is SVN185 part of the lanczos-opt brach or is it the main "branch"?

jasonp 2010-02-01 20:32

SVN 185 was a fix to branches/lanczos-opt; the repository copy in SVN 184 actually does not have the reordering code at all.

You can go to 'SVN Browse' in the 'Develop' menu on sourceforge, then click on a particular revision number and see exactly which files in the repository are affected.

Andi47 2010-02-06 07:36

matrix step of a c133 with 1.43:

[CODE]
Fri Feb 5 06:41:40 2010 Msieve v. 1.43
Fri Feb 5 06:41:40 2010 random seeds: 35296e5c f31b436b
Fri Feb 5 06:41:40 2010 factoring 5335366387062594103281599506619078830675568770597674764072096981029494235489350569590888813457656823560321885511353040519174462222557 (133 digits)
Fri Feb 5 06:41:41 2010 no P-1/P+1/ECM available, skipping
Fri Feb 5 06:41:41 2010 commencing number field sieve (133-digit input)
Fri Feb 5 06:41:41 2010 R0: -43590657298344915874817701
Fri Feb 5 06:41:41 2010 R1: 467844431521753
Fri Feb 5 06:41:41 2010 A0: 3052516397340263636960237838715200
Fri Feb 5 06:41:41 2010 A1: 29100578051888094818724416280
Fri Feb 5 06:41:41 2010 A2: 36523592624186545117942
Fri Feb 5 06:41:41 2010 A3: -123602480596394341
Fri Feb 5 06:41:41 2010 A4: -18461007596
Fri Feb 5 06:41:41 2010 A5: 33900
Fri Feb 5 06:41:41 2010 skew 863842.14, size 7.089820e-13, alpha -7.025253, combined = 4.837199e-11
Fri Feb 5 06:41:41 2010
Fri Feb 5 06:41:41 2010 commencing relation filtering
Fri Feb 5 06:41:41 2010 estimated available RAM is 1985.8 MB
Fri Feb 5 06:41:41 2010 commencing duplicate removal, pass 1
Fri Feb 5 06:44:27 2010 found 3339904 hash collisions in 14676617 relations
Fri Feb 5 06:45:06 2010 added 62981 free relations
Fri Feb 5 06:45:06 2010 commencing duplicate removal, pass 2
Fri Feb 5 06:45:46 2010 found 3492350 duplicates and 11247248 unique relations

<snip>

Feb 5 06:55:03 2010 commencing linear algebra
Fri Feb 5 06:55:04 2010 read 1428636 cycles
Fri Feb 5 06:55:08 2010 cycles contain 4538304 unique relations
Fri Feb 5 06:56:02 2010 read 4538304 relations
Fri Feb 5 06:56:10 2010 using 20 quadratic characters above 134217690
Fri Feb 5 06:56:42 2010 building initial matrix
Fri Feb 5 06:58:32 2010 memory use: 595.5 MB
Fri Feb 5 06:58:34 2010 read 1428636 cycles
Fri Feb 5 06:58:42 2010 matrix is 1428459 x 1428636 (435.0 MB) with weight 135788023 (95.05/col)
Fri Feb 5 06:58:42 2010 sparse part has weight 96885657 (67.82/col)
Fri Feb 5 06:59:07 2010 filtering completed in 2 passes
Fri Feb 5 06:59:07 2010 matrix is 1427844 x 1428021 (434.9 MB) with weight 135765621 (95.07/col)
Fri Feb 5 06:59:07 2010 sparse part has weight 96880921 (67.84/col)
Fri Feb 5 06:59:24 2010 read 1428021 cycles
Fri Feb 5 06:59:26 2010 matrix is 1427844 x 1428021 (434.9 MB) with weight 135765621 (95.07/col)
Fri Feb 5 06:59:26 2010 sparse part has weight 96880921 (67.84/col)
Fri Feb 5 06:59:26 2010 saving the first 48 matrix rows for later
Fri Feb 5 06:59:27 2010 matrix is 1427796 x 1428021 (418.5 MB) with weight 108661836 (76.09/col)
Fri Feb 5 06:59:27 2010 sparse part has weight 95415355 (66.82/col)
Fri Feb 5 06:59:27 2010 matrix includes 64 packed rows
Fri Feb 5 06:59:27 2010 using block size 65536 for processor cache size 4096 kB
Fri Feb 5 06:59:36 2010 commencing Lanczos iteration (2 threads)
Fri Feb 5 06:59:36 2010 memory use: 415.9 MB
Fri Feb 5 06:59:47 2010 linear algebra at 0.1%, ETA 5h13m
Fri Feb 5 12:58:11 2010 lanczos halted after 22580 iterations (dim = 1427795)
Fri Feb 5 12:58:15 2010 recovered 29 nontrivial dependencies
Fri Feb 5 12:58:15 2010 [B]BLanczosTime: 21792[/B][/CODE]

Same c133 on the same C2D laptop with SVN185:

[CODE]Fri Feb 5 22:20:34 2010 Msieve v. 1.44
Fri Feb 5 22:20:34 2010 random seeds: eeced92d 786ed506
Fri Feb 5 22:20:34 2010 factoring 5335366387062594103281599506619078830675568770597674764072096981029494235489350569590888813457656823560321885511353040519174462222557 (133 digits)

<snip>

Fri Feb 5 22:33:54 2010 commencing linear algebra
Fri Feb 5 22:33:55 2010 read 1428636 cycles
Fri Feb 5 22:33:59 2010 cycles contain 4538304 unique relations
Fri Feb 5 22:34:53 2010 read 4538304 relations
Fri Feb 5 22:35:02 2010 using 20 quadratic characters above 134217690
Fri Feb 5 22:35:34 2010 building initial matrix
Fri Feb 5 22:37:25 2010 memory use: 595.5 MB
Fri Feb 5 22:37:27 2010 read 1428636 cycles
Fri Feb 5 22:37:35 2010 matrix is 1428459 x 1428636 (435.0 MB) with weight 135788023 (95.05/col)
Fri Feb 5 22:37:35 2010 sparse part has weight 96885657 (67.82/col)
Fri Feb 5 22:38:00 2010 filtering completed in 2 passes
Fri Feb 5 22:38:00 2010 matrix is 1427844 x 1428021 (434.9 MB) with weight 135765621 (95.07/col)
Fri Feb 5 22:38:00 2010 sparse part has weight 96880921 (67.84/col)
Fri Feb 5 22:38:18 2010 read 1428021 cycles
Fri Feb 5 22:38:19 2010 matrix is 1427844 x 1428021 (434.9 MB) with weight 135765621 (95.07/col)
Fri Feb 5 22:38:19 2010 sparse part has weight 96880921 (67.84/col)
Fri Feb 5 22:38:19 2010 saving the first 48 matrix rows for later
Fri Feb 5 22:38:20 2010 matrix is 1427796 x 1428021 (418.5 MB) with weight 108661836 (76.09/col)
Fri Feb 5 22:38:20 2010 sparse part has weight 95415355 (66.82/col)
Fri Feb 5 22:38:20 2010 matrix includes 64 packed rows
Fri Feb 5 22:38:20 2010 using block size 65536 for processor cache size 4096 kB
Fri Feb 5 22:38:29 2010 commencing Lanczos iteration (2 threads)
Fri Feb 5 22:38:29 2010 memory use: 415.9 MB
Fri Feb 5 22:38:40 2010 linear algebra at 0.1%, ETA 5h12m
Sat Feb 6 04:44:09 2010 lanczos halted after 22579 iterations (dim = 1427795)
Sat Feb 6 04:44:14 2010 recovered 29 nontrivial dependencies
Sat Feb 6 04:44:14 2010 [B]BLanczosTime: 22220[/B]
[/CODE]

Seems SVN185 is slower.

debrouxl 2010-02-06 09:55

I made some tests on C149_113_58 (SNFS difficulty 200) on Core 2 Duo T7200, 2 GB RAM (but that didn't work out, see below) and Core 2 Duo T7600, 3 GB RAM:

SVN185:
[code]Msieve v. 1.44
Thu Feb 4 18:02:13 2010
random seeds: c08d3a90 82f55b5b
factoring 58726049878536391557011831786679470473096960150546862137088288968129115887237799139638233172513763094056500204963331970846586616031413324264998772683 (149 digits)
searching for 15-digit factors
commencing number field sieve (149-digit input)
R0: -624332378391008467754527498760580235264
R1: 38358611506121121577937
A0: 1442897
A1: 0
A2: 0
A3: 0
A4: 0
A5: 195112
skew 1.49, size 1.169368e-14, alpha -1.135403, combined = 6.064487e-12

commencing linear algebra
read 3259953 cycles
cycles contain 10103761 unique relations
read 10103761 relations
using 20 quadratic characters above 536870768
building initial matrix
memory use: 1326.4 MB
read 3259953 cycles
matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col)
sparse part has weight 226894247 (69.60/col)
filtering completed in 1 passes
matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col)
sparse part has weight 226894247 (69.60/col)
permuting matrix for faster multiplies
matrix is 3259753 x 3259953
sparse core is 2313986 x 1549672
graph has 11341062 edges
memory use: 160.22 MB
max edge weight is 36
read 3259953 cycles
matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col)
sparse part has weight 226894247 (69.60/col)
saving the first 48 matrix rows for later
matrix is 3259705 x 3259953 (960.0 MB) with weight 242856059 (74.50/col)
sparse part has weight 219071221 (67.20/col)
matrix includes 64 packed rows
using block size 65536 for processor cache size 4096 kB
commencing Lanczos iteration (2 threads)
memory use: 975.0 MB
linear algebra at 0.0%, ETA 33h37m59953 dimensions (0.0%, ETA 33h37m)
^Cnear algebra completed 6836 of 3259953 dimensions (0.2%, ETA 33h18m)
received signal 2; shutting down

lanczos halted after 111 iterations (dim = 7024)
BLanczosTime: 976
elapsed time 00:16:17[/code]

trunk (SVN 189, I think):

[code]Msieve v. 1.44
Thu Feb 4 18:19:19 2010
random seeds: 4c27aea9 6f01d9ad
factoring 58726049878536391557011831786679470473096960150546862137088288968129115887237799139638233172513763094056500204963331970846586616031413324264998772683 (149 digits)
searching for 15-digit factors
commencing number field sieve (149-digit input)
R0: -624332378391008467754527498760580235264
R1: 38358611506121121577937
A0: 1442897
A1: 0
A2: 0
A3: 0
A4: 0
A5: 195112
skew 1.49, size 1.169368e-14, alpha -1.135403, combined = 6.064487e-12

commencing linear algebra
read 3259953 cycles
cycles contain 10103761 unique relations
read 10103761 relations
using 20 quadratic characters above 536870768
building initial matrix
memory use: 1326.4 MB
read 3259953 cycles
matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col)
sparse part has weight 226894247 (69.60/col)
filtering completed in 1 passes
matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col)
sparse part has weight 226894247 (69.60/col)
read 3259953 cycles
matrix is 3259753 x 3259953 (1002.3 MB) with weight 304669742 (93.46/col)
sparse part has weight 226894247 (69.60/col)
saving the first 48 matrix rows for later
matrix is 3259705 x 3259953 (960.0 MB) with weight 242856059 (74.50/col)
sparse part has weight 219071221 (67.20/col)
matrix includes 64 packed rows
using block size 65536 for processor cache size 4096 kB
commencing Lanczos iteration (2 threads)
memory use: 974.9 MB
linear algebra at 0.0%, ETA 33h18m59953 dimensions (0.0%, ETA 33h18m)
^Cnear algebra completed 7343 of 3259953 dimensions (0.2%, ETA 33h35m)
received signal 2; shutting down

lanczos halted after 119 iterations (dim = 7530)
BLanczosTime: 806
elapsed time 00:13:28[/code]

I guess I need to mention that despite the "memory use: 974.9 MB" and "memory use: 975.0 MB"; traces, consistent wrt. each other, the memory consumption of the reordering-enabled binary was at least 700 MB (!) higher than that of the other binary.
On the T7200 with 2 GB RAM, the reordering-enabled binary was killed by Linux's OOM killer before it could display "commencing Lanczos iteration (2 threads)", so I switched to the T7600.

jasonp, if you think that it would be useful for testing the algorithm (all of our tasks are beyond 2M cycles), I can send you how to get the raw sieving data of the RSALS grid :smile:

debrouxl 2010-02-07 08:00

For some reason, I can't seem to be allowed to edit my own posts (even after logging out and logging back in), so I'm creating a new post.
[EDIT: the Edit button appears on this post right after I've made it - is there a time limit ?]


[quote]Unfortunately, multilevel graph partitioning with simultaneous constraints takes around 6x the memory use of my crappy code, so it's a non-starter for really big problems.[/quote]If the total memory requirements of the other graph partitioning code are lower than the requirements of the rest of the msieve LA pass, it can remain a valid option (barring e.g. licensing issues, of course).

10metreh 2010-02-07 08:33

[QUOTE=debrouxl;204761][EDIT: the Edit button appears on this post right after I've made it - is there a time limit ?][/QUOTE]

Yes there is: 1 hour (unless you're a moderator).


All times are UTC. The time now is 11:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.