mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-01-30, 18:15   #1
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

353710 Posts
Default Block Lanczos with a reordering pass

If anyone feels adventurous, it would be nice to test out the msieve distribution in branches/lanczos-opt on a fairly large problem. This code adds an extra pass to the matrix-building phase that permutes the rows and columns of the matrix so that sparse regions are packed together for increased efficiency when performing matrix multiplies.

The method for doing so involves:
- reading the matrix from disk and ignoring dense rows and columns
- running a graph partitioning algorithm recursively on the sparse core that is left over. Currently the algorithm used is a crappy version of Kernighan-Lin with improvements by Fidduccia-Matheyses; about the only good thing I can say about it is that it should be able to scale up to large size problems with memory use somewhat less than is needed for NFS filtering.
- the graph partitioning algorithm divides the matrix into two halves that are independent of each other plus a border region that hopefully is narrow, then permutes the rows and columns so that the border region comes first and then the independent regions come last, then recurses on each region
- the next time the matrix is read in, the final permutation is applied and the permuted matrix is written to disk. If running with multiple threads, the number of matrix columns given to each thread is now nonuniform, because the density of matrix entries decreases as you move from the top left to bottom right of the matrix.

I haven't had time to test it out on a large problem, but I think the LA should at least work okay, if not faster. The quality of the partitioning has a crucial effect on performance, and currently the quality is not very good (plus partitioning an NFS matrix seems to be a very large and difficult problem). A similar bunch of partitioning code has been in the works for a while now, intended to be applied to the filtering stage, but I think right now that we need some tweaks to make the LA go faster and this is a good candidate to get extra performance for free.

It goes without saying that the modified code is furiously preliminary, and hopefully will help out for the largest size matrices (2M columns and up). Please let me know if it helps!

Last fiddled with by jasonp on 2010-01-30 at 18:26
jasonp is offline   Reply With Quote
Old 2010-01-30, 19:27   #2
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

2×17×73 Posts
Default

What is "fairly large"? c13x? c150? c165?

I am currently sieving a c133 (should finish in a few days from now) and could give it a try on this one, but I can't find out where to download the file?

(BTW: is there any place where I can download a version (post 1.43, but not a GPU version as my graphics card is too old) which includes the fix to the issue which causes FactMsieve.pl to crash when I have just not enough relations and msieve therefore can't build a matrix? I would need a windows binary of this too - I don't like it when I run aliquot sequences and aliqueit gets stuck in "GNFS failed, running ECM forever" due to this issue when I'm not around to catch it...)
Andi47 is offline   Reply With Quote
Old 2010-01-30, 23:23   #3
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

9,391 Posts
Default

We'll have a 2,913+/- coming up soon. Will try.
Batalov is offline   Reply With Quote
Old 2010-01-31, 04:03   #4
schickel
 
schickel's Avatar
 
"Frank <^>"
Dec 2004
CDP Janesville

84A16 Posts
Default

I also have a c133 sieving right now. I've also got two systems I can test on: 32 & 64 bit Vista, both on dual cores....
schickel is offline   Reply With Quote
Old 2010-01-31, 09:24   #5
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

2·17·73 Posts
Default

Where can I download the files? The link seems to be very well hidden (or I am too blind to see...)
Andi47 is offline   Reply With Quote
Old 2010-01-31, 13:58   #6
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

10AB16 Posts
Default

I believe he's referring to this code, which is downloadable via this link. (no prebuilt verisons available AFAICT)

Last fiddled with by Mini-Geek on 2010-01-31 at 13:59
Mini-Geek is offline   Reply With Quote
Old 2010-01-31, 14:28   #7
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

1001101100102 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
I believe he's referring to this code, which is downloadable via this link. (no prebuilt verisons available AFAICT)
Thanks: I have now tried to build a binary and it fails, see here: http://www.mersenneforum.org/showthread.php?t=13040
Andi47 is offline   Reply With Quote
Old 2010-01-31, 19:05   #8
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

A few hours ago, I started the LA for XYYXF C175_113_65, a 5043428 x 5043605 matrix, the initial ETA was around 97h (on a Core 2 Duo @ 2.33 GHz).
When it's over, in several days, I'll try this experimental msieve version :)
debrouxl is offline   Reply With Quote
Old 2010-01-31, 22:22   #9
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

13×491 Posts
Default

I'm running the 2845972 x 2846197 matrix for aliquot 4800 step 1449 (C146) as I type; it took 58878 seconds (4 threads, Core2 Q6600) with 1.43 last time.

I'm not seeing any messages at all from the partitioning pass - do I need some command-line option to invoke it? ETA at 1%-complete is 16h43m which suggests it might be a little (say 2000 seconds) slower than the last run.

Last fiddled with by fivemack on 2010-01-31 at 23:20
fivemack is offline   Reply With Quote
Old 2010-01-31, 23:23   #10
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

9,391 Posts
Default

I've built the experimental binary and the trunk binary, SVN 182. (k8, 64-bit linux)
Ran both with -t 4 for the toy tests below with a gnfs-135, matrix size ~1.8M.

1. To start with, I've tested it incorrectly, I think: I took an existing matrix from a finished gnfs-135 and a .chk file and restarted -ncr; then, the experimental binary was 2x slower than the trunk sibling.

2. I restarted from .dat, -nc1, -nc2. It reported "permuting matrix for faster multiplies" and then proceeded but also about 2x slower.

3. The experimental binary says "using block size 5000 for processor cache size 1024 kB", while the trunk binary says "using block size 43690..." - is that as planned? Maybe, that alone would explain the slowdown?
Batalov is offline   Reply With Quote
Old 2010-02-01, 02:12   #11
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

33×131 Posts
Default

Sorry about the 5000 stuff, I added it so that I could simulate what the matrix would look like for a much bigger problem, without having to build it. Unfortunately I left that hack in the source when committing the changes; its been reverted now.

Greg: did you pull out the SVN tree in branches/lanczos-opt and not trunk?

Last fiddled with by jasonp on 2010-02-01 at 02:13
jasonp is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Msieve with MPI block Lanczos jasonp Msieve 104 2013-06-17 11:28
block wiedemann and block lanczos ravlyuchenko Msieve 5 2011-05-09 13:16
Why is lanczos hard to distribute? Christenson Factoring 39 2011-04-08 09:44
Lanczos error Andi47 Msieve 7 2009-01-11 19:33
Msieve Lanczos scalability Jeff Gilchrist Msieve 1 2009-01-02 09:32

All times are UTC. The time now is 11:25.

Tue Apr 20 11:25:40 UTC 2021 up 12 days, 6:06, 0 users, load averages: 2.42, 2.20, 2.03

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.