mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-07-02, 23:57   #34
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2×34×13 Posts
Default

Quote:
Originally Posted by Andi47 View Post
BTW: Did I get this right - this was a SNFS-257 from 11^287-1?
Yes, that's right.
frmky is online now   Reply With Quote
Old 2010-07-03, 07:30   #35
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

23·3·5·72 Posts
Default

What size matrix would frmky be able to solve in a couple of months? What size number would that apply to?
henryzz is offline   Reply With Quote
Old 2010-07-03, 08:00   #36
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

83A16 Posts
Default

Quote:
Originally Posted by henryzz View Post
What size matrix would frmky be able to solve in a couple of months? What size number would that apply to?
As another data point, an 18.8M matrix would take under two weeks. I estimate a 30-35M matrix would take about two months. M941, a 284-digit SNFS, resulted in a 24.1M matrix, so probably an SNFS in the high 280's would give a 30M or so matrix. The bad news is that this would consume about 100,000 hours of CPU time on the big iron, which I will only have if the grant proposal I'm writing is funded. The good news is that Jason may still have tricks up his sleeve for improving performance on the local cluster that I can use for free.
frmky is online now   Reply With Quote
Old 2010-07-07, 00:49   #37
pancoast.3
 
Jun 2010

316 Posts
Default My Sliced MPI implementation

Ok I read book about MPI and I just finished my first MPI program. The goal was to divide a search interval into slices and distribute work more evenly amongst processors. The next step for me is to implement this sliced approach to the polynomial selection.

Please take a look a let me know if I'm missing anything here
http://mancoast.chickenkiller.com/primempi.tgz
pancoast.3 is offline   Reply With Quote
Old 2010-07-07, 01:22   #38
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

If you run a complex program where the only difference between nodes is stuff that is passed in on the command line, and you want to statically allocate the search space over the nodes you have, then just put the bounds in an array and use MPI_Bcast to send them to the other nodes.

There isn't a lot of difference between that and just executing a compiled binary using RPC, though; i.e. 'ssh user@node my_binary x,y'. If you want load balancing too, then just install one of the many free batch schedulers and let your big pile of jobs queue up waiting for a free CPU.
jasonp is offline   Reply With Quote
Old 2010-07-08, 17:17   #39
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD516 Posts
Default

I've now modified the msieve-mpi branch to use a 2-D grid of MPI processes when running the linear algebra. This should hopefully allow the speedup on a cluster with many nodes to continue increasing as more machines and more bus wires are added to a Lanczos run. The previous code was rather limited in the total speedup possible with increasing the number of machines, and I suspect the new code will only be faster for very large problems, perhaps 10M and up.

Running with -nc2 as before will use a 1xN grid like before, given N processes by mpirun. For a 2-D grid of MxN MPI processes, run with '-nc2 M,N' or '-ncr M,N'.

The code now performs a row permutation on the matrix as it is read from disk, to better balance the load across many machines; a side effect of that is that one can only restart from a checkpoint if the new M matches the old one.
jasonp is offline   Reply With Quote
Old 2010-07-15, 22:40   #40
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD516 Posts
Default

Now changed to be *much* faster (30-40% with many nodes)
jasonp is offline   Reply With Quote
Old 2010-07-16, 06:04   #41
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

40728 Posts
Default

Quote:
Originally Posted by jasonp View Post
Now changed to be *much* faster (30-40% with many nodes)
How fast you ask?
Code:
Sat Jun 26 22:24:21 2010  matrix is 9140582 x 9140759 (3918.8 MB) with weight 1121375182 (122.68/col)
Wed Jul 14 10:21:15 2010  initialized process (0,0) of 4 x 8 grid
Wed Jul 14 10:23:01 2010  linear algebra at 0.0%, ETA 39h58m
Granted this is using 8 nodes of an Infiniband-connected cluster, not your average PC. Eight nodes of a Gigabit Ethernet connected cluster takes more like 90 hours. But still...
frmky is online now   Reply With Quote
Old 2010-07-17, 13:59   #42
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD516 Posts
Default

The scalability with the latest code on infiniband connected nodes is also much better than previous computational experience would suggest. For N nodes we're seeing a speedup of O(N^0.75), instead of the predicted O(N^0.5)!
jasonp is offline   Reply With Quote
Old 2010-07-18, 03:42   #43
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

83A16 Posts
Default

Actually, compiling all of the data, it empirically appears to be a bit better than that. On the newer Abe cluster, we're seeing close to N^0.86, where N is the number of computational nodes used, out to 48 nodes, and perhaps a bit better than that out to 32 nodes. On the older IB connected Lonestar cluster, it's still N^0.81 to 32 nodes. Our local GigE cluster scales as perhaps N^0.6, but there aren't enough nodes to pin the exponent down well.
Attached Thumbnails
Click image for larger version

Name:	Rev331Scaling.GIF
Views:	185
Size:	5.7 KB
ID:	5489  

Last fiddled with by frmky on 2010-07-18 at 03:42
frmky is online now   Reply With Quote
Old 2010-08-05, 19:29   #44
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3×17×23 Posts
Default

I finally have built msieve with MPI=1 after fighting with some issues. Our mpicc uses Intel's ICC compiler so not even sure if this will work or not, or the Opteron systems are using pathcc so less difficult to get to compile.

Are there any special command line options or flags with the MPI version? There isn't anything in the readme or -h that I could see. Do I just use the command line as I normally would and tell my mpi launcher to use say 16 nodes and then msieve will automatically figure out how many ranks there are or do I need to use -t 16 as well to tell it there will be 16 "threads"?

Greg, can you post the command line you use for post-processing so I can see an example?

Thanks.
Jeff Gilchrist is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
block wiedemann and block lanczos ravlyuchenko Msieve 5 2011-05-09 13:16
Why is lanczos hard to distribute? Christenson Factoring 39 2011-04-08 09:44
Block Lanczos with a reordering pass jasonp Msieve 18 2010-02-07 08:33
Lanczos error Andi47 Msieve 7 2009-01-11 19:33
Msieve Lanczos scalability Jeff Gilchrist Msieve 1 2009-01-02 09:32

All times are UTC. The time now is 00:52.


Sat Jul 17 00:52:31 UTC 2021 up 49 days, 22:39, 1 user, load averages: 1.47, 1.50, 1.41

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.