mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2013-10-16, 20:57   #1
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

5×727 Posts
Default Running msieve LA with openmpi - do all machines need to be same/similar

If I try to set up openmpi with msieve, will it work across varying hardware, or does it need all the machines to be similar?

I have machines that range from single core P4, 1.3GHz through Core(TM)2 Quad, 2.66GHz. Would the slow machines get the same amount of work as the faster ones, or is there a way to match the matrix portions to the individual machines?
EdH is online now   Reply With Quote
Old 2013-10-16, 23:59   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

353410 Posts
Default

MPI doesn't care about whether the machines are the same or not, but the communication pattern in Msieve's LA does; it will work, but the slowest machine will hold back all the others.
jasonp is offline   Reply With Quote
Old 2013-10-17, 04:15   #3
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

5·727 Posts
Default

Quote:
Originally Posted by jasonp View Post
MPI doesn't care about whether the machines are the same or not, but the communication pattern in Msieve's LA does; it will work, but the slowest machine will hold back all the others.
Thanks! Can I overcome this by making the grid resolution small enough? Or, does msieve just make that many threads all at once rather than issuing out segments and waiting for returns?

Maybe I'm seeing this wrong, but won't openmpi allow for issuing tasks such that all "slots" run one process to completion before they accept another? Maybe this would allow for the slower machines to process fewer portions than the faster and balance out in the end? Of course, I suppose the added communication overhead may offset any potential gain.

I have openmpi running on several machines now and will try some experiments as soon as I can make some more time. And, msieve compiled with MPI=1 with no issues.

Thanks for all...
EdH is online now   Reply With Quote
Old 2013-10-17, 11:27   #4
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

353410 Posts
Default

There is no dynamic parallelism here, all the MPI processes are assigned to physical machines at the outset. You can assign more MPI processes to the faster machines and they will do comparatively more work, but if your P4 is 5x slower than the other machines then unbalancing the workload will not correct that. The LA is a tightly-coupled job, all the machines have to frequently synchronize with each other.

The LA attempts to give all MPI processes the same amount of work to do, and forcing the faster machines to do 5x as much as the slower ones will make the total time worse to the point it would would be better to run multithreaded on a single machine. That's the comparison to beat, not making an MPI keep your 'cluster' as busy as possible.

Also, you get a single binary for all the machines to use, so beware that new CPU instructions are not given to old machines (this has bitten you before)

Last fiddled with by jasonp on 2013-10-17 at 11:29
jasonp is offline   Reply With Quote
Old 2013-10-17, 12:53   #5
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

5×727 Posts
Default

Thanks again. I've got two core(tm)2 quads that aren't too far apart in speed. Maybe I'll restrict the LA portion to them for now and see how that works out. Actually, due to memory restrictions, they run LA faster on only two cores, so maybe I can set up mpi with two slots each and see how it compares.

Quote:
Also, you get a single binary for all the machines to use, so beware that new CPU instructions are not given to old machines (this has bitten you before)
Yeah, I've got an AMD machine sitting here, waiting for other work because it doesn't have SSE2.

Thanks for all...
EdH is online now   Reply With Quote
Old 2013-10-17, 22:34   #6
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

1110001100112 Posts
Default

Apparently, I'm not in the region to need mpi yet.
Code:
error: MPI size 1 incompatible with 2 x 1 grid, or 1 x 2 or etc...
The only thing that would work was 1 x 1. I did try a rather small set of data from a recent c116. I'll play more later when something larger comes along...
EdH is online now   Reply With Quote
Old 2013-10-18, 01:12   #7
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

2·3·19·31 Posts
Default

If you are using mpirun, you need to add '-np 2' and also pass '-nc2 1,2' in the Msieve command line. There is no lower bound on the size of problem where MPI is turned off, but you will get silent failures for matrices below 50k in size. If you don't want multithreaded runs, don't pass any '-t' option.

Last fiddled with by jasonp on 2013-10-18 at 01:13
jasonp is offline   Reply With Quote
Old 2013-10-19, 15:03   #8
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

5×727 Posts
Default

Well, I appear to be missing something. I thought perhaps I needed to start from scratch with msieve, so I just manually stepped through a c127. I'm having the same results: msieve appears to be locked into a single MPI setting. The log entry for each step says:
Code:
MPI process 0 of 1
Do I need to start earlier in the process to invoke more MPI processes?
I performed the following:

./msieve -i number.ini -np
used gnfs-lasieve4I14e across several machines to collect and combine relations
cat number.dat | ./remdups4 200 -v > numberd.dat
./msieve -i number.ini -s numberd.dat -l number.log -nf msieve.fb -t 4 -nc1
./msieve -i number.ini -s numberd.dat -l number.log -nf msieve.fb -nc2 2,2
All I get is:
Code:
error: MPI size 1 incompatible with 2 x 2 grid
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 11.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
I get the same for nc2 2,1 and nc2 1,2. The log shows:
Code:
Sat Oct 19 10:11:48 2013  
Sat Oct 19 10:11:48 2013  
Sat Oct 19 10:11:48 2013  Msieve v. 1.52 (SVN 886M)
Sat Oct 19 10:11:48 2013  random seeds: f2bb119f 236e6081
Sat Oct 19 10:11:48 2013  MPI process 0 of 1
Sat Oct 19 10:11:48 2013  factoring 6784444815212871648987113498975198812582601017150300513612649329477342951718412056224904684878476341466112558732473153528742691 (127 digits)
Sat Oct 19 10:11:49 2013  searching for 15-digit factors
Sat Oct 19 10:11:49 2013  commencing number field sieve (127-digit input)
Sat Oct 19 10:11:49 2013  R0: -6466522433420943924999750
Sat Oct 19 10:11:49 2013  R1: 21859154389253
Sat Oct 19 10:11:49 2013  A0: 141396535680280515764228165941687
Sat Oct 19 10:11:49 2013  A1: -198220819293965173286828926
Sat Oct 19 10:11:49 2013  A2: -2508680094167911717131
Sat Oct 19 10:11:49 2013  A3: -2449278542247386
Sat Oct 19 10:11:49 2013  A4: 3727847252
Sat Oct 19 10:11:49 2013  A5: 600
Sat Oct 19 10:11:49 2013  skew 934828.37, size 3.065e-12, alpha -7.032, combined = 1.147e-10 rroots = 3
Sat Oct 19 10:11:49 2013  
Sat Oct 19 10:11:49 2013  commencing linear algebra
And, then the error message above.

Sorry that I always seem to have these troubles...

Thanks for all.
EdH is online now   Reply With Quote
Old 2013-10-19, 15:16   #9
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

638210 Posts
Default

You need to start msieve with

'mpirun -n {number of machines} msieve ...'
fivemack is offline   Reply With Quote
Old 2013-10-19, 15:26   #10
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

1110001100112 Posts
Default

Quote:
Originally Posted by fivemack View Post
You need to start msieve with

'mpirun -n {number of machines} msieve ...'
I think that clears it up. Thanks. I thought msieve would give me a command line for mpi, but I see I need to run the msieve command in the mpirun. Sorry I'm so dense...
EdH is online now   Reply With Quote
Old 2013-10-19, 15:47   #11
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

5×727 Posts
Default

Thanks! That seems to have gotten it running...
EdH is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
How I Run a Larger Factorization Using Msieve, gnfs and factmsieve.py on Several Ubuntu Machines EdH EdH 7 2019-08-21 02:26
How I Install msieve onto my Ubuntu Machines EdH EdH 0 2018-02-23 14:43
Help need to running Msieve appleseed Msieve 12 2016-04-10 02:31
Running on multiple machines Helfire Software 8 2004-01-14 00:09
Running a LL test on 2 different machines lycorn Software 10 2003-01-13 19:34

All times are UTC. The time now is 18:50.

Sun Mar 7 18:50:26 UTC 2021 up 94 days, 15:01, 1 user, load averages: 2.09, 2.30, 2.06

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.