![]() |
![]() |
#1 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5×727 Posts |
![]()
If I try to set up openmpi with msieve, will it work across varying hardware, or does it need all the machines to be similar?
I have machines that range from single core P4, 1.3GHz through Core(TM)2 Quad, 2.66GHz. Would the slow machines get the same amount of work as the faster ones, or is there a way to match the matrix portions to the individual machines? |
![]() |
![]() |
![]() |
#2 |
Tribal Bullet
Oct 2004
353410 Posts |
![]()
MPI doesn't care about whether the machines are the same or not, but the communication pattern in Msieve's LA does; it will work, but the slowest machine will hold back all the others.
|
![]() |
![]() |
![]() |
#3 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
5·727 Posts |
![]() Quote:
Maybe I'm seeing this wrong, but won't openmpi allow for issuing tasks such that all "slots" run one process to completion before they accept another? Maybe this would allow for the slower machines to process fewer portions than the faster and balance out in the end? Of course, I suppose the added communication overhead may offset any potential gain. I have openmpi running on several machines now and will try some experiments as soon as I can make some more time. And, msieve compiled with MPI=1 with no issues. Thanks for all... |
|
![]() |
![]() |
![]() |
#4 |
Tribal Bullet
Oct 2004
353410 Posts |
![]()
There is no dynamic parallelism here, all the MPI processes are assigned to physical machines at the outset. You can assign more MPI processes to the faster machines and they will do comparatively more work, but if your P4 is 5x slower than the other machines then unbalancing the workload will not correct that. The LA is a tightly-coupled job, all the machines have to frequently synchronize with each other.
The LA attempts to give all MPI processes the same amount of work to do, and forcing the faster machines to do 5x as much as the slower ones will make the total time worse to the point it would would be better to run multithreaded on a single machine. That's the comparison to beat, not making an MPI keep your 'cluster' as busy as possible. Also, you get a single binary for all the machines to use, so beware that new CPU instructions are not given to old machines (this has bitten you before) Last fiddled with by jasonp on 2013-10-17 at 11:29 |
![]() |
![]() |
![]() |
#5 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
5×727 Posts |
![]()
Thanks again. I've got two core(tm)2 quads that aren't too far apart in speed. Maybe I'll restrict the LA portion to them for now and see how that works out. Actually, due to memory restrictions, they run LA faster on only two cores, so maybe I can set up mpi with two slots each and see how it compares.
Quote:
Thanks for all... |
|
![]() |
![]() |
![]() |
#6 |
"Ed Hall"
Dec 2009
Adirondack Mtns
1110001100112 Posts |
![]()
Apparently, I'm not in the region to need mpi yet.
Code:
error: MPI size 1 incompatible with 2 x 1 grid, or 1 x 2 or etc... |
![]() |
![]() |
![]() |
#7 |
Tribal Bullet
Oct 2004
2·3·19·31 Posts |
![]()
If you are using mpirun, you need to add '-np 2' and also pass '-nc2 1,2' in the Msieve command line. There is no lower bound on the size of problem where MPI is turned off, but you will get silent failures for matrices below 50k in size. If you don't want multithreaded runs, don't pass any '-t' option.
Last fiddled with by jasonp on 2013-10-18 at 01:13 |
![]() |
![]() |
![]() |
#8 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5×727 Posts |
![]()
Well, I appear to be missing something. I thought perhaps I needed to start from scratch with msieve, so I just manually stepped through a c127. I'm having the same results: msieve appears to be locked into a single MPI setting. The log entry for each step says:
Code:
MPI process 0 of 1 I performed the following: ./msieve -i number.ini -np used gnfs-lasieve4I14e across several machines to collect and combine relations cat number.dat | ./remdups4 200 -v > numberd.dat ./msieve -i number.ini -s numberd.dat -l number.log -nf msieve.fb -t 4 -nc1 ./msieve -i number.ini -s numberd.dat -l number.log -nf msieve.fb -nc2 2,2 All I get is: Code:
error: MPI size 1 incompatible with 2 x 2 grid -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 11. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- Code:
Sat Oct 19 10:11:48 2013 Sat Oct 19 10:11:48 2013 Sat Oct 19 10:11:48 2013 Msieve v. 1.52 (SVN 886M) Sat Oct 19 10:11:48 2013 random seeds: f2bb119f 236e6081 Sat Oct 19 10:11:48 2013 MPI process 0 of 1 Sat Oct 19 10:11:48 2013 factoring 6784444815212871648987113498975198812582601017150300513612649329477342951718412056224904684878476341466112558732473153528742691 (127 digits) Sat Oct 19 10:11:49 2013 searching for 15-digit factors Sat Oct 19 10:11:49 2013 commencing number field sieve (127-digit input) Sat Oct 19 10:11:49 2013 R0: -6466522433420943924999750 Sat Oct 19 10:11:49 2013 R1: 21859154389253 Sat Oct 19 10:11:49 2013 A0: 141396535680280515764228165941687 Sat Oct 19 10:11:49 2013 A1: -198220819293965173286828926 Sat Oct 19 10:11:49 2013 A2: -2508680094167911717131 Sat Oct 19 10:11:49 2013 A3: -2449278542247386 Sat Oct 19 10:11:49 2013 A4: 3727847252 Sat Oct 19 10:11:49 2013 A5: 600 Sat Oct 19 10:11:49 2013 skew 934828.37, size 3.065e-12, alpha -7.032, combined = 1.147e-10 rroots = 3 Sat Oct 19 10:11:49 2013 Sat Oct 19 10:11:49 2013 commencing linear algebra Sorry that I always seem to have these troubles... ![]() Thanks for all. |
![]() |
![]() |
![]() |
#9 |
(loop (#_fork))
Feb 2006
Cambridge, England
638210 Posts |
![]()
You need to start msieve with
'mpirun -n {number of machines} msieve ...' |
![]() |
![]() |
![]() |
#10 |
"Ed Hall"
Dec 2009
Adirondack Mtns
1110001100112 Posts |
![]() |
![]() |
![]() |
![]() |
#11 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5×727 Posts |
![]()
Thanks! That seems to have gotten it running...
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How I Run a Larger Factorization Using Msieve, gnfs and factmsieve.py on Several Ubuntu Machines | EdH | EdH | 7 | 2019-08-21 02:26 |
How I Install msieve onto my Ubuntu Machines | EdH | EdH | 0 | 2018-02-23 14:43 |
Help need to running Msieve | appleseed | Msieve | 12 | 2016-04-10 02:31 |
Running on multiple machines | Helfire | Software | 8 | 2004-01-14 00:09 |
Running a LL test on 2 different machines | lycorn | Software | 10 | 2003-01-13 19:34 |