mersenneforum.org Msieve LA with openmpi in the Current Age
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2019-11-03, 23:44 #1 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 5·727 Posts Msieve LA with openmpi in the Current Age Back in the Pentium4 days I was able to get multiple machines to run msieve LA (-nc2) and save time. Recently I revisited this with a couple i7s to see if there could be a gain of anything. The answer that we all probably already knew, is not with Gigabit. With a relations set that took ~10 hours on one machine, it showed ~12 hours to ~22 hours (after settling) for the two machines, depending on various thread/grid combinations. The one area that might be of use is if I can increase my memory capability by using two machines. This was something brought up by VBCurtis a while back. I haven't explored that to any extent, but if I need more than the 16G of one machine, perhaps I can use two and only lose some time (although possibly as much as 20%).
 2019-11-04, 03:26 #2 VBCurtis     "Curtis" Feb 2005 Riverside, CA 2·2,339 Posts 1. Older-generation infiniband cards are not very expensive; if one were serious about tackling jobs too large for one's best machine, a pair of IB cards with a single cable should net good msieve-MPI results. 2. I think I have openMPI installed and configured on my Z620, but msieve-MPI (self-compiled) does not function. Might you email me your msieve-MPI linux binary?
2019-11-04, 04:24   #3
EdH

"Ed Hall"
Dec 2009

5·727 Posts

Quote:
 Originally Posted by VBCurtis 1. Older-generation infiniband cards are not very expensive; if one were serious about tackling jobs too large for one's best machine, a pair of IB cards with a single cable should net good msieve-MPI results. 2. I think I have openMPI installed and configured on my Z620, but msieve-MPI (self-compiled) does not function. Might you email me your msieve-MPI linux binary?
1. I will have to check out the IB cards. I'm also revisiting the mpi aware bwc part of CADO-NFS, which I have experimented with already. I had some of the examples running, but not a real test case.

2. As to openmpi, if you're running Ubuntu 18.04, the repository openmpi is broken. It will not work if you try to use more than the localhost, which makes it rather useless. I've tried installing the latest version from the source site, but never got it to run, either. This is actually keeping me from upgrading all my 16.04 machines. I will try to search out the binaries and send them your way. I know there are three main files to install - openmpi-common, openmpi-bin and libopenmpi-dev. Give me a day or so. Locate and whereis didn't turn up anything.

 2019-11-04, 07:36 #4 VBCurtis     "Curtis" Feb 2005 Riverside, CA 2×2,339 Posts I believe I am running 18.04 on the Z620; though for my use case localhost-only does help, as others have found that using MPI for each socket is much faster than using a single pool of threads. I'm running -t 20 right now on the dual-10-core, where I'd like to be running MPI 2x10-threads. Good to know it's likely openMPI that I should try to address, rather than my build of msieve.
2019-11-04, 13:48   #5
EdH

"Ed Hall"
Dec 2009

5×727 Posts

Quote:
 Originally Posted by VBCurtis I believe I am running 18.04 on the Z620; though for my use case localhost-only does help, as others have found that using MPI for each socket is much faster than using a single pool of threads. I'm running -t 20 right now on the dual-10-core, where I'd like to be running MPI 2x10-threads. Good to know it's likely openMPI that I should try to address, rather than my build of msieve.
I will have to do more checking, but I believe if you make sure you have,

openmpi-bin, openmpi-common and libopenmpi-dev

installed, and then compile msieve, with MPI=1, it should work as long as you don't include a hostfile.

 2019-11-04, 15:21 #6 jasonp Tribal Bullet     Oct 2004 2×3×19×31 Posts If you are working with MPI over multiple machines, can you see if the postprocessing runs to completion using the latest Msieve svn? Greg has reported that it does not (the square root doesn't like the data it's given). It doesn't have to be for a large problem, just a C100 on a 1x2 or 2x2 grid would be enough. Last fiddled with by jasonp on 2019-11-04 at 15:22
2019-11-04, 17:41   #7
EdH

"Ed Hall"
Dec 2009

70638 Posts

Quote:
 Originally Posted by jasonp If you are working with MPI over multiple machines, can you see if the postprocessing runs to completion using the latest Msieve svn? Greg has reported that it does not (the square root doesn't like the data it's given). It doesn't have to be for a large problem, just a C100 on a 1x2 or 2x2 grid would be enough.
I can do this, but it will be later today. Currently I have a LA job that will take another 2.5+ hours to run on the host machine.

2019-11-05, 01:43   #8
EdH

"Ed Hall"
Dec 2009

5×727 Posts

Quote:
 Originally Posted by jasonp If you are working with MPI over multiple machines, can you see if the postprocessing runs to completion using the latest Msieve svn? Greg has reported that it does not (the square root doesn't like the data it's given). It doesn't have to be for a large problem, just a C100 on a 1x2 or 2x2 grid would be enough.
They didn't solve for me, either:
Code:
. . .
Mon Nov  4 20:30:25 2019  reading relations for dependency 63
Mon Nov  4 20:30:25 2019  read 0 cycles
Mon Nov  4 20:30:25 2019  reading relations for dependency 64
Mon Nov  4 20:30:25 2019  read 0 cycles
Mon Nov  4 20:30:25 2019  sqrtTime: 119
Mon Nov  4 20:30:25 2019  elapsed time 00:02:00
I've attached all the logs for your review. I forgot to run one without mpi. I'm off to do that now. . .
Attached Files
 comp.log.1x2.zip (4.8 KB, 116 views) comp.log.2x2.zip (6.3 KB, 130 views)

2019-11-05, 02:00   #9
EdH

"Ed Hall"
Dec 2009

5×727 Posts

A run without mpi worked fine:
Code:
. . .
Mon Nov  4 20:49:50 2019  initial square root is modulo 4203473
Mon Nov  4 20:49:56 2019  sqrtTime: 28
Mon Nov  4 20:49:56 2019  p50 factor: 26221114229909593079493944061795669970670518225931
Mon Nov  4 20:49:56 2019  p51 factor: 116701385250110252294900265085979409741229672958141
Mon Nov  4 20:49:56 2019  elapsed time 00:00:28
The log is attached.

To note, I ran -nc1 once and saved all the files to another directory. Then I copied them into the working directory before each -nc2 run.
Attached Files
 comp.log.mpi00.txt (11.1 KB, 97 views)

2019-11-05, 14:26   #10
EdH

"Ed Hall"
Dec 2009

1110001100112 Posts

Quote:
 Originally Posted by VBCurtis 1. Older-generation infiniband cards are not very expensive; if one were serious about tackling jobs too large for one's best machine, a pair of IB cards with a single cable should net good msieve-MPI results. 2. I think I have openMPI installed and configured on my Z620, but msieve-MPI (self-compiled) does not function. Might you email me your msieve-MPI linux binary?
Quote:
 Originally Posted by EdH 1. I will have to check out the IB cards. I'm also revisiting the mpi aware bwc part of CADO-NFS, which I have experimented with already. I had some of the examples running, but not a real test case. 2. As to openmpi, if you're running Ubuntu 18.04, the repository openmpi is broken. It will not work if you try to use more than the localhost, which makes it rather useless. I've tried installing the latest version from the source site, but never got it to run, either. This is actually keeping me from upgrading all my 16.04 machines. I will try to search out the binaries and send them your way. I know there are three main files to install - openmpi-common, openmpi-bin and libopenmpi-dev. Give me a day or so. Locate and whereis didn't turn up anything.
1. The cards are pretty inexpensive, but the cables are more than the cards. I might have to try this, perhaps "just for fun."

2. I was confused as to which binary(ies) you were interested in. Rather troubling, since you were specific! I'm assuming you're not interested right now, since the mpi msieve seems to be having some trouble, per previous posts and I believe your issue is really the 18.04 openmpi issue?

 2019-11-05, 15:55 #11 VBCurtis     "Curtis" Feb 2005 Riverside, CA 124616 Posts Agree to both on #2; the 18.04 openMPI is likely the problem, and the msieve bug means I shouldn't be trying it on the C207 matrix. My original intent was for you to send an msieve binary, because it didn't occur to me that MPI might be broken.

 Similar Threads Thread Thread Starter Forum Replies Last Post EdH GMP-ECM 2 2020-10-05 16:25 EdH Linux 0 2019-09-13 02:03 EdH Msieve 5 2017-01-16 17:22 EdH Msieve 32 2013-11-08 17:57 fivemack NFSNET Discussion 90 2006-11-13 13:37

All times are UTC. The time now is 01:18.

Mon Mar 8 01:18:35 UTC 2021 up 94 days, 21:29, 1 user, load averages: 1.86, 2.06, 2.16