View Single Post
 2019-04-04, 22:33 #1 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 22·32·101 Posts How I Install and Run ecmpi Across Several Ubuntu Machines (Note: I expect to keep the first post of each of these "How I Install..." threads up-to-date with the latest version. Please read the rest of each thread to see what may have led to the current set of instructions.) This thread will explain the steps I use to install* ecmpi (by Cyril Bouvier) onto several computers which are already running Ubuntu. This procedure should work for other linux distributions as well but the only other one I've currently tested is Debain and I haven't gotten it to work with that OS, yet. *In this instance "install" is referring to the acquiring and compilation of the ecmpi package only. The binaries will have to be called using their respective paths. Special Note: I have found that I cannot get ecmpi to work past the host machine, on any combination involving 18.04, due to openmpi troubles. I will expect the user of these steps to be able to use the sudo command. For ecmpi to run, mpi or openmpi must be installed on the machines and all machines must have a user of the same name. I will use the name mpi for a standard user on all the machines in this thread. The necessary programs to be installed will need to be done so through an admin account prior to the installation of ecmpi at the mpi user level. I will use a directory called ecmpi and subdirectory ecmpi/work for these installations. You may elect to use names of your own choice. These instructions can be used as a reference only, if you want, but I will provide specifics that work for me. If you follow these steps as I provide them, you should end up with a working installation. First, follow the procedures in: How I Install GMP on my Ubuntu Machines and How I Install GMP-ECM on my Ubuntu Machines or, install GMP and ECM from the repositories. If you have already installed most of the following packages, you can skip them, however, notice I have added the packages git and three openmpi packages to the list. You will now need git to retrieve the latest version of ecmpi using my steps and openmpi is needed to distribute to other machines. Also, cmake will be needed to compile ecmpi. Open a terminal and enter: Code: sudo apt-get update You should be prompted for your password. This prompt should only appear once for your terminal session, unless you leave it idle for a long time. After the update completes and the user prompt reappears, enter: Code: sudo apt-get install g++ m4 zlib1g-dev make p7zip git openmpi-bin openmpi-common libopenmpi-dev cmake Accept the prompt. In order for the openmpi cluster to run properly with more than three total machines, every machine has to have free and clear communication with all others. In my case, using openssh, I had to make sure all keys were listed in the authorized_keys file on every machine and all machines were included in all known_hosts files. If you are already running openssh, then you should be able to achieve the two-way links between all your machines. If not, you should research openssh to cover all the details of making sure your systems are secure. As mentioned previously, all machines have to have a user with the same username. To accomplish this, go to System Settings>User Accounts and add your user. For this "How I" I'll use mpi as the new user and create a Standard account. Go ahead and password protect the new mpi user. If you will be using ssh, for each new mpi account, you will need to do all the ssh stuff to create keys and add them to all the other mpi accounts. Every time you add a new machine, you will need to add it to all the others and vice versa. At this point, if you're not already at a terminal in one of the mpi accounts, open one, either by logging in with the GUI, or possibly sshing in from one of the main accounts. Let's consider this mpi account to be the host. The rest will be slaves in our cluster. In the terminal (mpi@:~\$) enter the following: Code: git clone https://gite.lirmm.fr/bouvier/ecmpi.git ecmpi When this completes there should be a directory named ecmpi. Move into it with: Code: cd ecmpi Now, issue the following commands: Code: cmake ./ make If no errors appear, you should be all set to set up and test a working environment. The first thing is to create a hostfile that will hold the info for the host and all the slaves that will eventually be run. Create a file named hostfile within ecmpi and for now have one line which reads: Code: localhost slots=2 Now, create a directory called work within the ecmpi directory. and cd into it. Next, enter the following to test the setup: Code: mpirun -np 2 --hostfile ../hostfile ../ecmpi -q -N 1623678511619010615822065755621691733061933624824934705601768533043353 -nb 4 -B1 300000 The factors should be returned: Code: Results: 1623678511619010615822065755621691733061933624824934705601768533043353 = 1510114122412521897224212426928165813328765695962607 * 1075202521134668279 Note: Occasionally, the above will return unsolved. If this occurs, run it again and the factors should be found. To add slaves, go through the entire above procedure for each new machine and note their IP address. Perform all the ssh necessities and then add the new machine to the hostfile: Code: localhost slots=<# of cores> mpi@ slots=<# of cores> mpi@ slots=<# of cores> ... After each addition of a new machine, test the operation with the previous command, but vary the -np and -nb values. -np is the number of processes to be run simultaneously on the cluster. It must be <= the number of cores in the hostfile. -nb is the number of curves to be run. Last fiddled with by EdH on 2019-09-12 at 22:54