mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Network of moderate speed (https://www.mersenneforum.org/showthread.php?t=18986)

fivemack 2013-12-07 21:00

Network of moderate speed
 
Infiniband HCAs like

[url]http://www.ebay.co.uk/itm/HP-452372-001-Infiniband-PCI-E-4X-DDR-Dual-Port-Storage-Host-Channel-Adapter-HCA-/360657396651?pt=UK_Computing_ComputerComponents_InterfaceCards&hash=item53f8db23ab[/url]

are ubiquitously available on ebay, so I've bought two and connected them with a

[url]http://www.ebay.co.uk/itm/INFINIBAND-CABLE-3-METER-CX4-CX4-/171114854785?pt=UK_Computing_Other_Computing_Networking&hash=item27d73d7981[/url]

[code]
sudo modprobe ib_umad
sudo modprobe mlx_ib
sudo modprobe ib_ipoib
[/code]

on both machines, then
[code]
sudo ifconfig ib1 10.10.10.1 netmask 255.255.255.0
[/code]
on one and
[code]
sudo ifconfig ib1 10.10.10.2 netmask 255.255.255.0
[/code]

on the other, and Robert appears to be some relative of your father:

[code]
pumpkin@pumpkin:~$ netperf -H 10.10.10.2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.10.10.2 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.00 12408.11
[/code]

There is perceptible CPU usage involved with the connection (in fact, probably just with netperf); the throughput with both ends running two gnfs-lasieve4I14e processes on each core drops to 7384Mbps.

It's still not to be sniffed at for less than £100 including postage.

Is there anything complicated I need to do to set up an MPI cluster between the two machines using the nice fast interconnect?

EdH 2013-12-27 16:29

[QUOTE=fivemack;361387]Is there anything complicated I need to do to set up an MPI cluster between the two machines using the nice fast interconnect?[/QUOTE]
Did this ever get answered?

I found the following site helpful in setting up my openmpi tests, (but you probably already know all of the following):

[URL="http://techtinkering.com/2009/12/02/setting-up-a-beowulf-cluster-using-open-mpi-on-linux/"]Setting up a Beowulf Cluster Using Open MPI on Linux[/URL]

I have found that I need to have the exact same openmpi version (therefore, linux distribution) on all the machines for success. For the following reason, all the machines also need to have the same architecture. I have been using the msieve binary of the host machine, placed in the host machine's folder that is accessed by all the others, which, through sshfs are linked to the host machine's folder:

They all have the same account name and basic folder structure:

The host has /home/math85/Math/folder, which has all the programs/files/subdirectories.

The slaves have /home/math85/Math/folder, which is empty until I use sshfs to link in the host's folder.

I hope this is helpful...


All times are UTC. The time now is 07:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.