![]() |
Any Infiniband Gurus Available?
I have acquired two cards and a cable and am hoping to connect two Z620s for LA work. I have installed one card in one machine, but nothing in the other, yet. So far, I've tried several web pages that are supposed to walk me though the process, but things don't succeed for me the way the pages succeed.
One example, I installed rmda-core, and was supposed to run systemctl start rmda.service, but the OS says there is no rmda.service. opensm installed and its start service returned as expected. The cards don't appear to have a name, but their Model# is HSTNS-BN80. The machines are Z620 dual Xeons running Ubuntu 20.04. A forum search for "infiniband" turned up three pages of threads, but I couldn't detect any helpful possibilities via the titles. Any assistance would be appreciated. |
I am very interested in doing this exact-same thing, if Ed gets his setup working. Two cards and a cable are fairly cheap used, and connecting just two machines means no need for a switch.
I'm told (well, I read on a website like the ones Ed found) that even with 2-port cards, one cannot connect three machines without a switch. I'd like to know if that is true! |
I haven't tried this in quite a while, but when I built my own small clusters I used [URL="http://www.rocksclusters.org/"]Rocks[/URL]. That configures IB out of the box, as well as MPI supporting IB, a resource manager, and a job scheduler.
|
[QUOTE=frmky;575475]I haven't tried this in quite a while, but when I built my own small clusters I used [URL="http://www.rocksclusters.org/"]Rocks[/URL]. That configures IB out of the box, as well as MPI supporting IB, a resource manager, and a job scheduler.[/QUOTE]
Thanks. This looks like it requires CentOS, or am I mistaken? It also looks like nothing has been done with it since 2017. I'll study it a bit more. |
Yes, in my experience most cluster systems are built on a Red Hat-based distro. Although it looks like Rocks hasn't been updated in a while, and the community appears to be moving to OpenHPC.
[url]https://openhpc.community/[/url] [url]https://github.com/XSEDE/CRI_XCBC/tree/master/doc[/url] |
[QUOTE=frmky;575520]Yes, in my experience most cluster systems are built on a Red Hat-based distro. Although it looks like Rocks hasn't been updated in a while, and the community appears to be moving to OpenHPC.
[URL]https://openhpc.community/[/URL] [URL]https://github.com/XSEDE/CRI_XCBC/tree/master/doc[/URL][/QUOTE] Thanks! I'll look these over. I currently have an openmpi cluster of various machines that's running over ethernet. I was kind of hoping I could just get the communication between two of the machines to link with a couple Infiniband cards, so I could get enough bandwidth to run Msieve LA a bit faster. I used to run it with Gigabit until the machines got so fast Gigabit couldn't handle it. |
After quite a while of just letting everything sit, I thought about this again. Everything I was reading was pointing to the brand Mellanox. Of course, the brand of cards I had acquired was HP. Well, I bought two Mellanox cards and three cables (because they came that way). These actually gave me connectivity betwen the two machines without much trouble, kind of. I have Infiniband connected between the two machines, but now my Ethernet cluster for ecmpi doesn't work when I have the Infiniband enabled, even though the hostfile still uses the Ethernet addresses. It actually fails trying to use the Infiniband node for the localhost machine.
So, I have made some progress and am looking forward to making more as I spend more time "playing." |
| All times are UTC. The time now is 16:39. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.