mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve LA with openmpi in the Current Age (https://www.mersenneforum.org/showthread.php?t=24907)

EdH 2019-11-03 23:44

Msieve LA with openmpi in the Current Age
 
Back in the Pentium4 days I was able to get multiple machines to run msieve LA (-nc2) and save time.

Recently I revisited this with a couple i7s to see if there could be a gain of anything. The answer that we all probably already knew, is not with Gigabit. With a relations set that took ~10 hours on one machine, it showed ~12 hours to ~22 hours (after settling) for the two machines, depending on various thread/grid combinations.

The one area that might be of use is if I can increase my memory capability by using two machines. This was something brought up by VBCurtis a while back. I haven't explored that to any extent, but if I need more than the 16G of one machine, perhaps I can use two and only lose some time (although possibly as much as 20%).

VBCurtis 2019-11-04 03:26

1. Older-generation infiniband cards are not very expensive; if one were serious about tackling jobs too large for one's best machine, a pair of IB cards with a single cable should net good msieve-MPI results.

2. I think I have openMPI installed and configured on my Z620, but msieve-MPI (self-compiled) does not function. Might you email me your msieve-MPI linux binary?

EdH 2019-11-04 04:24

[QUOTE=VBCurtis;529600]1. Older-generation infiniband cards are not very expensive; if one were serious about tackling jobs too large for one's best machine, a pair of IB cards with a single cable should net good msieve-MPI results.

2. I think I have openMPI installed and configured on my Z620, but msieve-MPI (self-compiled) does not function. Might you email me your msieve-MPI linux binary?[/QUOTE]1. I will have to check out the IB cards. I'm also revisiting the mpi aware bwc part of CADO-NFS, which I have experimented with already. I had some of the examples running, but not a real test case.

2. As to openmpi, if you're running Ubuntu 18.04, the repository openmpi is broken. It will not work if you try to use more than the localhost, which makes it rather useless. I've tried installing the latest version from the source site, but never got it to run, either. This is actually keeping me from upgrading all my 16.04 machines. I will try to search out the binaries and send them your way. I know there are three main files to install - openmpi-common, openmpi-bin and libopenmpi-dev. Give me a day or so. Locate and whereis didn't turn up anything.

VBCurtis 2019-11-04 07:36

I believe I am running 18.04 on the Z620; though for my use case localhost-only does help, as others have found that using MPI for each socket is much faster than using a single pool of threads. I'm running -t 20 right now on the dual-10-core, where I'd like to be running MPI 2x10-threads.

Good to know it's likely openMPI that I should try to address, rather than my build of msieve.

EdH 2019-11-04 13:48

[QUOTE=VBCurtis;529608]I believe I am running 18.04 on the Z620; though for my use case localhost-only does help, as others have found that using MPI for each socket is much faster than using a single pool of threads. I'm running -t 20 right now on the dual-10-core, where I'd like to be running MPI 2x10-threads.

Good to know it's likely openMPI that I should try to address, rather than my build of msieve.[/QUOTE]
I will have to do more checking, but I believe if you make sure you have,

openmpi-bin, openmpi-common and libopenmpi-dev

installed, and then compile msieve, with MPI=1, it should work as long as you don't include a hostfile.

jasonp 2019-11-04 15:21

If you are working with MPI over multiple machines, can you see if the postprocessing runs to completion using the latest Msieve svn? Greg has reported that it does not (the square root doesn't like the data it's given). It doesn't have to be for a large problem, just a C100 on a 1x2 or 2x2 grid would be enough.

EdH 2019-11-04 17:41

[QUOTE=jasonp;529622]If you are working with MPI over multiple machines, can you see if the postprocessing runs to completion using the latest Msieve svn? Greg has reported that it does not (the square root doesn't like the data it's given). It doesn't have to be for a large problem, just a C100 on a 1x2 or 2x2 grid would be enough.[/QUOTE]I can do this, but it will be later today. Currently I have a LA job that will take another 2.5+ hours to run on the host machine.

EdH 2019-11-05 01:43

2 Attachment(s)
[QUOTE=jasonp;529622]If you are working with MPI over multiple machines, can you see if the postprocessing runs to completion using the latest Msieve svn? Greg has reported that it does not (the square root doesn't like the data it's given). It doesn't have to be for a large problem, just a C100 on a 1x2 or 2x2 grid would be enough.[/QUOTE]
They didn't solve for me, either:
[code]
. . .
Mon Nov 4 20:30:25 2019 reading relations for dependency 63
Mon Nov 4 20:30:25 2019 read 0 cycles
Mon Nov 4 20:30:25 2019 reading relations for dependency 64
Mon Nov 4 20:30:25 2019 read 0 cycles
Mon Nov 4 20:30:25 2019 sqrtTime: 119
Mon Nov 4 20:30:25 2019 elapsed time 00:02:00
[/code]I've attached all the logs for your review. I forgot to run one without mpi. I'm off to do that now. . .

EdH 2019-11-05 02:00

1 Attachment(s)
[LEFT]A run without mpi worked fine:
[code]
. . .
Mon Nov 4 20:49:50 2019 initial square root is modulo 4203473
Mon Nov 4 20:49:56 2019 sqrtTime: 28
Mon Nov 4 20:49:56 2019 p50 factor: 26221114229909593079493944061795669970670518225931
Mon Nov 4 20:49:56 2019 p51 factor: 116701385250110252294900265085979409741229672958141
Mon Nov 4 20:49:56 2019 elapsed time 00:00:28
[/code]The log is attached.

To note, I ran -nc1 once and saved all the files to another directory. Then I copied them into the working directory before each -nc2 run.
[/LEFT]

EdH 2019-11-05 14:26

[QUOTE=VBCurtis;529600]1. Older-generation infiniband cards are not very expensive; if one were serious about tackling jobs too large for one's best machine, a pair of IB cards with a single cable should net good msieve-MPI results.

2. I think I have openMPI installed and configured on my Z620, but msieve-MPI (self-compiled) does not function. Might you email me your msieve-MPI linux binary?[/QUOTE]

[QUOTE=EdH;529604]1. I will have to check out the IB cards. I'm also revisiting the mpi aware bwc part of CADO-NFS, which I have experimented with already. I had some of the examples running, but not a real test case.

2. As to openmpi, if you're running Ubuntu 18.04, the repository openmpi is broken. It will not work if you try to use more than the localhost, which makes it rather useless. I've tried installing the latest version from the source site, but never got it to run, either. This is actually keeping me from upgrading all my 16.04 machines. I will try to search out the binaries and send them your way. I know there are three main files to install - openmpi-common, openmpi-bin and libopenmpi-dev. Give me a day or so. Locate and whereis didn't turn up anything.[/QUOTE]1. The cards are pretty inexpensive, but the cables are more than the cards.:smile: I might have to try this, perhaps "just for fun."

2. I was confused as to which binary(ies) you were interested in. Rather troubling, since you were specific! I'm assuming you're not interested right now, since the mpi msieve seems to be having some trouble, per previous posts and I believe your issue is really the 18.04 openmpi issue?

VBCurtis 2019-11-05 15:55

Agree to both on #2; the 18.04 openMPI is likely the problem, and the msieve bug means I shouldn't be trying it on the C207 matrix. My original intent was for you to send an msieve binary, because it didn't occur to me that MPI might be broken.


All times are UTC. The time now is 07:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.