mersenneforum.org 3,748+ c204 Smaller-but-Needed
 Register FAQ Search Today's Posts Mark Forums Read

 2021-11-18, 14:56 #276 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 11010010112 Posts Unfortunately, I have little experience in MPI, I used it in university, but there it was always set up for me and I only needed to use it. My plan was running LA on my 5950X, which has 64 GB of RAM. If it would help, I could connect that computer to my 1950X via 10 gig LAN. I do not have Infiniband hardware. The motherboard of that system is a bit flaky as I used to run it fully loaded (128 GB, 8 slots), but now I can only run it dual channel (32 GB). In this state, it is at least stable again. So……does MPI make sense with my setup? …if no, would it make sense if I got my 1950X up and stable with 64 GB RAM? I wanted to replace the mainboard for some time now. Since the cheapest boards are still above 250 $new, I wanted to look for used boards. …how much of a speedup could be expected? …would having 2x10 gig networking (two parallel connections) speed things up further? Last fiddled with by kruoli on 2021-11-18 at 15:03 Reason: Word order. 2021-11-18, 16:43 #277 charybdis Apr 2020 10010101012 Posts Quote:  Originally Posted by kruoli Wed Nov 17 19:18:16 2021 found 300859501 duplicates and 861473338 unique relations That looks good, I would guess? Very good! This confirms that A=30 was the right choice. Quote:  Originally Posted by EdH Are you going to employ MPI in Msieve LA? I'm not sure of your hardware setup, but I used mpi across two Xeons for a larger run a while back. It did save some time, but it was difficult to learn the nuances. charybdis was quite helpful with it. Unfortunately, I can't locate the posts about getting my setup to work best. They were PMs, not posts Quote:  Originally Posted by kruoli Unfortunately, I have little experience in MPI, I used it in university, but there it was always set up for me and I only needed to use it. My plan was running LA on my 5950X, which has 64 GB of RAM. If it would help, I could connect that computer to my 1950X via 10 gig LAN. I do not have Infiniband hardware. The motherboard of that system is a bit flaky as I used to run it fully loaded (128 GB, 8 slots), but now I can only run it dual channel (32 GB). In this state, it is at least stable again. So……does MPI make sense with my setup? …if no, would it make sense if I got my 1950X up and stable with 64 GB RAM? I wanted to replace the mainboard for some time now. Since the cheapest boards are still above 250$ new, I wanted to look for used boards. …how much of a speedup could be expected? …would having 2x10 gig networking (two parallel connections) speed things up further?
The answer to these is generally "maybe - give it a try". As long as the matrix remains above 32GB you won't be able to run it on the 1950X, but the eventual matrix ought to fit in 32GB.

The 5950X has enough threads that it might be worth trying MPI even without the second machine. You can try mpirun --bind-to none -np 2 msieve -nc2 2,1 -t 16 to start out; you'll need --bind-to when running with 2 processes, as otherwise MPI will bizarrely default to binding each process to a core! For Ed's dual Xeon, the solution ought to have been --bind-to socket but for some reason this didn't work as it was supposed to.
Experiment with different numbers of threads and MPI processes to see what works best.

 2021-11-18, 16:58 #278 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 2·11·191 Posts Although I got my Infiniband working across two machines, I think that may have been after I did the MPI (openMPI) LA for a large composite. In my case, I used it to gain some time across my two Xeon processors of my Z620 machine. It did seem to cut some time off the LA. For me, my machines mostly run Ubuntu 20.04 ATM and openMPI is easily installed on the OS. (I did discover that Ubuntu 18.04 openMPI was broken and never fixed, as far as I could tell.) I just remembered that all my info is actually in some PMs. I will dig out some of it and post it in a little while. . .
2021-11-18, 17:37   #279
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

5·1,033 Posts

Quote:
 Originally Posted by charybdis You'll probably want to wait at least until a matrix can be built with target_density=120; I expect VBCurtis will chime in later with his thoughts. Chances are if you tried TD=120 now you would get "too few cycles, matrix probably cannot build". How is the duplication rate looking?
Using https://mersenneforum.org/showthread.php?t=24054 data as a guide and adding 35% to relations to account for 33/34LP rather than 33LP for all the 16e jobs run on NFS@home, I figured 950M unique relations would be sufficient to get a decent matrix. We're at 860M now, which is a higher uniques ratio than I expected. Yay! We might make 950M uniques with 1.3G raw.

I agree that using target_density=120 is the minimum for this filtering job. My opinion (not based on enough data, I'm afraid) is that once we have enough relations to build a matrix at TD=124, we've likely gathered enough relations such that time spent on sieving will be mostly wasted when compared to time saved on matrix due to those extra relations.

I'd run filtering again with TD=100 to see if a matrix builds and how much it shrinks. 87M is big. 17,000 [thread-] hours!

Then I'd gather relations again and filter with TD=120 when we reach 1.25G raw relations.

Last fiddled with by VBCurtis on 2021-11-18 at 17:39

 2021-11-18, 17:38 #280 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 106A16 Posts I see I missed charybdis' post. Sorry 'bout that! But, I did find the PMs and here are some details. I just looked over the experimentation I did for using openMPI with Msieve LA and the experiments were done with the Msieve benchmark files. Many of the openMPI tests actually added a great deal of time, but with charybdis' help, I was able to find a command set that saved some time. This was all done on a Z620 dual Xeon 6c/12t machine. The first set of tests showed the following: Code: mpirun -np 2 msieve -nc2 2,1 -t 6 ETA 47 h 55 m mpirun -np 2 msieve -nc2 2,1 -t 12 ETA 48 h 22 m mpirun -np 4 msieve -nc2 2,2 -t 3 ETA 10 h 25 m mpirun -np 4 msieve -nc2 2,2 -t 6 ETA 8 h 29 m msieve -nc2 -t 12 ETA 10 h 20 m msieve -nc2 -t 24 ETA 9 h 27 m We did a lot of experimenting with options and to make it short, "--bind-to none" gave the best results in the command: Code: mpirun --bind-to none -np 2 ./msieve -nc2 2,1 -t 12 ETA 7 h 34 m turned out to show some time savings for my machine. Last fiddled with by EdH on 2021-11-18 at 17:41 Reason: command correction
2021-11-18, 18:56   #281
frmky

Jul 2003
So Cal

24·3·47 Posts

Quote:
 Originally Posted by kruoli Unfortunately, I have little experience in MPI, I used it in university, but there it was always set up for me and I only needed to use it. My plan was running LA on my 5950X, which has 64 GB of RAM. If it would help, I could connect that computer to my 1950X via 10 gig LAN. I do not have Infiniband hardware. The motherboard of that system is a bit flaky as I used to run it fully loaded (128 GB, 8 slots), but now I can only run it dual channel (32 GB). In this state, it is at least stable again. So……does MPI make sense with my setup? …if no, would it make sense if I got my 1950X up and stable with 64 GB RAM? I wanted to replace the mainboard for some time now. Since the cheapest boards are still above 250 \$ new, I wanted to look for used boards. …how much of a speedup could be expected? …would having 2x10 gig networking (two parallel connections) speed things up further?
I wouldn't try using MPI to run on both the 5950X and the 1950X. The vectors need to be transferred in each iteration. The bandwidth is ok, but higher ethernet latency kills performance. It would likely be slower than the 5950X alone.

With cores divided into chiplets on the 5950X, MPI might help. It's not NUMA, but I would still try it. On Ubuntu, getting a working MPI installed is as simple as
Code:
sudo apt install openmpi-bin openmpi-doc libopenmpi-dev
I would compare
./msieve -nc2 -t 32 -v
mpirun -np 2 ./msieve -nc2 1,2 -t 16 -v
mpirun -np 4 ./msieve -nc2 2,2 -t 8 -v

2021-11-18, 19:11   #282
kruoli

"Oliver"
Sep 2017
Porta Westfalica, DE

34B16 Posts

Thanks for all your input on MPI!

Quote:
 Originally Posted by frmky I would compare ./msieve -nc2 -t 32 -v mpirun -np 2 ./msieve -nc2 1,2 -t 16 -v mpirun -np 4 ./msieve -nc2 2,2 -t 8 -v
This can be run on the preliminary data and will give "portable" intel for the "correct" run? And I guess I should add --bind-to none as charybdis suggested (and also 2,1?)?

Edit: This is bogus as long as I'm sieving on the same machine. I will do this when sieving is done. That way it will not delay sieving where others are involved.

Quote:
 Originally Posted by VBCurtis I'd run filtering again with TD=100 to see if a matrix builds and how much it shrinks. 87M is big. 17,000 [thread-] hours!
Right now? It should be finished tomorrow my time.

Last fiddled with by kruoli on 2021-11-18 at 19:31 Reason: Additions.

2021-11-18, 19:30   #283
charybdis

Apr 2020

3×199 Posts

Quote:
 Originally Posted by kruoli This can be run on the preliminary data and will give "portable" intel for the "correct" run? And I guess I should add --bind-to none as charybdis suggested (and also 2,1?)?
--bind-to none should only be needed with -np 2. --bind-to socket ought to have the same effect. When there are at least 3 processes OpenMPI will bind to socket by default. I have no clue why it automatically binds to core with 2 processes.

Can't remember whether 2,1 vs 1,2 makes much of a difference with timings. IIRC you will need to rebuild the matrix if you change the first parameter, so at least if you test 2,1 first you can then test 2,2 with -nc2 "2,2 skip_matbuild=1" and avoid having to build the matrix again.

Relative speeds should be reasonably consistent between the current oversized matrix and the final one, though frmky has far more experience with this than I do.

Last fiddled with by charybdis on 2021-11-18 at 19:31

2021-11-18, 19:52   #284
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

5·1,033 Posts

Quote:
 Originally Posted by kruoli Right now? It should be finished tomorrow my time.
Naw, I forgot how fast we're gathering relations. Default vs TD 100 is a mildly interesting data point for matrix size, but we won't be using either of those matrices so it's not important.

I think doing a filtering run somewhere around 1.23-1.28G raw relations will give us an indication of when to shut down sieving. Sieving is going quickly, and uniques ratio is good.. so I doubt more than 1.33G raw relations is needed.

We agree that testing MPI is not useful while still sieving on the same machine!

2021-11-18, 22:22   #285
frmky

Jul 2003
So Cal

24·3·47 Posts

Quote:
 Originally Posted by charybdis Can't remember whether 2,1 vs 1,2 makes much of a difference with timings. IIRC you will need to rebuild the matrix if you change the first parameter, so at least if you test 2,1 first you can then test 2,2 with -nc2 "2,2 skip_matbuild=1" and avoid having to build the matrix again. Relative speeds should be reasonably consistent between the current oversized matrix and the final one, though frmky has far more experience with this than I do.
My experience is that 1,2 is almost always a little faster than 2,1.

You don't need to rebuild the matrix to change the first parameter. Once you build the matrix with mpi, you can use that matrix to test different parameters, both with and without mpi, using skip_matbuild=1. So, for example, run the tests in this sequence:

Code:
mpirun -np 2 --bind-to none ./msieve_mpi -nc2 1,2 -t 16 -v
mpirun -np 2 --bind-to none ./msieve_mpi -nc2 "2,1 skip_matbuild=1" -t 16 -v
mpirun -np 4 --bind-to none ./msieve_mpi -nc2 "2,2 skip_matbuild=1" -t 8 -v
./msieve_nompi -nc2 skip_matbuild=1 -t 32 -v
You can't restart a run in progress with a different first parameter as the checkpoint file format depends on that value, but you can restart with a different second parameter.

And yes, relative speeds should be consistent across a wide range of matrix sizes.

2021-11-18, 23:39   #286
charybdis

Apr 2020

3·199 Posts

Quote:
 Originally Posted by frmky You don't need to rebuild the matrix to change the first parameter. Once you build the matrix with mpi, you can use that matrix to test different parameters, both with and without mpi, using skip_matbuild=1. ... You can't restart a run in progress with a different first parameter as the checkpoint file format depends on that value, but you can restart with a different second parameter.
Ah whoops, got these mixed up. Thanks for the correction!

 Similar Threads Thread Thread Starter Forum Replies Last Post fivemack Factoring 3 2017-09-19 08:52 skan YAFU 6 2013-02-26 13:57 akruppa Factoring 114 2012-08-20 14:01 fortega Data 2 2005-06-16 22:48 marc Factoring 6 2004-10-09 14:17

All times are UTC. The time now is 08:33.

Fri Jan 28 08:33:47 UTC 2022 up 189 days, 3:02, 2 users, load averages: 0.84, 1.16, 1.30