mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Cunningham Tables

Reply
 
Thread Tools
Old 2021-11-18, 14:56   #276
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

839 Posts
Default

Unfortunately, I have little experience in MPI, I used it in university, but there it was always set up for me and I only needed to use it.

My plan was running LA on my 5950X, which has 64 GB of RAM. If it would help, I could connect that computer to my 1950X via 10 gig LAN. I do not have Infiniband hardware. The motherboard of that system is a bit flaky as I used to run it fully loaded (128 GB, 8 slots), but now I can only run it dual channel (32 GB). In this state, it is at least stable again. So…
  • …does MPI make sense with my setup?
  • …if no, would it make sense if I got my 1950X up and stable with 64 GB RAM? I wanted to replace the mainboard for some time now. Since the cheapest boards are still above 250 $ new, I wanted to look for used boards.
  • …how much of a speedup could be expected?
  • …would having 2x10 gig networking (two parallel connections) speed things up further?

Last fiddled with by kruoli on 2021-11-18 at 15:03 Reason: Word order.
kruoli is offline   Reply With Quote
Old 2021-11-18, 16:43   #277
charybdis
 
charybdis's Avatar
 
Apr 2020

22×3×72 Posts
Default

Quote:
Originally Posted by kruoli View Post
Wed Nov 17 19:18:16 2021 found 300859501 duplicates and 861473338 unique relations

That looks good, I would guess?
Very good! This confirms that A=30 was the right choice.

Quote:
Originally Posted by EdH View Post
Are you going to employ MPI in Msieve LA? I'm not sure of your hardware setup, but I used mpi across two Xeons for a larger run a while back. It did save some time, but it was difficult to learn the nuances. charybdis was quite helpful with it. Unfortunately, I can't locate the posts about getting my setup to work best.
They were PMs, not posts

Quote:
Originally Posted by kruoli View Post
Unfortunately, I have little experience in MPI, I used it in university, but there it was always set up for me and I only needed to use it.

My plan was running LA on my 5950X, which has 64 GB of RAM. If it would help, I could connect that computer to my 1950X via 10 gig LAN. I do not have Infiniband hardware. The motherboard of that system is a bit flaky as I used to run it fully loaded (128 GB, 8 slots), but now I can only run it dual channel (32 GB). In this state, it is at least stable again. So…
  • …does MPI make sense with my setup?
  • …if no, would it make sense if I got my 1950X up and stable with 64 GB RAM? I wanted to replace the mainboard for some time now. Since the cheapest boards are still above 250 $ new, I wanted to look for used boards.
  • …how much of a speedup could be expected?
  • …would having 2x10 gig networking (two parallel connections) speed things up further?
The answer to these is generally "maybe - give it a try". As long as the matrix remains above 32GB you won't be able to run it on the 1950X, but the eventual matrix ought to fit in 32GB.

The 5950X has enough threads that it might be worth trying MPI even without the second machine. You can try mpirun --bind-to none -np 2 msieve -nc2 2,1 -t 16 to start out; you'll need --bind-to when running with 2 processes, as otherwise MPI will bizarrely default to binding each process to a core! For Ed's dual Xeon, the solution ought to have been --bind-to socket but for some reason this didn't work as it was supposed to.
Experiment with different numbers of threads and MPI processes to see what works best.
charybdis is offline   Reply With Quote
Old 2021-11-18, 16:58   #278
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

101408 Posts
Default

Although I got my Infiniband working across two machines, I think that may have been after I did the MPI (openMPI) LA for a large composite.

In my case, I used it to gain some time across my two Xeon processors of my Z620 machine. It did seem to cut some time off the LA. For me, my machines mostly run Ubuntu 20.04 ATM and openMPI is easily installed on the OS. (I did discover that Ubuntu 18.04 openMPI was broken and never fixed, as far as I could tell.)

I just remembered that all my info is actually in some PMs. I will dig out some of it and post it in a little while. . .
EdH is offline   Reply With Quote
Old 2021-11-18, 17:37   #279
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

10100001001102 Posts
Default

Quote:
Originally Posted by charybdis View Post
You'll probably want to wait at least until a matrix can be built with target_density=120; I expect VBCurtis will chime in later with his thoughts. Chances are if you tried TD=120 now you would get "too few cycles, matrix probably cannot build".

How is the duplication rate looking?
Using https://mersenneforum.org/showthread.php?t=24054 data as a guide and adding 35% to relations to account for 33/34LP rather than 33LP for all the 16e jobs run on NFS@home, I figured 950M unique relations would be sufficient to get a decent matrix. We're at 860M now, which is a higher uniques ratio than I expected. Yay! We might make 950M uniques with 1.3G raw.

I agree that using target_density=120 is the minimum for this filtering job. My opinion (not based on enough data, I'm afraid) is that once we have enough relations to build a matrix at TD=124, we've likely gathered enough relations such that time spent on sieving will be mostly wasted when compared to time saved on matrix due to those extra relations.

I'd run filtering again with TD=100 to see if a matrix builds and how much it shrinks. 87M is big. 17,000 [thread-] hours!

Then I'd gather relations again and filter with TD=120 when we reach 1.25G raw relations.

Last fiddled with by VBCurtis on 2021-11-18 at 17:39
VBCurtis is offline   Reply With Quote
Old 2021-11-18, 17:38   #280
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

10000011000002 Posts
Default

I see I missed charybdis' post. Sorry 'bout that! But, I did find the PMs and here are some details.

I just looked over the experimentation I did for using openMPI with Msieve LA and the experiments were done with the Msieve benchmark files. Many of the openMPI tests actually added a great deal of time, but with charybdis' help, I was able to find a command set that saved some time.

This was all done on a Z620 dual Xeon 6c/12t machine. The first set of tests showed the following:
Code:
mpirun -np 2 msieve -nc2 2,1 -t 6 
ETA 47 h 55 m 
 
mpirun -np 2 msieve -nc2 2,1 -t 12 
ETA 48 h 22 m 
 
mpirun -np 4 msieve -nc2 2,2 -t 3 
ETA 10 h 25 m 
 
mpirun -np 4 msieve -nc2 2,2 -t 6 
ETA 8 h 29 m 
 
msieve -nc2 -t 12 
ETA 10 h 20 m 
 
msieve -nc2 -t 24 
ETA 9 h 27 m
We did a lot of experimenting with options and to make it short, "--bind-to none" gave the best results in the command:
Code:
mpirun --bind-to none -np 2 ./msieve -nc2 2,1 -t 12
ETA  7 h 34 m
turned out to show some time savings for my machine.

Last fiddled with by EdH on 2021-11-18 at 17:41 Reason: command correction
EdH is offline   Reply With Quote
Old 2021-11-18, 18:56   #281
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

5×11×41 Posts
Default

Quote:
Originally Posted by kruoli View Post
Unfortunately, I have little experience in MPI, I used it in university, but there it was always set up for me and I only needed to use it.

My plan was running LA on my 5950X, which has 64 GB of RAM. If it would help, I could connect that computer to my 1950X via 10 gig LAN. I do not have Infiniband hardware. The motherboard of that system is a bit flaky as I used to run it fully loaded (128 GB, 8 slots), but now I can only run it dual channel (32 GB). In this state, it is at least stable again. So…
  • …does MPI make sense with my setup?
  • …if no, would it make sense if I got my 1950X up and stable with 64 GB RAM? I wanted to replace the mainboard for some time now. Since the cheapest boards are still above 250 $ new, I wanted to look for used boards.
  • …how much of a speedup could be expected?
  • …would having 2x10 gig networking (two parallel connections) speed things up further?
I wouldn't try using MPI to run on both the 5950X and the 1950X. The vectors need to be transferred in each iteration. The bandwidth is ok, but higher ethernet latency kills performance. It would likely be slower than the 5950X alone.

With cores divided into chiplets on the 5950X, MPI might help. It's not NUMA, but I would still try it. On Ubuntu, getting a working MPI installed is as simple as
Code:
sudo apt install openmpi-bin openmpi-doc libopenmpi-dev
I would compare
./msieve -nc2 -t 32 -v
mpirun -np 2 ./msieve -nc2 1,2 -t 16 -v
mpirun -np 4 ./msieve -nc2 2,2 -t 8 -v
frmky is online now   Reply With Quote
Old 2021-11-18, 19:11   #282
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

839 Posts
Default

Thanks for all your input on MPI!

Quote:
Originally Posted by frmky View Post
I would compare
./msieve -nc2 -t 32 -v
mpirun -np 2 ./msieve -nc2 1,2 -t 16 -v
mpirun -np 4 ./msieve -nc2 2,2 -t 8 -v
This can be run on the preliminary data and will give "portable" intel for the "correct" run? And I guess I should add --bind-to none as charybdis suggested (and also 2,1?)?

Edit: This is bogus as long as I'm sieving on the same machine. I will do this when sieving is done. That way it will not delay sieving where others are involved.

Quote:
Originally Posted by VBCurtis View Post
I'd run filtering again with TD=100 to see if a matrix builds and how much it shrinks. 87M is big. 17,000 [thread-] hours!
Right now? It should be finished tomorrow my time.

Last fiddled with by kruoli on 2021-11-18 at 19:31 Reason: Additions.
kruoli is offline   Reply With Quote
Old 2021-11-18, 19:30   #283
charybdis
 
charybdis's Avatar
 
Apr 2020

22·3·72 Posts
Default

Quote:
Originally Posted by kruoli View Post
This can be run on the preliminary data and will give "portable" intel for the "correct" run? And I guess I should add --bind-to none as charybdis suggested (and also 2,1?)?
--bind-to none should only be needed with -np 2. --bind-to socket ought to have the same effect. When there are at least 3 processes OpenMPI will bind to socket by default. I have no clue why it automatically binds to core with 2 processes.

Can't remember whether 2,1 vs 1,2 makes much of a difference with timings. IIRC you will need to rebuild the matrix if you change the first parameter, so at least if you test 2,1 first you can then test 2,2 with -nc2 "2,2 skip_matbuild=1" and avoid having to build the matrix again.

Relative speeds should be reasonably consistent between the current oversized matrix and the final one, though frmky has far more experience with this than I do.

Last fiddled with by charybdis on 2021-11-18 at 19:31
charybdis is offline   Reply With Quote
Old 2021-11-18, 19:52   #284
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2·2,579 Posts
Default

Quote:
Originally Posted by kruoli View Post
Right now? It should be finished tomorrow my time.
Naw, I forgot how fast we're gathering relations. Default vs TD 100 is a mildly interesting data point for matrix size, but we won't be using either of those matrices so it's not important.

I think doing a filtering run somewhere around 1.23-1.28G raw relations will give us an indication of when to shut down sieving. Sieving is going quickly, and uniques ratio is good.. so I doubt more than 1.33G raw relations is needed.

We agree that testing MPI is not useful while still sieving on the same machine!
VBCurtis is offline   Reply With Quote
Old 2021-11-18, 22:22   #285
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

5·11·41 Posts
Default

Quote:
Originally Posted by charybdis View Post
Can't remember whether 2,1 vs 1,2 makes much of a difference with timings. IIRC you will need to rebuild the matrix if you change the first parameter, so at least if you test 2,1 first you can then test 2,2 with -nc2 "2,2 skip_matbuild=1" and avoid having to build the matrix again.

Relative speeds should be reasonably consistent between the current oversized matrix and the final one, though frmky has far more experience with this than I do.
My experience is that 1,2 is almost always a little faster than 2,1.

You don't need to rebuild the matrix to change the first parameter. Once you build the matrix with mpi, you can use that matrix to test different parameters, both with and without mpi, using skip_matbuild=1. So, for example, run the tests in this sequence:

Code:
mpirun -np 2 --bind-to none ./msieve_mpi -nc2 1,2 -t 16 -v
mpirun -np 2 --bind-to none ./msieve_mpi -nc2 "2,1 skip_matbuild=1" -t 16 -v
mpirun -np 4 --bind-to none ./msieve_mpi -nc2 "2,2 skip_matbuild=1" -t 8 -v
./msieve_nompi -nc2 skip_matbuild=1 -t 32 -v
You can't restart a run in progress with a different first parameter as the checkpoint file format depends on that value, but you can restart with a different second parameter.

And yes, relative speeds should be consistent across a wide range of matrix sizes.
frmky is online now   Reply With Quote
Old 2021-11-18, 23:39   #286
charybdis
 
charybdis's Avatar
 
Apr 2020

24C16 Posts
Default

Quote:
Originally Posted by frmky View Post
You don't need to rebuild the matrix to change the first parameter. Once you build the matrix with mpi, you can use that matrix to test different parameters, both with and without mpi, using skip_matbuild=1.
...
You can't restart a run in progress with a different first parameter as the checkpoint file format depends on that value, but you can restart with a different second parameter.
Ah whoops, got these mixed up. Thanks for the correction!
charybdis is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Using 16e on smaller numbers fivemack Factoring 3 2017-09-19 08:52
NFS on smaller numbers? skan YAFU 6 2013-02-26 13:57
Bernoulli(200) c204 akruppa Factoring 114 2012-08-20 14:01
checking smaller number fortega Data 2 2005-06-16 22:48
Factoring Smaller Numbers marc Factoring 6 2004-10-09 14:17

All times are UTC. The time now is 21:31.


Sat Jan 22 21:31:53 UTC 2022 up 183 days, 16 hrs, 0 users, load averages: 1.57, 1.35, 1.29

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔