mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2017-01-03, 18:51   #1
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

3,617 Posts
Default More openmpi questions...

I am having trouble getting msieve threads to pass on into the openmpi processes, especially -t 4. All of the following are using the same instance of mpi-aware msieve. All these observations are after waiting for the LA to get past the ETA messages by a bit.

If I run msieve with -t 4, without calling it via mpirun, using top, I can see one instance of msieve using ~350% of my quad core CPU:
Code:
../msieve/msieve -i number.ini -s number.dat -l number.log -nf number.fb -t 4 -nc2
If I then run the same command via mpi, top shows one instance of msieve running at <=100% and the ETA is appropriately longer.
Code:
mpirun -np 1 --hostfile ./mpi_hosts111 ../msieve/msieve -i number.ini -s number.dat -l number.log -nf number.fb -t 4 -nc2 1,1
If, still using one machine, I change to 2 mpi processes and -t 2, top will show 2 instances, both are <= 100%.
Code:
mpirun -np 2 --hostfile ./mpi_hosts221 ../msieve/msieve -i number.ini -s number.dat -l number.log -nf number.fb -t 2 -nc2 2,1
If I add in a second machine, top shows 2 instances on both machines with all instances showing <=150%.
Code:
mpirun -np 4 --hostfile ./mpi_hosts221 ../msieve/msieve -i number.ini -s number.dat -l number.log -nf number.fb -t 2 -nc2 4,1
If I then adjust to 2 mpi processes with -t 4, top changes to one instance on each machine, but <=100% for both.
Code:
mpirun -np 2 --hostfile ./mpi_hosts111 ../msieve/msieve -i number.ini -s number.dat -l number.log -nf number.fb -t 4 -nc2 2,1
Note, that the mpi_hosts' suffix is representative of the slots to use per three machines.

It appears that -t 2 works with multiple machines, but -t 4 will not work at all via mpi.

Any thoughts on the above observations are welcome...
EdH is offline   Reply With Quote
Old 2017-01-05, 00:14   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DCE16 Posts
Default

How big a matrix is this? I would expect to need a matrix size above maybe 2M to expect a speedup from MPI, especially with multiple threads. If your machines are still connected with gigabit then they still will spend a lot of time waiting for data transfers.
jasonp is offline   Reply With Quote
Old 2017-01-05, 03:12   #3
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

70418 Posts
Default

I think my matrix is just over 4M, if I'm reading it right. I was thinking if I could increase threads and reduce mpi processes, I could decrease data transfers, but I might be thinking this backwards. Practice appears to show that -t 2 and three machines is optimum for my setup. Yes, I'm still on Gigabit. Adding a third machine does reduce time, so I was thinking that meant the Gigabit wasn't saturated by the first two. But memory transfer might be my issue. Even though I'm not filling it, there may not be enough bandwidth, perhaps?

If this is helpful:
Code:
mpirun -np 2 --hostfile ./mpi_hosts111 ../msieve/msieve -i number.ini -s number.dat -l number.log -nf number.fb -t 4 -nc2 2,1
gives me one process on each machine with top showing <=100%.

Here are the two logs:
Code:
Wed Jan  4 21:26:34 2017  Msieve v. 1.53 (SVN 993)
Wed Jan  4 21:26:34 2017  random seeds: 99435217 8a405357
Wed Jan  4 21:26:34 2017  MPI process 0 of 2
Wed Jan  4 21:26:34 2017  factoring 820542702287058139583300542461757119495935711084069870517652403589147165539358552360109600961345804476958004926044416408854122278694458926677 (141 digits)
Wed Jan  4 21:26:35 2017  searching for 15-digit factors
Wed Jan  4 21:26:36 2017  commencing number field sieve (141-digit input)
Wed Jan  4 21:26:36 2017  R0: -8068224187348260061731767540
Wed Jan  4 21:26:36 2017  R1: 7392072149387
Wed Jan  4 21:26:36 2017  A0: 20423341607513397403579630437539211
Wed Jan  4 21:26:36 2017  A1: -529896443757128435451449388665
Wed Jan  4 21:26:36 2017  A2: -137820022314972661814868
Wed Jan  4 21:26:36 2017  A3: 11802326769047736
Wed Jan  4 21:26:36 2017  A4: 9623375962
Wed Jan  4 21:26:36 2017  A5: 24
Wed Jan  4 21:26:36 2017  skew 6219102.56, size 7.941e-14, alpha -5.813, combined = 1.417e-11 rroots = 3
Wed Jan  4 21:26:36 2017  
Wed Jan  4 21:26:36 2017  commencing linear algebra
Wed Jan  4 21:26:36 2017  initialized process (0,0) of 2 x 1 grid
Wed Jan  4 21:26:36 2017  read 2124888 cycles
Wed Jan  4 21:26:40 2017  cycles contain 6333818 unique relations
Wed Jan  4 21:27:50 2017  read 6333818 relations
Wed Jan  4 21:27:58 2017  using 20 quadratic characters above 4294917295
Wed Jan  4 21:28:33 2017  building initial matrix
Wed Jan  4 21:30:01 2017  memory use: 851.5 MB
Wed Jan  4 21:30:03 2017  read 2124888 cycles
Wed Jan  4 21:30:04 2017  matrix is 2124709 x 2124888 (638.7 MB) with weight 201591455 (94.87/col)
Wed Jan  4 21:30:04 2017  sparse part has weight 144046076 (67.79/col)
Wed Jan  4 21:30:19 2017  filtering completed in 1 passes
Wed Jan  4 21:30:20 2017  matrix is 2124709 x 2124888 (638.7 MB) with weight 201591455 (94.87/col)
Wed Jan  4 21:30:20 2017  sparse part has weight 144046076 (67.79/col)
Wed Jan  4 21:30:38 2017  matrix starts at (0, 0)
Wed Jan  4 21:30:38 2017  matrix is 1062411 x 2124888 (370.2 MB) with weight 131215183 (61.75/col)
Wed Jan  4 21:30:38 2017  sparse part has weight 73669804 (34.67/col)
Wed Jan  4 21:30:38 2017  saving the first 48 matrix rows for later
Wed Jan  4 21:30:39 2017  matrix includes 64 packed rows
Wed Jan  4 21:30:39 2017  matrix is 1062363 x 2124888 (350.4 MB) with weight 90149514 (42.43/col)
Wed Jan  4 21:30:39 2017  sparse part has weight 70607640 (33.23/col)
Wed Jan  4 21:30:39 2017  using block size 8192 and superblock size 196608 for processor cache size 2048 kB
Wed Jan  4 21:30:44 2017  commencing Lanczos iteration (4 threads)
Wed Jan  4 21:30:44 2017  memory use: 261.8 MB
Wed Jan  4 21:31:07 2017  linear algebra at 0.1%, ETA 8h35m
Wed Jan  4 21:31:15 2017  checkpointing every 250000 dimensions
Code:
Wed Jan  4 21:26:36 2017  commencing linear algebra
Wed Jan  4 21:26:36 2017  initialized process (1,0) of 2 x 1 grid
Wed Jan  4 21:30:38 2017  matrix starts at (1062411, 0)
Wed Jan  4 21:30:38 2017  matrix is 1062298 x 2124888 (333.3 MB) with weight 70376272 (33.12/col)
Wed Jan  4 21:30:38 2017  sparse part has weight 70376272 (33.12/col)
Wed Jan  4 21:30:39 2017  matrix is 1062298 x 2124888 (333.3 MB) with weight 70376272 (33.12/col)
Wed Jan  4 21:30:39 2017  sparse part has weight 70376272 (33.12/col)
Wed Jan  4 21:30:39 2017  using block size 8192 and superblock size 196608 for processor cache size 2048 kB
Wed Jan  4 21:30:44 2017  commencing Lanczos iteration (4 threads)
Wed Jan  4 21:30:44 2017  memory use: 244.7 MB
Here is changing to 4 processes with -t 2:
Code:
mpirun -np 4 --hostfile ./mpi_hosts221 ../msieve/msieve -i number.ini -s number.dat -l number.log -nf number.fb -t 2 -nc2 4,1
And the first log:
Code:
Wed Jan  4 21:42:40 2017  commencing linear algebra
Wed Jan  4 21:42:40 2017  initialized process (0,0) of 4 x 1 grid
Wed Jan  4 21:42:41 2017  read 2124888 cycles
Wed Jan  4 21:42:45 2017  cycles contain 6333818 unique relations
Wed Jan  4 21:43:56 2017  read 6333818 relations
Wed Jan  4 21:44:04 2017  using 20 quadratic characters above 4294917295
Wed Jan  4 21:44:38 2017  building initial matrix
Wed Jan  4 21:46:04 2017  memory use: 851.5 MB
Wed Jan  4 21:46:06 2017  read 2124888 cycles
Wed Jan  4 21:46:07 2017  matrix is 2124709 x 2124888 (638.7 MB) with weight 201591455 (94.87/col)
Wed Jan  4 21:46:07 2017  sparse part has weight 144046076 (67.79/col)
Wed Jan  4 21:46:22 2017  filtering completed in 1 passes
Wed Jan  4 21:46:23 2017  matrix is 2124709 x 2124888 (638.7 MB) with weight 201591455 (94.87/col)
Wed Jan  4 21:46:23 2017  sparse part has weight 144046076 (67.79/col)
Wed Jan  4 21:46:43 2017  matrix starts at (0, 0)
Wed Jan  4 21:46:44 2017  matrix is 531262 x 2124888 (236.0 MB) with weight 96032795 (45.19/col)
Wed Jan  4 21:46:44 2017  sparse part has weight 38487416 (18.11/col)
Wed Jan  4 21:46:44 2017  saving the first 48 matrix rows for later
Wed Jan  4 21:46:44 2017  matrix includes 64 packed rows
Wed Jan  4 21:46:47 2017  matrix is 531214 x 2124888 (216.2 MB) with weight 54967126 (25.87/col)
Wed Jan  4 21:46:47 2017  sparse part has weight 35425252 (16.67/col)
Wed Jan  4 21:46:47 2017  using block size 8192 and superblock size 196608 for processor cache size 2048 kB
Wed Jan  4 21:46:50 2017  commencing Lanczos iteration (2 threads)
Wed Jan  4 21:46:50 2017  memory use: 146.6 MB
Wed Jan  4 21:47:06 2017  linear algebra at 0.1%, ETA 5h49m
Wed Jan  4 21:47:11 2017  checkpointing every 370000 dimensions
Both machines show two processes with <150% each, shown via top.

The logs do show the appropriate threads, but the CPU use, just doesn't seem to.

Thanks for the reply. I'll go back to my studies...

Last fiddled with by EdH on 2017-01-05 at 03:13
EdH is offline   Reply With Quote
Old 2017-01-05, 13:57   #4
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×3,191 Posts
Default

Could you post the hosts files?

My suspicion is that mpirun has decided that it should bind processes to CPUs, and that you've somehow not told it that some of the hosts have more than one CPU ... what does 'taskset -p {process ID}' tell you when a process is running with insufficient CPU usage?

Aha, in a document at the Oxford supercomputer centre website, I found

Code:
Finally, versions higher than 1.8.0 in OpenMPI bind automatically processes to threads. Thus,

export OMPI_MCA_hwloc_base_binding_policy=none
so maybe see if doing that changes what you see happening?

Supercomputer centres almost always use something like Slurm or Torque for job submission, so I'm having a little trouble tying down how to get one-job-per-machine in the case without an extra layer.

Last fiddled with by fivemack on 2017-01-05 at 14:03
fivemack is offline   Reply With Quote
Old 2017-01-05, 18:40   #5
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

3,617 Posts
Default

Thanks, fivemack! This does make a difference. After exporting the policy value on the host and first slave, the msieve threads have increased their CPU usage. The host machine is up to just under 200% and the slave machine is just over 200%. During this run, your taskset query returns:
Code:
pid 8003's current affinity mask: f
I'll clear the policy and see what I get with the machine in the earlier state...

OK, taskset now returns:
Code:
pid 8850's current affinity mask: 1
and, top is back to showing <=100% for both msieve processes.

In answer to your other request, my mpi_host files follow this theme:

mpi_hosts111:
Code:
localhost slots=1

math59@192.168.0.58 slots=1
math59@192.168.0.60 slots=1
mpi_hosts221:
Code:
localhost slots=2

math59@192.168.0.58 slots=2
math59@192.168.0.60 slots=1
They appear to track directly with my grid values.

The host and the first slave are quad core and the second slave is dual core. Also, the host and first slave are maxed out at 4G, while the second slave has only 3G of RAM.

I am probably going to change the second slave for a quad core with more RAM, but the current second slave is the identical architecture as the other two. I thought that might be an advantage at this point.
EdH is offline   Reply With Quote
Old 2017-01-16, 17:22   #6
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

3,617 Posts
Default

Well, I guess it's time to give up on this for a while again. Too much frustration!

No matter what combination I try, I can't get a savings in time over the bare initial machine running four cores. The only advantage the additional machines do give me is the capability to handle larger matrices since the additional machines do add their memory to the mix and all are maxed at 4GB.

Since my current play area only entails working with composites that are less than 150 digits and only take two to four days to factor, I'll let this slide into the background for a bit.
EdH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Running msieve LA with openmpi - do all machines need to be same/similar EdH Msieve 32 2013-11-08 17:57
GPU questions c10ck3r GPU Computing 1 2012-05-08 02:48
some FFT Mul questions rapso Math 7 2012-01-26 18:59
Two questions: Dubslow GPU Computing 1 2011-08-05 18:22
Some questions... OmbooHankvald PSearch 3 2005-09-17 19:29

All times are UTC. The time now is 07:53.

Fri Feb 26 07:53:16 UTC 2021 up 85 days, 4:04, 0 users, load averages: 1.31, 1.41, 1.55

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.