View Single Post
Old 2017-01-05, 03:12   #3
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2·7·263 Posts
Default

I think my matrix is just over 4M, if I'm reading it right. I was thinking if I could increase threads and reduce mpi processes, I could decrease data transfers, but I might be thinking this backwards. Practice appears to show that -t 2 and three machines is optimum for my setup. Yes, I'm still on Gigabit. Adding a third machine does reduce time, so I was thinking that meant the Gigabit wasn't saturated by the first two. But memory transfer might be my issue. Even though I'm not filling it, there may not be enough bandwidth, perhaps?

If this is helpful:
Code:
mpirun -np 2 --hostfile ./mpi_hosts111 ../msieve/msieve -i number.ini -s number.dat -l number.log -nf number.fb -t 4 -nc2 2,1
gives me one process on each machine with top showing <=100%.

Here are the two logs:
Code:
Wed Jan  4 21:26:34 2017  Msieve v. 1.53 (SVN 993)
Wed Jan  4 21:26:34 2017  random seeds: 99435217 8a405357
Wed Jan  4 21:26:34 2017  MPI process 0 of 2
Wed Jan  4 21:26:34 2017  factoring 820542702287058139583300542461757119495935711084069870517652403589147165539358552360109600961345804476958004926044416408854122278694458926677 (141 digits)
Wed Jan  4 21:26:35 2017  searching for 15-digit factors
Wed Jan  4 21:26:36 2017  commencing number field sieve (141-digit input)
Wed Jan  4 21:26:36 2017  R0: -8068224187348260061731767540
Wed Jan  4 21:26:36 2017  R1: 7392072149387
Wed Jan  4 21:26:36 2017  A0: 20423341607513397403579630437539211
Wed Jan  4 21:26:36 2017  A1: -529896443757128435451449388665
Wed Jan  4 21:26:36 2017  A2: -137820022314972661814868
Wed Jan  4 21:26:36 2017  A3: 11802326769047736
Wed Jan  4 21:26:36 2017  A4: 9623375962
Wed Jan  4 21:26:36 2017  A5: 24
Wed Jan  4 21:26:36 2017  skew 6219102.56, size 7.941e-14, alpha -5.813, combined = 1.417e-11 rroots = 3
Wed Jan  4 21:26:36 2017  
Wed Jan  4 21:26:36 2017  commencing linear algebra
Wed Jan  4 21:26:36 2017  initialized process (0,0) of 2 x 1 grid
Wed Jan  4 21:26:36 2017  read 2124888 cycles
Wed Jan  4 21:26:40 2017  cycles contain 6333818 unique relations
Wed Jan  4 21:27:50 2017  read 6333818 relations
Wed Jan  4 21:27:58 2017  using 20 quadratic characters above 4294917295
Wed Jan  4 21:28:33 2017  building initial matrix
Wed Jan  4 21:30:01 2017  memory use: 851.5 MB
Wed Jan  4 21:30:03 2017  read 2124888 cycles
Wed Jan  4 21:30:04 2017  matrix is 2124709 x 2124888 (638.7 MB) with weight 201591455 (94.87/col)
Wed Jan  4 21:30:04 2017  sparse part has weight 144046076 (67.79/col)
Wed Jan  4 21:30:19 2017  filtering completed in 1 passes
Wed Jan  4 21:30:20 2017  matrix is 2124709 x 2124888 (638.7 MB) with weight 201591455 (94.87/col)
Wed Jan  4 21:30:20 2017  sparse part has weight 144046076 (67.79/col)
Wed Jan  4 21:30:38 2017  matrix starts at (0, 0)
Wed Jan  4 21:30:38 2017  matrix is 1062411 x 2124888 (370.2 MB) with weight 131215183 (61.75/col)
Wed Jan  4 21:30:38 2017  sparse part has weight 73669804 (34.67/col)
Wed Jan  4 21:30:38 2017  saving the first 48 matrix rows for later
Wed Jan  4 21:30:39 2017  matrix includes 64 packed rows
Wed Jan  4 21:30:39 2017  matrix is 1062363 x 2124888 (350.4 MB) with weight 90149514 (42.43/col)
Wed Jan  4 21:30:39 2017  sparse part has weight 70607640 (33.23/col)
Wed Jan  4 21:30:39 2017  using block size 8192 and superblock size 196608 for processor cache size 2048 kB
Wed Jan  4 21:30:44 2017  commencing Lanczos iteration (4 threads)
Wed Jan  4 21:30:44 2017  memory use: 261.8 MB
Wed Jan  4 21:31:07 2017  linear algebra at 0.1%, ETA 8h35m
Wed Jan  4 21:31:15 2017  checkpointing every 250000 dimensions
Code:
Wed Jan  4 21:26:36 2017  commencing linear algebra
Wed Jan  4 21:26:36 2017  initialized process (1,0) of 2 x 1 grid
Wed Jan  4 21:30:38 2017  matrix starts at (1062411, 0)
Wed Jan  4 21:30:38 2017  matrix is 1062298 x 2124888 (333.3 MB) with weight 70376272 (33.12/col)
Wed Jan  4 21:30:38 2017  sparse part has weight 70376272 (33.12/col)
Wed Jan  4 21:30:39 2017  matrix is 1062298 x 2124888 (333.3 MB) with weight 70376272 (33.12/col)
Wed Jan  4 21:30:39 2017  sparse part has weight 70376272 (33.12/col)
Wed Jan  4 21:30:39 2017  using block size 8192 and superblock size 196608 for processor cache size 2048 kB
Wed Jan  4 21:30:44 2017  commencing Lanczos iteration (4 threads)
Wed Jan  4 21:30:44 2017  memory use: 244.7 MB
Here is changing to 4 processes with -t 2:
Code:
mpirun -np 4 --hostfile ./mpi_hosts221 ../msieve/msieve -i number.ini -s number.dat -l number.log -nf number.fb -t 2 -nc2 4,1
And the first log:
Code:
Wed Jan  4 21:42:40 2017  commencing linear algebra
Wed Jan  4 21:42:40 2017  initialized process (0,0) of 4 x 1 grid
Wed Jan  4 21:42:41 2017  read 2124888 cycles
Wed Jan  4 21:42:45 2017  cycles contain 6333818 unique relations
Wed Jan  4 21:43:56 2017  read 6333818 relations
Wed Jan  4 21:44:04 2017  using 20 quadratic characters above 4294917295
Wed Jan  4 21:44:38 2017  building initial matrix
Wed Jan  4 21:46:04 2017  memory use: 851.5 MB
Wed Jan  4 21:46:06 2017  read 2124888 cycles
Wed Jan  4 21:46:07 2017  matrix is 2124709 x 2124888 (638.7 MB) with weight 201591455 (94.87/col)
Wed Jan  4 21:46:07 2017  sparse part has weight 144046076 (67.79/col)
Wed Jan  4 21:46:22 2017  filtering completed in 1 passes
Wed Jan  4 21:46:23 2017  matrix is 2124709 x 2124888 (638.7 MB) with weight 201591455 (94.87/col)
Wed Jan  4 21:46:23 2017  sparse part has weight 144046076 (67.79/col)
Wed Jan  4 21:46:43 2017  matrix starts at (0, 0)
Wed Jan  4 21:46:44 2017  matrix is 531262 x 2124888 (236.0 MB) with weight 96032795 (45.19/col)
Wed Jan  4 21:46:44 2017  sparse part has weight 38487416 (18.11/col)
Wed Jan  4 21:46:44 2017  saving the first 48 matrix rows for later
Wed Jan  4 21:46:44 2017  matrix includes 64 packed rows
Wed Jan  4 21:46:47 2017  matrix is 531214 x 2124888 (216.2 MB) with weight 54967126 (25.87/col)
Wed Jan  4 21:46:47 2017  sparse part has weight 35425252 (16.67/col)
Wed Jan  4 21:46:47 2017  using block size 8192 and superblock size 196608 for processor cache size 2048 kB
Wed Jan  4 21:46:50 2017  commencing Lanczos iteration (2 threads)
Wed Jan  4 21:46:50 2017  memory use: 146.6 MB
Wed Jan  4 21:47:06 2017  linear algebra at 0.1%, ETA 5h49m
Wed Jan  4 21:47:11 2017  checkpointing every 370000 dimensions
Both machines show two processes with <150% each, shown via top.

The logs do show the appropriate threads, but the CPU use, just doesn't seem to.

Thanks for the reply. I'll go back to my studies...

Last fiddled with by EdH on 2017-01-05 at 03:13
EdH is offline   Reply With Quote