![]() |
I'm very much looking forward to trying 899, on both the Haswell and 48-way Opteron machines - I'll probably have to redo my mpirun script to run 8-way MPI with -t3 with the right CPU bindings, but that's not that hard.
But when I go to [url]http://msieve.svn.sourceforge.net[/url] or do 'svn update' on my checkout, I only get offered the eight-week-old 886. Have I missed some announcement of a move to a different revision control hosting service? |
I was baffled by this earlier today as well. It turned out that SF.net changed the URLs for the repositories, I guess it was done as part of the platform upgrade...
msieve SVN can now be reached at [url]http://svn.code.sf.net/p/msieve/code/[/url] (/trunk, /branches/..., etc.). |
Yes, they've spent several months automatically upgrading projects to use a new repository structure, in order of project popularity, and making the old repository read-only. GGNFS switched over about 2 months ago and Msieve switched about two weeks ago.
The speedup is only for the linear algebra. Sieving in Msieve is pretty much a lost cause, it's not even worth me trying to find why the current code is broken. |
Wonderful, thanks!
|
OK, on a Haswell i7/4770 with 16G of DDR3/1600 memory, the previous version took 90103 seconds = 25:01:43 to solve a 5385216x5385395 matrix from an over-sieved C163 I had lying around, with -t 4.
The new version is giving me an ETA of 12:16 and falling fast - I'll let it run through and tell you in the morning how long it actually took, but this sounds as if you've managed a 50% speed-up on what I thought was already damn good code. |
On the Opteron (eight six-CPU-and-dual-channel-DDR3/1333 nodes), having arranged that all the other jobs are running with taskset 3-47:6,4-47:6,5-47:6, running that matrix with
[code] mpirun -n 8 ./blackmagic.sh [/code] with blackmagic.sh being [code] #!/bin/bash msieve_real='/home/nfsworld/msieve-svn-again-mpi/trunk/msieve -v' echo $OMPI_COMM_WORLD_RANK CPUL=$[6*$OMPI_COMM_WORLD_RANK] CPUR=$[6*$OMPI_COMM_WORLD_RANK+2] taskset -c $CPUL-$CPUR numactl --cpunodebind $OMPI_COMM_WORLD_RANK -l $msieve_real -t 3 -nc2 8,1 [/code] (after a very tedious hour or so rebuilding the matrix from the .cyc file, twice because the msieve.dat file on that machine didn't have the free relations in so msieve fell over trying to read the first free relation in a cycle) segfaulted [code] [tractor:12330] *** Process received signal *** [tractor:12330] Signal: Segmentation fault (11) [tractor:12330] Signal code: Invalid permissions (2) [tractor:12330] Failing at address: 0x7fab84022000 [tractor:12330] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7faba522b8f0] [tractor:12330] [ 1] /home/nfsworld/msieve-svn-again-mpi/trunk/msieve(mul_64xN_Nx64+0x50) [0x446880] [tractor:12330] [ 2] /home/nfsworld/msieve-svn-again-mpi/trunk/msieve(tmul_64xN_Nx64+0x12d) [0x446b4d] [tractor:12330] [ 3] /home/nfsworld/msieve-svn-again-mpi/trunk/msieve() [0x43f866] [tractor:12330] [ 4] /home/nfsworld/msieve-svn-again-mpi/trunk/msieve(block_lanczos+0x3e4) [0x4425d4] [tractor:12330] [ 5] /home/nfsworld/msieve-svn-again-mpi/trunk/msieve(nfs_solve_linear_system+0x478) [0x435e68] [tractor:12330] [ 6] /home/nfsworld/msieve-svn-again-mpi/trunk/msieve(factor_gnfs+0x9bc) [0x418e0c] [tractor:12330] [ 7] /home/nfsworld/msieve-svn-again-mpi/trunk/msieve(msieve_run+0x5a6) [0x409c96] [tractor:12330] [ 8] /home/nfsworld/msieve-svn-again-mpi/trunk/msieve() [0x407e5a] [tractor:12330] [ 9] /home/nfsworld/msieve-svn-again-mpi/trunk/msieve(main+0x8e2) [0x408882] [tractor:12330] [10] /lib/libc.so.6(__libc_start_main+0xfd) [0x7faba4327c4d] [tractor:12330] [11] /home/nfsworld/msieve-svn-again-mpi/trunk/msieve() [0x407839] [/code] I'm 60% of the way through the 26M matrix for 2^947-1, which I think means it's not quite worth restarting even if I see the sort of improvement I saw on Haswell, so timing for really big jobs on this machine will have to wait until the second week in July. |
Just to check, is this SVN 900? That one has some MPI fixes in place that avoid crashing unconditionally.
|
Yes, this is SVN 900.
I tried it again without the numactl magic, and it still falls over. I did '-nc2 8,1 skip_matbuild=1' but it built the matrix anyway; did I have to say something like -nc2 "8,1 skip_matbuild=1" Trying again with -nc2 1,8 - which of course does have to rebuild the matrix - in case aspect ratio is material. |
-nc2 1,8 and no black magic [b]does[/b] start, but the timing looks awful; on the other hand that appears to be conflicts with other things running on the machine, I stopped them and the ETA went from 33 hours to 30 hours in a few minutes.
One last trial, using black magic and with everything else on the machine stopped ... if it works it will be done by the morning. |
Yes, all the NFS arguments have to be in one string, even if they are for different NFS phases (it's not very intuitive).
|
[QUOTE=fivemack;344394]OK, on a Haswell i7/4770 with 16G of DDR3/1600 memory, the previous version took 90103 seconds = 25:01:43 to solve a 5385216x5385395 matrix from an over-sieved C163 I had lying around, with -t 4.
The new version is giving me an ETA of 12:16 and falling fast - I'll let it run through and tell you in the morning how long it actually took[/quote] The new version took 45135 seconds to solve the matrix on i7/4770; almost exactly half the time! Opteron run is 55% done with ETA 6 hours, so a bit slower than the Haswell despite theoretically much better resources. |
| All times are UTC. The time now is 15:23. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.