How big a matrix is this? I would expect to need a matrix size above maybe 2M to expect a speedup from MPI, especially with multiple threads. If your machines are still connected with gigabit then they still will spend a lot of time waiting for data transfers.
