mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-06-22, 17:10   #12
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by jasonp View Post
This is with OpenMPI. I don't have the means to get more than one machine working on this at home, so it doesn't matter a great deal which one I use. I can see the handling of signals varying from one set of MPI middleware to the other; maybe I can even configure mpirun to do what I want. The modified LA does explicitly abort after a checkpoint is written, to force the other instances to shut down. (Yes, when I say 'head node' I mean rank 0, the one that handles all tasks besides the matrix multiply)

Greg, do you see a performance difference using fewer MPI process but more than one thread per process?

Bob, are you sure your laptop isn't throttling once the memory controller gets pushed hard enough?

No, I am not sure.
R.D. Silverman is offline   Reply With Quote
Old 2010-06-22, 17:17   #13
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
No, I am not sure.
Indeed. How would one find out while the system is up?
I don't know of any tools that would tell me.
R.D. Silverman is offline   Reply With Quote
Old 2010-06-22, 17:17   #14
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1000001110102 Posts
Default

Quote:
Originally Posted by jasonp View Post
Now you can restart an MPI run with a different number of processes without having to rebuild the matrix.
Great! That will help with testing. Rebuilding the matrix each time grew tiresome quickly.

Quote:
Originally Posted by jasonp View Post
A problem I noticed on my local machine is that mpirun apparently catches signals, so that executing mpirun and hitting Ctrl-C does not make the LA write a checkpoint. Sending SIGINT directly to the msieve 'head node' process (i.e. the one whose working set is largest :) does seem to work however.
I noticed the same here, but rather than log into the correct compute node, I just got into the habit of waiting for a checkpoint update, then SIGKILLing the head node process (SIGTERM wasn't strong enough).

I've started the LA on 11,287+ using 16 MPI processes (2/node by slot) on our cluster. It's a matrix with 9.3M columns and weight 1142 million. It has an ETA of 186 hours. In just over a week, we should have the solution.
frmky is online now   Reply With Quote
Old 2010-06-22, 17:20   #15
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2×34×13 Posts
Default

Quote:
Originally Posted by jasonp View Post

Greg, do you see a performance difference using fewer MPI process but more than one thread per process?
I only tried that once, and it crashed. I'm using OpenMPI, but I'm not sure which version. I had lots of other benchmarking to do, so I didn't try to diagnose it.

Last fiddled with by jasonp on 2010-06-22 at 17:23 Reason: Guess I should try it myself :)
frmky is online now   Reply With Quote
Old 2010-06-22, 17:20   #16
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

1D2416 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Indeed. How would one find out while the system is up?
I don't know of any tools that would tell me.
I do know how to add code to the linear algebra code that will tell me
the current clock rate, but that is rather hard to do while the code
is running.
R.D. Silverman is offline   Reply With Quote
Old 2010-06-22, 17:22   #17
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by frmky View Post

I've started the LA on 11,287+ using 16 MPI processes (2/node by slot) on our cluster. It's a matrix with 9.3M columns and weight 1142 million. It has an ETA of 186 hours. In just over a week, we should have the solution.
Whereas my laptop is taking more than 4 times as long on 7.9M columns
and weight 542M. Note that my code is only single threaded.
R.D. Silverman is offline   Reply With Quote
Old 2010-06-22, 17:32   #18
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

23×3×5×72 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I do know how to add code to the linear algebra code that will tell me
the current clock rate, but that is rather hard to do while the code
is running.
CPU-Z will tell the clockrate and should indicate throttling.
henryzz is offline   Reply With Quote
Old 2010-06-22, 17:32   #19
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36·13 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I do know how to add code to the linear algebra code that will tell me
the current clock rate, but that is rather hard to do while the code
is running.
You may use PC Wizard (also for various temperatures, settings)... maybe TMonitor/perfMonitor is even better -- this is from the same group as CPU-Z.

Last fiddled with by Batalov on 2010-06-22 at 17:37
Batalov is offline   Reply With Quote
Old 2010-06-22, 17:57   #20
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by Batalov View Post
You may use PC Wizard (also for various temperatures, settings)... maybe TMonitor/perfMonitor is even better -- this is from the same group as CPU-Z.
Much obliged. I installed it. My system is not throttling the clock.
It is running at full speed.
R.D. Silverman is offline   Reply With Quote
Old 2010-06-22, 20:45   #21
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Whereas my laptop is taking more than 4 times as long on 7.9M columns
and weight 542M. Note that my code is only single threaded.
It could be that nothing is going wrong; multiplying by twice as many nonzeros and about 17% more matrix columns, then dividing by 4 for the efficiency difference between msieve's LA and the CWI LA code (as reported by Paul a while ago), and dividing by 4 to account for the sqrt(N)-like scaling using 16 MPI processes, we should be seeing a runtime difference of a factor of about 6.5x. The fact that you see only 4x slower may be a compliment :)

Do you get dramatically different runtimes from slightly smaller matrices that do fit in 2GB? Maybe windows is playing games with PAE to allow a larger VM space once the working set size exceeds 2GB.
jasonp is offline   Reply With Quote
Old 2010-06-22, 21:02   #22
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

164448 Posts
Default

Quote:
Originally Posted by jasonp View Post
It could be that nothing is going wrong; multiplying by twice as many nonzeros and about 17% more matrix columns, then dividing by 4 for the efficiency difference between msieve's LA and the CWI LA code (as reported by Paul a while ago),
I was not aware that the msieve code was 4x faster.
Do you know the cause of the difference?

Maybe I can improve the CWI code.

Quote:
and dividing by 4 to account for the sqrt(N)-like scaling using 16 MPI processes, we should be seeing a runtime difference of a factor of about 6.5x. The fact that you see only 4x slower may be a compliment :)

Do you get dramatically different runtimes from slightly smaller matrices that do fit in 2GB? Maybe windows is playing games with PAE to allow a larger VM space once the working set size exceeds 2GB.
Yes, I get much faster times for matrices that fit in 2GB. I had a
matrix of 7.1M rows of about the same density that fit in 2G. It took
only 14 days. Theory would predict (7.1/7.9)^2 times ~ 81% as long.
R.D. Silverman is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
block wiedemann and block lanczos ravlyuchenko Msieve 5 2011-05-09 13:16
Why is lanczos hard to distribute? Christenson Factoring 39 2011-04-08 09:44
Block Lanczos with a reordering pass jasonp Msieve 18 2010-02-07 08:33
Lanczos error Andi47 Msieve 7 2009-01-11 19:33
Msieve Lanczos scalability Jeff Gilchrist Msieve 1 2009-01-02 09:32

All times are UTC. The time now is 00:52.


Sat Jul 17 00:52:46 UTC 2021 up 49 days, 22:40, 1 user, load averages: 1.29, 1.46, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.