mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-03-13, 07:59   #1
wblipp
 
wblipp's Avatar
 
"William"
May 2003
New Haven

2×7×132 Posts
Default Msieve and multiprocessors?

Someone is considering running a large NFS processing job on a university supercomputer that has nodes of quad-core processors running CentOS, a Red Hat Linux derivative. He has been advised that msieve cannot distribute itself across multiple processors, so he cannot use more than one node and should not set the "-t" flag to more than 4.

Is that right? When I use programs on Windows SMP machines that have multiple physical processors with multiple cores per processor, the operating system appears to spread processing among physical processors without "the program distributing itself." I don't know much about CentOS, but I naively expect that "using two or three nodes" would be configured to look like SMP for the user, and "-t 8" or "-t 12" should work.

Am I just too naive?

William
wblipp is offline   Reply With Quote
Old 2010-03-13, 11:16   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Others here have a lot more experience with highly parallel LA runs than I do, but the last time anyone tried large numbers of threads on a highly multi-core machine it didn't work out too well. I suspect that the scaling of the LA depends first on the bisection bandwidth of the machine currently, and then on the number of bus wires that can be brought to bear.

With sparse matrix-vector multiplies the ratio of computation to memory access is somewhat low, so the scaling to lots of machines is going to be problematic.
jasonp is offline   Reply With Quote
Old 2010-03-13, 11:59   #3
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

32×5×107 Posts
Default

On the other hand,

1 - if the cluster is configured like a Beowulf one, and
2 - appears to the application program as a bunch of different CPUs, or
3 - the application program is MP aware],

the sieving phase should take advantage of the cluster configuration.

Luigi
ET_ is offline   Reply With Quote
Old 2010-03-13, 18:11   #4
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

There are parallel processing libraries that attempt to make multiple computers connected by a 'slow' interconnect appear like a single giant computer; I'm thinking primarily of PVM. But PVM does best on serious big iron, where the abstraction of a single machine isn't so far from the reality of memory latencies to remote nodes.

MPI is better for this type of problem; it forces the programmer to account for where the data begins and ends, and also on how it gets there. In the case of NFS linear algebra there's no benefit to be gained from dynamically changing where data resides, it's a regular series of operations whose data placement can be statically determined.

Someday I'll be able to spend the time to make it happen.
jasonp is offline   Reply With Quote
Old 2010-03-13, 18:49   #5
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2×34×13 Posts
Default

I've got a hacked up MPI version of gnfs-lasieve that I've occasionally used. It's really the minimum set of changes necessary, and it's quite non-optimal. It simply divides the range of q up evenly among the cpus and sets them to work. This means that some cpus finish earlier than other and sit idle waiting for the others to finish, even on homogeneous clusters. Somewhat better would be to dispatch an equal number of special-q's to each CPU, but this too only works on homogeneous clusters. Best would to hand out small ranges for each computer to process, giving new ones as each finishes. I'll hunt down and post this file later today.

If you can rsh or ssh into the compute nodes from the head node without a password, then it's perhaps easier to modify the for loop in the script to use ssh to launch the standard gnfs-lasieve binary on the remote host. Just modify it to run
ssh remotehost <standard command>
where remotehost changes appropriately for each loop. You'll need to modify the output file for each, then recombine them in the next step. This definitely only works for homogeneous clusters, and too suffers from the fact that some finish earlier than others, but is what I most often use.
frmky is online now   Reply With Quote
Old 2010-03-13, 19:22   #6
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by frmky View Post
I've got a hacked up MPI version of gnfs-lasieve that I've occasionally used. It's really the minimum set of changes necessary, and it's quite non-optimal...
Many years ago I wrote an almost trivial client/server harness for NFS sievers. You can tell how old it is from its name --- cabald/cabalc --- and it originally ran line sievers. Its most recent application was for sieving RSA-768. I find that it's pretty close to optimal, from a compute-time perspective if not from its intrinsic elegance.

Is anyone interested in it? The source really is almost trivial. Anyone who wants to improve it is welcome to try.

Parallelizing the sieving is very very easy, as is finding polynomials and extracting square roots. It's the linear algebra stages that are distinctly non-trivial to parallelize efficiently.

Paul
xilman is offline   Reply With Quote
Old 2010-03-14, 08:03   #7
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2·34·13 Posts
Default

Attached is the diff for adding MPI to the lattice sieve. This is a really old version of the source so it probably won't apply cleanly, but there are few enough changes and enough context to track down and apply the changes manually if necessary. Don't use the -k switch with this as I didn't modify the name of the fb file (didn't bother at the time since I don't use the -k switch), and they'll end up stomping on each other.
Attached Files
File Type: txt mpidiff.txt (1.5 KB, 104 views)
frmky is online now   Reply With Quote
Old 2010-03-14, 08:50   #8
wblipp
 
wblipp's Avatar
 
"William"
May 2003
New Haven

2·7·132 Posts
Default

The OP job has already been sieved. Only post-processing is being planned for the university's super computer. Must he stick to a single quad-core processor? Should he?
wblipp is offline   Reply With Quote
Old 2010-03-14, 12:59   #9
10metreh
 
10metreh's Avatar
 
Nov 2008

2·33·43 Posts
Default

Quote:
Originally Posted by wblipp View Post
The OP job has already been sieved.
If OP = odd perfect, is this sigma(2801^78)?
10metreh is offline   Reply With Quote
Old 2010-03-14, 13:15   #10
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

wblipp: if the matrix is larger than, say, 10Mx10M, and you want the result ASAP, you'll probably want to run the linear algebra on a 6-core or 8-core computer instead.
debrouxl is offline   Reply With Quote
Old 2010-03-14, 13:27   #11
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

354110 Posts
Default

There isn't any harm in trying msieve with lots of threads; the on-disk storage format is independent of how the matrix is partitioned in a multithreaded run, so you can restart with different values of -t and not lose any work figuring out the best number of threads. In a batch environment, you can add '-d <num>' so the binary will halt itself after a preset time.

Last fiddled with by jasonp on 2010-03-14 at 13:28
jasonp is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
msieve on KNL frmky Msieve 3 2016-11-06 11:45
Msieve on a Mac (Help) pxp Msieve 1 2013-02-28 14:56
Using msieve with c burrobert Msieve 9 2012-10-26 22:46
msieve help em99010pepe Msieve 23 2009-09-27 16:13
fun with msieve masser Sierpinski/Riesel Base 5 83 2007-11-17 19:39

All times are UTC. The time now is 00:49.


Sat Jul 17 00:49:08 UTC 2021 up 49 days, 22:36, 1 user, load averages: 1.53, 1.50, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.