mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-03-14, 14:45   #12
wblipp
 
wblipp's Avatar
 
"William"
May 2003
New Haven

2·7·132 Posts
Default

Quote:
Originally Posted by 10metreh View Post
If OP = odd perfect, is this sigma(2801^78)?
OP = Originating Post. This one of the RSALS numbers.
wblipp is offline   Reply With Quote
Old 2010-03-14, 19:17   #13
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2×34×13 Posts
Default

Here are the results of measurements I did a short while ago. This was done on an 8-CPU 32-core Opteron system using shared memory. Although the scaling for CADO BW was slightly better than msieve, note that the BW algorithm is much less efficient than BL, so the overall run would take nearly twice as long at each point as msieve's BL.
Attached Thumbnails
Click image for larger version

Name:	LA_Scaling.GIF
Views:	153
Size:	8.4 KB
ID:	4847  
frmky is online now   Reply With Quote
Old 2010-03-15, 08:40   #14
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2A0116 Posts
Default

Quote:
Originally Posted by frmky View Post
Here are the results of measurements I did a short while ago. This was done on an 8-CPU 32-core Opteron system using shared memory. Although the scaling for CADO BW was slightly better than msieve, note that the BW algorithm is much less efficient than BL, so the overall run would take nearly twice as long at each point as msieve's BL.
That's an interesting graph. Perhaps I've misunderstood it but it seems to imply that after only a few threads are used, adding more makes the algorithm slower.

My experience with a MPI-implementation of BL running on a cluster with up to 32 processes was that performance scaled roughly as the square root of the number of processes. The machine architecture was different and maybe that's relevant. The MSR cluster had sixteen dual-cpu machines; each cpu had only one core and no hyperthreading.

Paul
xilman is offline   Reply With Quote
Old 2010-03-15, 11:21   #15
Raman
Noodles
 
Raman's Avatar
 
"Mr. Tuch"
Dec 2007
Chennai, India

23518 Posts
Default

If the msieve Linear Algebra is working efficiently by using many computers in parallel over a closely coupled network connection, then the immediate next target should be M1061
I would be glad to contribute some sieving for this number itself...

I have a pending exercise in CUDA that I have to submit next week
for which the next week is the deadline... I am learning about CUDA programming
right now actually... but that people are not teaching anything at all... we have to learn
everything on our own... NVIDIA CUDA... not much references available over the web
online at all... This exercise is actually for effective parallel matrix multiplication...

msieve with GPU (Graphical Processing Unit?) in clearly parallel execution
MPI libraries are now being released...
Good luck!

Last fiddled with by Raman on 2010-03-15 at 11:22
Raman is offline   Reply With Quote
Old 2010-03-15, 12:28   #16
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Quote:
Originally Posted by Raman View Post
NVIDIA CUDA... not much references available over the web online at all... This exercise is actually for effective parallel matrix multiplication...
Nvidia have excellent documentation available for CUDA, and forums.nvidia.com has a pretty nice user community; some of them are quite advanced.

Paul, I suspect that when the computations are all on one machine then the slowdown with many threads reflects caches getting saturated. The caches on your cluster were probably much larger in the aggregate. Also, it was never faster for msieve to find more than 64 dependencies, whereas IIRC the CWI parallel BL code liked 128 dependencies better than 64.
jasonp is offline   Reply With Quote
Old 2010-03-15, 16:50   #17
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

210610 Posts
Default

Paul, that's exactly what happens. Beyond 8 threads CADO BW gets no faster, and msieve actually slows.

I've also informally tried CADO BW on the beowulf cluster consisting of quad-core Core 2's connected by GigE and I saw similar behavior. But I seem to recall in that case speed got significantly worse once I moved from 4 threads to 5 because of communication speeds (4 threads uses shared memory on one node, 5 threads requires communication over GigE), and it never recovered. That cluster is due a memory upgrade soon, so I'll try it again with a bigger matrix when that happens.
frmky is online now   Reply With Quote
Old 2010-03-15, 20:08   #18
Raman
Noodles
 
Raman's Avatar
 
"Mr. Tuch"
Dec 2007
Chennai, India

23518 Posts
Default

Quote:
Originally Posted by jasonp View Post
Nvidia have excellent documentation available for CUDA, and forums.nvidia.com has a pretty nice user community; some of them are quite advanced.
I already have the documentation, but it cannot be read out at all.
It is hundred pages, cannot be read out entirely for a small example project.
It is quite often rigorous, with mostly containing unnecessary or unwanted text, rather than demonstrations with examples.

There are plenty of languages, could I learn them all? I could only do so that are very popular, and then are being more frequently used up.
I am quite good at programming within C, C++, Java, Visual Basic. But can I say "good"? No. In the sense that I can't write programs as efficient as msieve, GGNFS, GMP-ECM, YAFU, etc. Of course, for those things understanding about algorithm is almost as important, with all those optimizations till now, as the algorithm gets much better everyday, right? Then, I cannot understand even the rigorous code of such projects. Where to start up with, actually?

There is no limit upon learning around with the things in this world at all. I am quite pretty sure that even the top project developers do not know about everything within this world at all, actually!
Raman is offline   Reply With Quote
Old 2010-03-15, 21:03   #19
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by Raman View Post
There is no limit upon learning around with the things in this world at all. I am quite pretty sure that even the top project developers do not know about everything within this world at all, actually!
You are completely correct.

No-one has been able to understand everything of importance for several hundred years. All you can do is concentrate on the things you think are both important and interesting.

Paul
xilman is offline   Reply With Quote
Old 2010-03-19, 14:09   #20
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

210 Posts
Default

Quote:
Originally Posted by wblipp View Post
Someone is considering running a large NFS processing job on a university supercomputer that has nodes of quad-core processors running CentOS, a Red Hat Linux derivative. He has been advised that msieve cannot distribute itself across multiple processors, so he cannot use more than one node and should not set the "-t" flag to more than 4.
...
and
Quote:
Originally Posted by wblipp
...
Must he stick to a single quad-core processor? Should he?
We've been using CentOS here for some time (from Red Hat, previously),
and I'm just finishing up a 3-day 7.15M^2 matrix on our fast xeon SMP,
two dual quadcores (SunFire x2270; "2x 2.93GHz Intel Xeon X5570 (Nehalem)";
"Connected using 1000Mbps Intel PRO/1000"). I stick to 4 threads (-t 4),
and top shows that the job has accumulated
Code:
2568m 2.5g 1184 R 401.0  5.2  12862:10 msieveH
a total of 12862 cpu minutes, and is getting 400% of the processor.
More precisely, "top -H" shows threads, and
Code:
2568m 2.5g 1184 R 101.7  5.2   2892:31 msieveH
2568m 2.5g 1184 R  99.8  5.2   4012:01 msieveH
2568m 2.5g 1184 R  99.8  5.2   2975:16 msieveH
2568m 2.5g 1184 R  99.8  5.2   2976:37 msieveH
4012 cpu minutes went to the main process (17489), a bit under
three days, so far; with three other processes (17567, 17568, 17570)
having contributed somewhat more than two cpu days (at 1440 cpu
minutes/day).

Actually, this is a fairly stunning read, with all four threads at c. 100%.
I've gotten 3 + 3*2 = 9 cpudays, from 3 days on the four cores.
(Ah, starting at 12:20 noon, read at 9:35; that much less than 3 days).

As the other processors are sieving, even perfect scaling to 8 threads,
one main, seven side processes wouldn't be worthwhile (I'm not in any
particular hurry); but my recolllection was that the side processes also
tail off, and the total cputime goes up. Four are worthwhile, eight
not-so-much. Just a liberal arts point-of-view, as a math person. -bd
bdodson is offline   Reply With Quote
Old 2010-04-20, 08:02   #21
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

2·17·73 Posts
Default

Quote:
Originally Posted by frmky View Post
Here are the results of measurements I did a short while ago. This was done on an 8-CPU 32-core Opteron system using shared memory. Although the scaling for CADO BW was slightly better than msieve, note that the BW algorithm is much less efficient than BL, so the overall run would take nearly twice as long at each point as msieve's BL.
Has anyone made similar benchmarks on a (hyperthreaded) i7 (i.e. 4 cores and 4 hyperthreads)? Is it better to run -t 4, -t 6 or -t 8, when I want to minimize wallclock time?

Off-topic question: Is there any windows equivalent to the linux command line "wc -l *.out" for counting relations in a bunch of relation files?
Andi47 is offline   Reply With Quote
Old 2010-04-20, 09:44   #22
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72×131 Posts
Default

You might want to look at the thread around

http://www.mersenneforum.org/showpos...2&postcount=14

and at http://www.mersenneforum.org/showthread.php?t=12861
fivemack is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
msieve on KNL frmky Msieve 3 2016-11-06 11:45
Msieve on a Mac (Help) pxp Msieve 1 2013-02-28 14:56
Using msieve with c burrobert Msieve 9 2012-10-26 22:46
msieve help em99010pepe Msieve 23 2009-09-27 16:13
fun with msieve masser Sierpinski/Riesel Base 5 83 2007-11-17 19:39

All times are UTC. The time now is 00:48.


Sat Jul 17 00:48:35 UTC 2021 up 49 days, 22:35, 1 user, load averages: 1.43, 1.49, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.