mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2012-04-25, 01:50   #45
Mathew
 
Mathew's Avatar
 
Nov 2009

2×52×7 Posts
Default

jyb made a lot of modifications to the lasieves source for mac. I have been using his sieves for smaller c106-110 numbers. For 1 core they are on par with the windows binaries.
Mathew is offline   Reply With Quote
Old 2012-04-25, 06:41   #46
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by Mathew View Post
jyb made a lot of modifications to the lasieves source for mac. I have been using his sieves for smaller c106-110 numbers. For 1 core they are on par with the windows binaries.
GHood news. I had immense difficulties even getting the sources to build and gave up after a while. Perhaps I should have another go.

The failure to reallocate error re-appeared. I'll have another go at that today.
xilman is offline   Reply With Quote
Old 2012-04-25, 09:42   #47
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by xilman View Post
Yup, /etc/login.conf had maximum memory set at 512M. Reset it to 32G and trying again.
Now works

Linear algebra performance sucks on this box because all 8 threads were allocated to a single cpu. Tom could doubtless have told me that and, quite possibly, how to get around it. However what I actually mean to do is install OpenMPI and then rebuild. May as well educate myself in the process.

What is nice is that the machine hasn't given me any hassle at all with lock-ups, sporadic reboots, video artefacts and any of the other nasties which which Linsux has been plaguing me for the last several months. I'm becoming ever more of the opinion that their development teams should spend time on reliability issues on hard-pushed hardware, at the cost of slowing down development of new whizzy features if need be.
xilman is offline   Reply With Quote
Old 2012-04-25, 10:58   #48
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72×131 Posts
Default

Quote:
Originally Posted by xilman View Post
Now works

Linear algebra performance sucks on this box because all 8 threads were allocated to a single cpu. Tom could doubtless have told me that and, quite possibly, how to get around it. However what I actually mean to do is install OpenMPI and then rebuild. May as well educate myself in the process.

What is nice is that the machine hasn't given me any hassle at all with lock-ups, sporadic reboots, video artefacts and any of the other nasties which which Linsux has been plaguing me for the last several months. I'm becoming ever more of the opinion that their development teams should spend time on reliability issues on hard-pushed hardware, at the cost of slowing down development of new whizzy features if need be.
Respectfully, I would see that as a reason to use an older Linux - I am still very happy with ubuntu-10.04 - rather than to go to an operating system sufficiently unsuited to the use case that it defaults to binding threads started by a process to the core on which the thread was started: that is certainly an OS issue rather than something in msieve.

I have been running 10.04 with a load average above 60 on a 48-CPU system for a year solid without a single lockup or sporadic reboot. Though I admit that I have a fairly strong server/desktop separation; I log into that machine only by ssh from a Mac, and I'm not sure it's ever entered graphics mode.
fivemack is offline   Reply With Quote
Old 2012-04-25, 17:30   #49
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by fivemack View Post
Respectfully, I would see that as a reason to use an older Linux - I am still very happy with ubuntu-10.04 - rather than to go to an operating system sufficiently unsuited to the use case that it defaults to binding threads started by a process to the core on which the thread was started: that is certainly an OS issue rather than something in msieve.

I have been running 10.04 with a load average above 60 on a 48-CPU system for a year solid without a single lockup or sporadic reboot. Though I admit that I have a fairly strong server/desktop separation; I log into that machine only by ssh from a Mac, and I'm not sure it's ever entered graphics mode.
I agree with you in every respect, but for the observation that an older Linux tends to have all the old security holes.

As Miss Piggy might say: paranoid, moi?

Still, I'm having great fun with OpenBSD. The nostalgia alone is worth the putting up with the omnalgia incurred from the relatively low profile of that OS. I spent today getting backuppc running on sparc64/OpenBSD. The SunBlade 2500 is rock-solid stable when running a decent server OS, despite being around 8 years old and, from a GMP point of view, better suited as a space heater or a boat anchor. Once I'm certain that backuppc works reliably, the backup service will be migrated from its currently flaky Linsux box.

Part of the x86-Linux flakiness comes from wanting to run CUDA code. The recent Nvidia drivers suck badly, according both to my experience and to others posting on the interweb thingy. Nonetheless, the move to Gnome3 and gnome-shell must take most of the blame.
xilman is offline   Reply With Quote
Old 2012-04-25, 18:06   #50
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

947710 Posts
Default

I am quite comfortable with OpenSUSE (and SLES at work). Stable and all. There's Xfce and Lxde.

When the desktop is a final nuissance and I'd need all of the memory for a month, I'd Ctrl-F1, su, init 3, and then work from a black window (six or more actually: Ctrl-F1 thru Ctrl-F6; plus there's always ssh into it).
Batalov is offline   Reply With Quote
Old 2012-04-25, 18:17   #51
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

I'm quite happy with Ubuntu 11.04 and have no plans at all to upgrade; I just wish I'd had the sense to install 10.04 when I first got the thing.
Dubslow is offline   Reply With Quote
Old 2012-04-25, 20:13   #52
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

Unity is hated by many power users, indeed. With MATE (continuing the Gnome 2 series) and Cinnamon (strongly tweaking Gnome 3 to resemble what people expect a desktop environment to be), Ubuntu's close derivative Mint has chosen a route that seems to please hundreds of thousands of persons.

On the BSD side, PC-BSD is known to be quite a friendly OS.
debrouxl is offline   Reply With Quote
Old 2012-04-28, 04:46   #53
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

7×1,373 Posts
Default

Got msieve 1.50 the GPU (cuda) version.

First observation: on a gtx580 the -np1 switch produces lines about 5 times faster then the "standard" (CPU) version. This is good.

Second observation: The occupancy of the GPU is quite low! This is bad! It increases with the size of the composite I want to factor, reaching about 95% for the C204 from the Bernoulli thread in discussion novadays, but I don't dare to attack something like that with my modest knowledge. For a C120-C130 which I am currently factoring, the GPU is busy just a bit over 20% and I would need 5 copies of msieve running. Is this the solution? running more factorizations in the same time? What if I only have one composite? (like the guy who wanna crack the Battle.net encryption, in the neighborhood thread). Can I tell some switches to msieve to use more GPU resources? (like more threads, etc, but not -t switch which affects the CPU threads). Mention that the CPU is not used at all. So where is the limitation?

Third observation: Only -np1 phase uses the GPU. This is also bad, terrible bad. If I launch "msieve -np", then the GPU works a chunk of time, like a second or so, then it is free another chunk of time, like 3-4-6 etc seconds or so, and all repeat "da capo al fine". During the "GPU does nothing" period, the screen is flooded with "too many lines" message (warning, error, whatever). Is this the intended behavior? My guess is that he does the "-np1 part" of the -np switch on the GPU and then the "-np2" and poly combining part on the CPU. This is quite a waste of resources, or I am using it in a totally wrong way.

Forth observation: it is not multi-gpu aware. If I have a system with 3 gpu's, and factoring in the range of C120-130, then I have to create 15 copies of msieve (5 for each GPU to max it, time 3 GPUs) and I have to FOLLOW, KEEP TRACK of all this hell of the work. Very unconvenient for an extreme lazy guy like me. Assuming I have so many numbers to factor in the same time, what I most of the times don't.

Questions: what is the best way to use msieve GPU on a multi-GPU system in each of the following situations ?
(1)- when I have to factor one small number (120+ digits)
(2)- when I have to factor many small numbers (120+ digits)
(3)- if I dare to factor a BIG number (like C165++).

[I did C165 with yafu/msieve(CPU). (by the way, side question: is it possible to use msieve.gpu with yafu? and if yes, how?)]

Obviously the solution for (2) is easy, explained above, just launch all of them in parallel till the system blows flames from its nose. However, I would be interested on a "sequential" solution, as it would be easy to follow: I like to see the numbers crunched one by one. But this highly depends on solving (1), that is, having a method to make a single factorization to fill (maximize occupancy of) all the resources (all CPU cores and all GPUs) till the computer blows flames from its nose. If I can do that, then I could tackle (3) too, despite of the fact that (1) and (3) are different stories.

Right now, for (3) it seems to be a solution too, like running a separate -np1 phase on all gpus, which is maximizing the GPU work, then combine the results of all, using eventually CPU.

But still no solution for (1), which is only using 7% of the resources (about 20% of a single GPU card, if I use -np1, and this become intermittent as explained above, if I use -np). What am I missing? This task could be done theoretically 100/7=15 times faster, that would translate into 45 times faster if I could use 3 gpus combined. Andc compared with the same task on a single CPU core (which is 5 times slower) this would translate into 225 times faster factoring process.

Or did I got it totally wrong?

Last fiddled with by LaurV on 2012-04-28 at 04:53
LaurV is offline   Reply With Quote
Old 2012-04-28, 11:50   #54
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Quote:
Originally Posted by LaurV View Post
Third observation: Only -np1 phase uses the GPU. This is also bad, terrible bad. If I launch "msieve -np", then the GPU works a chunk of time, like a second or so, then it is free another chunk of time, like 3-4-6 etc seconds or so, and all repeat "da capo al fine". During the "GPU does nothing" period, the screen is flooded with "too many lines" message (warning, error, whatever). Is this the intended behavior? My guess is that he does the "-np1 part" of the -np switch on the GPU and then the "-np2" and poly combining part on the CPU. This is quite a waste of resources, or I am using it in a totally wrong way.
Yes, only -np1 uses GPUs. The polynomial selection is not multithreaded at all, and under those constraints a single copy of the library only accesses one GPU at a time (selected with the '-g' switch) and alternates between stage 1 and stage 2. A better architecture would have stage 2 in a separate CPU thread with a queue of results fed by different threads running stage 1 on separate GPUs, but alas that requires relatively serious development time that jrk and I really don't have.

We can probably adjust the tuning so that running stage 1 (only) can utilize more of a hot modern GPU. I use a 2009-era 9800GT for day-to-day work, and have a GTS450 that will not run stably until my development system gets a larger power supply. jrk has a nicer Fermi card, if you'd like you can coordinate with him. Stage 2 is enormously more complex than stage 1, porting it to use a GPU is out of the question. Your suggestion is correct, the current best way to run polynomial selection on a GPU is to produce lots of text files of stage 1 output, glue them together and run stage 2 elsewhere.

Last fiddled with by jasonp on 2012-04-28 at 11:53
jasonp is offline   Reply With Quote
Old 2012-04-28, 16:18   #55
chris2be8
 
chris2be8's Avatar
 
Sep 2009

81E16 Posts
Default

Two thoughts:

On UNIX variants such as Linux try the trick I suggested in http://mersenneforum.org/showpost.ph...3&postcount=32 to overlap np1 on a GPU and np2 on a CPU. To run several copies either run them in different directories or pass msieve parms to change the name of msieve.dat.m.

The multithreaded polynomial code I wrote for factMsieve.pl *should* work if you change the call to msieve to use a GPU for stage 1.Look for the msievePolyselect subroutine, note there are two places it calls msieve, 1 for single threaded work and 1 for multiple threads.

Note I don't have a GPU so I can't test either idea.

Chris
chris2be8 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Msieve 1.53 feedback xilman Msieve 149 2018-11-12 06:37
Msieve v1.48 feedback Jeff Gilchrist Msieve 48 2011-06-10 18:18
Msieve 1.43 feedback Jeff Gilchrist Msieve 47 2009-11-24 15:53
Msieve 1.42 feedback Andi47 Msieve 167 2009-10-18 19:37
Msieve 1.41 Feedback Batalov Msieve 130 2009-06-09 16:01

All times are UTC. The time now is 00:53.


Sat Jul 17 00:53:36 UTC 2021 up 49 days, 22:40, 1 user, load averages: 1.26, 1.43, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.