![]() |
|
|
#45 |
|
Nov 2009
2×52×7 Posts |
jyb made a lot of modifications to the lasieves source for mac. I have been using his sieves for smaller c106-110 numbers. For 1 core they are on par with the windows binaries.
|
|
|
|
|
|
#46 | |
|
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
10,753 Posts |
Quote:
The failure to reallocate error re-appeared. I'll have another go at that today. |
|
|
|
|
|
|
#47 | |
|
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
10,753 Posts |
Quote:
![]() Linear algebra performance sucks on this box because all 8 threads were allocated to a single cpu. Tom could doubtless have told me that and, quite possibly, how to get around it. However what I actually mean to do is install OpenMPI and then rebuild. May as well educate myself in the process. What is nice is that the machine hasn't given me any hassle at all with lock-ups, sporadic reboots, video artefacts and any of the other nasties which which Linsux has been plaguing me for the last several months. I'm becoming ever more of the opinion that their development teams should spend time on reliability issues on hard-pushed hardware, at the cost of slowing down development of new whizzy features if need be. |
|
|
|
|
|
|
#48 | |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72×131 Posts |
Quote:
I have been running 10.04 with a load average above 60 on a 48-CPU system for a year solid without a single lockup or sporadic reboot. Though I admit that I have a fairly strong server/desktop separation; I log into that machine only by ssh from a Mac, and I'm not sure it's ever entered graphics mode. |
|
|
|
|
|
|
#49 | |
|
Bamboozled!
"πΊππ·π·π"
May 2003
Down not across
10,753 Posts |
Quote:
As Miss Piggy might say: paranoid, moi? Still, I'm having great fun with OpenBSD. The nostalgia alone is worth the putting up with the omnalgia incurred from the relatively low profile of that OS. I spent today getting backuppc running on sparc64/OpenBSD. The SunBlade 2500 is rock-solid stable when running a decent server OS, despite being around 8 years old and, from a GMP point of view, better suited as a space heater or a boat anchor. Once I'm certain that backuppc works reliably, the backup service will be migrated from its currently flaky Linsux box. Part of the x86-Linux flakiness comes from wanting to run CUDA code. The recent Nvidia drivers suck badly, according both to my experience and to others posting on the interweb thingy. Nonetheless, the move to Gnome3 and gnome-shell must take most of the blame. |
|
|
|
|
|
|
#50 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
947710 Posts |
I am quite comfortable with OpenSUSE (and SLES at work). Stable and all. There's Xfce and Lxde.
When the desktop is a final nuissance and I'd need all of the memory for a month, I'd Ctrl-F1, su, init 3, and then work from a black window (six or more actually: Ctrl-F1 thru Ctrl-F6; plus there's always ssh into it). |
|
|
|
|
|
#51 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
I'm quite happy with Ubuntu 11.04 and have no plans at all to upgrade; I just wish I'd had the sense to install 10.04 when I first got the thing.
|
|
|
|
|
|
#52 |
|
Sep 2009
977 Posts |
Unity is hated by many power users, indeed. With MATE (continuing the Gnome 2 series) and Cinnamon (strongly tweaking Gnome 3 to resemble what people expect a desktop environment to be), Ubuntu's close derivative Mint has chosen a route that seems to please hundreds of thousands of persons.
On the BSD side, PC-BSD is known to be quite a friendly OS. |
|
|
|
|
|
#53 |
|
Romulan Interpreter
Jun 2011
Thailand
7×1,373 Posts |
Got msieve 1.50 the GPU (cuda) version.
First observation: on a gtx580 the -np1 switch produces lines about 5 times faster then the "standard" (CPU) version. This is good. Second observation: The occupancy of the GPU is quite low! This is bad! It increases with the size of the composite I want to factor, reaching about 95% for the C204 from the Bernoulli thread in discussion novadays, but I don't dare to attack something like that with my modest knowledge. For a C120-C130 which I am currently factoring, the GPU is busy just a bit over 20% and I would need 5 copies of msieve running. Is this the solution? running more factorizations in the same time? What if I only have one composite? (like the guy who wanna crack the Battle.net encryption, in the neighborhood thread ). Can I tell some switches to msieve to use more GPU resources? (like more threads, etc, but not -t switch which affects the CPU threads). Mention that the CPU is not used at all. So where is the limitation?Third observation: Only -np1 phase uses the GPU. This is also bad, terrible bad. If I launch "msieve -np", then the GPU works a chunk of time, like a second or so, then it is free another chunk of time, like 3-4-6 etc seconds or so, and all repeat "da capo al fine". During the "GPU does nothing" period, the screen is flooded with "too many lines" message (warning, error, whatever). Is this the intended behavior? My guess is that he does the "-np1 part" of the -np switch on the GPU and then the "-np2" and poly combining part on the CPU. This is quite a waste of resources, or I am using it in a totally wrong way. Forth observation: it is not multi-gpu aware. If I have a system with 3 gpu's, and factoring in the range of C120-130, then I have to create 15 copies of msieve (5 for each GPU to max it, time 3 GPUs) and I have to FOLLOW, KEEP TRACK of all this hell of the work. Very unconvenient for an extreme lazy guy like me . Assuming I have so many numbers to factor in the same time, what I most of the times don't. Questions: what is the best way to use msieve GPU on a multi-GPU system in each of the following situations ? (1)- when I have to factor one small number (120+ digits) (2)- when I have to factor many small numbers (120+ digits) (3)- if I dare to factor a BIG number (like C165++). [I did C165 with yafu/msieve(CPU). (by the way, side question: is it possible to use msieve.gpu with yafu? and if yes, how?)] Obviously the solution for (2) is easy, explained above, just launch all of them in parallel till the system blows flames from its nose. However, I would be interested on a "sequential" solution, as it would be easy to follow: I like to see the numbers crunched one by one. But this highly depends on solving (1), that is, having a method to make a single factorization to fill (maximize occupancy of) all the resources (all CPU cores and all GPUs) till the computer blows flames from its nose. If I can do that, then I could tackle (3) too, despite of the fact that (1) and (3) are different stories. Right now, for (3) it seems to be a solution too, like running a separate -np1 phase on all gpus, which is maximizing the GPU work, then combine the results of all, using eventually CPU. But still no solution for (1), which is only using 7% of the resources (about 20% of a single GPU card, if I use -np1, and this become intermittent as explained above, if I use -np). What am I missing? This task could be done theoretically 100/7=15 times faster, that would translate into 45 times faster if I could use 3 gpus combined. Andc compared with the same task on a single CPU core (which is 5 times slower) this would translate into 225 times faster factoring process. Or did I got it totally wrong? Last fiddled with by LaurV on 2012-04-28 at 04:53 |
|
|
|
|
|
#54 | |
|
Tribal Bullet
Oct 2004
3,541 Posts |
Quote:
We can probably adjust the tuning so that running stage 1 (only) can utilize more of a hot modern GPU. I use a 2009-era 9800GT for day-to-day work, and have a GTS450 that will not run stably until my development system gets a larger power supply. jrk has a nicer Fermi card, if you'd like you can coordinate with him. Stage 2 is enormously more complex than stage 1, porting it to use a GPU is out of the question. Your suggestion is correct, the current best way to run polynomial selection on a GPU is to produce lots of text files of stage 1 output, glue them together and run stage 2 elsewhere. Last fiddled with by jasonp on 2012-04-28 at 11:53 |
|
|
|
|
|
|
#55 |
|
Sep 2009
81E16 Posts |
Two thoughts:
On UNIX variants such as Linux try the trick I suggested in http://mersenneforum.org/showpost.ph...3&postcount=32 to overlap np1 on a GPU and np2 on a CPU. To run several copies either run them in different directories or pass msieve parms to change the name of msieve.dat.m. The multithreaded polynomial code I wrote for factMsieve.pl *should* work if you change the call to msieve to use a GPU for stage 1.Look for the msievePolyselect subroutine, note there are two places it calls msieve, 1 for single threaded work and 1 for multiple threads. Note I don't have a GPU so I can't test either idea. Chris |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Msieve 1.53 feedback | xilman | Msieve | 149 | 2018-11-12 06:37 |
| Msieve v1.48 feedback | Jeff Gilchrist | Msieve | 48 | 2011-06-10 18:18 |
| Msieve 1.43 feedback | Jeff Gilchrist | Msieve | 47 | 2009-11-24 15:53 |
| Msieve 1.42 feedback | Andi47 | Msieve | 167 | 2009-10-18 19:37 |
| Msieve 1.41 Feedback | Batalov | Msieve | 130 | 2009-06-09 16:01 |