![]() |
pinhodecarlos: sparse linear algebra is memory-constrained, and the performance of running two jobs at once is going to be critically dependent on details of memory placement ... I don't know whether the -t4 job is going to allocate on both processors and pull stuff across the (very fast) inter-processor interconnect.
I haven't got the intuition to say anything about this without running experiments, and I haven't got a dual-Xeon to experiment with any more. |
[QUOTE=fivemack;308128]pinhodecarlos: sparse linear algebra is memory-constrained, and the performance of running two jobs at once is going to be critically dependent on details of memory placement ... I don't know whether the -t4 job is going to allocate on both processors and pull stuff across the (very fast) inter-processor interconnect.
I haven't got the intuition to say anything about this without running experiments, and I haven't got a dual-Xeon to experiment with any more.[/QUOTE] Dmitry can experiment for us. |
[QUOTE=fivemack;308128]I haven't got the intuition to say anything about this without running experiments, and I haven't got a dual-Xeon to experiment with any more.[/QUOTE]That need not be a problem. I have two such and we may be able to come to a mutually beneficial arrangement.
You wouldn't be the first MersenneForum member to have ssh access to one or more of my machines for software development. Paul |
Lionel and post-processing helpers,
Be aware that there will be a call to arms for the RSA project for SETI.USA team members. [quote=Fire$torm] This is a friendly admin notice. I am sending out the team mass email for the Sept. SIMAP challenge and included a call to arms for the RSA project. Hopefully we should see additional members on this project soon. [/quote] Fire$torm is the administrator of SETI.USA forum. They will go for first place and I suppose there will be a fight for it with team Sicituradastra. I hope there will be plenty of work to be done and available post-processing helpers. Tomorrow I will clean my computer to be another one to help out the post-processing phase. Carlos EDIT: Just saw the "RSALS shutting down at the end of August, please migrate to NFS@Home..." thread on RSALS forum so I think it is better to someone to post a message on the SETI.USA forum telling them to migrate to NFS@Home. |
Indeed, SETI.USA should flee RSALS, and attach their clients to NFS@Home instead.
At the time of this writing, the RSALS server's disk is effectively overcommitted - I mean, it has ~20e9 bytes free, but the results for the WUs in progress will take up more than this. And that's [i]after[/i] I removed the raw relation sets for 389_95_minus1 and 44371_43_minus1. I'm going to let RSALS starve, and to fill in more numbers on the NFS@Home side. |
You would want to unqueue C154_4788_5053 - it is factored by someone.
|
[QUOTE=jasonp;308027]Note that MPI msieve needs careful tuning when run on a large SMP machine, because it's easy for the OS to not correctly balance the load across all the cores, and easy for the OS to shuffle MPI processes around after they have allocated their memory. frmky has reported that you have to disable even cron processes, fivemack has [url="http://fivemack.livejournal.com/226160.html"]a post[/url] on what he had to do.[/QUOTE]
Memory on our cluster is by nodes; 64Gb/node, with two 8-core Opteron chips/node. As a practical matter, asking for four nodes seems to work best with our scheduler ("PBS/Torque/Maui" with playfair priority), and I'm using 64-cores on an 8x8 grid. So all of the cores are busy, and I don't see any change over time in the performance. What I do see is occasional restarts (with 12-22hr runs) that get an especially bad distribution of the 64 tasks. Here's a good distribution [code] Thu Aug 16 09:57:46 2012 initialized process (0,0) of 8 x 8 grid Thu Aug 16 09:57:55 2012 matrix starts at (0, 0) Thu Aug 16 09:57:55 2012 matrix is 4887089 x 4314690 (365.5 MB) with weight 133662597 (30.98/col) Thu Aug 16 09:57:56 2012 sparse part has weight 52660698 (12.20/col) ... Thu Aug 16 09:57:59 2012 matrix is 4887041 x 4314690 (318.5 MB) with weight 58899832 (13.65/col) Thu Aug 16 09:57:59 2012 sparse part has weight 40345266 ( 9.35/col) Thu Aug 16 09:57:59 2012 using block size 262144 for processor cache size 10240 kB Thu Aug 16 09:58:01 2012 commencing Lanczos iteration Thu Aug 16 09:58:01 2012 memory use: 494.4 MB Thu Aug 16 09:58:16 2012 restarting at iteration 417488 (dim = 26400057) Thu Aug 16 09:59:08 2012 linear algebra at 67.5%, ETA 120h52m Thu Aug 16 09:59:24 2012 checkpointing every 110000 dimensions [/code] for one of the 64 submatrices of our current [code] Sat Jul 21 12:36:44 2012 matrix is 39095900 x 39096100 (11555.1 MB) with weight 3371381266 (86.23/col) Sat Jul 21 12:36:44 2012 sparse part has weight 2638144630 (67.48/col) [/code] So here's the timings at this week's restarts [code] mpi00:Wed Aug 8 08:29:51 2012 linear algebra at 44.7%, ETA 205h42m mpi00:Wed Aug 8 21:11:02 2012 linear algebra at 47.8%, ETA 235h53m mpi00:Fri Aug 10 09:07:08 2012 linear algebra at 53.0%, ETA 512h18m mpi00:Fri Aug 10 12:41:58 2012 linear algebra at 53.2%, ETA 369h53m --- mpi00:Sat Aug 11 13:29:53 2012 linear algebra at 54.6%, ETA 169h37m mpi00:Mon Aug 13 09:41:02 2012 linear algebra at 57.1%, ETA 159h50m mpi00:Mon Aug 13 20:07:42 2012 linear algebra at 59.4%, ETA 151h14m --- mpi00:Wed Aug 15 06:07:29 2012 linear algebra at 65.0%, ETA 130h12m mpi00:Thu Aug 16 06:08:58 2012 linear algebra at 67.2%, ETA 351h41m mpi00:Thu Aug 16 09:59:08 2012 linear algebra at 67.5%, ETA 120h52m [/code] The short ETAs are mostly when the tasks were scheduled on nodes 14-17; while some of the worst were on nodes 18-21. But the three most recent are all on 14-17 (with node 17 as head node), so this sporadic bad loading doesn't seem to depend on the hardware (these are all ib nodes). I've taken to killing restarts with bad timings, and have so-far gotten a good timing from the subsequent restart (the timings for new runs are 12hr - 22hr, depending upon scheduling, so the progress on % isn't uniform). This is our second large matrix (the other 25M^2), with a binary from when our sysadmin where having binaries run with hydra-mpirun. They're saying I should switch to compiling with openmpi (not sure whether it is 1.6 ...), so I'd be interested to hear what we ought to be watching for with the new binary. -Bruce (as in Batalov+Dodson) PS - It is easy to see the difference between good/bad/terrible: [code] Thu Aug 16 09:59:24 2012 checkpointing every 110000 dimensions (16 restarts) Wed Aug 8 21:11:21 2012 checkpointing every 90000 dimensions Sun Aug 5 09:19:55 2012 checkpointing every 80000 dimensions (1 restart, each) Tue Jul 31 23:31:53 2012 checkpointing every 70000 dimensions (6 restarts) Sun Jul 22 04:34:32 2012 checkpointing every 60000 dimensions Fri Aug 10 12:42:30 2012 checkpointing every 50000 dimensions Fri Aug 10 09:07:58 2012 checkpointing every 40000 dimensions (2 restarts, each) [/code] These are msieve's estimate of the number of Lanczos iterations/hour; the good restarts are doing twice as many Lancos iterations as the terrible ones, per hour. |
1 Attachment(s)
Sieving a GNFS 171 task using 30-bit LPs (due to space constraints; otherwise, I'd obviously have used 31-bit LPs !) with the 14e siever is [i]officially[/i] a bad idea :smile:
The output of remdups4 -v is attached. Summary: [code]Found 91563793 unique, 48348274 duplicate (34.6% of total), and 155195 bad relations.[/code] Getting rid of all of those duplicates, and recompressing the result with pbzip2, saves more than 4 GB of disk space... |
C171
One of the problems might be the low starting point for special-q. Though, it may produce a better yield, one is apt to accumulate more dups in the lower ranges. I would think starting around 20-30M might have been a better choice. Of course, I am always subjected to be corrected by a higher authority. :-)
|
[QUOTE=Dubslow;307744]Okay, I guess I'll pitch in and do GW_6_301.[/QUOTE]
Sorry it took so long, I've had hardware issues and then I spent two days in Windows getting the entire summer's worth of gaming done in a few days. :razz: [code]PRP54 = 167234023851315043627492602845770131366956631669151511 PRP182 = 35428216977629308709346384897858722957864896521236633920871150047660354226030990937812843100212728231266275403247901894323027521996535525361051619602035845364097644084622293809080053[/code] That was pretty close with the ECM, though perhaps not an ECM "miss" per se. RSALS [URL="http://boinc.unsads.com/rsals/crunching.php"]reports[/URL] ECM to 3t50. Edit: I'll do 160969_43_minus1 (it looks a bit closer to done than the one above it). |
GC_7_280 started
[code]matrix is 10175033 x 10175210 (3065.6 MB) with weight 907103337 (89.15/col)
sparse part has weight 691696737 (67.98/col) saving the first 48 matrix rows for later matrix is 10174985 x 10175210 (2932.0 MB) with weight 726345051 (71.38/col) sparse part has weight 666845488 (65.54/col) matrix includes 64 packed rows[/code] Factors should be available in a few days. The LA is taking 5.3G RAM and is running 6 threads on a AMD Phenom II X6 1090T clocked at 3.36GHz. To early to say yet how lon the LA will take but I expect 5-8 days. Paul |
| All times are UTC. The time now is 21:52. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.