![]() |
big job planning
[quote=frmky;224222]NFS@Home has completed 5,448+ by SNFS. A 16.1M matrix was solved using 64 computers (256 cores) in a bit under 41 hours. The log is attached.
[code]prp65 factor: 23371863775658144623538828456854573496104607906605333794952273409 prp141 factor: 698965867837568984299398033395117263633121275562035049049008890847786442548362928940407820059804253774285034574979465129314235059775249213313 [/code][/quote] Have you any clue how long that would have taken on four cores of one pc? What is the efficiency of running on 64 computers? |
I think that would have taken something like 600 hours on my four-core i7 machine, interpolating from matrices of that sort of size that I have run, so about 2500 core-hours, versus the 10500 core-hours that it took on the cluster.
|
Sounds about right to me. Exactly same sized matrix (16.1M) is running on a 1055T Phenom right now for estimated 520hrs (6 threads @3640MHz).
|
So MPI to that scale is only 25% efficient. Is that the Infiniband or the Gigabit cluster? I am guessing it must be the Infiniband. Would this method be usable to help extend the forum record gnfs?
@fivemack Would your machines support MPI? Would it be useful to run a forum sized factorization LA over all your pcs or would it tie them all up for too long at once with your number of machines? |
My machines are connected via gigabit using a slow switch; that, and the fact that they are all of different spec (Core2quad, Phenomquad, i7quad, dual-Shanghaiquad), makes me rather unkeen on using MPI.
If I want to run very large jobs, I would get a quad-MagnyCours machine; it's expensive, but it costs rather less than eight quad-cores connected with infiniband would, and I would expect it to work rather faster, and use significantly less electricity so not require awkward measures for cooling. I've rung Tyan this morning and they say the S8812 motherboards will start being reasonably available in September. Probably I would have to run MPI internally rather than a single job with -t 32. |
[QUOTE=henryzz;224270]So MPI to that scale is only 25% efficient. Is that the Infiniband or the Gigabit cluster? I am guessing it must be the Infiniband. Would this method be usable to help extend the forum record gnfs?
@fivemack Would your machines support MPI? Would it be useful to run a forum sized factorization LA over all your pcs or would it tie them all up for too long at once with your number of machines?[/QUOTE] These are Core 2 based computers, so a more appropriate 4-core estimate would be about 850 hours, or 3400 CPU-hours. So the efficiency is closer to 1/3. I previously found scaling on this cluster to 16 nodes to be about N^0.81, but at 64 nodes it's a bit worse than that. 64^(0.81-1) = 45%. This is going to be used for record factorizations at NFS@Home. Once 12,254+ finishes in the next couple of weeks we will start 5,409-, a SNFS286. As far as I am aware, this will be a record factorization using open source software. Presuming that's successful, we will move up to an SNFS290 for the next one. |
[QUOTE=fivemack;224272]
If I want to run very large jobs, I would get a quad-MagnyCours machine; it's expensive, but it costs rather less than eight quad-cores connected with infiniband would, and I would expect it to work rather faster[/QUOTE] I'm not so sure. msieve bottlenecks on accesses to main memory during the matrix multiplies. On dual-quad core2 computers with DDR2 memory, I get better speeds if I leave four of the cores idle to relieve the contention. Our K10 Barcelona computer, even though it's NUMA, does the same thing. The speed tops out at 16 MPI processes, even though it has 32 cores, and is still considerably slower than an 8 quadcore GigE-connected cluster. DDR3 will help, but not much. Eight separate computers have 8x the main memory bandwidth as one. :smile: |
Eight separate Phenom machines have exactly the same main memory bandwidth as one quad-MagnyCours - each of the eight quad-core dice in the machine has its own pair of memory controllers connected to its own bank of memory. It's admittedly a bit awkward to have to upgrade memory in sixteens.
|
[QUOTE=fivemack;224281]each of the eight quad-core dice in the machine has its own pair of memory controllers connected to its own bank of memory.[/QUOTE]
The same is true in the Barcelona, but unless they've significantly improved it, it doesn't work as well as you'd hope. Because it's still a shared memory architecture, there's a lot of chatter. |
[QUOTE=frmky;224275]These are Core 2 based computers, so a more appropriate 4-core estimate would be about 850 hours, or 3400 CPU-hours. So the efficiency is closer to 1/3. I previously found scaling on this cluster to 16 nodes to be about N^0.81, but at 64 nodes it's a bit worse than that. 64^(0.81-1) = 45%.
This is going to be used for record factorizations at NFS@Home. Once 12,254+ finishes in the next couple of weeks we will start 5,409-, a SNFS286. As far as I am aware, this will be a record factorization using open source software. Presuming that's successful, we will move up to an SNFS290 for the next one.[/QUOTE] 2,964+, 2,961+, or 2, 961- would be nice candidates..... However, I think that claiming a record simply based on being "open source" is a little presumptuous. (No offense intended) How about going after M1061? This would be a real record. |
It's not presumptuous, it's simply a fact. While excellent lattice sieving code has been released, and for which we are very appreciative, the postprocessing code has not. Jason has spent an enormous amount of time creating and releasing wonderful postprocessing code, and I believe his achievement should be highlighted. In addition, there is a huge difference, and therefore a new "benchmark," when an average member of the public can go out, spend $50K on computers, download the software from the internet, and do these factorizations on his own in reasonable time.
I do plan to work up to a kilobit, but in small steps. Start with 286, then 290, then 295... |
| All times are UTC. The time now is 10:59. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.