mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Cunningham Tables

Reply
 
Thread Tools
Old 2010-08-06, 08:45   #1
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

22·1,433 Posts
Default big job planning

Quote:
Originally Posted by frmky View Post
NFS@Home has completed 5,448+ by SNFS. A 16.1M matrix was solved using 64 computers (256 cores) in a bit under 41 hours. The log is attached.

Code:
prp65 factor: 23371863775658144623538828456854573496104607906605333794952273409
prp141 factor: 698965867837568984299398033395117263633121275562035049049008890847786442548362928940407820059804253774285034574979465129314235059775249213313
Have you any clue how long that would have taken on four cores of one pc? What is the efficiency of running on 64 computers?
henryzz is offline   Reply With Quote
Old 2010-08-06, 09:24   #2
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

632210 Posts
Default

I think that would have taken something like 600 hours on my four-core i7 machine, interpolating from matrices of that sort of size that I have run, so about 2500 core-hours, versus the 10500 core-hours that it took on the cluster.
fivemack is offline   Reply With Quote
Old 2010-08-06, 10:16   #3
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

100011101010002 Posts
Default

Sounds about right to me. Exactly same sized matrix (16.1M) is running on a 1055T Phenom right now for estimated 520hrs (6 threads @3640MHz).
Batalov is offline   Reply With Quote
Old 2010-08-06, 17:08   #4
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

22·1,433 Posts
Default

So MPI to that scale is only 25% efficient. Is that the Infiniband or the Gigabit cluster? I am guessing it must be the Infiniband. Would this method be usable to help extend the forum record gnfs?
@fivemack Would your machines support MPI? Would it be useful to run a forum sized factorization LA over all your pcs or would it tie them all up for too long at once with your number of machines?
henryzz is offline   Reply With Quote
Old 2010-08-06, 17:24   #5
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

632210 Posts
Default

My machines are connected via gigabit using a slow switch; that, and the fact that they are all of different spec (Core2quad, Phenomquad, i7quad, dual-Shanghaiquad), makes me rather unkeen on using MPI.

If I want to run very large jobs, I would get a quad-MagnyCours machine; it's expensive, but it costs rather less than eight quad-cores connected with infiniband would, and I would expect it to work rather faster, and use significantly less electricity so not require awkward measures for cooling. I've rung Tyan this morning and they say the S8812 motherboards will start being reasonably available in September.

Probably I would have to run MPI internally rather than a single job with -t 32.
fivemack is offline   Reply With Quote
Old 2010-08-06, 17:35   #6
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

32×227 Posts
Default

Quote:
Originally Posted by henryzz View Post
So MPI to that scale is only 25% efficient. Is that the Infiniband or the Gigabit cluster? I am guessing it must be the Infiniband. Would this method be usable to help extend the forum record gnfs?
@fivemack Would your machines support MPI? Would it be useful to run a forum sized factorization LA over all your pcs or would it tie them all up for too long at once with your number of machines?
These are Core 2 based computers, so a more appropriate 4-core estimate would be about 850 hours, or 3400 CPU-hours. So the efficiency is closer to 1/3. I previously found scaling on this cluster to 16 nodes to be about N^0.81, but at 64 nodes it's a bit worse than that. 64^(0.81-1) = 45%.

This is going to be used for record factorizations at NFS@Home. Once 12,254+ finishes in the next couple of weeks we will start 5,409-, a SNFS286. As far as I am aware, this will be a record factorization using open source software. Presuming that's successful, we will move up to an SNFS290 for the next one.
frmky is offline   Reply With Quote
Old 2010-08-06, 17:47   #7
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

32·227 Posts
Default

Quote:
Originally Posted by fivemack View Post

If I want to run very large jobs, I would get a quad-MagnyCours machine; it's expensive, but it costs rather less than eight quad-cores connected with infiniband would, and I would expect it to work rather faster
I'm not so sure. msieve bottlenecks on accesses to main memory during the matrix multiplies. On dual-quad core2 computers with DDR2 memory, I get better speeds if I leave four of the cores idle to relieve the contention. Our K10 Barcelona computer, even though it's NUMA, does the same thing. The speed tops out at 16 MPI processes, even though it has 32 cores, and is still considerably slower than an 8 quadcore GigE-connected cluster. DDR3 will help, but not much. Eight separate computers have 8x the main memory bandwidth as one.
frmky is offline   Reply With Quote
Old 2010-08-06, 18:11   #8
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2·29·109 Posts
Default

Eight separate Phenom machines have exactly the same main memory bandwidth as one quad-MagnyCours - each of the eight quad-core dice in the machine has its own pair of memory controllers connected to its own bank of memory. It's admittedly a bit awkward to have to upgrade memory in sixteens.
fivemack is offline   Reply With Quote
Old 2010-08-06, 18:22   #9
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

32×227 Posts
Default

Quote:
Originally Posted by fivemack View Post
each of the eight quad-core dice in the machine has its own pair of memory controllers connected to its own bank of memory.
The same is true in the Barcelona, but unless they've significantly improved it, it doesn't work as well as you'd hope. Because it's still a shared memory architecture, there's a lot of chatter.
frmky is offline   Reply With Quote
Old 2010-08-06, 18:24   #10
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

161008 Posts
Default

Quote:
Originally Posted by frmky View Post
These are Core 2 based computers, so a more appropriate 4-core estimate would be about 850 hours, or 3400 CPU-hours. So the efficiency is closer to 1/3. I previously found scaling on this cluster to 16 nodes to be about N^0.81, but at 64 nodes it's a bit worse than that. 64^(0.81-1) = 45%.

This is going to be used for record factorizations at NFS@Home. Once 12,254+ finishes in the next couple of weeks we will start 5,409-, a SNFS286. As far as I am aware, this will be a record factorization using open source software. Presuming that's successful, we will move up to an SNFS290 for the next one.
2,964+, 2,961+, or 2, 961- would be nice candidates.....

However, I think that claiming a record simply based on being "open source"
is a little presumptuous. (No offense intended)

How about going after M1061? This would be a real record.
R.D. Silverman is offline   Reply With Quote
Old 2010-08-06, 18:48   #11
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

32·227 Posts
Default

It's not presumptuous, it's simply a fact. While excellent lattice sieving code has been released, and for which we are very appreciative, the postprocessing code has not. Jason has spent an enormous amount of time creating and releasing wonderful postprocessing code, and I believe his achievement should be highlighted. In addition, there is a huge difference, and therefore a new "benchmark," when an average member of the public can go out, spend $50K on computers, download the software from the internet, and do these factorizations on his own in reasonable time.

I do plan to work up to a kilobit, but in small steps. Start with 286, then 290, then 295...
frmky is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
How many DC exponents are GPU Factoring planning to reserve? patrik GPU to 72 8 2013-03-13 16:03
Planning to buy an 4xOpteron 6128 system (advice) joblack Hardware 37 2011-06-09 08:21
Special Project Planning wblipp ElevenSmooth 2 2004-02-19 05:25

All times are UTC. The time now is 01:53.

Mon Oct 26 01:53:44 UTC 2020 up 45 days, 23:04, 0 users, load averages: 2.28, 1.82, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.