20180331, 12:00  #1 
(loop (#_fork))
Feb 2006
Cambridge, England
13·491 Posts 
Some numbers
This is for the same job (a 22.78M density146 from a GNFS187); indeed, resuming from the same point.
Timings on four different reasonably chunky computers. Note how well the singlesocket SKL is doing. Haven't tried MPI on the dualsocket machines yet. Speeds are in millions of dimensions per day. Code:
#cores core GHz speed RAM oak 20 skl 2.2 2.669 EDDR4 2400 x6 butternut 6 hsw 3.3 1.427 DDR4 2400 x4 pineapple 14 skl 3.1 3.454 DDR4 2666 x4 birch4 16 snb 2 2.003 EDDR3 1066 x4 Last fiddled with by fivemack on 20180409 at 13:32 
20180406, 06:47  #2 
Jul 2003
So Cal
2×3×347 Posts 
SKL is great. Put one msieve process per socket and use threads to distribute to the cores.

20180406, 08:59  #3 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
2·7·349 Posts 
Greg,
Looking at the status page for NFS, example, one SNFS 311. What would be the machine specs to do the postprocessing in two weeks time. Make me a list please: dual socket, amount of memory, etc. 
20180413, 09:39  #4  
(loop (#_fork))
Feb 2006
Cambridge, England
13·491 Posts 
Quote:
So you could probably get away with an MPI grid of eight dualsocket 12core SKL Xeons; call it £40,000 (you don't need much memory in each node, but you probably want a fast interconnect; a 12port 40Gb Infiniband switch is $3000 and the adapters are $500 per node) The filtering would fit on a 64GB machine; using the normal n^2 scaling, my SKL which is taking 530 hours for a 37M matrix would take about two months. That's about a £3000 machine (mine was a bit more because it has a GTX1080Ti in it). 

20180413, 09:58  #5 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
1316_{16} Posts 
When we meet again can we discuss this in more deep please. I’ll send an invitation in due course since I am with a high work load and also I’m waiting for the Bristish summer.

20180413, 14:05  #6 
Bamboozled!
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across
2^{2}·3·887 Posts 

20180422, 13:37  #7 
(loop (#_fork))
Feb 2006
Cambridge, England
1100011101111_{2} Posts 
Some MPI numbers
These are times between 'commencing Lanczos iteration' and 'recovered nontrivial dependencies' for a 1478190 x 1478415 matrix at density 70 (from the C135 from term 5143 of aliquot sequence 219240)
Code:
name MPI cores seconds oak 1x2 10 x2 1167 oak 1x1 10 1302 oak 1x1 10+10HT 1408 oak 1x1 20 1212 birch1 1x1 16+16HT 1655 birches 1x8 8 x8 1285 birches 2x4 8 x8 937 birches 4x2 8 x8 960 birches 8x1 8 x8 1419 On oak I launch the MPI job with Code:
mpirun reportbindings np 2 mapby socket:PE=10 bindto core /home/nfsworld/msievesvn/MPI/msieve v nc2 t 10 Code:
for u in "1,8" "2,4" "4,2" "8,1"; do mpirun reportbindings np 8 hostfile mpihosts /home/nfsworld/msievesvn/MPI/msieve v nc2 $u t 8; done Code:
birch@birch1.fivemack.internal slots=2 birch@birch2.fivemack.internal slots=2 birch@birch3.fivemack.internal slots=2 birch@birch4.fivemack.internal slots=2 I don't see CPU occupancy much above 30% for any of the cores on birch, or much above 60% on oak. Last fiddled with by fivemack on 20180929 at 06:08 
20180422, 13:41  #8  
(loop (#_fork))
Feb 2006
Cambridge, England
1100011101111_{2} Posts 
Quote:
Last fiddled with by fivemack on 20180422 at 13:41 

20180422, 14:38  #9 
"Victor de Hollander"
Aug 2011
the Netherlands
2^{3}·3·7^{2} Posts 
So the take away from this is? Don't use MPI for LA unless you need results quickly?

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Carmichael numbers and Devaraj numbers  devarajkandadai  Number Theory Discussion Group  0  20170709 05:07 
6 digit numbers and the mersenne numbers  henryzz  Math  2  20080429 02:05 
LLT numbers, linkd with Mersenne and Fermat numbers  T.Rex  Math  4  20050507 08:25 