20170523, 03:37  #1 
Jan 2013
Best approach for P1?
My computer cluster is loaded with memory that I'm not taking advantage of. I understand P1 can use the memory, but I'm unsure where to start with configuring things optimally.
Each node has 4 cores and 32 GB of memory with good memory bandwidth. I'm currently using a single worker for LL/DC. Suggestions? 
20170523, 04:38  #2 
Sep 2006
Use two workers : one for LL work and the other for P1. On the machine where I do this, I assigned two cores to each worker. By running just one P1 worker you avoid the problem of having two workers competing for high memory usage during stage 2.
Jacob 
20170523, 12:32  #3 
"Kieren"
Jul 2011
IIRC, my experience was that Stage 1, P1 makes use of 2 cores fairly well. Stage 2 seemed to use 1.5 to 1.7 cores out of 2. Perhaps this reflects lack of memory bandwidth on the machines I was using?

20170523, 14:53  #4 
Jan 2013
So maybe I should allocate 3 cores to an LL worker and 1 to a P1 worker?
I'll have to do some benchmarking. 
20170523, 17:27  #5  
"Kieren"
Jul 2011
6700K @ 4.3 GHz, Kingston dualrank 8 GB x 2, rated 2666, running @ 3200 A quick search of results.txt suggests that core completed an 81.5M assignment in 1214 hours. The 3 core DCLL took 5760 hr, v 2425 hr for all four cores. Last fiddled with by kladner on 20170523 at 17:43 

20170523, 19:15  #6  
Jan 2013
20170523, 23:58  #7 
Jul 2011
I should probably verify that on additional exponents. With only two samples, both could have included down periods with no output.
