20160502, 16:42  #1 
"Oliver"
Mar 2005
Germany
11·101 Posts 
P1 factoring: B1 and B2 vs. multicore scaling
Hi,
for the fun and because Prime95 goes multithreading by default I decided to waste some cycles to P1 factoring with different settings. My test exponent was 77774863, TFed to 2^{75}.
System for this test was a single E52699 v4 (2,2GHz base clock, 22 cores), 4x DDR42400, Turbomode enabled. Because Turbo was enabled it is save to assume that the clock speed was different for the individual tests. Memory was virtually unlimited for the runs it requieres "only" ~30GiB for E=12 for this. Stage 1 scaling
Keeping in mind that bandwidth is limited and single core turbo is higher than 22 core turbo (3.6GHz vs. 2.8GHz for nonAVX workloads) and 22 core turbo is limited by TDP thus clock rate is lower than max turbo not that bad. Stage 2 scaling
Scaling isn't good. Starting with WorkerThreads as low as 2 even the CPU load is well below 100% (per core/thread) for all helper threads. Power consumption of the CPU was as low as ~6570W during stage 2 (WorkerThreads = 22) which is far below the TDP of 145W. During stage 1 (and LL tests ofcourse) the power consumption was solid 145W. GCD isn't parallized at all but doesn't really matter until we're going for 100+ cores and good scaling on stage 1 and 2. I don't know how/if it is possible to make P1 stage 2 more scaleable but short term one should perhaps rethink about B1/B2 parameter selection as this is just a tradeof between runtime and chance to find a factor. Bonus: Using this page to calculate probabilities and the 22 core timings:
Edit: using mprime 28.9 Oliver Last fiddled with by TheJudger on 20160502 at 16:47 
20160502, 21:09  #2 
"Mark"
Feb 2003
Sydney
3×191 Posts 
Interesting... Nice analysis, thank you.

