mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-05-02, 16:42   #1
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

45716 Posts
Default P-1 factoring: B1 and B2 vs. multicore scaling

Hi,

for the fun and because Prime95 goes multithreading by default I decided to waste some cycles to P-1 factoring with different settings. My test exponent was 77774863, TFed to 275.
  • WorkerThreads = 1:
    Code:
    Optimal bounds are B1=655000, B2=13100000
    Chance of finding a factor is an estimated 3.29%
    M77774863 stage 1 complete. 1960468 transforms. Time: 19020.686 sec.
    Stage 1 GCD complete. Time: 59.929 sec.
    M77774863 stage 2 complete. 1439216 transforms. Time: 20928.729 sec.
    Stage 2 GCD complete. Time: 59.804 sec.
    M77774863 completed P-1, B1=655000, B2=13100000, E=12, We8: DCC22268
  • WorkerThreads = 2:
    Code:
    Optimal bounds are B1=655000, B2=13100000
    Chance of finding a factor is an estimated 3.29%
    M77774863 stage 1 complete. 1960468 transforms. Time: 10024.989 sec.
    Stage 1 GCD complete. Time: 59.943 sec.
    M77774863 stage 2 complete. 1439216 transforms. Time: 14055.951 sec.
    Stage 2 GCD complete. Time: 59.833 sec.
    M77774863 completed P-1, B1=655000, B2=13100000, E=12, We8: DCC22268
  • WorkerThreads = 4:
    Code:
    Optimal bounds are B1=655000, B2=13100000
    Chance of finding a factor is an estimated 3.29%
    M77774863 stage 1 complete. 1960468 transforms. Time: 5521.954 sec.
    Stage 1 GCD complete. Time: 60.217 sec.
    M77774863 stage 2 complete. 1439216 transforms. Time: 10438.691 sec.
    Stage 2 GCD complete. Time: 59.868 sec.
    M77774863 completed P-1, B1=655000, B2=13100000, E=12, We8: DCC22268
  • WorkerThreads = 8:
    Code:
    Optimal bounds are B1=655000, B2=13100000
    Chance of finding a factor is an estimated 3.29%
    M77774863 stage 1 complete. 1960468 transforms. Time: 3177.091 sec.
    Stage 1 GCD complete. Time: 61.542 sec.
    M77774863 stage 2 complete. 1439216 transforms. Time: 8713.107 sec.
    Stage 2 GCD complete. Time: 59.838 sec.
    M77774863 completed P-1, B1=655000, B2=13100000, E=12, We8: DCC22268
  • WorkerThreads = 22:
    Code:
    Optimal bounds are B1=655000, B2=13100000
    Chance of finding a factor is an estimated 3.29%
    M77774863 stage 1 complete. 1960468 transforms. Time: 1464.181 sec.
    Stage 1 GCD complete. Time: 61.301 sec.
    M77774863 stage 2 complete. 1439216 transforms. Time: 8376.363 sec.
    Stage 2 GCD complete. Time: 60.640 sec.
    M77774863 completed P-1, B1=655000, B2=13100000, E=12, We8: DCC22268

System for this test was a single E5-2699 v4 (2,2GHz base clock, 22 cores), 4x DDR4-2400, Turbomode enabled. Because Turbo was enabled it is save to assume that the clock speed was different for the individual tests. Memory was virtually unlimited for the runs it requieres "only" ~30GiB for E=12 for this.

Stage 1 scaling
  • 1c -> 2c: 1.897x
  • 1c -> 4c: 3.445x
  • 1c -> 8c: 5.987x
  • 1c -> 22c: 12.991x

Keeping in mind that bandwidth is limited and single core turbo is higher than 22 core turbo (3.6GHz vs. 2.8GHz for non-AVX workloads) and 22 core turbo is limited by TDP thus clock rate is lower than max turbo not that bad.

Stage 2 scaling
  • 1c -> 2c: 1.489x
  • 1c -> 4c: 2.005x
  • 1c -> 8c: 2.402x
  • 1c -> 22c: 2.499x

Scaling isn't good. Starting with WorkerThreads as low as 2 even the CPU load is well below 100% (per core/thread) for all helper threads. Power consumption of the CPU was as low as ~65-70W during stage 2 (WorkerThreads = 22) which is far below the TDP of 145W. During stage 1 (and LL tests ofcourse) the power consumption was solid 145W.

GCD isn't parallized at all but doesn't really matter until we're going for 100+ cores and good scaling on stage 1 and 2.

I don't know how/if it is possible to make P-1 stage 2 more scaleable but short term one should perhaps rethink about B1/B2 parameter selection as this is just a tradeof between runtime and chance to find a factor.

Bonus:
Using this page to calculate probabilities and the 22 core timings:
  • WorkerThreads = 22, B1 = 655000, B2 = 13100000 (automatic selected)
    3.29% chance for stage 1 + 2, 9962.485s runtime, 0.285 factors per day
  • WorkerThreads = 22, manually selected B1 = 1000000 and B2 = 13100000 - just increased B1 a bit:
    Code:
    P-1 on M77774863 with B1=1000000, B2=13100000
    M77774863 stage 1 complete. 2939928 transforms. Time: 2192.769 sec.
    Stage 1 GCD complete. Time: 62.367 sec.
    M77774863 stage 2 complete. 1408738 transforms. Time: 8190.553 sec.
    Stage 2 GCD complete. Time: 60.791 sec.
    M77774863 completed P-1, B1=1000000, B2=13100000, E=12, We8: DCC4E630
    3.52% chance for stage 1 + 2, 10506.480s runtime, 0.290 factors per day
  • WorkerThreads = 22, manually selected B1 = 2000000 and B2 = 6300000 - increased B1 and lowered B2 for the same overall probability:
    Code:
    P-1 on M77774863 with B1=2000000, B2=6300000
    M77774863 stage 1 complete. 7208160 transforms. Time: 5708.323 sec.
    Stage 1 GCD complete. Time: 60.894 sec.
    M77774863 stage 2 complete. 503802 transforms. Time: 2982.309 sec.
    Stage 2 GCD complete. Time: 60.715 sec.
    M77774863 completed P-1, B1=2000000, B2=6300000, E=6, We8: DC6D65F0
    3.30% chance for stage 1 + 2, 8812.241s runtime, 0.324 factors per day

Edit: using mprime 28.9

Oliver

Last fiddled with by TheJudger on 2016-05-02 at 16:47
TheJudger is offline   Reply With Quote
Old 2016-05-02, 21:09   #2
markr
 
markr's Avatar
 
"Mark"
Feb 2003
Sydney

10758 Posts
Default

Interesting... Nice analysis, thank you.
markr is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Skylake and RAM scaling mackerel Hardware 34 2016-03-03 19:14
Symbol used for scaling in Fractal Dimensions stephensmedley Math 6 2015-01-04 15:12
Odd scaling of test times between two machines mdettweiler Hardware 3 2014-07-28 16:35
Factor5 on Multicore Machines Rodrigo Operation Billion Digits 4 2011-01-02 04:50
Using PRIMO on a multicore system Cybertronic Five or Bust - The Dual Sierpinski Problem 6 2010-10-13 18:25

All times are UTC. The time now is 05:07.


Wed Oct 27 05:07:49 UTC 2021 up 95 days, 23:36, 0 users, load averages: 1.15, 1.11, 1.09

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.