View Single Post
Old 2021-11-07, 23:14   #5
nordi
 
Dec 2016

32×13 Posts
Default

I also benchmarked Step 2 on my AMD Ryzen 9 3950X, using M1217 and B2=1e13 for Step 2 to answer two questions:
  1. does it make sense to run Step 2 on every CPU thread?
  2. does it make sense to run Steps 1 and Step 2 in parallel on a physical core, using its two threads?

For question 1, I got
16 physical cores with Step 2: 357.5 seconds per curve
32 CPU threads with Step 2: 631.5 seconds per curve
throughput: 357.5/631.5*2 = 113.2%
which is 13.2% more throughput.


For question 2, I got
Step 2 takes 611.8 seconds
Step 2 throughput: 357.5/611.8 = 58.4%
Step 1 while Step 2 is running 599.6
Step 1 without Step 2 running: 354.0
Step 1 throughput: 354/599.6 = 59.0%
overall throughput: 58.4% + 59.0% = 117.4%
which is 17.4% more throughput.


The additional throughput is not as significant as for step 1 and comes at the expense of either doubled RAM requirements (case 1) or a longer time during which the RAM is used (case 2). But if you have enough RAM, it makes sense to use all CPU threads.
nordi is offline   Reply With Quote