View Single Post
 2021-07-24, 17:47 #38 drkirkby   "David Kirkby" Jan 2021 Althorne, Essex, UK 3·149 Posts (kriesel: Caution, next post indicates there was an undisclosed error affecting this post.) I tried what you said, but performance was not that great. Then I tried running one process with 2 workers, with Affinity like you said, and benchmarking another process. The benchmarking was tried with 24-26 cores and 2-4 workers. Code: [Worker #1 Jul 24 16:57] Timing 5760K FFT, 26 cores, 4 workers. Average times: 8.29, 7.12, 6.24, 5.37 ms. Total throughput: 607.58 iter/sec. Since 4 does not divide 26, clearly there must be an unequal number of cores running on each worker. The 607.58 iter/sec is almost double the throughput one obtains running 4 workers on each of two processes, where the processes are not constrained in any way. Here are the results from running two benchmarks, where nothing is constrained. Code: [Worker #1 Jul 24 18:14] Benchmarking multiple workers to measure the impact of memory bandwidth [Worker #1 Jul 24 18:15] Timing 5760K FFT, 26 cores, 4 workers. Average times: 13.27, 11.03, 13.31, 11.08 ms. Total throughput: 331.31 iter/sec. and Code: [Worker #1 Jul 24 18:15] Timing 5760K FFT, 26 cores, 4 workers. Average times: 12.80, 11.30, 12.78, 11.42 ms. Total throughput: 332.36 iter/sec. Total throughput is a dismal 331.31+332.36=663.67 iter/sec. One does better running one process Code: [Worker #1 Jul 24 18:22] Benchmarking multiple workers to measure the impact of memory bandwidth [Worker #1 Jul 24 18:22] Timing 5760K FFT, 52 cores, 4 workers. Average times: 3.85, 3.84, 3.86, 3.86 ms. Total throughput: 1038.20 iter/sec. I suppose the next thing to try is to run two processes, each with 4 workers. I guess 2x6+2x7=26 would be a reasonable It would be nice to think I could get a total throughput of 2*607.58 = 1215.16 iter/sec, but somehow I doubt that will happen. Last fiddled with by kriesel on 2021-07-25 at 00:05 Reason: error indicated in next post