View Single Post
Old 2021-11-17, 13:05   #67
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

6510 Posts
Default

Quote:
Originally Posted by drkirkby View Post
Why would 3 workers give the most throughput on a dual-socket computer?
I ran the throughput benchmark on a c5.metal instance and got different results. Specifically, two workers were faster at the higher FFT lengths. Here are the fastest numbers of workers for each supported FFT length benchmarked by default:
  • 6 workers: 2048K, 2100K, 2160K, 2240K, 2304K, 2400K
  • 4 workers: 2520K, 2560K, 2592K, 2688K, 2880K, 2940K, 3000K, 3072K, 3136K, 3200K, 3360K, 3456K, 3600K, 3840K, 3920K, 4200K, 4320K, 4480K, 4800K
  • 3 workers: 4032K
  • 2 workers: 4608K, 4704K, 5040K, 5120K, 5184K, 5376K, 5760K, 6048K, 6144K, 6272K, 6400K, 6720K, 7056K, 7168K, 7200K, 7680K, 8064K
Here are the actual results for one of the FFT lengths used for wavefront first time tests:
Code:
Timings for 6144K FFT length (48 cores, 1 worker): 1.35 ms. Throughput: 740.08 iter/sec.
Timings for 6144K FFT length (48 cores, 2 workers): 1.16, 1.19 ms. Throughput: 1697.08 iter/sec.
Timings for 6144K FFT length (48 cores, 3 workers): 3.04, 3.07, 1.23 ms. Throughput: 1470.23 iter/sec.
Timings for 6144K FFT length (48 cores, 4 workers): 3.05, 3.02, 3.02, 3.00 ms. Throughput: 1322.79 iter/sec.
Timings for 6144K FFT length (48 cores, 6 workers): 5.47, 5.47, 5.48, 5.39, 5.37, 5.39 ms. Throughput: 1105.26 iter/sec.
Timings for 6144K FFT length (48 cores, 8 workers): 7.56, 7.54, 7.56, 7.55, 7.41, 7.50, 7.46, 7.44 ms. Throughput: 1066.38 iter/sec.
Timings for 6144K FFT length (48 cores, 12 workers): 11.56, 11.61, 11.62, 12.32, 11.57, 11.51, 11.54, 11.55, 11.29, 11.40, 11.43, 11.25 ms. Throughput: 1039.05 iter/sec.
Timings for 6144K FFT length (48 cores, 16 workers): 20.99, 20.72, 20.95, 20.82, 20.78, 21.04, 20.89, 20.78, 14.67, 13.45, 14.54, 14.61, 13.46, 13.71, 13.94, 14.91 ms. Throughput: 949.13 iter/sec.
Timings for 6144K FFT length (48 cores, 24 workers): 57.30, 56.56, 56.51, 56.29, 56.69, 56.99, 56.67, 56.94, 56.71, 56.65, 56.85, 56.70, 26.03, 30.28, 25.78, 27.11, 29.24, 29.75, 27.03, 29.71, 27.35, 28.07, 30.19, 28.15 ms. Throughput: 637.96 iter/sec.
Timings for 6144K FFT length (48 cores, 48 workers): 130.05, 132.14, 128.50, 128.65, 129.51, 129.92, 128.45, 129.71, 128.78, 129.41, 130.18, 128.95, 130.09, 129.10, 130.14, 129.61, 128.04, 130.51, 129.25, 129.42, 129.92, 130.49, 129.53, 131.25, 86.04, 102.15, 87.65, 103.38, 74.75, 91.32, 91.88, 76.62, 75.89, 103.66, 101.44, 101.42, 95.30, 93.57, 79.79, 102.96, 72.71, 95.29, 98.47, 87.29, 100.54, 87.55, 94.32, 102.50 ms. Throughput: 449.43 iter/sec.
MPrime by default wanted to use 12 workers, but 2 workers is significantly faster.
tdulcet is offline   Reply With Quote