I'm having trouble deciding between 1 worker and 4 workers in terms of overall efficiency on a particular machine. I ran 30 minute benchmarks (BenchTime=1800) twice, and get close (but different) results. Is this the equivalent to a "statistical tie", and I should just randomly pick a configuration (1, 2, or 4 workers) and just forget about it?
Code:
Timings for 4096K FFT length (4 cpus, 1 worker): 5.85 ms. Throughput: 170.91 iter/sec.
Timings for 4096K FFT length (4 cpus, 2 workers): 11.81, 11.74 ms. Throughput: 169.85 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 23.23, 23.36, 23.19, 23.23 ms. Throughput: 172.04 iter/sec.
Code:
Timings for 4096K FFT length (4 cpus, 1 worker): 5.83 ms. Throughput: 171.43 iter/sec.
Timings for 4096K FFT length (4 cpus, 2 workers): 11.57, 11.57 ms. Throughput: 172.89 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 23.56, 23.44, 23.46, 23.89 ms. Throughput: 169.60 iter/sec.