KNL throughput benchmark
Attached is the Knights Landing benchmark run I'd asked for, and the user in question was awesome to help us out with this.
I was happy to see that it scales really well. There's barely a blip in difference between one worker and 64 workers.
At the 2048K FFT size, a solo single-cored worker does 42.34 ms/iter. With all 64 workers going, they still manage an average of somewhere around 44.5 ms/iter for an aggregate throughput of 1435.16 iter/sec.
Up at the higher end of the FFT sizes (I only requested 2M-5M to keep the data set to a dull roar...sorry, no 332M+ sized exponent sized FFTs but I could ask...)
5120K FFT = 110.82 ms/iter for a single worker, and ~ 119.5 ms/iter with all 64 going. Total throughput at that size = 534.78 iter/sec
Attached is the raw results... I was thinking it could be graphed or something to show how additional workers running affects the throughput of each other worker, but it's such a gentle curve from my quick glance that it would hopefully (and thankfully) be a boring graph. Only a 5-7% slowdown from 1 worker to 64 workers... yeah, I'll take that any day.
|