If you run enough benchmarks you will "waste" more time than any optimization will make up for.

Personally, we would run four cores on one job, so the work gets done faster. Our theory is the shorter the run is the less likely there will be a cosmic ray bit-flip.

