View Single Post
Old 2021-11-17, 12:37   #20
kriesel's Avatar
Mar 2017
US midwest

22×1,619 Posts
Default Optimizing core count for fastest iteration time of a single task

On a dual-processor-package system, 2 x E5-2697V2 (each of which are 12-core plus x2 hyperthreading, for a total of 2 x 12 x 2 =48 logical processors), within Ubuntu running atop WSL1 on Win 10 Pro x64, a series of self tests for a single fft length and varying cpu core counts were run in Mlucas v20.1.1 (Nov 6 tarball). Usage that way would be likely when attempting to complete one testing task as quickly as possible (minimum latency). Examples are OBD or F33 P-1, or confirming a new Mersenne prime discovery. It is very likely not the maximum-throughput case, that would constitute typical production running.

Fastest iteration time, ~400ms/iter at 192M (suitable for OBD P-1) was obtained at 20 cores, which is less than the total physical core count 24.
Iteration times obtained were observed to have limited reproducibility at 100 iterations. (10% or worse variability.) Reproducibility was much better with 1000-iteration runs.
Best reproducibility was apparently by running nothing else, no interactive use, and not even top, although I left a gpuowl instance running uninterrupted.

The thread efficiency = ms/iter * threadcount / ms/iter for 1 thread varied widely, down to 20.6% at 48 threads. At the fastest iteration time, 20 threads, it was 65.8%. Power-of-two thread counts were in most cases local maxima.

The tests were performed by writing and launching a simple sequential shell script, specifying Mlucas command line and output redirection, followed by rename of the mlucas.cfg before the next thread count run.

Top of this reference thread
Top of reference tree:
Attached Files
File Type: pdf optimizing 192M thread count.pdf (27.8 KB, 39 views)

Last fiddled with by kriesel on 2021-11-17 at 14:46
kriesel is offline