View Single Post
Old 2022-06-29, 19:30   #18
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·29·59 Posts
Default

Found in prime95's undoc.txt (emphasis mine):
Code:
Most FFT sizes have several implementations.  The program uses throughput benchmark
data to select the fastest FFT implementation.  The program assumes all CPU cores
will be used and all workers will be running FFTs.  This can be overridden
in gwnum.txt:
    BenchCores=x
    BenchWorkers=y
If the program is selecting what would be fastest with all cores busy, as its documentation states, and your expectation is it would select what would be fastest with a few of the 14 cores busy, that may account for some discrepancies between expectation and observed operation.
I'd be interested to see what George's take is on the clm=2 vs. 1 performance and selection.
kriesel is online now   Reply With Quote