View Single Post
Old 2020-10-05, 05:32   #6
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2×2,311 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I don't know about 12 core chips running LLR, but generally it makes sense to run 1 instance per chip or chiplet.
My experience, mostly on Haswell-era desktops, is that LLR doesn't benefit much from splitting small FFTs on to multiple threads. 128K per thread seems to be a good cutoff- so for OP's example 192K FFT, I doubt running two 2-threaded instances would be faster than four 1-threaded.

Once FFT reaches 256K, 2-threaded runs work pretty well.

OP- I've run LLR on this size of number on prebuilt machines with slow 2-channel memory, and running 3 instances was just about as fast as 4 but generated quite a bit less heat. That is, 3 is enough to saturate the memory on some quad-core machines. It takes some experimenting with threads-per-process and number of processes to find the sweet spot!
VBCurtis is offline   Reply With Quote