View Single Post
Old 2021-02-10, 10:13   #5
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

22×199 Posts
Default

Quote:
Originally Posted by henryzz View Post
Memory bandwidth is often the issue. More recently some cpus have had enough L3 cache that memory won't be needed. This threadripper is likely to be one of them.
For small FFT's it's definitely true that memory bandwidth is irrelevant, for large FFT's memory bandwidth should come into play as you probably want to increase worker count to decrease CCX-to-CCX communication, 16 discrete chunks of L3 is not ideal and needs to be worked around. The larger the FFT the more likely memory bandwidth comes into play, but as long as you tune to the point that CCX comm, cache contention, memory bandwidth and compute are in check you should be in the ballpark of optimal.
M344587487 is offline   Reply With Quote