View Single Post
Old 2022-02-02, 03:21   #32
kriesel's Avatar
Mar 2017
US midwest

19A116 Posts

Originally Posted by ewmayer View Post
I run 2 instances per card on each of my Radeon VIIs for 2 reasons:

1. Gives an total throughput boost in the 7-10% range;

2. If one job hangs or crashes - infrequent, but it does happen - one minimizes the total throughput hit.

Even if one has a GPU model where 2-instances is slightly slower in total-throughput terms - say no more that 5% - [2] makes it worth doing, IMO.

On the R7 I found negative benefit from > 2 instances.
I think the benefit will depend on the GPU model and the work. IIRC for disparate work between the two instances, performance gain may be less, or there may be a loss. (Running very different fft lengths may result in less throughput than a single instance of either length.)

Re the sometimes-two-instances-performance-penalty, in that case why not use a shell script so two instances ALTERNATE when one crashes or runs out of queued work. I suggest a short delay between and perhaps a maximum loop count. An A-B loop without either will inflate logs with lots of garbage otherwise when both instances are out of work or a driver or a lib or symlink has gone bonkers or the GPU has got into a must-crash-app state. (Cue the anything Windows can do, Linux can do better chorus...;)

Last fiddled with by kriesel on 2022-02-02 at 03:21
kriesel is offline   Reply With Quote