View Single Post
Old 2022-06-28, 19:36   #14
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×3×5×227 Posts
Default

Memory situation looks pretty good to me. Log file shows up to 8 GB used in stage2, so 3 workers at a time could get all they want, and a fourth can get by on somewhat less. Additional workers could be running stage 1 at the same time.

It's not normally necessary to delete results.bench.txt and gwnum.txt.

Mprime defaults to 4 cores per worker, because the expectation is most users will be running DC (~61M exponent) or higher, and 4 cores/worker is reasonably close to maximum total system throughput for that exponent & fft size and somewhat upward.
Deviating considerably from usual usage as you do (very small exponent) or I do (very large exponent) means the usual near optimal no longer applies.
And systems/processors do vary in what is optimal for their specific design.

Please bite the bullet and benchmark with multiple workers. It's the only way you will get close to the full capability of your system on such small exponents.
I suggest benchmarking 1 & 2 cores/worker, on 14 and 7 workers respectively. Latency of an assignment will go up, but throughput (assignments completed per day) should go up.
After that you could try whichever cores/worker seems faster for your chosen work, and varying number of workers downward from all-cores-occupied. It's possible that cache efficiency may be higher at less than the maximum possible number of workers, enough to give higher throughput at say 12 rather than 14 cores working.
You could also consider what is the max throughput/system-power-consumption configuration.

Last fiddled with by kriesel on 2022-06-28 at 19:39
kriesel is offline   Reply With Quote