View Single Post
Old 2021-01-13, 23:25   #28
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23×7×127 Posts
Default Performance variables for gpuowl

At least the following, to varying degrees:

The gpu model has a great deal of effect. Radeon VII or RX6x00 preferred for performance/cost ratio. See https://www.mersenne.ca/cudalucas.php

OS overhead in general: headless or at least low graphical activity on the gpu(s), few services, Linux

Gpuowl version; a fairly recent one, v7.2-53 or v6.11-364 to 380

gpuowl performance is best with -use ASM where compatible.
-use ASM I think requires OpenCL support and gpu driver support.
The Windows Adrenalin driver does not support ASM. https://mersenneforum.org/showpost.p...postcount=1288
The Linux amdgpupro driver (compiler?) does not support ASM. https://mersenneforum.org/showpost.p...postcount=1292
The ROCm driver supports ASM as implemented by Preda in gpuowl.
ROCm driver is only supported /available on Linux. https://github.com/RadeonOpenCompute/ROCm
ROCm is not supported on Windows and does not appear to be a priority https://github.com/RadeonOpenCompute/ROCm/issues/390
"Failed ASM implies Windows. AMD's Windows OpenCL support for the Radeon VII is awful." Prime95, https://mersenneforum.org/showpost.p...&postcount=282

There's considerable variation in throughput versus fft length https://mersenneforum.org/showpost.p...postcount=2272

There's noticeable variation in performance versus driver version.
For ROCM version,
for better: https://mersenneforum.org/showpost.p...postcount=2043
or worse: https://mersenneforum.org/showpost.p...postcount=2316
https://mersenneforum.org/showpost.p...postcount=1089
Similar variation or more occurs in Windows driver version performance. I've seen up to 5+% drops for Windows Adrenalin driver "upgrades", or slight improvement.

Other variables affecting performance include ambient temperature, fan curves, and power curve settings; some users reduce power and throughput to improve performance/cost ratio. Throughput is less than linear with power.
Linux by Preda: https://mersenneforum.org/showpost.p...postcount=1623
Windows 10 by Kriesel: https://mersenneforum.org/showpost.p...postcount=1625

Also running two instances per gpu, as in https://mersenneforum.org/showpost.p...postcount=1937

Testing alternatives for same or similar fft lengths, and using the fastest that is sufficiently reliable

Overclocking especially gpu ram, but only up to the point where it's still reliable. Explore those limits running PRP/GEC which will very reliably and quickly detect errors. Back off from detection of errors, down to where error rate is less than 1/week, and a little bit lower for stability in LL DC or P-1. There are reports of reliable memory clock rates up to 20% above nominal on some Radeon VIIs. (Hynix HBM2 Vram is overclockable; Samsung is not reliable even at nominal rate, or considerably below.)

Specify a judiciously chosen value of -log (interval) in config.txt, to optimize overhead of GEC versus loss of progress due to GEC detected errors. This will take effect upon the next application start. Savings will depend upon the error rate. See the discussion including Robert Gerbicz' post at https://mersenneforum.org/showthread.php?t=26626

Specify a judiciously chosen value of -block (small GEC block size l) to optimize overhead of GEC versus loss of progress due to GEC detected errors. This will take effect upon the beginning of the next PRP exponent (not upon a GpuOwl restart, unless interim files for the exponent are deleted or unreadable) GEC overhead is 0.5% at -block 400 (current default), and 0.2% at -block 1000, per Preda.


Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-07-10 at 15:36 Reason: added -block and -log effect
kriesel is offline