![]() |
By way of p.s. to my above post - I finally got around to firing up a 2nd gpuOwl run on my Radeon VII ... when I first installed it Preda told me "based on current data, 1-job running is about the same as 2-job in terms of total throughput per watt", but more recently George suggested that in fact 2-job remains better in those terms. I was unable to use Matt's radeon_setup.sh script to do the manual clock/mem tunings he implements there - even as root get "Permission denied" whenever I try to touch any of the entries under /sys/class/drm/card0/device. So just stuck with my current manual sclk = 5 downclock setting, created a 2nd run-subdir under the gpuowl one, fired up 2nd job. Both running @5632K FFT, here the before and after timings:
Before: Job 1: 753 us/iter After: Job 1: 1407 us/iter Job 2: 1407 us/iter Wattage barely budged - up less than 5%, so total throughput up 7%, slightly less than that on a per-watt basis. (But definitely better in per-watt terms than I get from cranking the sclk setting up to 6 for a single run). |
A heads up for people working with GpuOwl's source code, about some build changes in recent commits.
In gpuowl.cl there is a lot of duplication between similar kernels with a few small changes between them; an example being carryFused and carryFusedMul, which do almost the same thing with the difference that carryFusedMul also does a multiplication-by-3. Unfortunately there is no good mechanism in OpenCL proper to share the common code between the two kernels without a potential performance hit. Rather that having the code duplicated between the two kernels, I chose to add a simple form of a sort of macro expansion, which is implemented by the python script tools/expand.py . This script interprents a few special-form comments in the gpuowl.cl: '//{{' : start a named block '//}}' : end a named block '//==' : instantiate a block This scheme allows to define the body of a kernel once, and then instantiate it multiple times in different contexts. From a build perspective, the path is now: gpuowl.cl -> gpuowl-expanded.cl -> gpuowl-wrap.cpp The file gpuowl-expanded.cl is generated; there is no point in editing it, the source is still gpuowl.cl. gpuowl-expanded.cl, being generated, does not need to be under source control, but for now I added it as a convenience for people who don't have python installed or have difficulty executing tools/expand.py for some reason. (in the future if everybody is fine building with expand.py I can remove gpuowl-expanded.cl from source control) |
[QUOTE=kriesel;539701]Whatever the performance run parameters are, all relevant parameters should be stated along with the timing, so that the timing is not meaningless.[/QUOTE]
+1 |
ROCm 3.1, sclk 3, mem 1180, FFT 5M: 708us/it. (150W)
|
ROCm 2.10 sclk 3 mem 1150 FFT 5632K gpuowl v6.11-134-g1e0ce1d 800 us/it 194.00W 93C
I am still a bit scared to upgrade to ROCm 3.1... Well, I tried to upgrade and now I get: [code] ./gpuowl -device 1 2020-03-16 17:10:53 gpuowl v6.11-197-g3886a11 2020-03-16 17:10:53 Note: not found 'config.txt' 2020-03-16 17:10:53 config: -device 1 2020-03-16 17:10:53 device 1, unique id 'f582388172fd5d41' 2020-03-16 17:10:53 f582388172fd5d41 worktodo.txt line ignored: "" 2020-03-16 17:10:53 f582388172fd5d41 999xxxxx FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word Segmentation fault [/code] [code] uname -a Linux honeypot9 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux [/code] |
[QUOTE=paulunderwood;539853]ROCm 2.10 sclk 3 mem 1150 FFT 5632K gpuowl v6.11-134-g1e0ce1d 800 us/it 194.00W 93C
I am still a bit scared to upgrade to ROCm 3.1... Well, I tried to upgrade and now I get: [code] ./gpuowl -device 1 2020-03-16 17:10:53 gpuowl v6.11-197-g3886a11 2020-03-16 17:10:53 Note: not found 'config.txt' 2020-03-16 17:10:53 config: -device 1 2020-03-16 17:10:53 device 1, unique id 'f582388172fd5d41' 2020-03-16 17:10:53 f582388172fd5d41 worktodo.txt line ignored: "" 2020-03-16 17:10:53 f582388172fd5d41 999xxxxx FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word Segmentation fault [/code] [code] uname -a Linux honeypot9 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux [/code][/QUOTE] I am willing to try a different distro for ROCm 3.1 -- any suggestions? |
[QUOTE=preda;539852]ROCm 3.1, sclk 3, mem 1180, FFT 5M: 708us/it. (150W)[/QUOTE]
[QUOTE=paulunderwood;539853]ROCm 2.10 sclk 3 mem 1150 FFT 5632K gpuowl v6.11-134-g1e0ce1d 800 us/it 194.00W 93C[/QUOTE] Are those with 1 worker or 2? Also, what OS distro are you guys running? As I noted, I am not allowed, even as su, to fiddle the mem-clock settings in my ROCm 2.10 setup under Ubuntu 19.10. |
[QUOTE=ewmayer;539872]Are those with 1 worker or 2? Also, what OS distro are you guys running? As I noted, I am not allowed, even as su, to fiddle the mem-clock settings in my ROCm 2.10 setup under Ubuntu 19.10.[/QUOTE]
I wrestled back the ROCm 2.10.0 driver... With 2 gpuowl instances (with same settings (sclk 3 etc, but with 5 extra Watts and gpuowl v6.11-197-g3886a11-dirty)) I am getting ~1475 us/it each. Thanks for prompting me Ernst -- a great speed-up :tu: |
[QUOTE=paulunderwood;539862]I am willing to try a different distro for ROCm 3.1 -- any suggestions?[/QUOTE]
I'm using Ubuntu 19.10 with Linux kernel 5.4.24. I also tried kernels 5.5.x, 5.6.x and they work too. |
[QUOTE=preda;539879]I'm using Ubuntu 19.10 with Linux kernel 5.4.24. I also tried kernels 5.5.x, 5.6.x and they work too.[/QUOTE]
Anyone get rocm 3.1 to work on Ubuntu 19.04? I've tried 3 times without success. |
-O2 or not
[QUOTE=paulunderwood;539876]
With 2 gpuowl instances (with same settings (sclk 3 etc, but with 5 extra Watts and gpuowl v6.11-197-g3886a11-dirty)) I am getting ~1475 us/it each. Thanks for prompting me Ernst -- a great speed-up :tu:[/QUOTE] This was compiled without -O2 in the Makefile. With it, the iterations are 1487us and the power usage is 1 or 2 Watts lower. |
| All times are UTC. The time now is 23:10. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.