mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

ewmayer 2020-03-14 21:45

By way of p.s. to my above post - I finally got around to firing up a 2nd gpuOwl run on my Radeon VII ... when I first installed it Preda told me "based on current data, 1-job running is about the same as 2-job in terms of total throughput per watt", but more recently George suggested that in fact 2-job remains better in those terms. I was unable to use Matt's radeon_setup.sh script to do the manual clock/mem tunings he implements there - even as root get "Permission denied" whenever I try to touch any of the entries under /sys/class/drm/card0/device. So just stuck with my current manual sclk = 5 downclock setting, created a 2nd run-subdir under the gpuowl one, fired up 2nd job. Both running @5632K FFT, here the before and after timings:

Before:
Job 1: 753 us/iter

After:
Job 1: 1407 us/iter
Job 2: 1407 us/iter

Wattage barely budged - up less than 5%, so total throughput up 7%, slightly less than that on a per-watt basis. (But definitely better in per-watt terms than I get from cranking the sclk setting up to 6 for a single run).

preda 2020-03-15 10:39

A heads up for people working with GpuOwl's source code, about some build changes in recent commits.

In gpuowl.cl there is a lot of duplication between similar kernels with a few small changes between them; an example being carryFused and carryFusedMul, which do almost the same thing with the difference that carryFusedMul also does a multiplication-by-3. Unfortunately there is no good mechanism in OpenCL proper to share the common code between the two kernels without a potential performance hit.

Rather that having the code duplicated between the two kernels, I chose to add a simple form of a sort of macro expansion, which is implemented by the python script tools/expand.py . This script interprents a few special-form comments in the gpuowl.cl:
'//{{' : start a named block
'//}}' : end a named block
'//==' : instantiate a block

This scheme allows to define the body of a kernel once, and then instantiate it multiple times in different contexts.

From a build perspective, the path is now:
gpuowl.cl -> gpuowl-expanded.cl -> gpuowl-wrap.cpp

The file gpuowl-expanded.cl is generated; there is no point in editing it, the source is still gpuowl.cl.

gpuowl-expanded.cl, being generated, does not need to be under source control, but for now I added it as a convenience for people who don't have python installed or have difficulty executing tools/expand.py for some reason. (in the future if everybody is fine
building with expand.py I can remove gpuowl-expanded.cl from source control)

LaurV 2020-03-16 02:45

[QUOTE=kriesel;539701]Whatever the performance run parameters are, all relevant parameters should be stated along with the timing, so that the timing is not meaningless.[/QUOTE]
+1

preda 2020-03-16 15:06

ROCm 3.1, sclk 3, mem 1180, FFT 5M: 708us/it. (150W)

paulunderwood 2020-03-16 15:14

ROCm 2.10 sclk 3 mem 1150 FFT 5632K gpuowl v6.11-134-g1e0ce1d 800 us/it 194.00W 93C

I am still a bit scared to upgrade to ROCm 3.1...

Well, I tried to upgrade and now I get:

[code]
./gpuowl -device 1
2020-03-16 17:10:53 gpuowl v6.11-197-g3886a11
2020-03-16 17:10:53 Note: not found 'config.txt'
2020-03-16 17:10:53 config: -device 1
2020-03-16 17:10:53 device 1, unique id 'f582388172fd5d41'
2020-03-16 17:10:53 f582388172fd5d41 worktodo.txt line ignored: ""
2020-03-16 17:10:53 f582388172fd5d41 999xxxxx FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word
Segmentation fault
[/code]

[code]
uname -a
Linux honeypot9 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux
[/code]

paulunderwood 2020-03-16 17:49

[QUOTE=paulunderwood;539853]ROCm 2.10 sclk 3 mem 1150 FFT 5632K gpuowl v6.11-134-g1e0ce1d 800 us/it 194.00W 93C

I am still a bit scared to upgrade to ROCm 3.1...

Well, I tried to upgrade and now I get:

[code]
./gpuowl -device 1
2020-03-16 17:10:53 gpuowl v6.11-197-g3886a11
2020-03-16 17:10:53 Note: not found 'config.txt'
2020-03-16 17:10:53 config: -device 1
2020-03-16 17:10:53 device 1, unique id 'f582388172fd5d41'
2020-03-16 17:10:53 f582388172fd5d41 worktodo.txt line ignored: ""
2020-03-16 17:10:53 f582388172fd5d41 999xxxxx FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word
Segmentation fault
[/code]

[code]
uname -a
Linux honeypot9 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux
[/code][/QUOTE]


I am willing to try a different distro for ROCm 3.1 -- any suggestions?

ewmayer 2020-03-16 19:20

[QUOTE=preda;539852]ROCm 3.1, sclk 3, mem 1180, FFT 5M: 708us/it. (150W)[/QUOTE]

[QUOTE=paulunderwood;539853]ROCm 2.10 sclk 3 mem 1150 FFT 5632K gpuowl v6.11-134-g1e0ce1d 800 us/it 194.00W 93C[/QUOTE]

Are those with 1 worker or 2? Also, what OS distro are you guys running? As I noted, I am not allowed, even as su, to fiddle the mem-clock settings in my ROCm 2.10 setup under Ubuntu 19.10.

paulunderwood 2020-03-16 20:07

[QUOTE=ewmayer;539872]Are those with 1 worker or 2? Also, what OS distro are you guys running? As I noted, I am not allowed, even as su, to fiddle the mem-clock settings in my ROCm 2.10 setup under Ubuntu 19.10.[/QUOTE]

I wrestled back the ROCm 2.10.0 driver...

With 2 gpuowl instances (with same settings (sclk 3 etc, but with 5 extra Watts and gpuowl v6.11-197-g3886a11-dirty)) I am getting ~1475 us/it each. Thanks for prompting me Ernst -- a great speed-up :tu:

preda 2020-03-16 20:46

[QUOTE=paulunderwood;539862]I am willing to try a different distro for ROCm 3.1 -- any suggestions?[/QUOTE]

I'm using Ubuntu 19.10 with Linux kernel 5.4.24. I also tried kernels 5.5.x, 5.6.x and they work too.

Prime95 2020-03-16 22:04

[QUOTE=preda;539879]I'm using Ubuntu 19.10 with Linux kernel 5.4.24. I also tried kernels 5.5.x, 5.6.x and they work too.[/QUOTE]

Anyone get rocm 3.1 to work on Ubuntu 19.04? I've tried 3 times without success.

paulunderwood 2020-03-17 06:23

-O2 or not
 
[QUOTE=paulunderwood;539876]

With 2 gpuowl instances (with same settings (sclk 3 etc, but with 5 extra Watts and gpuowl v6.11-197-g3886a11-dirty)) I am getting ~1475 us/it each. Thanks for prompting me Ernst -- a great speed-up :tu:[/QUOTE]

This was compiled without -O2 in the Makefile. With it, the iterations are 1487us and the power usage is 1 or 2 Watts lower.


All times are UTC. The time now is 23:10.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.