![]() |
[QUOTE=Xebecer;547219]-pool C:\Users\xebec\Desktop\GPUOwl Shared in the config.txt file. I get:
Can't open 'C' (mode 'ab') Exception NSt10filesystem7__cxx1116filesystem_errorE" filesystem error" can't open file" No error [C"\Users\xebec\Desktop\GPUOwl_Shared/] Bye[/QUOTE] It's pretty messed-up: something replaced the ':' character in the error message with the " (quote) character. It's also missing an expected "results.txt" at the end. For comparison, this is how I'd expect that error to look: [QUOTE] 2020-06-06 08:06:16 Can't open 'C:\Foo\bar/results.txt' (mode 'ab') 2020-06-06 08:06:16 Exception NSt10filesystem7__cxx1116filesystem_errorE: filesystem error: can't open file: Success [C:\Foo\bar/results.txt] [/QUOTE] Maybe you could attach the full log of gpuowl start-up, which should contain information about the full config options that it sees. |
[QUOTE=kriesel;547079]Is that so regardless of gpu model? I note Radeon VII gpus have a serial number built in. (Cpuid hwinfo produced this on Windows. RX480 and RX550 do not have such serial numbers.)[/QUOTE]
No, I would expect that the availability of unique_id depends on the GPU model. RadeonVII has it, others may not have it. If the file /sys/class/drm/cardN/device/unique_id is there it's likely to have the id information, otherwise not. |
gpuowl-win v6.11-312-gc69350e failed to build
1 Attachment(s)
Same ambiguous overload error as -310 and -311.
I would try an earlier commit but don't know the proper magic git incantation. |
Draft for gpuowl radeon vii tuning
This may become a gpuowl reference thread post later.
#1, get the latest gpuowl version. There have been lots of performance improvements lately, and some added features (LLDC, Jacobi check; PRP proof if you can build it) It's unclear how much performance depends on PCIe width, version, extender type if any, etc. Before starting tuning, document baseline performance and configuration. Do parameter testing while running PRP with GEC, to reliably detect unreliability. Run a gpu monitoring application such as GPU-Z, nvidia-smi or rocm-smi if possible Initially changes can be made rather quickly, and a quick GEC error is quick feedback you've been too aggressive with the settings, but to ensure the gpu is reliable, final selections should be watched for hours or days of error-free operation. Only after days error free is achieved, should any LL or P-1 runs be attempted. Increase memory clock from default (some are able to run as high as 1200Mhz, +20% above nominal) On my setup, I see about 1% performance gain for 5% memory clock increase. (This may be limited because it's in a warm area.) Undervolting. Which voltage(s) to adjust, what are people getting away with relative to what original settings? [URL]https://mersenneforum.org/showpost.php?p=533050&postcount=1630[/URL] is unclear to me Presumably it's what GPU-z calls vddc, the only voltage displayed, and what Radeon software Adrenaline 2020 calls gpu voltage, the only one offered for modification. What's the benefit of undervolting: Allowing higher clocks during thermal limiting or power limiting? Saving on power cost? linux: to manage power requirements, use lower sclk. Windows equivalent: directly adjust gpu clock sclk 5 (highest) 1684 sclk 4 ~1547 Mhz per philf [URL]https://mersenneforum.org/showpost.php?p=534334&postcount=1698[/URL] sclk 3 1373 sclk 2 ? sclk 1? sclk 0? Apparently these vary a bit; preda gave 1520 for sclk 4. In [URL]https://mersenneforum.org/showpost.php?p=533050&postcount=1630[/URL] preda gives an example bash script and describes parameters. see also [URL]https://mersenneforum.org/showpost.php?p=533072&postcount=1632[/URL] Fiddle with fan curve? Default on Windows is only 75% fan at 105C hot spot temp (corresponding to ~80C nominal gpu temp) (I see references such as in [URL]https://mersenneforum.org/showpost.php?p=533143&postcount=1642[/URL] by linux users to setting fan well above 100. What are the units of setfan in linux?) AMD Radeon software on Windows also allows to set a power limit up to +20% or down to -20% relative to nominal Name and save a profile with the resulting gpus-specific tuning settings in the Windows Adrenalin software, so it can easily be reloaded after a system start. After other tuning is done, if you have enough similar work, run two instances per gpu, for a bit more throughput at the cost of about double latency. Is that still worthwhile doing with current commits? Same computation type, PRP & PRP, or LLDC & LLDC, same fft length recommended. Ideally they will be a bit out of phase, so that when one instance is writing to disk or communicating between gpu ram and system/cpu ram, the other is utilizing the gpu computing resources. If work is too dissimilar, two instances will have lower combined throughput than one. Try, measure, adjust. See [URL]https://mersenneforum.org/showpost.php?p=532134&postcount=1507[/URL] Linux is supposedly faster than Windows, perhaps due to lower driver overhead. Does anyone have numbers for that on the same hardware? In bitcoin mining multi-gpu setup howtos, they advise turning off various things in the BIOS, as part of the process of preparing the system to support a large number of gpus. Is any of that known to be relevant or irrelevant to gpuowl performance? What else? |
1 Attachment(s)
We have gpuowl working on our (single-PCI slot) W5500. (We are also using it as our main display card.)
We had to change the makefile's "LIBPATH" to a different place: [C]LIBPATH = -L/opt/amdgpu-pro/lib64 -L.[/C] We have a sample test running. We don't yet know how to decipher the information presented, but at least it works! We are very surprised that gpuowl runs as our normal user. As far as performance, how does this look?[CODE]2020-06-05 17:13:13 gpuowl v6.11-312-gc69350e-dirty 2020-06-05 17:13:13 Note: not found 'config.txt' 2020-06-05 17:13:13 device 0, unique id '' 2020-06-05 17:13:13 gfx1012-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw) 2020-06-05 17:13:13 gfx1012-0 Expected maximum carry32: 583B0000 2020-06-05 17:13:13 gfx1012-0 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DWEIGHT_STEP=0x1.5621686be7602p+0 -DIWEIGHT_STEP=0x1.7f1af377e822p-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DPM1=0 -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=1u -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-06-05 17:13:16 gfx1012-0 OpenCL compilation in 3.10 s 2020-06-05 17:13:17 gfx1012-0 77936867 OK 0 loaded: blockSize 400, 0000000000000003 2020-06-05 17:13:21 gfx1012-0 77936867 OK 800 0.00%; 2982 us/it; ETA 2d 16:34; 1579c241dc63eca6 (check 1.27s) 2020-06-05 17:23:18 gfx1012-0 77936867 OK 200000 0.26%; 2991 us/it; ETA 2d 16:35; f0b04b45b0855bd2 (check 1.28s) 2020-06-05 17:33:15 gfx1012-0 77936867 OK 400000 0.51%; 2979 us/it; ETA 2d 16:10; c03f94396a5aa29e (check 1.27s) 2020-06-05 17:43:17 gfx1012-0 77936867 OK 600000 0.77%; 3004 us/it; ETA 2d 16:32; b9decd65ca71b629 (check 1.28s)[/CODE]PS - [C]Linux xii 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Thu Apr 9 13:49:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux[/C] :mike: |
Some gpuowl Radeon VII tuning data from the trenches
1 Attachment(s)
These are not final tunes, just what gives tolerable error rates for now. The rolloff in GhzD/day at higher fft lengths was surprisingly high at nearly 2:1. The peak I've achieved here at 5M is noticeably lower than what George posted (510 GhzD/day back in March with an earlier slower version on Linux.)
|
[QUOTE=Xyzzy;547247]
As far as performance, how does this look?[CODE]2020-06-05 17:13:13 gpuowl v6.11-312-gc69350e-dirty 2020-06-05 17:13:13 Note: not found 'config.txt' 2020-06-05 17:13:13 device 0, unique id '' 2020-06-05 17:13:13 gfx1012-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw) 2020-06-05 17:13:13 gfx1012-0 Expected maximum carry32: 583B0000 2020-06-05 17:13:13 gfx1012-0 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DWEIGHT_STEP=0x1.5621686be7602p+0 -DIWEIGHT_STEP=0x1.7f1af377e822p-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DPM1=0 -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=1u -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-06-05 17:13:16 gfx1012-0 OpenCL compilation in 3.10 s 2020-06-05 17:13:17 gfx1012-0 77936867 OK 0 loaded: blockSize 400, 0000000000000003 2020-06-05 17:13:21 gfx1012-0 77936867 OK 800 0.00%; 2982 us/it; ETA 2d 16:34; 1579c241dc63eca6 (check 1.27s) 2020-06-05 17:23:18 gfx1012-0 77936867 OK 200000 0.26%; 2991 us/it; ETA 2d 16:35; f0b04b45b0855bd2 (check 1.28s) 2020-06-05 17:33:15 gfx1012-0 77936867 OK 400000 0.51%; 2979 us/it; ETA 2d 16:10; c03f94396a5aa29e (check 1.27s) 2020-06-05 17:43:17 gfx1012-0 77936867 OK 600000 0.77%; 3004 us/it; ETA 2d 16:32; b9decd65ca71b629 (check 1.28s)[/CODE]:mike:[/QUOTE] That's a little slower than my RX480 and gpuowl-winv 6.11-292 runs 4.5M fft (at 2808us/it) on Windows 7 in a cramped warm HP Z600 workstation tower. Unfortunately [url]https://www.mersenne.ca/cudalucas.php[/url] doesn't know anything about a W5500. Maybe you could run and send James a benchmark. |
gpuowl-win v6.11-295 build (may be the last for a while)
2 Attachment(s)
[QUOTE=kriesel;547236]Same ambiguous overload error as -310 and -311.
I would try an earlier commit but don't know the proper magic git incantation.[/QUOTE] Using git bisect iteratively, per [URL]https://git-scm.com/docs/git-bisect[/URL] it appears the last gpuowl commit without the "ambiguous overload" fatal build issue on MSYS2 for Windows is v6.11-295-gaecf041, v6.11-296-g33e2d8e bad. $ git bisect good v6.11-295-gaecf041 [CODE]33e2d8ef73d81c581fc0d0aa161445ddefb03c18 is the first bad commit commit 33e2d8ef73d81c581fc0d0aa161445ddefb03c18 Author: Mihai Preda <mhpreda@gmail.com> Date: Mon May 25 23:39:37 2020 +1000 In work: proof construction blueprint Args.cpp | 6 +---- GmpUtil.cpp | 2 +- GmpUtil.h | 2 +- Gpu.cpp | 79 +++++++++++++++++++++++++++++++++++---------------------- Gpu.h | 6 +++-- ProofSet.h | 84 ++++++++++++++++++++++++++++++++++++++++++++++--------------- Task.cpp | 8 +++++- main.cpp | 1 + 8 files changed, 128 insertions(+), 60 deletions(-)[/CODE] |
FWIW, here is how we got gpuowl working on a clean install of Centos 8.
As root:[CODE]yum update yum install gmp-devel.x86_64 cd ~ wget https://drivers.amd.com/drivers/linux/amdgpu-pro-19.50-1011208-rhel-8.1.tar.xz tar Jxvf amdgpu-pro-19.50-1011208-rhel-8.1.tar.xz cd amdgpu-pro-19.50-1011208-rhel-8.1/ ./amdgpu-pro-install -y --opencl=pal,legacy reboot[/CODE]As a normal user:[CODE]cd ~ git clone https://github.com/preda/gpuowl cd gpuowl <<< fix makefile >>> make[/CODE]We will test Centos 7 later tonight. :mike: |
gpuowl-win v6.11-313 try
1 Attachment(s)
No joy there either. Thanks for trying.
|
[QUOTE=kriesel;547256]No joy there either. Thanks for trying.[/QUOTE]
OK I finally understand what's happening here: it's the windows 32-bit long again! Let me explain, GMP provides constructors that take pretty much any kind of int, among others: unsigned int, and long unsigned int. In my code I'm invoking the constructor with a 64-bit unsigned int, and guess what -- there's no constructor taking that! (because well, both unsigned and long unsigned are 32bit). Very helpful. I'll need to work-around this silly situation. |
| All times are UTC. The time now is 23:02. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.