mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2019-07-20 00:50

Try running it with "-use NO_ASM"
(if it works, you can put that option in config.txt )

The __asm() errors are because you're running with a driver (Adrenalin/windows) that does not support assembly. ROCm/linux works fine with __asm(). Anyway, assembly support is not mandatory, you just need to disable it with -use NO_ASM . I haven't found a way yet (in OpenCL) to automatically detect __asm support.

[QUOTE=ATH;521950]So I got my Radeon VII but I'm a bit lost, it has been many many years since I had an AMD card and it was way before using GPUs for any calculations, and I'm also new to gpuowl.

I installed the newest drivers: Adrenalin 2019 19.7.2. I had "gpuowl-win7-x64-v6.5-c48d46f.7z" from [URL="https://mersenneforum.org/showpost.php?p=516704&postcount=1171"]post #1171[/URL] on my hard drive already from 2 months ago, I think I got it to confirm that OpenCL really worked on my RTX 2080 which it did.

Now when I run it with -device 1 (Radeon VII) it only writes the first few lines but never gets to the "OpenCL compilation in ..." line and it never starts running.



When I use -device 0 it works fine and runs on my RTX 2080.


I tried downloading the " gpuowl-win-v6.5-84-g30c0508.7z" from [URL="https://mersenneforum.org/showpost.php?p=521225&postcount=1274"]post #1274[/URL] but it does not start at all on neither card:

[CODE]2019-07-20 00:05:56 config: -device 1
2019-07-20 00:05:56 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word
2019-07-20 00:05:56 using short carry kernels
2019-07-20 00:05:56 OpenCL args "-DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-07-20 00:05:56 OpenCL compilation error -11 (args -DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-07-20 00:05:56 C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: implicit declaration of function '__asm' is invalid in C99
X2(u[0], u[2]);
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:2: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: expected ')'
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:35: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:7: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: expected ')'
X2(u[0], u[2]);
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:175:35: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:175:7: note: expanded from macro 'X2'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:198:3: error: expected ')'
X2_mul_t4(u[1], u[3]);
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:180:35: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:198:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:180:7: note: expanded from macro 'X2_mul_t4'
__asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:1982019-07-20 00:05:56 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:215 build
2019-07-20 00:05:56 Bye[/CODE]


Are there any more Windows executables collected somewhere?[/QUOTE]

preda 2019-07-20 00:51

Thank you, this is useful!

[QUOTE=ewmayer;521952]No, the FFT is inherently cyclic-convolutional ... the IBDWT weighting allow us to use a prime-length "bit folding" boundary in conjunction with an underlying polynomial-multiply which most naturally lends itself to a bitness which is highly composite, by way of being a multiple of the transform length.


As George noted, for (mod 2^p+1) you need 2 distinct weightings: the IBDWT one to allow for a prime-length bit-folding, and the standard acyclic-effecting weighting, which for a length-n transform uses the first n complex (2*n)th roots of unity. That needs a complex FFT algorithm; for length-n real input vector you can use a length-(n/2) complex FFT. Noting that the [j]th and [j+n/2]th acyclic weights (call them 'awt') are related by awt[j+n/2] = I*awt[j], you can see that in this context it makes sense to group pairs of real inputs together not via the usual (x[j],x[j+1])-treated-as-a-complex-datum scheme but rather in (x[j],x[j+n/2]) pairs, since applying the acyclic-weights turns those 2 reals into (awt[j]*x[j],I*awt[j]*x[j+n/2]), i.e. we can pull out the shared complex acyclic-multiplier awt[j] = exp(I*j/(2*n) to get a weighted complex input awt[j]*(x[j] + I*x[j+n/2]). This is the so-called "right-angle transform" trick. Crandall & Fagin recapped it (since it wasn't new) in the Fermat-mod section of the same 1994 paper where they introduced the Mersenne-mod IBDWT.[/QUOTE]

ATH 2019-07-20 01:07

Thanks. I assume there is no Windows driver where it works?

Now there are no errors but it does not actually start calculating, the card is not being used at all.

[CODE]
gpuowl-win.exe -device 1 -use NO_ASM
2019-07-20 03:01:43 gpuowl v6.5-84-g30c0508
2019-07-20 03:01:43 Note: no config.txt file found
2019-07-20 03:01:43 config: -device 1 -use NO_ASM
2019-07-20 03:01:43 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word
2019-07-20 03:01:43 using short carry kernels
2019-07-20 03:01:43 OpenCL args "-DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DNO_ASM=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"[/CODE]


I was afraid I was being too optimistic trying to run Nvidia and AMD card in the same computer.

Anyone else have any Windows binaries? Only Kriesel posted binaries in this thread from the latests versions.

kriesel 2019-07-20 08:51

[QUOTE=preda;521955]Try running it with "-use NO_ASM"
(if it works, you can put that option in config.txt )

The __asm() errors are because you're running with a driver (Adrenalin/windows) that does not support assembly. ROCm/linux works fine with __asm(). Anyway, assembly support is not mandatory, you just need to disable it with -use NO_ASM . I haven't found a way yet (in OpenCL) to automatically detect __asm support.[/QUOTE]FWIW, the example run of gpuowl v6.5-84-g30c0508 in [URL]https://www.mersenneforum.org/showpost.php?p=521225&postcount=1274[/URL] was on an RX480, Win7 x64, Adrenalin 18.10.2 driver with -use ORIG_X2, after the advice of Prime95 to -use FMA_X2 at [URL]https://www.mersenneforum.org/showpost.php?p=517932&postcount=1213[/URL], plus subsequent experimentation for performance [URL]https://www.mersenneforum.org/showpost.php?p=517961&postcount=1217[/URL]
No data on Radeon VII here yet.

Hadn't seen NO_ASM back when I made the -use list at [URL]https://www.mersenneforum.org/showpost.php?p=517999&postcount=1222[/URL] but I see it there in the gpuowl.cl code of v6.5-76-g1ca08e2-dirty

SELROC 2019-07-20 09:22

[QUOTE=preda;521955]Try running it with "-use NO_ASM"
(if it works, you can put that option in config.txt )

The __asm() errors are because you're running with a driver (Adrenalin/windows) that does not support assembly. ROCm/linux works fine with __asm(). Anyway, assembly support is not mandatory, you just need to disable it with -use NO_ASM . I haven't found a way yet (in OpenCL) to automatically detect __asm support.[/QUOTE]


there should be a way to detect which driver is in use, amdgpu-pro doesn't support __asm() and I set -use NO_ASM.

preda 2019-07-20 11:22

Normally the next log line would be something like:
"OpenCL compilation in 2195 ms"
So it seems it your case it's stuck at the OpenCL compilation step. I'm sorry but I don't really know why, and unfortunatelly I can't repro. (I would be happy to have a fix if the problem is on gpuowl's side)

[QUOTE=ATH;521958]Thanks. I assume there is no Windows driver where it works?

Now there are no errors but it does not actually start calculating, the card is not being used at all.

[CODE]
gpuowl-win.exe -device 1 -use NO_ASM
2019-07-20 03:01:43 gpuowl v6.5-84-g30c0508
2019-07-20 03:01:43 Note: no config.txt file found
2019-07-20 03:01:43 config: -device 1 -use NO_ASM
2019-07-20 03:01:43 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word
2019-07-20 03:01:43 using short carry kernels
2019-07-20 03:01:43 OpenCL args "-DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DNO_ASM=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"[/CODE]


I was afraid I was being too optimistic trying to run Nvidia and AMD card in the same computer.

Anyone else have any Windows binaries? Only Kriesel posted binaries in this thread from the latests versions.[/QUOTE]

SELROC 2019-07-20 12:44

[QUOTE=SELROC;521978]there should be a way to detect which driver is in use, amdgpu-pro doesn't support __asm() and I set -use NO_ASM.[/QUOTE]


it is possible with a script, the "vendor" field should change accordingly for nvidia, also the "configuration: driver=amdgpu latency=0" should change accordingly:


# lshw -class video
*-display
description: VGA compatible controller
product: Ellesmere [Radeon RX 470/480]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:01:00.0
version: e7
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: iomemory:220-21f iomemory:210-20f irq:126 memory:2200000000-23ffffffff memory:2100000000-21001fffff ioport:e000(size=256) memory:f7e00000-f7e3ffff memory:f7e40000-f7e5ffff
*-display
description: VGA compatible controller
product: Intel Corporation
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 04
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: iomemory:2f0-2ef iomemory:2f0-2ef irq:125 memory:2ffe000000-2ffeffffff memory:2fe0000000-2fefffffff ioport:f000(size=64) memory:c0000-dffff

kriesel 2019-07-20 14:41

[QUOTE=ATH;521958]
I was afraid I was being too optimistic trying to run Nvidia and AMD card in the same computer.[/QUOTE]
AH!
Divide and conquer.
Use multiple cl test and info utilities and device manager to check how functional the AMD opencl and driver installation is. Sometimes one will claim all's fine and others will show issues. I've seen one vendor's opencl install hose another's. (iIn that case it was an NVIDIA or AMD SDK install disabling the opencl use of the Intel igp, until the SDKs were removed and the Intel opencl reinstalled.)

You could try a temporary complete removal of the NVIDIA driver followed by removal and reinstall of the AMD driver. Also sometimes an additional second reboot is needed after a graphics driver install.

SELROC 2019-07-20 14:41

[QUOTE=preda;521955]Try running it with "-use NO_ASM"
(if it works, you can put that option in config.txt )

The __asm() errors are because you're running with a driver (Adrenalin/windows) that does not support assembly. ROCm/linux works fine with __asm(). Anyway, assembly support is not mandatory, you just need to disable it with -use NO_ASM . I haven't found a way yet (in OpenCL) to automatically detect __asm support.[/QUOTE]


Hi Mihai, I found a C++ library:


[url]https://github.com/ThePhD/infoware[/url]


Example: [url]https://github.com/ThePhD/infoware/blob/master/examples/gpu.cpp[/url]


reactions?

SELROC 2019-07-21 06:42

[QUOTE=preda;521955]Try running it with "-use NO_ASM"
(if it works, you can put that option in config.txt )

The __asm() errors are because you're running with a driver (Adrenalin/windows) that does not support assembly. ROCm/linux works fine with __asm(). Anyway, assembly support is not mandatory, you just need to disable it with -use NO_ASM . I haven't found a way yet (in OpenCL) to automatically detect __asm support.[/QUOTE]


This header file detects platform:
[url]https://github.com/hendrix2897/platform-detect/blob/master/PlatformDetect.h[/url]

maxzor 2019-07-23 19:37

Hello and thank you for the program.
How much of it depends on CPU performance?
Will it be significantly slower running on a Radeon VII with a pentium II, i5 2500 or R7 1800x (or 3600) ?
I am about to compile in linux soon.
I have a 1800x, and setup Radeon VI for gpuOwl and Nvidia 1050ti for the lesser stuff, any experience in balancing load between two gpus appreciated! [url=https://betrig.com/]betrig[/url]


All times are UTC. The time now is 23:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.