mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-07-20, 00:50   #1288
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Try running it with "-use NO_ASM"
(if it works, you can put that option in config.txt )

The __asm() errors are because you're running with a driver (Adrenalin/windows) that does not support assembly. ROCm/linux works fine with __asm(). Anyway, assembly support is not mandatory, you just need to disable it with -use NO_ASM . I haven't found a way yet (in OpenCL) to automatically detect __asm support.

Quote:
Originally Posted by ATH View Post
So I got my Radeon VII but I'm a bit lost, it has been many many years since I had an AMD card and it was way before using GPUs for any calculations, and I'm also new to gpuowl.

I installed the newest drivers: Adrenalin 2019 19.7.2. I had "gpuowl-win7-x64-v6.5-c48d46f.7z" from post #1171 on my hard drive already from 2 months ago, I think I got it to confirm that OpenCL really worked on my RTX 2080 which it did.

Now when I run it with -device 1 (Radeon VII) it only writes the first few lines but never gets to the "OpenCL compilation in ..." line and it never starts running.



When I use -device 0 it works fine and runs on my RTX 2080.


I tried downloading the " gpuowl-win-v6.5-84-g30c0508.7z" from post #1274 but it does not start at all on neither card:

Code:
2019-07-20 00:05:56 config: -device 1 
2019-07-20 00:05:56 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word
2019-07-20 00:05:56 using short carry kernels
2019-07-20 00:05:56 OpenCL args "-DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-07-20 00:05:56 OpenCL compilation error -11 (args -DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-07-20 00:05:56 C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: implicit declaration of function '__asm' is invalid in C99
  X2(u[0], u[2]);
  ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:2: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
        ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: expected ')'
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:35: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                         ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:174:7: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
             ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: error: expected ')'
  X2(u[0], u[2]);
  ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:175:35: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                         ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:197:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:175:7: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
             ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:198:3: error: expected ')'
  X2_mul_t4(u[1], u[3]);
  ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:180:35: note: expanded from macro 'X2_mul_t4'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
                                         ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:198:3: note: to match this '('
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:180:7: note: expanded from macro 'X2_mul_t4'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
             ^
C:\Users\ATH\AppData\Local\Temp\\OCL7076T0.cl:1982019-07-20 00:05:56 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:215 build
2019-07-20 00:05:56 Bye

Are there any more Windows executables collected somewhere?
preda is offline   Reply With Quote
Old 2019-07-20, 00:51   #1289
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101010110112 Posts
Default

Thank you, this is useful!

Quote:
Originally Posted by ewmayer View Post
No, the FFT is inherently cyclic-convolutional ... the IBDWT weighting allow us to use a prime-length "bit folding" boundary in conjunction with an underlying polynomial-multiply which most naturally lends itself to a bitness which is highly composite, by way of being a multiple of the transform length.


As George noted, for (mod 2^p+1) you need 2 distinct weightings: the IBDWT one to allow for a prime-length bit-folding, and the standard acyclic-effecting weighting, which for a length-n transform uses the first n complex (2*n)th roots of unity. That needs a complex FFT algorithm; for length-n real input vector you can use a length-(n/2) complex FFT. Noting that the [j]th and [j+n/2]th acyclic weights (call them 'awt') are related by awt[j+n/2] = I*awt[j], you can see that in this context it makes sense to group pairs of real inputs together not via the usual (x[j],x[j+1])-treated-as-a-complex-datum scheme but rather in (x[j],x[j+n/2]) pairs, since applying the acyclic-weights turns those 2 reals into (awt[j]*x[j],I*awt[j]*x[j+n/2]), i.e. we can pull out the shared complex acyclic-multiplier awt[j] = exp(I*j/(2*n) to get a weighted complex input awt[j]*(x[j] + I*x[j+n/2]). This is the so-called "right-angle transform" trick. Crandall & Fagin recapped it (since it wasn't new) in the Fermat-mod section of the same 1994 paper where they introduced the Mersenne-mod IBDWT.
preda is offline   Reply With Quote
Old 2019-07-20, 01:07   #1290
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×1,579 Posts
Default

Thanks. I assume there is no Windows driver where it works?

Now there are no errors but it does not actually start calculating, the card is not being used at all.

Code:
gpuowl-win.exe -device 1 -use NO_ASM
2019-07-20 03:01:43 gpuowl v6.5-84-g30c0508
2019-07-20 03:01:43 Note: no config.txt file found
2019-07-20 03:01:43 config: -device 1 -use NO_ASM
2019-07-20 03:01:43 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word
2019-07-20 03:01:43 using short carry kernels
2019-07-20 03:01:43 OpenCL args "-DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DNO_ASM=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"

I was afraid I was being too optimistic trying to run Nvidia and AMD card in the same computer.

Anyone else have any Windows binaries? Only Kriesel posted binaries in this thread from the latests versions.

Last fiddled with by ATH on 2019-07-20 at 01:21
ATH is offline   Reply With Quote
Old 2019-07-20, 08:51   #1291
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

541910 Posts
Default

Quote:
Originally Posted by preda View Post
Try running it with "-use NO_ASM"
(if it works, you can put that option in config.txt )

The __asm() errors are because you're running with a driver (Adrenalin/windows) that does not support assembly. ROCm/linux works fine with __asm(). Anyway, assembly support is not mandatory, you just need to disable it with -use NO_ASM . I haven't found a way yet (in OpenCL) to automatically detect __asm support.
FWIW, the example run of gpuowl v6.5-84-g30c0508 in https://www.mersenneforum.org/showpo...postcount=1274 was on an RX480, Win7 x64, Adrenalin 18.10.2 driver with -use ORIG_X2, after the advice of Prime95 to -use FMA_X2 at https://www.mersenneforum.org/showpo...postcount=1213, plus subsequent experimentation for performance https://www.mersenneforum.org/showpo...postcount=1217
No data on Radeon VII here yet.

Hadn't seen NO_ASM back when I made the -use list at https://www.mersenneforum.org/showpo...postcount=1222 but I see it there in the gpuowl.cl code of v6.5-76-g1ca08e2-dirty
kriesel is offline   Reply With Quote
Old 2019-07-20, 09:22   #1292
SELROC
 

2,137 Posts
Default

Quote:
Originally Posted by preda View Post
Try running it with "-use NO_ASM"
(if it works, you can put that option in config.txt )

The __asm() errors are because you're running with a driver (Adrenalin/windows) that does not support assembly. ROCm/linux works fine with __asm(). Anyway, assembly support is not mandatory, you just need to disable it with -use NO_ASM . I haven't found a way yet (in OpenCL) to automatically detect __asm support.

there should be a way to detect which driver is in use, amdgpu-pro doesn't support __asm() and I set -use NO_ASM.
  Reply With Quote
Old 2019-07-20, 11:22   #1293
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

25338 Posts
Default

Normally the next log line would be something like:
"OpenCL compilation in 2195 ms"
So it seems it your case it's stuck at the OpenCL compilation step. I'm sorry but I don't really know why, and unfortunatelly I can't repro. (I would be happy to have a fix if the problem is on gpuowl's side)

Quote:
Originally Posted by ATH View Post
Thanks. I assume there is no Windows driver where it works?

Now there are no errors but it does not actually start calculating, the card is not being used at all.

Code:
gpuowl-win.exe -device 1 -use NO_ASM
2019-07-20 03:01:43 gpuowl v6.5-84-g30c0508
2019-07-20 03:01:43 Note: no config.txt file found
2019-07-20 03:01:43 config: -device 1 -use NO_ASM
2019-07-20 03:01:43 80293033 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.02 bits/word
2019-07-20 03:01:43 using short carry kernels
2019-07-20 03:01:43 OpenCL args "-DEXP=80293033u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xf.d1f3073e091p-3 -DIWEIGHT_STEP=0x8.17498299a4db8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DNO_ASM=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"

I was afraid I was being too optimistic trying to run Nvidia and AMD card in the same computer.

Anyone else have any Windows binaries? Only Kriesel posted binaries in this thread from the latests versions.
preda is offline   Reply With Quote
Old 2019-07-20, 12:44   #1294
SELROC
 

23×52×43 Posts
Default

Quote:
Originally Posted by SELROC View Post
there should be a way to detect which driver is in use, amdgpu-pro doesn't support __asm() and I set -use NO_ASM.

it is possible with a script, the "vendor" field should change accordingly for nvidia, also the "configuration: driver=amdgpu latency=0" should change accordingly:


# lshw -class video
*-display
description: VGA compatible controller
product: Ellesmere [Radeon RX 470/480]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:01:00.0
version: e7
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: iomemory:220-21f iomemory:210-20f irq:126 memory:2200000000-23ffffffff memory:2100000000-21001fffff ioport:e000(size=256) memory:f7e00000-f7e3ffff memory:f7e40000-f7e5ffff
*-display
description: VGA compatible controller
product: Intel Corporation
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 04
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: iomemory:2f0-2ef iomemory:2f0-2ef irq:125 memory:2ffe000000-2ffeffffff memory:2fe0000000-2fefffffff ioport:f000(size=64) memory:c0000-dffff
  Reply With Quote
Old 2019-07-20, 14:41   #1295
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by ATH View Post
I was afraid I was being too optimistic trying to run Nvidia and AMD card in the same computer.
AH!
Divide and conquer.
Use multiple cl test and info utilities and device manager to check how functional the AMD opencl and driver installation is. Sometimes one will claim all's fine and others will show issues. I've seen one vendor's opencl install hose another's. (iIn that case it was an NVIDIA or AMD SDK install disabling the opencl use of the Intel igp, until the SDKs were removed and the Intel opencl reinstalled.)

You could try a temporary complete removal of the NVIDIA driver followed by removal and reinstall of the AMD driver. Also sometimes an additional second reboot is needed after a graphics driver install.
kriesel is offline   Reply With Quote
Old 2019-07-20, 14:41   #1296
SELROC
 

67·107 Posts
Default

Quote:
Originally Posted by preda View Post
Try running it with "-use NO_ASM"
(if it works, you can put that option in config.txt )

The __asm() errors are because you're running with a driver (Adrenalin/windows) that does not support assembly. ROCm/linux works fine with __asm(). Anyway, assembly support is not mandatory, you just need to disable it with -use NO_ASM . I haven't found a way yet (in OpenCL) to automatically detect __asm support.

Hi Mihai, I found a C++ library:


https://github.com/ThePhD/infoware


Example: https://github.com/ThePhD/infoware/b...amples/gpu.cpp


reactions?
  Reply With Quote
Old 2019-07-21, 06:42   #1297
SELROC
 

5·17·29 Posts
Default

Quote:
Originally Posted by preda View Post
Try running it with "-use NO_ASM"
(if it works, you can put that option in config.txt )

The __asm() errors are because you're running with a driver (Adrenalin/windows) that does not support assembly. ROCm/linux works fine with __asm(). Anyway, assembly support is not mandatory, you just need to disable it with -use NO_ASM . I haven't found a way yet (in OpenCL) to automatically detect __asm support.

This header file detects platform:
https://github.com/hendrix2897/platf...atformDetect.h
  Reply With Quote
Old 2019-07-23, 19:37   #1298
maxzor
 
Apr 2017

248 Posts
Default

Hello and thank you for the program.
How much of it depends on CPU performance?
Will it be significantly slower running on a Radeon VII with a pentium II, i5 2500 or R7 1800x (or 3600) ?
I am about to compile in linux soon.
I have a 1800x, and setup Radeon VI for gpuOwl and Nvidia 1050ti for the lesser stuff, any experience in balancing load between two gpus appreciated! betrig

Last fiddled with by maxzor on 2019-07-23 at 19:52
maxzor is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 20:32.


Sun Aug 1 20:32:26 UTC 2021 up 9 days, 15:01, 0 users, load averages: 2.12, 2.22, 1.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.