mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

GP2 2019-01-05 22:19

I don't know much about OpenCL, but according to the Wikipedia page, it can be used not just on GPUs but also on other hardware such as FPGAs.

I don't know much about FPGAs either, but they are available for rent by the hour on AWS cloud. Specifically, Xilinx Virtex UltraScale+ VU9P FPGAs. A bit pricey though.

Would it be possible to run gpuOwL on an FPGA? Do FPGAs offer greater flexibility that would enable tuning and better performance than a GPU could achieve?

kriesel 2019-01-05 23:32

[QUOTE=GP2;505068]I don't know much about OpenCL, but according to the Wikipedia page, it can be used not just on GPUs but also on other hardware such as FPGAs.

I don't know much about FPGAs either, but they are available for rent by the hour on AWS cloud. Specifically, Xilinx Virtex UltraScale+ VU9P FPGAs. A bit pricey though.

Would it be possible to run gpuOwL on an FPGA? Do FPGAs offer greater flexibility that would enable tuning and better performance than a GPU could achieve?[/QUOTE]
Maybe. Not all OpenCLs are created equal. I tried running an early version of gpuOwl on an Intel IGP, and the results were less than promising. NVIDIA OpenCl was a no go at v0.5 and v1.9 and ~v3.8.

Search for VCU1525, $3500 on eBay. No refunds. Just the board, not the whole development kit.

[URL]https://www.xilinx.com/products/boards-and-kits/vcu1525-a.html#tabAnchor-documentation[/URL]

preda 2019-01-06 01:09

[QUOTE=GP2;505068]I don't know much about OpenCL, but according to the Wikipedia page, it can be used not just on GPUs but also on other hardware such as FPGAs.

I don't know much about FPGAs either, but they are available for rent by the hour on AWS cloud. Specifically, Xilinx Virtex UltraScale+ VU9P FPGAs. A bit pricey though.

Would it be possible to run gpuOwL on an FPGA? Do FPGAs offer greater flexibility that would enable tuning and better performance than a GPU could achieve?[/QUOTE]

While OpenCL *may* be portable to some degree, the performance is not portable (and thus, IMO, the whole point of OpenCL "portability" is moot). I mean that even if it would run on an FPGA, it would probably run extremely slow before perf tuning.

In practice, I strongly expect GpuOwl to not run at all on an FPGA. It does use LDS (Local Data Share) which likely is not available on FPGA. It uses DP FP heavily, which may not be present as specialized hardware sub-elements on the FPGA, and thus would be very expensive and rather slow to implement on plain FPGA.

For FPGA, I think a different design that plays into FPGA's strengths is needed. And maybe some specialized DP units would help too.

preda 2019-01-08 00:58

[QUOTE=kriesel;504925]
P-1 code that "merely" does P-1 efficiently and reliably on OpenCl is a good thing, an advance from the status quo. Any performance advantage, reusability of the full P-1 residue, etc, is a bonus.
[/QUOTE]

I agree. Unfortunately my PRP-1 implementation is not an efficient P-1 implementation: if used only for P-1, PRP-1 is wasteful. If OTOH the P-1 is continued with the PRP, then PRP-1 becomes an efficient implementation overall.

In other words: if one wants to do standalone P-1 in OpenCL, a different implementation is needed, which would probably do "classic" P-1 as in mprime and cudapm1.

clarke 2019-01-08 06:10

gpuowl for OpenCL 1.2 binary
 
Hi kriesel and preda,
May I ask you to compile gpuowl for OpenCL 1.2 that supposed to run on Windows 10 1803 x64 with latest AMD OpenCL 15.7.1/15.11.1 driver for 5870 which doesn't support OpenCL 2.0.
Unfortunately, gpuowl 5.0-df2bdf2 throws "aclBinary init failure" error:
[code]
2019-01-08 08:27:48 Exiting because "OpenCL compilation"
2019-01-08 08:27:48 Bye
2019-01-08 08:34:57 gpuowl 5.0-df2bdf2
2019-01-08 08:34:57
2019-01-08 08:34:57 216091 FFT 128K: Width 64x4, Height 64x4; 1.65 bits/word
2019-01-08 08:34:57 using long carry kernels
2019-01-08 08:34:57 Cypress-20x 850-@1:0.0 AMD Radeon HD 5800 Series
2019-01-08 08:34:57 OpenCL compilation error -11 (args -DEXP=216091u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 )
2019-01-08 08:34:57 Error: aclBinary init failure

2019-01-08 08:34:57 Exiting because "OpenCL compilation"
2019-01-08 08:34:57 Bye
2019-01-08 08:53:24 gpuowl 5.0-df2bdf2[/code]
I guess it expects OpenCL 2.0 due to [b]-cl-std=CL2.0[/b] parameter.
Perhaps CL_HPP_CL_1_2_DEFAULT_BUILD macro should be defined before include cl2.hpp
[url]https://github.com/KhronosGroup/OpenCL-CLHPP/issues/27#issuecomment-282794419[/url]

preda 2019-01-08 10:33

[QUOTE=clarke;505291]Hi kriesel and preda,
May I ask you to compile gpuowl for OpenCL 1.2 that supposed to run on Windows 10 1803 x64 with latest AMD OpenCL 15.7.1/15.11.1 driver for 5870 which doesn't support OpenCL 2.0.
[/QUOTE]

gpuowl at this point requires OpenCL 2.0. The main reason for this was to get access to the new atomics primitives that become available in 2.0.

It would be possible to move back and support again OpenCL 1.2, but that would require some work.

Another option would be to get a driver that accepts -cl-std=2.0, which is not unusual on AMD at this point (e.g. both ROCm and amdgpu-pro accept -cl-std=2.0). I don't know about Windows and your particular GPU.

henryzz 2019-01-08 12:17

[QUOTE=preda;505296]gpuowl at this point requires OpenCL 2.0. The main reason for this was to get access to the new atomics primitives that become available in 2.0.

It would be possible to move back and support again OpenCL 1.2, but that would require some work.

Another option would be to get a driver that accepts -cl-std=2.0, which is not unusual on AMD at this point (e.g. both ROCm and amdgpu-pro accept -cl-std=2.0). I don't know about Windows and your particular GPU.[/QUOTE]

One advantage of also supporting 1.2 would be nvidia support.

M344587487 2019-01-08 13:36

Did gpuowl move to PRP before moving to OpenCL 2.0? Maybe there's an older version of gpuowl that would work.

kriesel 2019-01-08 13:39

1 Attachment(s)
Fairly current driver for RX480 & RX 550 is AMD Adrenalin 18.10.2, supports OpenCL 2.0 and some 2.1 apparently. Latest currently is 18.12.3.

For links to posts of Windows executables for various versions of gpuowl, see bottom of post 4 in [URL]https://www.mersenneforum.org/showthread.php?t=23391[/URL]

If the 5870 of interest is an HD5870, that was introduced 9 years ago. [URL]https://www.techpowerup.com/gpu-specs/radeon-hd-5870.c253[/URL]
May be worth upgrading the gpu, on a cost of electricity per throughput basis, or for performance.

As I recall, the switch to PRP was around V0.7. Early versions had limited fft lengths. The GIMPS wavefront has passed 4M and is close to passing 5M (gpuowl V2.0). gpuowl 1.9 had 4, 8 and 16M. Performance and fft length choice is better in 3.5-3.8 but I think that is already OpenCl 2.0 territory.

clarke 2019-01-08 23:03

[QUOTE=preda;505296]gpuowl at this point requires OpenCL 2.0. The main reason for this was to get access to the new atomics primitives that become available in 2.0.

It would be possible to move back and support again OpenCL 1.2, but that would require some work.

Another option would be to get a driver that accepts -cl-std=2.0, which is not unusual on AMD at this point (e.g. both ROCm and amdgpu-pro accept -cl-std=2.0). I don't know about Windows and your particular GPU.[/QUOTE]

Unfortunately, it seems like there is no OpenCL 2.0 for pre-GCN AMD cards.

[QUOTE=kriesel;505300]Fairly current driver for RX480 & RX 550 is AMD Adrenalin 18.10.2, supports OpenCL 2.0 and some 2.1 apparently. Latest currently is 18.12.3.

For links to posts of Windows executables for various versions of gpuowl, see bottom of post 4 in [URL]https://www.mersenneforum.org/showthread.php?t=23391[/URL]

If the 5870 of interest is an HD5870, that was introduced 9 years ago. [URL]https://www.techpowerup.com/gpu-specs/radeon-hd-5870.c253[/URL]
May be worth upgrading the gpu, on a cost of electricity per throughput basis, or for performance.

As I recall, the switch to PRP was around V0.7. Early versions had limited fft lengths. The GIMPS wavefront has passed 4M and is close to passing 5M (gpuowl V2.0). gpuowl 1.9 had 4, 8 and 16M. Performance and fft length choice is better in 3.5-3.8 but I think that is already OpenCl 2.0 territory.[/QUOTE]
Agreed, that card is very old and power-hungry (; but I'm not ready to upgrade it.
Yep, I've tried each and every Windows-version at that page and only 1.9 starts running with OCL 1.0/1.2. But it catches some error and doesn't move forward:
[code]
gpuOwL v1.9- GPU Mersenne primality checker
AMD Radeon HD 5800 Series 20 @1:0.0, Cypress 850MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=75002911u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
Error: aclBinary init failure

".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^


OpenCL compilation in 2674 ms, with "-I. -cl-fast-relaxed-math -DEXP=75002911u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 75002911 (17.88 bits/word) [2019-01-08 09:52:45]
Starting at iteration 0
OK 0 / 75002911 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [09:53:01]
EE 1000 / 75002911 [ 0.00%], 27.83 ms/it; ETA 24d 03:42; 463de8cc34b3766c [09:53:44]
EE 1000 / 75002911 [ 0.00%], 27.84 ms/it; ETA 24d 03:55; 463de8cc34b3766c [09:54:27] (1 errors)
EE 1000 / 75002911 [ 0.00%], 27.82 ms/it; ETA 24d 03:38; 463de8cc34b3766c [09:55:11] (2 errors)
EE 1000 / 75002911 [ 0.00%], 27.84 ms/it; ETA 24d 04:01; 463de8cc34b3766c [09:55:54] (3 errors)
EE 1000 / 75002911 [ 0.00%], 27.82 ms/it; ETA 24d 03:41; 463de8cc34b3766c [09:56:37] (4 errors)
EE 1000 / 75002911 [ 0.00%], 27.83 ms/it; ETA 24d 03:53; 463de8cc34b3766c [09:57:20] (5 errors)
[/code]
If this was fixed at later gpuowl versions, then I'm out of luck for now.

kriesel 2019-01-08 23:19

[QUOTE=clarke;505337]Unfortunately, it seems like there is no OpenCL 2.0 for pre-GCN AMD cards.


Agreed, that card is very old and power-hungry (; but I'm not ready to upgrade it.
Yep, I've tried each and every Windows-version at that page and only 1.9 starts running with OCL 1.0/1.2. But it catches some error and doesn't move forward:
[code]
gpuOwL v1.9- GPU Mersenne primality checker
AMD Radeon HD 5800 Series 20 @1:0.0, Cypress 850MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=75002911u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
Error: aclBinary init failure

".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^


OpenCL compilation in 2674 ms, with "-I. -cl-fast-relaxed-math -DEXP=75002911u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 75002911 (17.88 bits/word) [2019-01-08 09:52:45]
Starting at iteration 0
OK 0 / 75002911 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [09:53:01]
EE 1000 / 75002911 [ 0.00%], 27.83 ms/it; ETA 24d 03:42; 463de8cc34b3766c [09:53:44]
EE 1000 / 75002911 [ 0.00%], 27.84 ms/it; ETA 24d 03:55; 463de8cc34b3766c [09:54:27] (1 errors)
EE 1000 / 75002911 [ 0.00%], 27.82 ms/it; ETA 24d 03:38; 463de8cc34b3766c [09:55:11] (2 errors)
EE 1000 / 75002911 [ 0.00%], 27.84 ms/it; ETA 24d 04:01; 463de8cc34b3766c [09:55:54] (3 errors)
EE 1000 / 75002911 [ 0.00%], 27.82 ms/it; ETA 24d 03:41; 463de8cc34b3766c [09:56:37] (4 errors)
EE 1000 / 75002911 [ 0.00%], 27.83 ms/it; ETA 24d 03:53; 463de8cc34b3766c [09:57:20] (5 errors)
[/code]If this was fixed at later gpuowl versions, then I'm out of luck for now.[/QUOTE]
It's not broken, it's just your setup is incompatible.
[CODE]gpuOwL v1.9- GPU Mersenne primality checker
Radeon 500 Series 8 @f:0.0, gfx804 1203MHz

OpenCL compilation in 2147 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 76812401 (18.31 bits/word) [2018-01-23 12:43:49 Central Standard Time]
Starting at iteration 25373000
OK 25373000 / 76812401 [33.03%], 0.00 ms/it; ETA 0d 00:00; 6d6a6ebc97092826 [12:43:57]
OK 25374000 / 76812401 [33.03%], 11.97 ms/it; ETA 7d 03:05; bb937b8a48c69d60 [12:44:17]
OK 25375000 / 76812401 [33.03%], 11.97 ms/it; ETA 7d 03:04; b81a6f51602c2bd8 [12:44:36]
OK 25380000 / 76812401 [33.04%], 11.96 ms/it; ETA 7d 02:50; 60bcb33b85922094 [12:45:44]
OK 25390000 / 76812401 [33.05%], 12.00 ms/it; ETA 7d 03:26; 516093b7988f8ac4 [12:47:52]
OK 25400000 / 76812401 [33.07%], 12.00 ms/it; ETA 7d 03:23; 5313239afe8bcffe [12:50:00]
OK 25420000 / 76812401 [33.09%], 12.00 ms/it; ETA 7d 03:20; d04bc7fd72b07e36 [12:54:07]
OK 25440000 / 76812401 [33.12%], 12.00 ms/it; ETA 7d 03:15; 679e6f34ac35a983 [/CODE]


All times are UTC. The time now is 23:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.