mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2018-08-15 16:01

NVIDIA & OpenCL GpuOwL
 
Just for kicks, tried the executable from post 581 on an NVIDIA GTX1060 3GB. To probably no one's surprise, it didn't get far.

[CODE]C:\users\ken\documents\gpuowl-cuda-build>openowl-V36-v20180810-W64.exe -h
gpuowl-OpenCL 3.6-v20180810

Command line options:

-user <name> : specify the user name.
-cpu <name> : specify the hardware name.
-time : display kernel profiling information.
-fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1.
-block 100|200|400 : select PRP-check block size. Smaller block is slower but detects errors earlier.
-carry long|short : force carry type. Short carry may be faster, but requires high bits/word.
-list fft : display a list of available FFT configurations.
-device <N> : select a specific device:
0 : GeForce GTX 1060 3GB-9x1771-
1 : Quadro 2000-4x1251-
2 : Quadro 4000-8x 950-

C:\users\ken\documents\gpuowl-cuda-build>set gpuowl$exe=openowl-V36-v20180810-W64.exe

C:\users\ken\documents\gpuowl-cuda-build>openowl-V36-v20180810-W64.exe
gpuowl-OpenCL 3.6-v20180810
FFT 36864K: Width 1024 (256x4), Height 2048 (256x8), Middle 9; 9.96 bits/word
Note: using long carry kernels
GeForce GTX 1060 3GB-9x1771-
OpenCL compilation error -11 (args -DEXP=376059389u -DWIDTH=1024u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 )
In file included from <kernel>:1:
./gpuowl.cl:540:51: error: 'long long' type is not supported
long reduce36(long x) { return (x >> 36) + (x & ((1ull << 36) - 1)); }
^
./gpuowl.cl:582:11: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
atom_add(&localRes36, reduce36(res36));
^
./gpuowl.cl:585:26: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
if (me == 0) { atom_add(&out[outPos], localRes36); }
^

terminate called after throwing an instance of 'char const*'

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.[/CODE]

henryzz 2018-08-15 18:43

That looks more fixable than previous errors.

preda 2018-08-16 12:42

GpuOwl v3.7 with TF
 
Some basic TF (trial factoring) is now integrated in GpuOwl (OpenCL only initially), in v3.7.

For now the TF part is only known to work with ROCm 1.8.2, as it makes use of some specific extension there.

TF work can be triggered in two ways:
1. with TF lines in worktodo.txt like:
Factor=332298607,76,77
Factor=AID_REMOVED,332298607,76,77

2. with PRP lines in worktodo.txt, in these conditions:
- the PRP line contains a "factored up to bit" value, e.g.:
PRP=AID_REMOVED,1,2,332345953,-1,79,0,3,
(the up-to-bit is 79).

- on the command line, the "-tf 0" is used to indicate "opt-in" to TF.

In this case, the program estimates a "target TF bit level", and attempts to TF up to that bit-level before the PRP.

By default, the "target TF bit level" is 81 for 332M exponents, 80 for 160M exponents and 79 for 80M exponents. This is what is used with "-tf 0". An offest other than 0 can be specified to change the "target" up or down, e.g.: "-tf -1" if 1-bit-less TF is desired.

To use TF, the new OpenCL file "tf.cl" must be next to the executable (where "gpuowl.cl" is now).

A new set of savefiles is created for TF with names ending in ".tf.owl". The TF can be stopped and will resume from savefile on restart.

The output/log also changed in 3.7, to uniformly prefix time and cpu-name to every output line.

axn 2018-08-16 13:10

[QUOTE=preda;494005]By default, the "target TF bit level" is 81 for 332M exponents, 80 for 160M exponents and 79 for 80M exponents. This is what is used with "-tf 0".[/QUOTE]
This ... looks wrong. 3 bits per doubling of exponent - that's the rule. So 81, 78, and 75 respectively.

SELROC 2018-08-16 13:17

[QUOTE=preda;494005]Some basic TF (trial factoring) is now integrated in GpuOwl (OpenCL only initially), in v3.7.

For now the TF part is only known to work with ROCm 1.8.2, as it makes use of some specific extension there.

TF work can be triggered in two ways:
1. with TF lines in worktodo.txt like:
Factor=332298607,76,77
Factor=AID_REMOVED,332298607,76,77

2. with PRP lines in worktodo.txt, in these conditions:
- the PRP line contains a "factored up to bit" value, e.g.:
PRP=AID_REMOVED,1,2,332345953,-1,79,0,3,
(the up-to-bit is 79).

- on the command line, the "-tf 0" is used to indicate "opt-in" to TF.

In this case, the program estimates a "target TF bit level", and attempts to TF up to that bit-level before the PRP.

By default, the "target TF bit level" is 81 for 332M exponents, 80 for 160M exponents and 79 for 80M exponents. This is what is used with "-tf 0". An offest other than 0 can be specified to change the "target" up or down, e.g.: "-tf -1" if 1-bit-less TF is desired.

To use TF, the new OpenCL file "tf.cl" must be next to the executable (where "gpuowl.cl" is now).

A new set of savefiles is created for TF with names ending in ".tf.owl". The TF can be stopped and will resume from savefile on restart.

The output/log also changed in 3.7, to uniformly prefix time and cpu-name to every output line.[/QUOTE]


Meanwhile you Fix TF for amdgpu, thanks for fixing mfakto, now it runs well on Debian.

preda 2018-08-16 13:23

[QUOTE=axn;494008]This ... looks wrong. 3 bits per doubling of exponent - that's the rule. So 81, 78, and 75 respectively.[/QUOTE]

Yes, right. I'll fix.

Let me explain why:
1-bit reduction in exponent, means 1 bit reduction in TF produces the same number of Ks (factor candidates, FC). *BUT* 1-bit reduction in exponent also means a 2-bit (i.e. four-fold) reduction in PRP time. So 3 bits in total.

preda 2018-08-16 14:00

[QUOTE=SELROC;494009]Meanwhile you Fix TF for amdgpu, thanks for fixing mfakto, now it runs well on Debian.[/QUOTE]

I just had a look at amdgpu-pro. Unfortunately it's not easy to adapt the TF code for it (it's not just a sub, it's all 128-bit not working). Thus I think staying with mfakto on amdgpu-pro is an option, until amdgpu adopts the newer compiler from ROCm.

SELROC 2018-08-16 14:04

[QUOTE=preda;494016]I just had a look at amdgpu-pro. Unfortunately it's not easy to adapt the TF code for it (it's not just a sub, it's all 128-bit not working). Thus I think staying with mfakto on amdgpu-pro is an option, until amdgpu adopts the newer compiler from ROCm.[/QUOTE]


I don't know what can be done here. Maybe copy code from mfakto ?

preda 2018-08-16 14:15

[QUOTE=SELROC;494018]I don't know what can be done here. Maybe copy code from mfakto ?[/QUOTE]

I think I don't want to do that. mfakto is good as is, in its own way, and I don't intend to duplicate it.

OTOH I tried to get a really neat&tight TF implem in GpuOwl (which is now at 400 lines, the tf.cl).

SELROC 2018-08-16 14:28

[QUOTE=preda;494019]I think I don't want to do that. mfakto is good as is, in its own way, and I don't intend to duplicate it.

OTOH I tried to get a really neat&tight TF implem in GpuOwl (which is now at 400 lines, the tf.cl).[/QUOTE]


Hugh sorry, I didn't intend to hurt you.



Absolutely. I am happy as is and you should be too :-)

Mark Rose 2018-08-16 15:08

May I also suggest supporting the format:

Factor=N/A,332298607,76,77

It's supported by other tools.


All times are UTC. The time now is 23:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.