mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2019-05-20 13:30

[QUOTE=SELROC;517236]With new version there is remarkable speedup on 332M exponent !


Went from 4.13 ms/sq to 3.7 ms/sq


Good !


I did change the FFT however, from -fft +2 to normal fft without arguments. -fft +2 now fails to load.[/QUOTE]

Could you please check to see if -fft +2 is fixed now for that exponent, thanks.
(I think I introduced a recent bug in the new fft8 primitive)

SELROC 2019-05-20 13:36

[QUOTE=preda;517254]Could you please check to see if -fft +2 is fixed now for that exponent, thanks.
(I think I introduced a recent bug in the new fft8 primitive)[/QUOTE]


Done. It works now, slower of course (4.16 ms/sq).

kracker 2019-05-22 01:53

Getting an error trying to build gpuowl (the usual, msys2/windows)


[code]
In file included from Gpu.cpp:3:
Gpu.h:70:30: error: static assertion failed: long is 64 bits
static_assert(sizeof(long) == 8, "long is 64 bits");
~~~~~~~~~~~~~^~~~
make: *** [Makefile:32: Gpu.o] Error 1
[/code]

SELROC 2019-05-22 06:38

Managing old checkpoint files
 
This is a bash script to remove old checkpoint files.
Needs one argument: number of days backwards.
Use with caution !


[url]https://github.com/valeriob01/Mersenne-gpu-computing-node/commit/a5190ba4a6d68f41a29a581ab9b888a8231d50b1#diff-beff0752a018bd57a24cb7bf3c9f6dd9[/url]

preda 2019-05-22 11:44

[QUOTE=kracker;517437]Getting an error trying to build gpuowl (the usual, msys2/windows)


[code]
In file included from Gpu.cpp:3:
Gpu.h:70:30: error: static assertion failed: long is 64 bits
static_assert(sizeof(long) == 8, "long is 64 bits");
~~~~~~~~~~~~~^~~~
make: *** [Makefile:32: Gpu.o] Error 1
[/code][/QUOTE]

OK fixed. Still an unusual compiler setup in this age, having long==int.

SELROC 2019-05-22 18:18

[QUOTE=SELROC;517449]This is a bash script to remove old checkpoint files.
Needs one argument: number of days backwards.
Use with caution !


[URL]https://github.com/valeriob01/Mersenne-gpu-computing-node/commit/a5190ba4a6d68f41a29a581ab9b888a8231d50b1#diff-beff0752a018bd57a24cb7bf3c9f6dd9[/URL][/QUOTE]


Enhancements and bugfixes: now we remove all files older than the number of days specified in the second argument. The first argument is the target directory.


[URL]https://github.com/valeriob01/Mersenne-gpu-computing-node/blob/master/remove_checkpoints.sh[/URL]




PS: arguments are both mandatory.

kriesel 2019-05-23 19:49

gpuowl v6.5-c48d46f head to head on NVIDIA GTX 1070 with CUDALucas 2.06beta May 5 2017
 
Good news, runs on Win7, x64, GTX1070, wavefront exponent prp
[CODE]>gpuowl-win -h
2019-05-23 13:34:54 gpuowl v6.5-c48d46f

Command line options:

-dir <folder> : specify work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log)
-user <name> : specify the user name.
-cpu <name> : specify the hardware name.
-time : display kernel profiling information.
-fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value> : PRP GEC block size. Default 1000. Smaller block is slower but detects errors sooner.
-log <step> : log every <step> iterations, default 20000. Multiple of 10000.
-carry long|short : force carry type. Short carry may be faster, but requires high bits/word.
-B1 : P-1 B1 bound, default 500000
-B2 : P-1 B2 bound, default B1 * 30
-rB2 : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set
-prp <exponent> : run a single PRP test and exit, ignoring worktodo.txt
-pm1 <exponent> : run a single P-1 test and exit, ignoring worktodo.txt
-results <file> : name of results file, default 'results.txt'
-device <N> : select a specific device:
0 : GeForce GTX 1070-15x1708-
1 : Quadro 2000-4x1251-
2 : GeForce GTX 1050 Ti-6x1468-

FFT Configurations:
FFT 8K [ 0.01M - 0.18M] 64-64
FFT 32K [ 0.05M - 0.68M] 64-256 256-64
FFT 48K [ 0.07M - 1.01M] 64-64-6
FFT 64K [ 0.10M - 1.34M] 64-512 512-64
FFT 72K [ 0.11M - 1.50M] 64-64-9
FFT 80K [ 0.12M - 1.66M] 64-64-10
FFT 128K [ 0.20M - 2.63M] 1K-64 64-1K 256-256
FFT 192K [ 0.29M - 3.91M] 64-256-6 256-64-6
FFT 256K [ 0.39M - 5.18M] 64-2K 256-512 512-256 2K-64
FFT 288K [ 0.44M - 5.81M] 64-256-9 256-64-9
FFT 320K [ 0.49M - 6.44M] 64-256-10 256-64-10
FFT 384K [ 0.59M - 7.69M] 64-512-6 512-64-6
FFT 512K [ 0.79M - 10.18M] 1K-256 256-1K 512-512 4K-64
FFT 576K [ 0.88M - 11.42M] 64-512-9 512-64-9
FFT 640K [ 0.98M - 12.66M] 64-512-10 512-64-10
FFT 768K [ 1.18M - 15.12M] 1K-64-6 64-1K-6 256-256-6
FFT 1M [ 1.57M - 20.02M] 1K-512 256-2K 512-1K 2K-256
FFT 1152K [ 1.77M - 22.45M] 1K-64-9 64-1K-9 256-256-9
FFT 1280K [ 1.97M - 24.88M] 1K-64-10 64-1K-10 256-256-10
FFT 1536K [ 2.36M - 29.72M] 64-2K-6 256-512-6 512-256-6 2K-64-6
FFT 2M [ 3.15M - 39.34M] 1K-1K 512-2K 2K-512 4K-256
FFT 2304K [ 3.54M - 44.13M] 64-2K-9 256-512-9 512-256-9 2K-64-9
FFT 2560K [ 3.93M - 48.90M] 64-2K-10 256-512-10 512-256-10 2K-64-10
FFT 3M [ 4.72M - 58.41M] 1K-256-6 256-1K-6 512-512-6 4K-64-6
FFT 4M [ 6.29M - 77.30M] 1K-2K 2K-1K 4K-512
FFT 4608K [ 7.08M - 86.70M] 1K-256-9 256-1K-9 512-512-9 4K-64-9
FFT 5M [ 7.86M - 96.07M] 1K-256-10 256-1K-10 512-512-10 4K-64-10
FFT 6M [ 9.44M - 114.74M] 1K-512-6 256-2K-6 512-1K-6 2K-256-6
FFT 8M [ 12.58M - 151.83M] 2K-2K 4K-1K
FFT 9M [ 14.16M - 170.28M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9
FFT 10M [ 15.73M - 188.68M] 1K-512-10 256-2K-10 512-1K-10 2K-256-10
FFT 12M [ 18.87M - 225.32M] 1K-1K-6 512-2K-6 2K-512-6 4K-256-6
FFT 16M [ 25.17M - 298.13M] 4K-2K
FFT 18M [ 28.31M - 334.34M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9
FFT 20M [ 31.46M - 370.44M] 1K-1K-10 512-2K-10 2K-512-10 4K-256-10
FFT 24M [ 37.75M - 442.34M] 1K-2K-6 2K-1K-6 4K-512-6
FFT 36M [ 56.62M - 656.22M] 1K-2K-9 2K-1K-9 4K-512-9
FFT 40M [ 62.91M - 727.03M] 1K-2K-10 2K-1K-10 4K-512-10
FFT 48M [ 75.50M - 868.07M] 2K-2K-6 4K-1K-6
FFT 72M [113.25M - 1287.53M] 2K-2K-9 4K-1K-9
FFT 80M [125.83M - 1426.38M] 2K-2K-10 4K-1K-10
FFT 96M [150.99M - 1702.92M] 4K-2K-6
FFT 144M [226.49M - 2525.23M] 4K-2K-9
FFT 160M [251.66M - 2797.39M] 4K-2K-10
2019-05-23 13:34:54 Exiting because "help"
2019-05-23 13:34:54 Bye

>gpuowl-win
2019-05-23 13:36:03 gpuowl v6.5-c48d46f
2019-05-23 13:36:03 Note: no config.txt file found
2019-05-23 13:36:03 85389763 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 18.10 bits/word
2019-05-23 13:36:03 using short carry kernels
2019-05-23 13:36:05

2019-05-23 13:36:05 OpenCL compilation in 1840 ms, with "-DEXP=85389763u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-23 13:36:06 85389763.owl not found, starting from the beginning.
2019-05-23 13:36:30 85389763 OK 2000 0.00%; 5.55 ms/sq; ETA 5d 11:32; 13fdc384f649745f (check 5.82s)
2019-05-23 13:38:13 85389763 20000 0.02%; 5.74 ms/sq; ETA 5d 16:14; dc66a97eaafc3e4d
2019-05-23 13:40:11 85389763 40000 0.05%; 5.90 ms/sq; ETA 5d 19:50; ff5be2560bfd9c09
2019-05-23 13:42:09 85389763 60000 0.07%; 5.87 ms/sq; ETA 5d 19:11; 81b3341edfd7a610
2019-05-23 13:44:06 85389763 80000 0.09%; 5.89 ms/sq; ETA 5d 19:33; 181394a870cfcf3b
2019-05-23 13:44:12 Stopping, please wait..
2019-05-23 13:44:18 85389763 OK 81000 0.09%; 5.88 ms/sq; ETA 5d 19:22; a8835bb1f12323ed (check 6.05s)
2019-05-23 13:44:18 Exiting because "stop requested"
2019-05-23 13:44:18 Bye
Terminate batch job (Y/N)? n
[/CODE]But the time to beat is ~5.64ms/iter at 4608K delivered by CUDALucas on the same gpu in April. Try some variations in gpuowl.[CODE]>gpuowl-win -device 0 -carry long
2019-05-23 13:44:22 gpuowl v6.5-c48d46f
2019-05-23 13:44:22 Note: no config.txt file found
2019-05-23 13:44:22 config: -device 0 -carry long
2019-05-23 13:44:22 85389763 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 18.10 bits/word
2019-05-23 13:44:22 using long carry kernels
2019-05-23 13:44:23

2019-05-23 13:44:23 OpenCL compilation in 31 ms, with "-DEXP=85389763u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-23 13:44:23 85389763.owl loaded: k 81000, block 1000, res64 a8835bb1f12323ed
2019-05-23 13:44:49 85389763 OK 83000 0.10%; 6.15 ms/sq; ETA 6d 01:49; a3eb982ed7fa14bc (check 6.40s)
2019-05-23 13:46:36 85389763 100000 0.12%; 6.27 ms/sq; ETA 6d 04:34; d543b380d35e0511
2019-05-23 13:48:41 85389763 120000 0.14%; 6.24 ms/sq; ETA 6d 03:52; ab2fa867f9ec0f95
2019-05-23 13:50:46 85389763 140000 0.16%; 6.26 ms/sq; ETA 6d 04:12; f6315071c1d3da26
2019-05-23 13:50:52 Stopping, please wait..
2019-05-23 13:50:59 85389763 OK 141000 0.17%; 6.26 ms/sq; ETA 6d 04:07; a7bae12ec4a6302e (check 6.40s)
2019-05-23 13:50:59 Exiting because "stop requested"
2019-05-23 13:50:59 Bye
Terminate batch job (Y/N)? n

>gpuowl-win -device 0 -carry short -fft +1
2019-05-23 13:51:04 gpuowl v6.5-c48d46f
2019-05-23 13:51:04 Note: no config.txt file found
2019-05-23 13:51:04 config: -device 0 -carry short -fft +1
2019-05-23 13:51:04 85389763 FFT 4608K: Width 64x4, Height 256x4, Middle 9; 18.10 bits/word
2019-05-23 13:51:04 using short carry kernels
2019-05-23 13:51:06

2019-05-23 13:51:06 OpenCL compilation in 1794 ms, with "-DEXP=85389763u -DWIDTH=256u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-23 13:51:06 85389763.owl loaded: k 141000, block 1000, res64 a7bae12ec4a6302e
2019-05-23 13:51:31 85389763 OK 143000 0.17%; 5.72 ms/sq; ETA 5d 15:34; 7294fbceb5f99113 (check 6.02s)
2019-05-23 13:53:11 85389763 160000 0.19%; 5.88 ms/sq; ETA 5d 19:16; 545347192c50295f
2019-05-23 13:55:08 85389763 180000 0.21%; 5.88 ms/sq; ETA 5d 19:12; f74830bf9143037a
2019-05-23 13:57:06 85389763 200000 0.23%; 5.88 ms/sq; ETA 5d 19:06; 0f5ddafcc6d3a01d
2019-05-23 13:57:18 Stopping, please wait..
2019-05-23 13:57:24 85389763 OK 202000 0.24%; 5.87 ms/sq; ETA 5d 18:58; 9e2cc389e8016958 (check 6.08s)
2019-05-23 13:57:24 Exiting because "stop requested"
2019-05-23 13:57:24 Bye
Terminate batch job (Y/N)? y

>gpuowl-win -device 0 -carry short -fft +2
2019-05-23 13:57:27 gpuowl v6.5-c48d46f
2019-05-23 13:57:27 Note: no config.txt file found
2019-05-23 13:57:27 config: -device 0 -carry short -fft +2
2019-05-23 13:57:27 85389763 FFT 4608K: Width 64x8, Height 64x8, Middle 9; 18.10 bits/word
2019-05-23 13:57:27 using short carry kernels
2019-05-23 13:57:30

2019-05-23 13:57:30 OpenCL compilation in 2464 ms, with "-DEXP=85389763u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-23 13:57:31 85389763.owl loaded: k 202000, block 1000, res64 9e2cc389e8016958
2019-05-23 13:57:54 85389763 OK 204000 0.24%; 5.51 ms/sq; ETA 5d 10:29; 644b67dc40432b6f (check 5.77s)
2019-05-23 13:59:25 85389763 220000 0.26%; 5.63 ms/sq; ETA 5d 13:18; d4b0d2a05763b3be
2019-05-23 14:01:17 85389763 240000 0.28%; 5.63 ms/sq; ETA 5d 13:10; 17982e053950f51d
2019-05-23 14:03:10 85389763 260000 0.30%; 5.63 ms/sq; ETA 5d 13:11; 875dfe35f006aa26
2019-05-23 14:05:02 85389763 280000 0.33%; 5.63 ms/sq; ETA 5d 13:08; e0054e499c4e2534
2019-05-23 14:05:14 Stopping, please wait..
2019-05-23 14:05:20 85389763 OK 282000 0.33%; 5.63 ms/sq; ETA 5d 13:07; 527f9626965b7b86 (check 5.91s)
2019-05-23 14:05:20 Exiting because "stop requested"
2019-05-23 14:05:20 Bye
Terminate batch job (Y/N)? y
[/CODE]That last one above is competitive, 5.63 vs ~5.64. On to -fft +3, the last choice.[CODE]
>gpuowl-win -device 0 -carry short -fft +3
2019-05-23 14:05:23 gpuowl v6.5-c48d46f
2019-05-23 14:05:23 Note: no config.txt file found
2019-05-23 14:05:23 config: -device 0 -carry short -fft +3
2019-05-23 14:05:23 85389763 FFT 4608K: Width 512x8, Height 8x8, Middle 9; 18.10 bits/word
2019-05-23 14:05:23 using short carry kernels
2019-05-23 14:05:26

2019-05-23 14:05:26 OpenCL compilation in 2355 ms, with "-DEXP=85389763u -DWIDTH=4096u -DSMALL_HEIGHT=64u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-23 14:05:27 85389763.owl loaded: k 282000, block 1000, res64 527f9626965b7b86
2019-05-23 14:05:27 Exception 9gpu_error: MEM_OBJECT_ALLOCATION_FAILURE tailFused at clwrap.cpp:284 run
2019-05-23 14:05:27 Bye[/CODE]Oops, fft 4608K +3 failed to run. Can't tell if it would be faster.
Time CUDALucas (already tuned) on same gpu and thermal situation now:
[CODE]Continuing M85299667 @ iteration 17902 with fft length 4608K, 0.02% done

| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| May 23 14:20:46 | M85299667 50000 0x1dd17e29d1464496 | 4608K 0.29102 5.7498 184.55s | 5:16:10:11 0.05% |
| May 23 14:25:35 | M85299667 100000 0x1f5024fc1ad5626b | 4608K 0.28125 5.7873 289.36s | 5:16:31:41 0.11% |
| May 23 14:30:25 | M85299667 150000 0x35e5c02539884305 | 4608K 0.30469 5.7849 289.24s | 5:16:34:31 0.17% |
[/CODE]CUDALucas tuned, for the same gtx1070 gpu, in current thermal environment, is running ~5.786 ms/iteration. Power usage in CUDALucas was 120W, vs 114W in gpuowl, as indicated by nvidia-smi. So the best speed was gpuowl -fft +2, ~2.8% faster than CUDALucas per iteration, 5% less power draw, plus we get about another 2% savings on time and energy by avoiding some triple checks with the Gerbicz check reliability.

kriesel 2019-05-24 03:34

[QUOTE=GP2;512503]You can find some PRPs that need DC from the following users:

Warning: not all are type 4. There's no way to filter by residue type, although you can click on the "Residue Type" column header to sort by it.

Also, I think gpuOwL only produces residues with shift count zero, so the double check will also have shift count zero, and some might insist that it's not a proper double check unless it's with a different shift count.
[LIST][*][URL="https://www.mersenne.org/report_prp/?exp_lo=82000000&exp_hi=999999999&exp_date=&end_date=&user_only=1&user_id=Mihai+Preda&exdchk=1&dispdate=1&B1="]Mihai Preda[/URL][*][URL="https://www.mersenne.org/report_prp/?exp_lo=82000000&exp_hi=999999999&exp_date=&end_date=&user_only=1&user_id=Kriesel&exdchk=1&dispdate=1&exbad=1&exfactor=1&B1="]Kriesel[/URL][*][URL="https://www.mersenne.org/report_prp/?exp_lo=82000000&exp_hi=999999999&exp_date=&end_date=&user_only=1&user_id=kwe5ykdf&exdchk=1&dispdate=1&exbad=1&exfactor=1&B1="]kwe5ykdf[/URL][*][URL="https://www.mersenne.org/report_prp/?exp_lo=82000000&exp_hi=999999999&exp_date=&end_date=&user_only=1&user_id=tServo&exdchk=1&dispdate=1&exbad=1&exfactor=1&B1="]tServo[/URL][*][URL="https://www.mersenne.org/report_prp/?exp_lo=82000000&exp_hi=999999999&exp_date=&end_date=&user_only=1&user_id=Franklin+Webber&exdchk=1&dispdate=1&exbad=1&exfactor=1&B1="]Franklin Webber[/URL][*][URL="https://www.mersenne.org/report_prp/?exp_lo=82000000&exp_hi=999999999&exp_date=&end_date=&user_only=1&user_id=Xebecer&exdchk=1&dispdate=1&exbad=1&exfactor=1&B1="]Xebecer[/URL][*][URL="https://www.mersenne.org/report_prp/?exp_lo=82000000&exp_hi=999999999&exp_date=&end_date=&user_only=1&user_id=xx005fs&exdchk=1&dispdate=1&exbad=1&exfactor=1&B1="]xx005fs[/URL][*][URL="https://www.mersenne.org/report_prp/?exp_lo=82000000&exp_hi=999999999&exp_date=&end_date=&user_only=1&user_id=SEL-ROC&exdchk=1&dispdate=1&exbad=1&exfactor=1&B1="]SEL-ROC[/URL][*][URL="https://www.mersenne.org/report_prp/?exp_lo=82000000&exp_hi=999999999&exp_date=&end_date=&user_only=1&user_id=kracker&exdchk=1&dispdate=1&exbad=1&exfactor=1&B1="]kracker[/URL][/LIST][/QUOTE]
If contemplating PRP DC via gpuowl, stay away from mine that were performed "Manual" rather than some bird species/computer name. Manual PRP up to this point are gpuowl, which will have zero offset and so are not suitable for DC with gpuowl zero offset again. The birds' PRP output are from prime95, so fair game.

SELROC 2019-05-24 07:55

[QUOTE=kriesel;517597]If contemplating PRP DC via gpuowl, stay away from mine that were performed "Manual" rather than some bird species/computer name. Manual PRP up to this point are gpuowl, which will have zero offset and so are not suitable for DC with gpuowl zero offset again. The birds' PRP output are from prime95, so fair game.[/QUOTE]


I think I said it already that DC needs a different program.

kriesel 2019-05-24 17:10

P-1 fast fail on GTX1070
 
Two attempts to run a P-1 with specified B1, B2 on a NVIDIA GTX 1070 gpu (on Win 7 x64) that previously had successfully run several PRP test conditions, failed immediately.[CODE]>gpuowl-win -device 0 -carry long -fft +0
2019-05-24 12:01:46 gpuowl v6.5-c48d46f
2019-05-24 12:01:46 Note: no config.txt file found
2019-05-24 12:01:46 config: -device 0 -carry long -fft +0
2019-05-24 12:01:46 91538501 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.46 bits/word
2019-05-24 12:01:46 using long carry kernels
2019-05-24 12:01:48

2019-05-24 12:01:48 OpenCL compilation in 1856 ms, with "-DEXP=91538501u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-24 12:01:50 Exception 9gpu_error: INVALID_VALUE clGetDeviceInfo(id, what, bufSize, buf, NULL) at clwrap.cpp:98 getInfo
2019-05-24 12:01:50 Bye

>gpuowl-win -device 0
2019-05-24 12:03:09 gpuowl v6.5-c48d46f
2019-05-24 12:03:09 Note: no config.txt file found
2019-05-24 12:03:09 config: -device 0
2019-05-24 12:03:09 91538501 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.46 bits/word
2019-05-24 12:03:09 using short carry kernels
2019-05-24 12:03:10

2019-05-24 12:03:10 OpenCL compilation in 15 ms, with "-DEXP=91538501u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-24 12:03:11 Exception 9gpu_error: INVALID_VALUE clGetDeviceInfo(id, what, bufSize, buf, NULL) at clwrap.cpp:98 getInfo
2019-05-24 12:03:11 Bye[/CODE]
Seems like there ought to be ", " or something similar between "Exception 9" and the rest of the error message.

kriesel 2019-05-24 17:20

Quadro 2000 / OpenCl v1.1 not enough apparently
 
gpuowl-win v6.5-c48d46f does not like the Quadro 2000's CUDA compute capability 2.1, Opencl v1.1 indicated in GPU-Z
Same prompt crash on compile opencl kernel happens on -carry short and -fft +1, +2, +3.
GTX10xx I've tried run PRP, and are Opencl v1.2 indicated in GPU-Z.
[CODE]>gpuowl-win -device 1 -carry long -fft +0
2019-05-23 19:14:42 gpuowl v6.5-c48d46f
2019-05-23 19:14:42 Note: no config.txt file found
2019-05-23 19:14:42 config: -device 1 -carry long -fft +0
2019-05-23 19:14:42 85389763 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 18.10 bits/word
2019-05-23 19:14:42 using long carry kernels
2019-05-23 19:14:42 OpenCL compilation error -11 (args -DEXP=85389763u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-05-23 19:14:42 <kernel>:778:44: error: use of undeclared identifier 'memory_scope_device'
work_group_barrier(CLK_GLOBAL_MEM_FENCE, memory_scope_device);
^
<kernel>:787:44: error: use of undeclared identifier 'memory_scope_device'
work_group_barrier(CLK_GLOBAL_MEM_FENCE, memory_scope_device);
^
<kernel>:825:5: warning: implicit declaration of function 'atomic_store_explicit' is invalid in C99
atomic_store_explicit((atomic_uint *) &ready[gr], 1, memory_order_release, memory_scope_device);
^
<kernel>:825:28: error: use of undeclared identifier 'atomic_uint'
atomic_store_explicit((atomic_uint *) &ready[gr], 1, memory_order_release, memory_scope_device);
^
<kernel>:825:41: error: expected expression
atomic_store_explicit((atomic_uint *) &ready[gr], 1, memory_order_release, memory_scope_device);
^
<kernel>:832:12: warning: implicit declaration of function 'atomic_load_explicit' is invalid in C99
while(!atomic_load_explicit((atomic_uint *) &ready[gr - 1], memory_order_acquire, memory_scope_device));
^
<kernel>:832:34: error: use of undeclared identifier 'atomic_uint'
while(!atomic_load_explicit((atomic_uint *) &ready[gr - 1], memory_order_acquire, memory_scope_device));
^
<kernel>:832:47: error: expected expression
while(!atomic_load_explicit((atomic_uint *) &ready[gr - 1], memory_order_acquire, memory_scope_device));
^
<kernel>:881:28: error: use of undeclared identifier 'atomic_uint'
atomic_store_explicit((atomic_uint *) &ready[gr], 1, memory_order_release, memory_scope_device);
^
<kernel>:881:41: error: expected expression
atomic_store_explicit((atomic_uint *) &ready[gr], 1, memory_order_release, memory_scope_device);
^
<kernel>:888:34: error: use of undeclared identifier 'atomic_ui2019-05-23 19:14:42 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:220 build
2019-05-23 19:14:42 Bye[/CODE]


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.