mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2020-04-11 09:18

[QUOTE=kracker;542292]Tried to submit an LL result, got "Did not understand 1 lines."

{"exponent":"54907981", "worktype":"LL", "status":"C", "program":{"name":"gpuowl", "version":"v6.11-252-gaf403e2"}, "timestamp":"2020-04-10 14:05:02 UTC", "user":"kracker", "computer":"core", "aid":"xxxxxxxxxx", "fft-length":3145728, "res64":"xxxxxxxxxxxxx", "offset":0}[/QUOTE]

Please replace "offset" with "shift-count" and re-submit the result -- it should be accepted after this change.

This same change has been comitted to gpuowl, so this should be fixed after a re-checkout.

kriesel 2020-04-11 14:03

[QUOTE=kriesel;542296]Latest commit build, build log, help output, etc.[/QUOTE]
v6.11-255 on Win7 x64, RX550 did not like the default fft at all. +1 etc syntax is apparently gone and if used, gpuowl fails in an interesting way. A quick read of the help output set it right and on its way with the second fft specification for the fft length.
[CODE]C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>title gpuowl-v6.11-255-g81fa7c3/rx550

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>gpuowl-win
2020-04-10 12:09:43 gpuowl v6.11-255-g81fa7c3
2020-04-10 12:09:43 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM
2020-04-10 12:09:43 device 1, unique id ''
2020-04-10 12:09:43 condorella/rx550 94741139 FFT: 5M 1K:10:256 (18.07 bpw)
2020-04-10 12:09:43 condorella/rx550 Expected maximum carry32: 461E0000
2020-04-10 12:09:46 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.3cd1fc041
1148p-3 -DIWEIGHT_STEP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1
-DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-10 12:09:53 condorella/rx550 OpenCL compilation in 6.96 s
2020-04-10 12:10:09 condorella/rx550 94741139 EE 0 loaded: blockSize 400, 0000000000000000 (expected 0000000000000003)
2020-04-10 12:10:09 condorella/rx550 Exiting because "error on load"
2020-04-10 12:10:09 condorella/rx550 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>title gpuowl-v6.11-255-g81fa7c3/rx550

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>gpuowl-win
2020-04-10 12:10:51 gpuowl v6.11-255-g81fa7c3
2020-04-10 12:10:51 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM -fft +1
2020-04-10 12:10:51 device 1, unique id ''
2020-04-10 12:10:51 condorella/rx550 94741139 FFT: 128K 256:1:256 (722.82 bpw)
2020-04-10 12:10:51 condorella/rx550 FFT size too small for exponent (722.82 bits/word).
2020-04-10 12:10:51 condorella/rx550 Exiting because "FFT size too small"
2020-04-10 12:10:51 condorella/rx550 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>title gpuowl-v6.11-255-g81fa7c3/rx550

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>gpuowl-win
2020-04-10 12:12:45 gpuowl v6.11-255-g81fa7c3
2020-04-10 12:12:45 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM -fft 1K:5:512
2020-04-10 12:12:45 device 1, unique id ''
2020-04-10 12:12:45 condorella/rx550 94741139 FFT: 5M 1K:5:512 (18.07 bpw)
2020-04-10 12:12:45 condorella/rx550 Expected maximum carry32: 461E0000
2020-04-10 12:12:47 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=5u -DWEIGHT_STEP=0xf.3cd1fc0411
148p-3 -DIWEIGHT_STEP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -
DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-10 12:12:55 condorella/rx550 OpenCL compilation in 8.18 s
2020-04-10 12:13:02 condorella/rx550 94741139 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-10 12:13:19 condorella/rx550 94741139 OK 800 0.00%; 14229 us/it; ETA 15d 14:28; 738c4e015132f834 (check 5.86s)
2020-04-10 13:00:54 condorella/rx550 94741139 OK 200000 0.21%; 14317 us/it; ETA 15d 15:59; e0463c77c58b0105 (check 5.87s)
2020-04-10 13:48:40 condorella/rx550 94741139 OK 400000 0.42%; 14319 us/it; ETA 15d 15:14; 5b1fe09cbecb5e40 (check 5.89s)
2020-04-10 14:36:27 condorella/rx550 94741139 OK 600000 0.63%; 14321 us/it; ETA 15d 14:29; 5f62cf32c024e1a2 (check 5.87s)
2020-04-10 15:24:15 condorella/rx550 94741139 OK 800000 0.84%; 14322 us/it; ETA 15d 13:44; 3dd122479d7dde25 (check 5.88s)
2020-04-10 16:12:02 condorella/rx550 94741139 OK 1000000 1.06%; 14319 us/it; ETA 15d 12:52; e44ae2f6c9046662 (check 5.87s)
2020-04-10 16:59:49 condorella/rx550 94741139 OK 1200000 1.27%; 14320 us/it; ETA 15d 12:06; b3a0108ad221f8fd (check 5.88s)
2020-04-10 17:47:36 condorella/rx550 94741139 OK 1400000 1.48%; 14319 us/it; ETA 15d 11:17; 6077a7f20c7ee45c (check 5.88s)
2020-04-10 17:49:53 condorella/rx550 Stopping, please wait..
2020-04-10 17:50:05 condorella/rx550 94741139 OK 1410000 1.49%; 14328 us/it; ETA 15d 11:28; e02e0d0dca18d9f5 (check 5.87s)
2020-04-10 17:50:05 condorella/rx550 Exiting because "stop requested"
2020-04-10 17:50:05 condorella/rx550 Bye[/CODE]

LaurV 2020-04-11 14:26

[QUOTE=kriesel;542296]Latest commit build, build log, help output, etc.[/QUOTE]
Could you (or kracker) please rebuild with the last change from preda, and repost?

(I am not yet able to build gpuowl, I mean, I didn't try yet, but I will give it few tests as long as it can LL).

ATH 2020-04-11 14:43

[QUOTE=preda;542348]Please replace "offset" with "shift-count" and re-submit the result -- it should be accepted after this change.

This same change has been comitted to gpuowl, so this should be fixed after a re-checkout.[/QUOTE]

Thanks, that worked. 2 successful double checks from gpuowl:
[M]83174053[/M]
[M]83180563[/M]

kriesel 2020-04-11 16:04

Win7 x64 build of gpuowl v6.11-257
 
2 Attachment(s)
Latest available commit as of ~12 minutes before this post. Usual shower of warning in the build log; help output included; no testing performed. Enjoy, and please report here any issues.

kracker 2020-04-11 17:07

1 Attachment(s)
Just now, I made the very stupid mistake of not checking a few DC residues before submitting a batch... :sorry:
I can redo them - or whatever is best.
Nvidia P100 in colab.
gpuowl v6.11-252-gaf403e2
OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8
[code]
51509873
51491101
51491059
51490883
51490843
51491267
51491119
51509257
51490799
51490723
51490343
51490339
51508747
58650941
51488837
51491983
51491773
51491731
[/code]

preda 2020-04-11 21:15

It seems the problem is associated with the setup
[QUOTE]
OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8
[/QUOTE]

Did this setup work for another exponent?

One way to check whether the FFT is broken is to run a few PRP iterations before starting the LL, e.g.
./gpuowl -prp 51509873

[QUOTE=kracker;542369]Just now, I made the very stupid mistake of not checking a few DC residues before submitting a batch... :sorry:
I can redo them - or whatever is best.
Nvidia P100 in colab.
gpuowl v6.11-252-gaf403e2
OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8
[code]
51509873
51491101
51491059
51490883
51490843
51491267
51491119
51509257
51490799
51490723
51490343
51490339
51508747
58650941
51488837
51491983
51491773
51491731
[/code][/QUOTE]

kracker 2020-04-12 02:17

with the previously set settings I'm getting an immediate EE... seems to work with no -use arguments.
[code]
/content/drive/My Drive/gpuowl-colab
2020-04-12 02:08:53 gpuowl v6.11-252-gaf403e2
2020-04-12 02:08:53 config: -user kracker -cpu pce
2020-04-12 02:08:53 config: -ll 51509873
2020-04-12 02:08:53 device 0, unique id ''
2020-04-12 02:08:53 pce 51509873 FFT: 2.75M 256:11:512 (17.86 bpw)
2020-04-12 02:08:53 pce Expected maximum carry32: 2B810000
2020-04-12 02:08:54 pce OpenCL args "-DEXP=51509873u -DWIDTH=256u -DSMALL_HEIGHT=512u -DMIDDLE=11u -DWEIGHT_STEP=0x1.19794ea80bcb4p+0 -DIWEIGHT_STEP=0x1.d1a9c3958d155p-1 -DWEIGHT_BIGSTEP=0x1.ae89f995ad3adp+0 -DIWEIGHT_BIGSTEP=0x1.306fe0a31b715p-1 -DPM1=0 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-12 02:08:57 pce

2020-04-12 02:08:57 pce OpenCL compilation in 2.80 s
2020-04-12 02:08:57 pce 51509873 LL 0 loaded: 0000000000000004
2020-04-12 02:09:48 pce 51509873 LL 100000 0.19%; 509 us/it; ETA 0d 07:16; d4bf953f17f5dd56
2020-04-12 02:10:15 pce Stopping, please wait..
2020-04-12 02:10:15 pce 51509873 LL 154000 0.30%; 510 us/it; ETA 0d 07:17; be98350bc1fe8687
2020-04-12 02:10:15 pce Exiting because "stop requested"
2020-04-12 02:10:15 pce Bye
[/code]

[code]

/content/drive/My Drive/gpuowl-colab
2020-04-12 02:12:19 gpuowl v6.11-252-gaf403e2
2020-04-12 02:12:19 config: -user kracker -cpu pce
2020-04-12 02:12:19 config: -use OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8 -ll 51509873
2020-04-12 02:12:19 device 0, unique id ''
2020-04-12 02:12:19 pce 51509873 FFT: 2.75M 256:11:512 (17.86 bpw)
2020-04-12 02:12:19 pce Expected maximum carry32: 2B810000
2020-04-12 02:12:19 pce OpenCL args "-DEXP=51509873u -DWIDTH=256u -DSMALL_HEIGHT=512u -DMIDDLE=11u -DWEIGHT_STEP=0x1.19794ea80bcb4p+0 -DIWEIGHT_STEP=0x1.d1a9c3958d155p-1 -DWEIGHT_BIGSTEP=0x1.ae89f995ad3adp+0 -DIWEIGHT_BIGSTEP=0x1.306fe0a31b715p-1 -DPM1=0 -DIN_SIZEX=8 -DIN_SPACING=8 -DOUT_SIZEX=16 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-12 02:12:19 pce

2020-04-12 02:12:19 pce OpenCL compilation in 0.01 s
2020-04-12 02:12:19 pce 51509873 LL 0 loaded: 0000000000000004
2020-04-12 02:13:09 pce 51509873 LL 100000 0.19%; 496 us/it; ETA 0d 07:05; a2891146b3ded4b9
2020-04-12 02:13:16 pce Stopping, please wait..
2020-04-12 02:13:17 pce 51509873 LL 115000 0.22%; 502 us/it; ETA 0d 07:10; 42848d9cb649a731
2020-04-12 02:13:17 pce Exiting because "stop requested"
2020-04-12 02:13:17 pce Bye
[/code]

ATH 2020-04-13 22:44

I created a script to test the speed of a bunch of combinations of the OUT_WG,OUT_SIZEX,OUT_SPACING,IN_WG,IN_SIZEX,IN_SPACING variables for the LL test.

It seems for LL test there is no block to stop combinations that will not work. Instead it zeros the residue. For example these:

[CODE]./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=4

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=128

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=8,IN_WG=64,IN_SIZEX=128,IN_SPACING=64

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=64

Output:

2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 OpenCL compilation in 2.22 s
2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 95000011 LL 0 loaded: 0000000000000004
2020-04-13 22:32:41 Tesla P100-PCIE-16GB-0 95000011 LL 10000 0.01%; 641 us/it; ETA 0d 16:54; fffffffffffffffd
2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 Stopping, please wait..
2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 95000011 LL 14000 0.01%; 657 us/it; ETA 0d 17:20; fffffffffffffffd

[/CODE]

preda 2020-04-13 23:17

LL is "naked", no error check at all. Please try/tune combinations on PRP, which will help detect the invalid ones. Only after validation with PRP use any combination for LL.

[QUOTE=ATH;542574]I created a script to test the speed of a bunch of combinations of the OUT_WG,OUT_SIZEX,OUT_SPACING,IN_WG,IN_SIZEX,IN_SPACING variables for the LL test.

It seems for LL test there is no block to stop combinations that will not work. Instead it zeros the residue. For example these:

[CODE]./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=4

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=128

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=8,IN_WG=64,IN_SIZEX=128,IN_SPACING=64

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=64

Output:

2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 OpenCL compilation in 2.22 s
2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 95000011 LL 0 loaded: 0000000000000004
2020-04-13 22:32:41 Tesla P100-PCIE-16GB-0 95000011 LL 10000 0.01%; 641 us/it; ETA 0d 16:54; fffffffffffffffd
2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 Stopping, please wait..
2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 95000011 LL 14000 0.01%; 657 us/it; ETA 0d 17:20; fffffffffffffffd

[/CODE][/QUOTE]

kriesel 2020-04-14 03:18

[QUOTE=preda;542577]LL is "naked", no error check at all. Please try/tune combinations on PRP, which will help detect the invalid ones. Only after validation with PRP use any combination for LL.[/QUOTE]Yikes, that means the LL side of gpuowl will be less reliable than CUDALucas v2.06, which has checks for known bad residues seen to occur,
0x0000000000000000, 0x0000000000000002, 0xffffffff80000000, 0xfffffffffffffffd, and excessive roundoff error. Gpuowl checks bits/word.

A memory copy fail could give 0; +-2 values come from the residue getting zeroed and then the -2 and the squaring; the 33-bits-set value 0xffffffff80000000 comes from using far too short an fft length as was seen in both cllucas 1.02 and CUDALucas v2.03.
[URL]https://mersenneforum.org/showpost.php?p=355661&postcount=232[/URL]
[URL]https://mersenneforum.org/showpost.php?p=386081&postcount=299[/URL]


All times are UTC. The time now is 23:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.