mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2019-05-24 21:27

OK, there are a couple of things.
- Exception 9gpu_error: "9gpu_error" is the typeid of the exception class, 9 being the number of chars after it (that's a compiler internal representation of names). Confusing I guess, I should fix it.

- clGetDeviceInfo: I'm using this to get the amount of GPU RAM, but it's through an AMD extension, which is not supported on Nvidia, and that fails. I need to avoid using that to enable P-1 on nvidia.


[QUOTE=kriesel;517643]Two attempts to run a P-1 with specified B1, B2 on a NVIDIA GTX 1070 gpu (on Win 7 x64) that previously had successfully run several PRP test conditions, failed immediately.[CODE]>gpuowl-win -device 0 -carry long -fft +0
2019-05-24 12:01:46 gpuowl v6.5-c48d46f
2019-05-24 12:01:46 Note: no config.txt file found
2019-05-24 12:01:46 config: -device 0 -carry long -fft +0
2019-05-24 12:01:46 91538501 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.46 bits/word
2019-05-24 12:01:46 using long carry kernels
2019-05-24 12:01:48

2019-05-24 12:01:48 OpenCL compilation in 1856 ms, with "-DEXP=91538501u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-24 12:01:50 Exception 9gpu_error: INVALID_VALUE clGetDeviceInfo(id, what, bufSize, buf, NULL) at clwrap.cpp:98 getInfo
2019-05-24 12:01:50 Bye

>gpuowl-win -device 0
2019-05-24 12:03:09 gpuowl v6.5-c48d46f
2019-05-24 12:03:09 Note: no config.txt file found
2019-05-24 12:03:09 config: -device 0
2019-05-24 12:03:09 91538501 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.46 bits/word
2019-05-24 12:03:09 using short carry kernels
2019-05-24 12:03:10

2019-05-24 12:03:10 OpenCL compilation in 15 ms, with "-DEXP=91538501u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-24 12:03:11 Exception 9gpu_error: INVALID_VALUE clGetDeviceInfo(id, what, bufSize, buf, NULL) at clwrap.cpp:98 getInfo
2019-05-24 12:03:11 Bye[/CODE]
Seems like there ought to be ", " or something similar between "Exception 9" and the rest of the error message.[/QUOTE]

kriesel 2019-05-24 21:40

Quadro 4000 fails like Quadro 2000 on Win
 
[QUOTE=kriesel;517645]gpuowl-win v6.5-c48d46f does not like the Quadro 2000's CUDA compute capability 2.1, Opencl v1.1 indicated in GPU-Z
Same prompt crash on compile opencl kernel happens on -carry short and -fft +1, +2, +3.
[/QUOTE]Same error messages on Quadro 4000.

kriesel 2019-05-25 17:06

Win 10, NVIDIA driver 353.30, two gpus fail to launch
 
1 Attachment(s)
For gpuowl-win v6.5-c48d46f compiled on msys2/mingw hosted on Win7, run on a Win 10 x64 system with Quadro K4000 and Tesla C2075 gpus, regardless of which gpu is tried, or -fft option, it pops up the attached instead of running. Will try to duplicate on a different Win10 system with more recent driver and gpu later.

kriesel 2019-05-25 17:21

gpuowl v6.5-c48d46f 4608k -fft +3 error on load on AMD gpus
 
gpuowl-win compiled on msys2/mingw hosted on Win7 x64, run on different Win7-x64 system, gpus RX550 or RX480, error on load, from a save file that has no issues elsewhere or previously. -fft +3 seems to have a problem.[CODE]>gpuowl-win -device 1 -carry short -fft +3
2019-05-24 23:17:49 gpuowl v6.5-c48d46f
2019-05-24 23:17:49 Note: no config.txt file found
2019-05-24 23:17:49 config: -device 1 -carry short -fft +3
2019-05-24 23:17:49 85389763 FFT 4608K: Width 512x8, Height 8x8, Middle 9; 18.10 bits/word
2019-05-24 23:17:49 using short carry kernels
2019-05-24 23:17:54 OpenCL compilation in 4160 ms, with "-DEXP=85389763u -DWIDTH=4096u -DSMALL_HEIGHT=64u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-24 23:17:55 85389763.owl loaded: k 4142000, block 1000, res64 23ae503b5c710f22
2019-05-24 23:18:34 85389763 EE loaded: 4142000, blockSize 1000, 7fe7dffd07fffdff (expected 23ae503b5c710f22)
2019-05-24 23:18:34 Exiting because "error on load"
2019-05-24 23:18:34 Bye[/CODE]

preda 2019-05-25 21:38

This has been discussed previously: an FFT config with Height 64 and a middle step is invalid, and this has been fixed recently. If using the old version without the fix, simply don't use such configs.

[QUOTE=kriesel;517749]gpuowl-win compiled on msys2/mingw hosted on Win7 x64, run on different Win7-x64 system, gpus RX550 or RX480, error on load, from a save file that has no issues elsewhere or previously. -fft +3 seems to have a problem.[CODE]>gpuowl-win -device 1 -carry short -fft +3
2019-05-24 23:17:49 gpuowl v6.5-c48d46f
2019-05-24 23:17:49 Note: no config.txt file found
2019-05-24 23:17:49 config: -device 1 -carry short -fft +3
2019-05-24 23:17:49 85389763 FFT 4608K: Width 512x8, Height 8x8, Middle 9; 18.10 bits/word
2019-05-24 23:17:49 using short carry kernels
2019-05-24 23:17:54 OpenCL compilation in 4160 ms, with "-DEXP=85389763u -DWIDTH=4096u -DSMALL_HEIGHT=64u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-24 23:17:55 85389763.owl loaded: k 4142000, block 1000, res64 23ae503b5c710f22
2019-05-24 23:18:34 85389763 EE loaded: 4142000, blockSize 1000, 7fe7dffd07fffdff (expected 23ae503b5c710f22)
2019-05-24 23:18:34 Exiting because "error on load"
2019-05-24 23:18:34 Bye[/CODE][/QUOTE]

kriesel 2019-05-26 18:54

gpuowl-win V6.5-c48d46f NVIDIA and AMD timings plus CUDALucas
 
See [URL]https://www.mersenneforum.org/showpost.php?p=517837&postcount=14[/URL] for some relative timing data for differing -fft option choices in gpuowl, and comparison runs in CUDALucas where possible, for 4608K and 18432K fft length, on several different gpu models, new and old. Best timings from gpuowl beat CUDALucas slightly in all cases. (CUDALucas had itself been thoroughly fft and threads tuned.) It's a bit apples and oranges, since CUDALucas is doing LL without Jacobi check, while gpuowl is doing PRP with Gerbicz check. The comparison is on the basis of ms/iter or ms/sq, which omits both the effect of the ~2.% chance of LL error for CUDALucas, and the ~0.3% observed GEC check time of gpuowl.

kriesel 2019-05-26 21:03

Feature request: P-1 save and resume
 
On an AMD RX480, P-1 for p~91m is ~1.5 hours stage 1, and presumably similar in stage 2, so 3 hours per exponent; for p~332M, 18 hours in stage 1, so presumably 1.5 days for both stages on one exponent; in both cases, for bounds similar to what gpu72 advises or CUDAPm1 selects. In a test on an RX480, after 2.5 hours running p~332M, in gpuowl-win v6.5-c48d46f, there was no save file made when halting a P-1 run. Restart began from scratch. The RX550 is likely to be about 3.8 times slower, judging by PRP run time ratios, so ~12 hours for 91m P-1; ~5.7 days for a 332M P-1; ~25 days for 664M P-1, ~57 days for 996M P-1. These are long runs to go without save files.

File extension could be something like .opm to distinguish it from a PRP save file.

kriesel 2019-05-26 21:20

Internal timing data from gpuowl v6.5 on NVIDIA GTX 1080 Ti
 
[CODE]>gpuowl-win -device 0 [B]-time[/B] -carry short -fft +2
2019-05-26 16:08:05 gpuowl v6.5-c48d46f
2019-05-26 16:08:05 Note: no config.txt file found
2019-05-26 16:08:05 config: -device 0 -time -carry short -fft +2
2019-05-26 16:08:05 85389763 FFT 4608K: Width 64x8, Height 64x8, Middle 9; 18.10 bits/word
2019-05-26 16:08:05 using short carry kernels
2019-05-26 16:08:06

2019-05-26 16:08:06 OpenCL compilation in 234 ms, with "-DEXP=85389763u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I
. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-26 16:08:07 85389763.owl loaded: k 1000, block 1000, res64 0c0a755239390788
2019-05-26 16:08:25 85389763 OK 3000 0.00%; 4.11 ms/sq; ETA 4d 01:29; 38414fc63219f5e9 (check 4.48s)
2019-05-26 16:08:25 27.80% tailFused : 1170 us/call x 3999 calls
2019-05-26 16:08:25 27.62% carryFused : 1202 us/call x 3869 calls
2019-05-26 16:08:25 12.05% fftMiddleIn : 476 us/call x 4259 calls
2019-05-26 16:08:25 11.31% fftMiddleOut : 461 us/call x 4129 calls
2019-05-26 16:08:25 8.90% transposeW : 352 us/call x 4259 calls
2019-05-26 16:08:25 7.60% transposeH : 310 us/call x 4129 calls
2019-05-26 16:08:25 1.39% fftP : 600 us/call x 390 calls
2019-05-26 16:08:25 1.20% carryA : 786 us/call x 258 calls
2019-05-26 16:08:25 1.02% fftH : 440 us/call x 390 calls
2019-05-26 16:08:25 0.56% fftW : 360 us/call x 260 calls
2019-05-26 16:08:25 0.37% multiply : 480 us/call x 130 calls
2019-05-26 16:08:25 0.19% carryB : 120 us/call x 260 calls
2019-05-26 16:08:25
2019-05-26 16:09:35 85389763 20000 0.02%; 4.11 ms/sq; ETA 4d 01:28; dc66a97eaafc3e4d
2019-05-26 16:09:35 31.58% tailFused : 1210 us/call x 17000 calls
2019-05-26 16:09:35 28.39% carryFused : 1089 us/call x 16983 calls
2019-05-26 16:09:35 10.92% fftMiddleOut : 418 us/call x 17017 calls
2019-05-26 16:09:35 10.17% transposeW : 389 us/call x 17034 calls
2019-05-26 16:09:35 10.01% fftMiddleIn : 383 us/call x 17034 calls
2019-05-26 16:09:35 8.83% transposeH : 338 us/call x 17017 calls
2019-05-26 16:09:35 0.05% fftP : 612 us/call x 51 calls
2019-05-26 16:09:35 0.02% fftW : 459 us/call x 34 calls
2019-05-26 16:09:35 0.02% fftH : 306 us/call x 51 calls
2019-05-26 16:09:35
2019-05-26 16:10:57 85389763 40000 0.05%; 4.11 ms/sq; ETA 4d 01:25; ff5be2560bfd9c09
2019-05-26 16:10:57 31.99% tailFused : 1239 us/call x 20000 calls
2019-05-26 16:10:57 27.56% carryFused : 1068 us/call x 19980 calls
2019-05-26 16:10:57 10.52% transposeW : 406 us/call x 20040 calls
2019-05-26 16:10:57 10.09% transposeH : 390 us/call x 20020 calls
2019-05-26 16:10:57 9.87% fftMiddleIn : 381 us/call x 20040 calls
2019-05-26 16:10:57 9.77% fftMiddleOut : 378 us/call x 20020 calls
2019-05-26 16:10:57 0.08% fftW : 1560 us/call x 40 calls
2019-05-26 16:10:57 0.04% fftP : 520 us/call x 60 calls
2019-05-26 16:10:57 0.04% fftH : 520 us/call x 60 calls
2019-05-26 16:10:57 0.02% carryA : 390 us/call x 40 calls
2019-05-26 16:10:57 0.02% multiply : 780 us/call x 20 calls
2019-05-26 16:10:57
2019-05-26 16:12:19 85389763 60000 0.07%; 4.11 ms/sq; ETA 4d 01:21; 81b3341edfd7a610
2019-05-26 16:12:19 31.56% tailFused : 1228 us/call x 20000 calls
2019-05-26 16:12:19 28.05% carryFused : 1093 us/call x 19980 calls
2019-05-26 16:12:19 10.34% fftMiddleIn : 402 us/call x 20040 calls
2019-05-26 16:12:19 10.18% fftMiddleOut : 396 us/call x 20020 calls
2019-05-26 16:12:19 9.90% transposeH : 385 us/call x 20020 calls
2019-05-26 16:12:19 9.86% transposeW : 383 us/call x 20040 calls
2019-05-26 16:12:19 0.08% fftW : 1560 us/call x 40 calls
2019-05-26 16:12:19 0.04% fftH : 520 us/call x 60 calls
2019-05-26 16:12:19
2019-05-26 16:13:41 85389763 80000 0.09%; 4.11 ms/sq; ETA 4d 01:26; 181394a870cfcf3b
2019-05-26 16:13:41 32.03% tailFused : 1250 us/call x 20000 calls
2019-05-26 16:13:41 27.73% carryFused : 1083 us/call x 19980 calls
2019-05-26 16:13:41 10.16% fftMiddleIn : 395 us/call x 20040 calls
2019-05-26 16:13:41 10.00% fftMiddleOut : 390 us/call x 20020 calls
2019-05-26 16:13:41 10.00% transposeW : 389 us/call x 20040 calls
2019-05-26 16:13:41 9.90% transposeH : 386 us/call x 20020 calls
2019-05-26 16:13:41 0.06% fftP : 780 us/call x 60 calls
2019-05-26 16:13:41 0.04% fftH : 520 us/call x 60 calls
2019-05-26 16:13:41 0.04% carryA : 780 us/call x 40 calls
2019-05-26 16:13:41 0.02% fftW : 390 us/call x 40 calls
2019-05-26 16:13:41 0.02% carryB : 390 us/call x 40 calls
2019-05-26 16:13:41
2019-05-26 16:13:45 Stopping, please wait..
2019-05-26 16:13:50 85389763 OK 81000 0.09%; 4.23 ms/sq; ETA 4d 04:10; a8835bb1f12323ed (check 4.48s)
2019-05-26 16:13:50 29.96% carryFused : 1156 us/call x 1998 calls
2019-05-26 16:13:50 28.54% tailFused : 1100 us/call x 2000 calls
2019-05-26 16:13:50 12.55% fftMiddleOut : 483 us/call x 2001 calls
2019-05-26 16:13:50 11.13% fftMiddleIn : 429 us/call x 2002 calls
2019-05-26 16:13:50 10.32% transposeW : 397 us/call x 2002 calls
2019-05-26 16:13:50 7.29% transposeH : 281 us/call x 2001 calls
2019-05-26 16:13:50 0.20% fftW : 5200 us/call x 3 calls
2019-05-26 16:13:50
2019-05-26 16:13:50 Exiting because "stop requested"
2019-05-26 16:13:50 Bye[/CODE]

kriesel 2019-05-26 23:24

P-1 attempt
 
P-1 attempt on 8GB RX480. No stage 1 gcd output at console or in log file; stage 2 terminated because of memory shortage.
[CODE]>gpuowl-win -device 0 -carry short -fft +0 -time
2019-05-26 14:39:18 gpuowl v6.5-c48d46f
2019-05-26 14:39:18 Note: no config.txt file found
2019-05-26 14:39:18 config: -device 0 -carry short -fft +0 -time
2019-05-26 14:39:18 worktodo.txt: ";B1=2735000,B2=67691250;PFactor=0,1,2,332419523,-1,81,2" ignored
2019-05-26 14:39:18 91538501 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.46 bits/word
2019-05-26 14:39:18 using short carry kernels
2019-05-26 14:39:26 OpenCL compilation in 2848 ms, with "-DEXP=91538501u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0
"
2019-05-26 14:39:27 91538501 P-1 [B]GPU RAM fits 182 stage2 buffers @ 40.0 MB each[/B]
2019-05-26 14:39:27 91538501 P-1 using 180 stage2 buffers (16 rounds)
2019-05-26 14:39:27 P-1 (B1=790000, B2=16590000, D=30030): primes 1003360, expanded 1023056, doubles 169797 (left 670317), singles 663766, total 833563 (83%)
2019-05-26 14:39:27 91538501 P-1 stage2: 527 blocks starting at block 26 (833563 selected)
2019-05-26 14:39:27 91538501 P-1 starting stage1
2019-05-26 14:40:16 91538501 10000 0.88%; 4.91 ms/sq; ETA 0d 01:33; a0649352e5eb83b6
2019-05-26 14:41:05 91538501 20000 1.75%; 4.91 ms/sq; ETA 0d 01:32; f58e75b92aa7f8fa
2019-05-26 14:41:54 91538501 30000 2.63%; 4.92 ms/sq; ETA 0d 01:31; 51873513619a0eb0
2019-05-26 14:42:44 91538501 40000 3.51%; 4.91 ms/sq; ETA 0d 01:30; b23444d0fb60071d
...
2019-05-26 16:08:44 91538501 1090000 95.64%; 4.91 ms/sq; ETA 0d 00:04; 895281c5e7df9ff4
2019-05-26 16:09:33 91538501 1100000 96.52%; 4.91 ms/sq; ETA 0d 00:03; c1f7c20a6ceaa6ff
2019-05-26 16:10:22 91538501 1110000 97.39%; 4.91 ms/sq; ETA 0d 00:02; 0fd657a862204e3c
2019-05-26 16:11:11 91538501 1120000 98.27%; 4.91 ms/sq; ETA 0d 00:02; 84f6956e7f57aab6
2019-05-26 16:12:00 91538501 1130000 99.15%; 4.91 ms/sq; ETA 0d 00:01; 2930c5f3238a743d
2019-05-26 16:12:48 P-1 stage2 [B]too little memory 6894 MB for 180 buffers of 41943040 b[/B]
2019-05-26 16:14:15 Exiting because "P-1 not enough memory"
2019-05-26 16:14:15 Bye
[/CODE]

preda 2019-05-27 12:33

I plan to rework the P-1 stage-2 memory allocation when I get a chance, probably in the following weeks.

[QUOTE=kriesel;517853]P-1 attempt on 8GB RX480. No stage 1 gcd output at console or in log file; stage 2 terminated because of memory shortage.
[CODE]>gpuowl-win -device 0 -carry short -fft +0 -time
2019-05-26 14:39:18 gpuowl v6.5-c48d46f
2019-05-26 14:39:18 Note: no config.txt file found
2019-05-26 14:39:18 config: -device 0 -carry short -fft +0 -time
2019-05-26 14:39:18 worktodo.txt: ";B1=2735000,B2=67691250;PFactor=0,1,2,332419523,-1,81,2" ignored
2019-05-26 14:39:18 91538501 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.46 bits/word
2019-05-26 14:39:18 using short carry kernels
2019-05-26 14:39:26 OpenCL compilation in 2848 ms, with "-DEXP=91538501u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0
"
2019-05-26 14:39:27 91538501 P-1 [B]GPU RAM fits 182 stage2 buffers @ 40.0 MB each[/B]
2019-05-26 14:39:27 91538501 P-1 using 180 stage2 buffers (16 rounds)
2019-05-26 14:39:27 P-1 (B1=790000, B2=16590000, D=30030): primes 1003360, expanded 1023056, doubles 169797 (left 670317), singles 663766, total 833563 (83%)
2019-05-26 14:39:27 91538501 P-1 stage2: 527 blocks starting at block 26 (833563 selected)
2019-05-26 14:39:27 91538501 P-1 starting stage1
2019-05-26 14:40:16 91538501 10000 0.88%; 4.91 ms/sq; ETA 0d 01:33; a0649352e5eb83b6
2019-05-26 14:41:05 91538501 20000 1.75%; 4.91 ms/sq; ETA 0d 01:32; f58e75b92aa7f8fa
2019-05-26 14:41:54 91538501 30000 2.63%; 4.92 ms/sq; ETA 0d 01:31; 51873513619a0eb0
2019-05-26 14:42:44 91538501 40000 3.51%; 4.91 ms/sq; ETA 0d 01:30; b23444d0fb60071d
...
2019-05-26 16:08:44 91538501 1090000 95.64%; 4.91 ms/sq; ETA 0d 00:04; 895281c5e7df9ff4
2019-05-26 16:09:33 91538501 1100000 96.52%; 4.91 ms/sq; ETA 0d 00:03; c1f7c20a6ceaa6ff
2019-05-26 16:10:22 91538501 1110000 97.39%; 4.91 ms/sq; ETA 0d 00:02; 0fd657a862204e3c
2019-05-26 16:11:11 91538501 1120000 98.27%; 4.91 ms/sq; ETA 0d 00:02; 84f6956e7f57aab6
2019-05-26 16:12:00 91538501 1130000 99.15%; 4.91 ms/sq; ETA 0d 00:01; 2930c5f3238a743d
2019-05-26 16:12:48 P-1 stage2 [B]too little memory 6894 MB for 180 buffers of 41943040 b[/B]
2019-05-26 16:14:15 Exiting because "P-1 not enough memory"
2019-05-26 16:14:15 Bye
[/CODE][/QUOTE]

kriesel 2019-05-27 14:55

[QUOTE=preda;517884]I plan to rework the P-1 stage-2 memory allocation when I get a chance, probably in the following weeks.[/QUOTE]
Thanks for the response/update.

Since a lowly 1GB Quadro 2000 can perform P-1 factoring in both stages in CUDAPm1 up to p~177,000,000, or a 2GB Quadro 4000 up to ~337,000,000, or a GTX 1060 3GB up to ~432,000,000, it was quite a surprise to me that 91.5M did not work to completion in gpuowl v6.5 P-1 on an 8GB RX480. (There's some data on CUDAPm1 limits vs gpu model & ram at [URL]https://www.mersenneforum.org/showpost.php?p=489365&postcount=7[/URL])

I retried the gpuowl P-1 run without the -time option or fft specification, and got the same result as previously. Toward the end it seemed to be saturating a cpu core, perhaps with the stage 1 gcd computation, but there was no output.

Is the -time option only applicable to PRP, not P-1, in gpuowl?


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.