mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

SELROC 2018-07-14 12:20

[QUOTE=preda;491786]On 20M it may be worth doing a bit higher exponents, 332M, which reach into "100M digits" domain. You can get such exponents from the "manual assignments" page, "first time 100M digits PRP".[/QUOTE]

I have just got one 332M exponent from "100M digits", I am going to start it tomorrow when a currently going 85M exponent completes.

kriesel 2018-07-14 12:25

AMD gpu vram usage on linux
 
[QUOTE=preda;491787]I don't have a good solution myself. If you use ROCm, it may be an idea to submit a feature request to rocm-smi. I think some information about allocated GPU RAM can be gleaned from clinfo.
That's why my memory info is "theoretical", not reported from the GPU.[/QUOTE]

Have you looked into [URL]https://github.com/marazmista/radeon-profile?[/URL] It's a bit graphical which won't make SELROC smile, but maybe it could be modified to text-only without too much trouble. The screenshot shows gpu vram usage.

SELROC 2018-07-14 12:59

[QUOTE=kriesel;491790]Have you looked into [URL]https://github.com/marazmista/radeon-profile?[/URL] It's a bit graphical which won't make SELROC smile, but maybe it could be modified to text-only without too much trouble. The screenshot shows gpu vram usage.[/QUOTE]

It is a nice tool to monitor your GPU while you play some game, as you say it should be converted to text-only.
For performance I use text-only console, this avoids a lot of graphic interface processes that get in the way when trying to keep the timing as low as possible for computing purpose. With text-only console the system scheduler is more relaxed (I haven't an exact count, last time I checked on another graphic Debian machine there were approx. 30 graphic interface processes for the GNOME interface), note that the graphic interface can also activate disk-swapping.

kriesel 2018-07-14 14:16

V3.3 update?
 
[QUOTE=kracker;483209]As requested... instructions on how to compile on windows (I use msys2.. and also there are probably better ways to do it but it's just how I did it)

1) Download, install and follow the instructions for updating MSYS2 here: [URL]https://www.msys2.org/[/URL]
2) Download and install AMD APP SDK(make sure you use the 64bit version) for Windows: [URL]https://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/[/URL]
3) Copy the contents of C:\Program Files (x86)\AMD APP SDK\3.0\lib\x86_64 to C:\msys64\mingw64\lib and C:\Program Files (x86)\AMD APP SDK\3.0\include to C:\msys64\mingw64\include
4) Install gcc (pacman -S mingw-w64-x86_64-gcc)
5) Download gpuowl sources and drop them somewhere(to /home/username/ is probably easiest)
6) Run MSYS2 from mingw64.exe and cd to the directory you extracted the source to
7) Compile by:
g++ -c gpuowl.cpp
g++ -o gpuowl.exe gpuowl.o -lOpenCL -static
strip gpuowl.exe[/QUOTE]
That worked great for v2.0. Thanks again for that. I tried again recently with V3.3 (starting from step 5) and ran into errors.
So, updated the msys64 installation with pacman -Syu until all was up to date. Tried again. Looked at the gpuowl makefile and extrapolated from it (for openOwL)
[CODE]g++ -O2 -DREV=\"bc4a29f\" -Wall -Werror -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32[/CODE]Still errors. Could you update 7) for V3.3 please?

Haven't tried it yet, but I extrapolate for cudaowl to: [CODE]nvcc -O2 -DREV=\"bc4a29f\" -o cudaowl CudaGpu.cu Gpu.cpp common.cpp gpuowl.cpp -lcufft[/CODE] (Don't have nvcc installed on a system with msys2 yet.)
And lastly, fftbench: [CODE]nvcc -O2 -o fftbench fftbench.cu -lcufft[/CODE]

preda 2018-07-14 14:30

[QUOTE=kriesel;491799]
[CODE]g++ -O2 -DREV=\"bc4a29f\" -Wall -Werror -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32[/CODE]Still errors. Could you update 7) for V3.3 please?[/QUOTE]
What are the errors?

kriesel 2018-07-14 14:37

[QUOTE=preda;491800]What are the errors?[/QUOTE]
See the attachment at [URL]http://www.mersenneforum.org/showpost.php?p=491788&postcount=495[/URL]. If that's not readable enough, let me know, and I'll duplicate it and PM you text capture.

preda 2018-07-14 14:51

[QUOTE=kriesel;491801]See the attachment at [URL]http://www.mersenneforum.org/showpost.php?p=491788&postcount=495[/URL]. If that's not readable enough, let me know, and I'll duplicate it and PM you text capture.[/QUOTE]
Sorry I missed your initial message with the errors. Please try removing the "-Werror" from the compilation command, and see if the executable works.


I don't know yet a proper fix for that particular error ("%llx" format).

kriesel 2018-07-14 17:14

[QUOTE=preda;491802]Sorry I missed your initial message with the errors. Please try removing the "-Werror" from the compilation command, and see if the executable works.

I don't know yet a proper fix for that particular error ("%llx" format).[/QUOTE]

No problem, and thanks for responding. CUDALucas etc have conditional compilation directives to handle such things as format specifier differences between platforms. Or perhaps I64 instead of ll? [URL]https://stackoverflow.com/questions/14071713/what-is-wrong-with-printfllx[/URL]

A far as getting a compile, it's gone from bad to worse, perhaps from my previous system update attempt after the first errors.
[CODE]
$ g++ -O2 -DREV=\"bc4a29f\" -Wall -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32
bash: g++: command not found

ken@condorella MSYS ~/gpuowl-compile/v3.3
$ pacman -S mingw-w64-x86_64-gcc
warning: mingw-w64-x86_64-gcc-7.3.0-2 is up to date -- reinstalling
resolving dependencies...
looking for conflicting packages...

Packages (1) mingw-w64-x86_64-gcc-7.3.0-2

Total Installed Size: 114.36 MiB
Net Upgrade Size: 0.00 MiB

:: Proceed with installation? [Y/n] y
(1/1) checking keys in keyring [############################################] 100%
(1/1) checking package integrity [############################################] 100%
(1/1) loading package files [############################################] 100%
(1/1) checking for file conflicts [############################################] 100%
(1/1) checking available disk space [############################################] 100%
:: Processing package changes...
(1/1) reinstalling mingw-w64-x86_64-gcc [############################################] 100%

ken@condorella MSYS ~/gpuowl-compile/v3.3
$ g++ -O2 -DREV=\"bc4a29f\" -Wall -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32
bash: g++: command not found
[/CODE]There may be an uninstall/reinstall cycle in its future.

kriesel 2018-07-14 17:51

rx550 too slow for 8m fft in V1.9 leading to Windows TDRs, app hangs?
 
[CODE]gpuOwL v1.9- GPU Mersenne primality checker
Radeon 550 Series 8 @3:0.0, gfx804 1203MHz

OpenCL compilation in 2737 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=152500021u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1
"
PRP-3: FFT 8M (2048 * 2048 * 2) of 152500021 (18.18 bits/word) [2018-07-14 09:21:30 Central Daylight Time]
Starting at iteration 93210000
OK 93210000 / 152500021 [61.12%], 0.00 ms/it; ETA 0d 00:00; 9d54586b81a581c5 [09:21:46]
OK 93211000 / 152500021 [61.12%], 22.97 ms/it [22.93, 23.01] CV 0.2%, check 14.89s; ETA 15d 18:18; 5a622f58dc7fe7fb [09:22:24]
OK 93215000 / 152500021 [61.12%], 22.96 ms/it [22.94, 23.05] CV 0.2%, check 14.92s; ETA 15d 18:06; ca6b293f0f5296f9 [09:24:11]
OK 93220000 / 152500021 [61.13%], 23.15 ms/it [22.96, 24.49] CV 2.0%, check 14.81s; ETA 15d 21:12; 6e4143aeec191d29 [09:26:22]
9500 / 10000, 23.15 ms/it
[/CODE](no further progress in 2.5 hours)
Perhaps the RX550 is too slow on 8M fft for the Windows TDR problem?
Process is hung up tight, does not respond to CTRL-C
Windows system log shows a TDR event at 9:30am
Disable and reenable driver in Windows Device Manager does not always restore function to GPU-Z monitoring or the gpuowl instance or a newly started gpuowl instance attempting to use the same gpu. Sometimes a system restart is required. This gpu drives the monitor that's rarely used. The other gpu, an RX480, is happily chugging along meanwhile uninterrupted.
At one point this system had 4 gpus in it. The other two RX550s one by one stopped even spinning their fans. That configuration required use of 3 pcie extenders, due to pcie slot placement and gpu card double-slot width. Now the system has no extenders installed.
Registry adjustments for TDR issue are already in place.

After a device disable/reenable and application restart:[CODE]
gpuOwL v1.9- GPU Mersenne primality checker
Radeon 550 Series 8 @3:0.0, gfx804 1203MHz

OpenCL compilation in 2901 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=152500021u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1
"
PRP-3: FFT 8M (2048 * 2048 * 2) of 152500021 (18.18 bits/word) [2018-07-14 11:55:57 Central Daylight Time]
Starting at iteration 93220000
OK 93220000 / 152500021 [61.13%], 0.00 ms/it; ETA 0d 00:00; 6e4143aeec191d29 [11:56:13]
OK 93221000 / 152500021 [61.13%], 22.99 ms/it [22.96, 23.02] CV 0.2%, check 14.87s; ETA 15d 18:37; c8c2ad99e709dbb0 [11:56:51]
OK 93225000 / 152500021 [61.13%], 22.99 ms/it [22.96, 23.06] CV 0.1%, check 14.80s; ETA 15d 18:32; c70e07d6d9222f21 [11:58:38]
OK 93230000 / 152500021 [61.13%], 23.11 ms/it [22.99, 23.96] CV 1.3%, check 15.04s; ETA 15d 20:31; 3769b4d0be8481f2 [12:00:49]
9000 / 10000, 23.06 ms/it
[/CODE]and another TDR event at 12:05 stops the show again; about 4 minutes productive progress making it into the checkpoint files per restart. Yesterday this was not a problem as I recall and gpuowl log confirms. Issue started this morning with a system restart after it was downed overnight due to a thunderstorm, which didn't even affect my clocks, and this system is UPS-powered. Weird.

preda 2018-07-15 05:52

moar FFT
 
I added a factor-9 step, and now there's a larger selection of FFT sizes:
[CODE] FFT maxExp W H M
0.5M 10.3M 512 512 1
1.0M 20.3M 1024 512 1
2.0M 39.8M 2048 512 1
2.0M 39.8M 512 2048 1
2.5M 49.4M 512 512 5
4.0M 78.0M 1024 2048 1
4.0M 78.0M 4096 512 1
4.5M 87.5M 512 512 9
5.0M 96.9M 1024 512 5
8.0M 153.0M 2048 2048 1
9.0M 171.6M 1024 512 9
10.0M 190.0M 512 2048 5
10.0M 190.0M 2048 512 5
16.0M 300.0M 4096 2048 1
18.0M 336.3M 2048 512 9
18.0M 336.3M 512 2048 9
20.0M 372.5M 4096 512 5
20.0M 372.5M 1024 2048 5
36.0M 659.0M 1024 2048 9
36.0M 659.0M 4096 512 9
40.0M 730.0M 2048 2048 5
72.0M 1290.9M 2048 2048 9
80.0M 1429.8M 4096 2048 5
144.0M 2527.5M 4096 2048 9
[/CODE] Now it's a bit easier to validate openowl on small know primes (e.g. M(1398269) in 6 minutes). For fun, it can also do things like 1Billion exponents in 39ms/it.

(As I have not tested every FFT size precisely, bugs may be hiding around.)

SELROC 2018-07-15 06:28

[QUOTE=preda;491835]I added a factor-9 step, and now there's a larger selection of FFT sizes:
[CODE] FFT maxExp W H M
0.5M 10.3M 512 512 1
1.0M 20.3M 1024 512 1
2.0M 39.8M 2048 512 1
2.0M 39.8M 512 2048 1
2.5M 49.4M 512 512 5
4.0M 78.0M 1024 2048 1
4.0M 78.0M 4096 512 1
4.5M 87.5M 512 512 9
5.0M 96.9M 1024 512 5
8.0M 153.0M 2048 2048 1
9.0M 171.6M 1024 512 9
10.0M 190.0M 512 2048 5
10.0M 190.0M 2048 512 5
16.0M 300.0M 4096 2048 1
18.0M 336.3M 2048 512 9
18.0M 336.3M 512 2048 9
20.0M 372.5M 4096 512 5
20.0M 372.5M 1024 2048 5
36.0M 659.0M 1024 2048 9
36.0M 659.0M 4096 512 9
40.0M 730.0M 2048 2048 5
72.0M 1290.9M 2048 2048 9
80.0M 1429.8M 4096 2048 5
144.0M 2527.5M 4096 2048 9
[/CODE] Now it's a bit easier to validate openowl on small know primes (e.g. M(1398269) in 6 minutes). For fun, it can also do things like 1Billion exponents in 39ms/it.

(As I have not tested every FFT size precisely, bugs may be hiding around.)[/QUOTE]


At first glance this is a Huge performance improvement.


All times are UTC. The time now is 23:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.