![]() |
[QUOTE=preda;491786]On 20M it may be worth doing a bit higher exponents, 332M, which reach into "100M digits" domain. You can get such exponents from the "manual assignments" page, "first time 100M digits PRP".[/QUOTE]
I have just got one 332M exponent from "100M digits", I am going to start it tomorrow when a currently going 85M exponent completes. |
AMD gpu vram usage on linux
[QUOTE=preda;491787]I don't have a good solution myself. If you use ROCm, it may be an idea to submit a feature request to rocm-smi. I think some information about allocated GPU RAM can be gleaned from clinfo.
That's why my memory info is "theoretical", not reported from the GPU.[/QUOTE] Have you looked into [URL]https://github.com/marazmista/radeon-profile?[/URL] It's a bit graphical which won't make SELROC smile, but maybe it could be modified to text-only without too much trouble. The screenshot shows gpu vram usage. |
[QUOTE=kriesel;491790]Have you looked into [URL]https://github.com/marazmista/radeon-profile?[/URL] It's a bit graphical which won't make SELROC smile, but maybe it could be modified to text-only without too much trouble. The screenshot shows gpu vram usage.[/QUOTE]
It is a nice tool to monitor your GPU while you play some game, as you say it should be converted to text-only. For performance I use text-only console, this avoids a lot of graphic interface processes that get in the way when trying to keep the timing as low as possible for computing purpose. With text-only console the system scheduler is more relaxed (I haven't an exact count, last time I checked on another graphic Debian machine there were approx. 30 graphic interface processes for the GNOME interface), note that the graphic interface can also activate disk-swapping. |
V3.3 update?
[QUOTE=kracker;483209]As requested... instructions on how to compile on windows (I use msys2.. and also there are probably better ways to do it but it's just how I did it)
1) Download, install and follow the instructions for updating MSYS2 here: [URL]https://www.msys2.org/[/URL] 2) Download and install AMD APP SDK(make sure you use the 64bit version) for Windows: [URL]https://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/[/URL] 3) Copy the contents of C:\Program Files (x86)\AMD APP SDK\3.0\lib\x86_64 to C:\msys64\mingw64\lib and C:\Program Files (x86)\AMD APP SDK\3.0\include to C:\msys64\mingw64\include 4) Install gcc (pacman -S mingw-w64-x86_64-gcc) 5) Download gpuowl sources and drop them somewhere(to /home/username/ is probably easiest) 6) Run MSYS2 from mingw64.exe and cd to the directory you extracted the source to 7) Compile by: g++ -c gpuowl.cpp g++ -o gpuowl.exe gpuowl.o -lOpenCL -static strip gpuowl.exe[/QUOTE] That worked great for v2.0. Thanks again for that. I tried again recently with V3.3 (starting from step 5) and ran into errors. So, updated the msys64 installation with pacman -Syu until all was up to date. Tried again. Looked at the gpuowl makefile and extrapolated from it (for openOwL) [CODE]g++ -O2 -DREV=\"bc4a29f\" -Wall -Werror -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32[/CODE]Still errors. Could you update 7) for V3.3 please? Haven't tried it yet, but I extrapolate for cudaowl to: [CODE]nvcc -O2 -DREV=\"bc4a29f\" -o cudaowl CudaGpu.cu Gpu.cpp common.cpp gpuowl.cpp -lcufft[/CODE] (Don't have nvcc installed on a system with msys2 yet.) And lastly, fftbench: [CODE]nvcc -O2 -o fftbench fftbench.cu -lcufft[/CODE] |
[QUOTE=kriesel;491799]
[CODE]g++ -O2 -DREV=\"bc4a29f\" -Wall -Werror -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32[/CODE]Still errors. Could you update 7) for V3.3 please?[/QUOTE] What are the errors? |
[QUOTE=preda;491800]What are the errors?[/QUOTE]
See the attachment at [URL]http://www.mersenneforum.org/showpost.php?p=491788&postcount=495[/URL]. If that's not readable enough, let me know, and I'll duplicate it and PM you text capture. |
[QUOTE=kriesel;491801]See the attachment at [URL]http://www.mersenneforum.org/showpost.php?p=491788&postcount=495[/URL]. If that's not readable enough, let me know, and I'll duplicate it and PM you text capture.[/QUOTE]
Sorry I missed your initial message with the errors. Please try removing the "-Werror" from the compilation command, and see if the executable works. I don't know yet a proper fix for that particular error ("%llx" format). |
[QUOTE=preda;491802]Sorry I missed your initial message with the errors. Please try removing the "-Werror" from the compilation command, and see if the executable works.
I don't know yet a proper fix for that particular error ("%llx" format).[/QUOTE] No problem, and thanks for responding. CUDALucas etc have conditional compilation directives to handle such things as format specifier differences between platforms. Or perhaps I64 instead of ll? [URL]https://stackoverflow.com/questions/14071713/what-is-wrong-with-printfllx[/URL] A far as getting a compile, it's gone from bad to worse, perhaps from my previous system update attempt after the first errors. [CODE] $ g++ -O2 -DREV=\"bc4a29f\" -Wall -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32 bash: g++: command not found ken@condorella MSYS ~/gpuowl-compile/v3.3 $ pacman -S mingw-w64-x86_64-gcc warning: mingw-w64-x86_64-gcc-7.3.0-2 is up to date -- reinstalling resolving dependencies... looking for conflicting packages... Packages (1) mingw-w64-x86_64-gcc-7.3.0-2 Total Installed Size: 114.36 MiB Net Upgrade Size: 0.00 MiB :: Proceed with installation? [Y/n] y (1/1) checking keys in keyring [############################################] 100% (1/1) checking package integrity [############################################] 100% (1/1) loading package files [############################################] 100% (1/1) checking for file conflicts [############################################] 100% (1/1) checking available disk space [############################################] 100% :: Processing package changes... (1/1) reinstalling mingw-w64-x86_64-gcc [############################################] 100% ken@condorella MSYS ~/gpuowl-compile/v3.3 $ g++ -O2 -DREV=\"bc4a29f\" -Wall -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32 bash: g++: command not found [/CODE]There may be an uninstall/reinstall cycle in its future. |
rx550 too slow for 8m fft in V1.9 leading to Windows TDRs, app hangs?
[CODE]gpuOwL v1.9- GPU Mersenne primality checker
Radeon 550 Series 8 @3:0.0, gfx804 1203MHz OpenCL compilation in 2737 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=152500021u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 " PRP-3: FFT 8M (2048 * 2048 * 2) of 152500021 (18.18 bits/word) [2018-07-14 09:21:30 Central Daylight Time] Starting at iteration 93210000 OK 93210000 / 152500021 [61.12%], 0.00 ms/it; ETA 0d 00:00; 9d54586b81a581c5 [09:21:46] OK 93211000 / 152500021 [61.12%], 22.97 ms/it [22.93, 23.01] CV 0.2%, check 14.89s; ETA 15d 18:18; 5a622f58dc7fe7fb [09:22:24] OK 93215000 / 152500021 [61.12%], 22.96 ms/it [22.94, 23.05] CV 0.2%, check 14.92s; ETA 15d 18:06; ca6b293f0f5296f9 [09:24:11] OK 93220000 / 152500021 [61.13%], 23.15 ms/it [22.96, 24.49] CV 2.0%, check 14.81s; ETA 15d 21:12; 6e4143aeec191d29 [09:26:22] 9500 / 10000, 23.15 ms/it [/CODE](no further progress in 2.5 hours) Perhaps the RX550 is too slow on 8M fft for the Windows TDR problem? Process is hung up tight, does not respond to CTRL-C Windows system log shows a TDR event at 9:30am Disable and reenable driver in Windows Device Manager does not always restore function to GPU-Z monitoring or the gpuowl instance or a newly started gpuowl instance attempting to use the same gpu. Sometimes a system restart is required. This gpu drives the monitor that's rarely used. The other gpu, an RX480, is happily chugging along meanwhile uninterrupted. At one point this system had 4 gpus in it. The other two RX550s one by one stopped even spinning their fans. That configuration required use of 3 pcie extenders, due to pcie slot placement and gpu card double-slot width. Now the system has no extenders installed. Registry adjustments for TDR issue are already in place. After a device disable/reenable and application restart:[CODE] gpuOwL v1.9- GPU Mersenne primality checker Radeon 550 Series 8 @3:0.0, gfx804 1203MHz OpenCL compilation in 2901 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=152500021u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 " PRP-3: FFT 8M (2048 * 2048 * 2) of 152500021 (18.18 bits/word) [2018-07-14 11:55:57 Central Daylight Time] Starting at iteration 93220000 OK 93220000 / 152500021 [61.13%], 0.00 ms/it; ETA 0d 00:00; 6e4143aeec191d29 [11:56:13] OK 93221000 / 152500021 [61.13%], 22.99 ms/it [22.96, 23.02] CV 0.2%, check 14.87s; ETA 15d 18:37; c8c2ad99e709dbb0 [11:56:51] OK 93225000 / 152500021 [61.13%], 22.99 ms/it [22.96, 23.06] CV 0.1%, check 14.80s; ETA 15d 18:32; c70e07d6d9222f21 [11:58:38] OK 93230000 / 152500021 [61.13%], 23.11 ms/it [22.99, 23.96] CV 1.3%, check 15.04s; ETA 15d 20:31; 3769b4d0be8481f2 [12:00:49] 9000 / 10000, 23.06 ms/it [/CODE]and another TDR event at 12:05 stops the show again; about 4 minutes productive progress making it into the checkpoint files per restart. Yesterday this was not a problem as I recall and gpuowl log confirms. Issue started this morning with a system restart after it was downed overnight due to a thunderstorm, which didn't even affect my clocks, and this system is UPS-powered. Weird. |
moar FFT
I added a factor-9 step, and now there's a larger selection of FFT sizes:
[CODE] FFT maxExp W H M 0.5M 10.3M 512 512 1 1.0M 20.3M 1024 512 1 2.0M 39.8M 2048 512 1 2.0M 39.8M 512 2048 1 2.5M 49.4M 512 512 5 4.0M 78.0M 1024 2048 1 4.0M 78.0M 4096 512 1 4.5M 87.5M 512 512 9 5.0M 96.9M 1024 512 5 8.0M 153.0M 2048 2048 1 9.0M 171.6M 1024 512 9 10.0M 190.0M 512 2048 5 10.0M 190.0M 2048 512 5 16.0M 300.0M 4096 2048 1 18.0M 336.3M 2048 512 9 18.0M 336.3M 512 2048 9 20.0M 372.5M 4096 512 5 20.0M 372.5M 1024 2048 5 36.0M 659.0M 1024 2048 9 36.0M 659.0M 4096 512 9 40.0M 730.0M 2048 2048 5 72.0M 1290.9M 2048 2048 9 80.0M 1429.8M 4096 2048 5 144.0M 2527.5M 4096 2048 9 [/CODE] Now it's a bit easier to validate openowl on small know primes (e.g. M(1398269) in 6 minutes). For fun, it can also do things like 1Billion exponents in 39ms/it. (As I have not tested every FFT size precisely, bugs may be hiding around.) |
[QUOTE=preda;491835]I added a factor-9 step, and now there's a larger selection of FFT sizes:
[CODE] FFT maxExp W H M 0.5M 10.3M 512 512 1 1.0M 20.3M 1024 512 1 2.0M 39.8M 2048 512 1 2.0M 39.8M 512 2048 1 2.5M 49.4M 512 512 5 4.0M 78.0M 1024 2048 1 4.0M 78.0M 4096 512 1 4.5M 87.5M 512 512 9 5.0M 96.9M 1024 512 5 8.0M 153.0M 2048 2048 1 9.0M 171.6M 1024 512 9 10.0M 190.0M 512 2048 5 10.0M 190.0M 2048 512 5 16.0M 300.0M 4096 2048 1 18.0M 336.3M 2048 512 9 18.0M 336.3M 512 2048 9 20.0M 372.5M 4096 512 5 20.0M 372.5M 1024 2048 5 36.0M 659.0M 1024 2048 9 36.0M 659.0M 4096 512 9 40.0M 730.0M 2048 2048 5 72.0M 1290.9M 2048 2048 9 80.0M 1429.8M 4096 2048 5 144.0M 2527.5M 4096 2048 9 [/CODE] Now it's a bit easier to validate openowl on small know primes (e.g. M(1398269) in 6 minutes). For fun, it can also do things like 1Billion exponents in 39ms/it. (As I have not tested every FFT size precisely, bugs may be hiding around.)[/QUOTE] At first glance this is a Huge performance improvement. |
| All times are UTC. The time now is 23:02. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.