mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2018-07-14, 12:20   #496
SELROC
 

19·191 Posts
Default

Quote:
Originally Posted by preda View Post
On 20M it may be worth doing a bit higher exponents, 332M, which reach into "100M digits" domain. You can get such exponents from the "manual assignments" page, "first time 100M digits PRP".
I have just got one 332M exponent from "100M digits", I am going to start it tomorrow when a currently going 85M exponent completes.
  Reply With Quote
Old 2018-07-14, 12:25   #497
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001111012 Posts
Default AMD gpu vram usage on linux

Quote:
Originally Posted by preda View Post
I don't have a good solution myself. If you use ROCm, it may be an idea to submit a feature request to rocm-smi. I think some information about allocated GPU RAM can be gleaned from clinfo.
That's why my memory info is "theoretical", not reported from the GPU.
Have you looked into https://github.com/marazmista/radeon-profile? It's a bit graphical which won't make SELROC smile, but maybe it could be modified to text-only without too much trouble. The screenshot shows gpu vram usage.
kriesel is online now   Reply With Quote
Old 2018-07-14, 12:59   #498
SELROC
 

FD816 Posts
Default

Quote:
Originally Posted by kriesel View Post
Have you looked into https://github.com/marazmista/radeon-profile? It's a bit graphical which won't make SELROC smile, but maybe it could be modified to text-only without too much trouble. The screenshot shows gpu vram usage.
It is a nice tool to monitor your GPU while you play some game, as you say it should be converted to text-only.
For performance I use text-only console, this avoids a lot of graphic interface processes that get in the way when trying to keep the timing as low as possible for computing purpose. With text-only console the system scheduler is more relaxed (I haven't an exact count, last time I checked on another graphic Debian machine there were approx. 30 graphic interface processes for the GNOME interface), note that the graphic interface can also activate disk-swapping.
  Reply With Quote
Old 2018-07-14, 14:16   #499
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default V3.3 update?

Quote:
Originally Posted by kracker View Post
As requested... instructions on how to compile on windows (I use msys2.. and also there are probably better ways to do it but it's just how I did it)

1) Download, install and follow the instructions for updating MSYS2 here: https://www.msys2.org/
2) Download and install AMD APP SDK(make sure you use the 64bit version) for Windows: https://developer.amd.com/amd-accele...ssing-app-sdk/
3) Copy the contents of C:\Program Files (x86)\AMD APP SDK\3.0\lib\x86_64 to C:\msys64\mingw64\lib and C:\Program Files (x86)\AMD APP SDK\3.0\include to C:\msys64\mingw64\include
4) Install gcc (pacman -S mingw-w64-x86_64-gcc)
5) Download gpuowl sources and drop them somewhere(to /home/username/ is probably easiest)
6) Run MSYS2 from mingw64.exe and cd to the directory you extracted the source to
7) Compile by:
g++ -c gpuowl.cpp
g++ -o gpuowl.exe gpuowl.o -lOpenCL -static
strip gpuowl.exe
That worked great for v2.0. Thanks again for that. I tried again recently with V3.3 (starting from step 5) and ran into errors.
So, updated the msys64 installation with pacman -Syu until all was up to date. Tried again. Looked at the gpuowl makefile and extrapolated from it (for openOwL)
Code:
g++ -O2 -DREV=\"bc4a29f\" -Wall -Werror -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32
Still errors. Could you update 7) for V3.3 please?

Haven't tried it yet, but I extrapolate for cudaowl to:
Code:
nvcc -O2 -DREV=\"bc4a29f\" -o cudaowl CudaGpu.cu Gpu.cpp common.cpp gpuowl.cpp -lcufft
(Don't have nvcc installed on a system with msys2 yet.)
And lastly, fftbench:
Code:
nvcc -O2 -o fftbench fftbench.cu -lcufft

Last fiddled with by kriesel on 2018-07-14 at 14:41
kriesel is online now   Reply With Quote
Old 2018-07-14, 14:30   #500
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by kriesel View Post
Code:
g++ -O2 -DREV=\"bc4a29f\" -Wall -Werror -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32
Still errors. Could you update 7) for V3.3 please?
What are the errors?
preda is offline   Reply With Quote
Old 2018-07-14, 14:37   #501
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

543710 Posts
Default

Quote:
Originally Posted by preda View Post
What are the errors?
See the attachment at http://www.mersenneforum.org/showpos...&postcount=495. If that's not readable enough, let me know, and I'll duplicate it and PM you text capture.
kriesel is online now   Reply With Quote
Old 2018-07-14, 14:51   #502
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by kriesel View Post
See the attachment at http://www.mersenneforum.org/showpos...&postcount=495. If that's not readable enough, let me know, and I'll duplicate it and PM you text capture.
Sorry I missed your initial message with the errors. Please try removing the "-Werror" from the compilation command, and see if the executable works.


I don't know yet a proper fix for that particular error ("%llx" format).
preda is offline   Reply With Quote
Old 2018-07-14, 17:14   #503
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by preda View Post
Sorry I missed your initial message with the errors. Please try removing the "-Werror" from the compilation command, and see if the executable works.

I don't know yet a proper fix for that particular error ("%llx" format).
No problem, and thanks for responding. CUDALucas etc have conditional compilation directives to handle such things as format specifier differences between platforms. Or perhaps I64 instead of ll? https://stackoverflow.com/questions/...with-printfllx

A far as getting a compile, it's gone from bad to worse, perhaps from my previous system update attempt after the first errors.
Code:
$ g++ -O2 -DREV=\"bc4a29f\" -Wall -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32
bash: g++: command not found

ken@condorella MSYS ~/gpuowl-compile/v3.3
$ pacman -S mingw-w64-x86_64-gcc
warning: mingw-w64-x86_64-gcc-7.3.0-2 is up to date -- reinstalling
resolving dependencies...
looking for conflicting packages...

Packages (1) mingw-w64-x86_64-gcc-7.3.0-2

Total Installed Size:  114.36 MiB
Net Upgrade Size:        0.00 MiB

:: Proceed with installation? [Y/n] y
(1/1) checking keys in keyring                                                  [############################################] 100%
(1/1) checking package integrity                                                [############################################] 100%
(1/1) loading package files                                                     [############################################] 100%
(1/1) checking for file conflicts                                               [############################################] 100%
(1/1) checking available disk space                                             [############################################] 100%
:: Processing package changes...
(1/1) reinstalling mingw-w64-x86_64-gcc                                         [############################################] 100%

ken@condorella MSYS ~/gpuowl-compile/v3.3
$ g++ -O2 -DREV=\"bc4a29f\" -Wall -std=c++14 OpenGpu.cpp Gpu.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/c/Windows/System32
bash: g++: command not found
There may be an uninstall/reinstall cycle in its future.

Last fiddled with by kriesel on 2018-07-14 at 17:39
kriesel is online now   Reply With Quote
Old 2018-07-14, 17:51   #504
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default rx550 too slow for 8m fft in V1.9 leading to Windows TDRs, app hangs?

Code:
gpuOwL v1.9- GPU Mersenne primality checker
Radeon 550 Series 8 @3:0.0, gfx804 1203MHz

OpenCL compilation in 2737 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=152500021u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1
 "
PRP-3: FFT 8M (2048 * 2048 * 2) of 152500021 (18.18 bits/word) [2018-07-14 09:21:30 Central Daylight Time]
Starting at iteration 93210000
OK 93210000 / 152500021 [61.12%], 0.00 ms/it; ETA 0d 00:00; 9d54586b81a581c5 [09:21:46]
OK 93211000 / 152500021 [61.12%], 22.97 ms/it [22.93, 23.01] CV 0.2%, check 14.89s; ETA 15d 18:18; 5a622f58dc7fe7fb [09:22:24]
OK 93215000 / 152500021 [61.12%], 22.96 ms/it [22.94, 23.05] CV 0.2%, check 14.92s; ETA 15d 18:06; ca6b293f0f5296f9 [09:24:11]
OK 93220000 / 152500021 [61.13%], 23.15 ms/it [22.96, 24.49] CV 2.0%, check 14.81s; ETA 15d 21:12; 6e4143aeec191d29 [09:26:22]
  9500 / 10000, 23.15 ms/it
(no further progress in 2.5 hours)
Perhaps the RX550 is too slow on 8M fft for the Windows TDR problem?
Process is hung up tight, does not respond to CTRL-C
Windows system log shows a TDR event at 9:30am
Disable and reenable driver in Windows Device Manager does not always restore function to GPU-Z monitoring or the gpuowl instance or a newly started gpuowl instance attempting to use the same gpu. Sometimes a system restart is required. This gpu drives the monitor that's rarely used. The other gpu, an RX480, is happily chugging along meanwhile uninterrupted.
At one point this system had 4 gpus in it. The other two RX550s one by one stopped even spinning their fans. That configuration required use of 3 pcie extenders, due to pcie slot placement and gpu card double-slot width. Now the system has no extenders installed.
Registry adjustments for TDR issue are already in place.

After a device disable/reenable and application restart:
Code:
gpuOwL v1.9- GPU Mersenne primality checker
Radeon 550 Series 8 @3:0.0, gfx804 1203MHz

OpenCL compilation in 2901 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=152500021u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1
 "
PRP-3: FFT 8M (2048 * 2048 * 2) of 152500021 (18.18 bits/word) [2018-07-14 11:55:57 Central Daylight Time]
Starting at iteration 93220000
OK 93220000 / 152500021 [61.13%], 0.00 ms/it; ETA 0d 00:00; 6e4143aeec191d29 [11:56:13]
OK 93221000 / 152500021 [61.13%], 22.99 ms/it [22.96, 23.02] CV 0.2%, check 14.87s; ETA 15d 18:37; c8c2ad99e709dbb0 [11:56:51]
OK 93225000 / 152500021 [61.13%], 22.99 ms/it [22.96, 23.06] CV 0.1%, check 14.80s; ETA 15d 18:32; c70e07d6d9222f21 [11:58:38]
OK 93230000 / 152500021 [61.13%], 23.11 ms/it [22.99, 23.96] CV 1.3%, check 15.04s; ETA 15d 20:31; 3769b4d0be8481f2 [12:00:49]
  9000 / 10000, 23.06 ms/it
and another TDR event at 12:05 stops the show again; about 4 minutes productive progress making it into the checkpoint files per restart. Yesterday this was not a problem as I recall and gpuowl log confirms. Issue started this morning with a system restart after it was downed overnight due to a thunderstorm, which didn't even affect my clocks, and this system is UPS-powered. Weird.
kriesel is online now   Reply With Quote
Old 2018-07-15, 05:52   #505
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default moar FFT

I added a factor-9 step, and now there's a larger selection of FFT sizes:
Code:
   FFT  maxExp    W    H M
  0.5M   10.3M  512  512 1
  1.0M   20.3M 1024  512 1
  2.0M   39.8M 2048  512 1
  2.0M   39.8M  512 2048 1
  2.5M   49.4M  512  512 5
  4.0M   78.0M 1024 2048 1
  4.0M   78.0M 4096  512 1
  4.5M   87.5M  512  512 9
  5.0M   96.9M 1024  512 5
  8.0M  153.0M 2048 2048 1
  9.0M  171.6M 1024  512 9
 10.0M  190.0M  512 2048 5
 10.0M  190.0M 2048  512 5
 16.0M  300.0M 4096 2048 1
 18.0M  336.3M 2048  512 9
 18.0M  336.3M  512 2048 9
 20.0M  372.5M 4096  512 5
 20.0M  372.5M 1024 2048 5
 36.0M  659.0M 1024 2048 9
 36.0M  659.0M 4096  512 9
 40.0M  730.0M 2048 2048 5
 72.0M 1290.9M 2048 2048 9
 80.0M 1429.8M 4096 2048 5
144.0M 2527.5M 4096 2048 9
Now it's a bit easier to validate openowl on small know primes (e.g. M(1398269) in 6 minutes). For fun, it can also do things like 1Billion exponents in 39ms/it.

(As I have not tested every FFT size precisely, bugs may be hiding around.)

Last fiddled with by preda on 2018-07-15 at 05:52
preda is offline   Reply With Quote
Old 2018-07-15, 06:28   #506
SELROC
 

3,011 Posts
Default

Quote:
Originally Posted by preda View Post
I added a factor-9 step, and now there's a larger selection of FFT sizes:
Code:
   FFT  maxExp    W    H M
  0.5M   10.3M  512  512 1
  1.0M   20.3M 1024  512 1
  2.0M   39.8M 2048  512 1
  2.0M   39.8M  512 2048 1
  2.5M   49.4M  512  512 5
  4.0M   78.0M 1024 2048 1
  4.0M   78.0M 4096  512 1
  4.5M   87.5M  512  512 9
  5.0M   96.9M 1024  512 5
  8.0M  153.0M 2048 2048 1
  9.0M  171.6M 1024  512 9
 10.0M  190.0M  512 2048 5
 10.0M  190.0M 2048  512 5
 16.0M  300.0M 4096 2048 1
 18.0M  336.3M 2048  512 9
 18.0M  336.3M  512 2048 9
 20.0M  372.5M 4096  512 5
 20.0M  372.5M 1024 2048 5
 36.0M  659.0M 1024 2048 9
 36.0M  659.0M 4096  512 9
 40.0M  730.0M 2048 2048 5
 72.0M 1290.9M 2048 2048 9
 80.0M 1429.8M 4096 2048 5
144.0M 2527.5M 4096 2048 9
Now it's a bit easier to validate openowl on small know primes (e.g. M(1398269) in 6 minutes). For fun, it can also do things like 1Billion exponents in 39ms/it.

(As I have not tested every FFT size precisely, bugs may be hiding around.)

At first glance this is a Huge performance improvement.
  Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 14:20.


Fri Aug 6 14:20:29 UTC 2021 up 14 days, 8:49, 1 user, load averages: 3.48, 2.78, 2.57

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.