mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2019-04-23, 18:28   #1112
SELROC
 

3·5·101 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Some top end platinum psus have enough connectors for 6 cards with fully populated eight pins and powered risers, there comes a point where you might as well go bold.



GEC performance as data transfer is limited sounds like an interesting thing to test, how much of an impact does it have? Would reducing the GEC frequency to mitigate this with the -blocks flag be detrimental to error checking in ways other than just taking longer before an error is detected?

I have tested gpuowl with RX580 on riser and on 16x slot, the difference in GEC timing was about 2 seconds, with the old software. I have not redone the test with the new software.


That is normal as GEC moves data forth and back so the transfer rate counts.

Last fiddled with by SELROC on 2019-04-23 at 18:30
  Reply With Quote
Old 2019-04-23, 18:39   #1113
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

154016 Posts
Default

Quote:
Originally Posted by M344587487 View Post
My Radeon VII card seems to work fine with ROCm on a powered riser. rocm-smi says unknown instead of the pcie speed but gpuowl ran happily for half an hour before I dismantled the setup.
How about a full primality-test duration? In my experience (Windows, RX550, other differences) AMD gpus and gpuowl can take hours to weeks to show issues.

Last fiddled with by kriesel on 2019-04-23 at 18:42
kriesel is offline   Reply With Quote
Old 2019-04-23, 19:00   #1114
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

22·32·23 Posts
Default

Right now I can't install the riser but will test it when possible.
M344587487 is offline   Reply With Quote
Old 2019-04-24, 12:24   #1115
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

55B16 Posts
Default

Quote:
Originally Posted by M344587487 View Post
GEC performance as data transfer is limited sounds like an interesting thing to test, how much of an impact does it have? Would reducing the GEC frequency to mitigate this with the -blocks flag be detrimental to error checking in ways other than just taking longer before an error is detected?
I'm considering changing the default block size to 1000 (from the current 400), which would mean a check done every 1M iterations. This because RadeonVII is so fast and rather reliable that a default of 400 seems un-necesarilly low. Of course the user is able to specify lower values such as 400, 200, 100 if he's suspecting something or simply wants frequent feedback.

In general GpuOwl does little transfer over the PCIe, so putting the card is a less-than-16x slot should have tiny impact. Indeed the check becomes a bit slower, but it's tiny anyway.
preda is offline   Reply With Quote
Old 2019-04-24, 12:58   #1116
SELROC
 

817010 Posts
Default

Quote:
Originally Posted by preda View Post
I'm considering changing the default block size to 1000 (from the current 400), which would mean a check done every 1M iterations. This because RadeonVII is so fast and rather reliable that a default of 400 seems un-necesarilly low. Of course the user is able to specify lower values such as 400, 200, 100 if he's suspecting something or simply wants frequent feedback.

In general GpuOwl does little transfer over the PCIe, so putting the card is a less-than-16x slot should have tiny impact. Indeed the check becomes a bit slower, but it's tiny anyway.

In effect I have more *occasional* errors with Radeon VII, the RX580 never showed an error.
The last error occurred on Radeon VII was just EE with a normal residue, hard to decipher what is going on as I can't be all the time watching. But gpuowl has recovered happily, so I only lost the last 400K iterations.
Those errors are so occasional that doubling the block size or setting to 1000 has little impact overall. So it is good if you do it.
  Reply With Quote
Old 2019-04-25, 14:56   #1117
SELROC
 

1010101102 Posts
Default

Quote:
Originally Posted by preda View Post
Warning for ROCm users: refrain from upgrading to recently-released ROCm 2.3, there is a 5% perf degradation. https://github.com/RadeonOpenCompute/ROCm/issues/766

Good news !


https://github.com/RadeonOpenCompute...ment-486592049
  Reply With Quote
Old 2019-04-25, 17:01   #1118
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

87816 Posts
Default

Tried to use the makefile for the first time under MSYS2/Windows... getting this:
Code:
i5-4670k@DESKTOP-H3R152O MINGW64 ~/gpuowl-master-t/gpuowl
$ make
echo \"`git describe --long --dirty --always`\" > version.inc
echo Version: `cat version.inc`
Version: "v6.5-24-g984cfc4"
g++ -Wall -O2 -std=c++17 -Wall Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp main.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp state.cpp Signal.cpp FFTConfig.cpp -o gpuowl -lOpenCL -lgmp -lstdc++fs -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L.
d000046.o:(.idata$5+0x0): multiple definition of `__imp___C_specific_handler'
d000043.o:(.idata$5+0x0): first defined here
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: In function `pre_c_init':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:146: undefined reference to `__p__fmode'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: In function `__tmainCRTStartup':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:290: undefined reference to `_set_invalid_parameter_handler'
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:299: undefined reference to `__p__acmdln'
C:\msys64\tmp\ccyfHvwr.o:common.cpp:(.text+0x53c): undefined reference to `__imp___acrt_iob_func'
C:\msys64\tmp\ccrwc8MT.o:Args.cpp:(.text+0x29): undefined reference to `__imp___acrt_iob_func'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-merr.o): In function `_matherr':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/merr.c:46: undefined reference to `__acrt_iob_func'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-pseudo-reloc.o): In function `__report_error':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/pseudo-reloc.c:149: undefined reference to `__acrt_iob_func'
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/pseudo-reloc.c:150: undefined reference to `__acrt_iob_func'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingwex.a(lib64_libmingwex_a-mingw_vfprintf.o): In function `__mingw_vfprintf':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/stdio/mingw_vfprintf.c:53: undefined reference to `_lock_file'
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/stdio/mingw_vfprintf.c:55: undefined reference to `_unlock_file'
collect2.exe: error: ld returned 1 exit status
make: *** [Makefile:14: gpuowl] Error 1
However I can compile it with no problems "manually".
kracker is offline   Reply With Quote
Old 2019-04-25, 20:30   #1119
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101010110112 Posts
Default

Quote:
Originally Posted by kracker View Post
Tried to use the makefile for the first time under MSYS2/Windows... getting this: [..]
However I can compile it with no problems "manually".
What is the difference when you compile manually? Do you add or remove some flags?
preda is offline   Reply With Quote
Old 2019-04-25, 23:29   #1120
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

87816 Posts
Default

Quote:
Originally Posted by preda View Post
What is the difference when you compile manually? Do you add or remove some flags?

This works with no errors/warnings (after generating version.inc)

Code:
 cd gpuowl
g++ -Wall -std=c++17 -c Pm1Plan.cpp
g++ -Wall -std=c++17 -c GmpUtil.cpp
g++ -Wall -std=c++17 -c Worktodo.cpp
g++ -Wall -std=c++17 -c common.cpp
g++ -Wall -std=c++17 -c main.cpp
g++ -Wall -std=c++17 -c Gpu.cpp
g++ -Wall -std=c++17 -c clwrap.cpp
g++ -Wall -std=c++17 -c Task.cpp
g++ -Wall -std=c++17 -c checkpoint.cpp
g++ -Wall -std=c++17 -c timeutil.cpp
g++ -Wall -std=c++17 -c Args.cpp
g++ -Wall -std=c++17 -c state.cpp
g++ -Wall -std=c++17 -c Signal.cpp
g++ -Wall -std=c++17 -c FFTConfig.cpp
g++ -o gpuowl.exe -static -std=c++17 Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o -lgmp -lstdc++fs /c/Windows/System32/OpenCL.dll
kracker is offline   Reply With Quote
Old 2019-05-02, 04:12   #1121
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26×5×17 Posts
Default

Quote:
Originally Posted by preda View Post
I added an initial CUDA backend to gpuOwl. I expect this to be rough, buggy and not-optimized yet, but it's a start.
...
- cudaOwl has a rich choice of FFT sizes (unlike openOwl). FFT selection is controlled with the "-fft" argument, allowing to specify hard sizes such as 4096K or 4M, or delta steps from the "default" size for the exponent, such as +1 or -1.

A few nice things:
- it's possible to switch the savefile between CUDA/OpenCL in midflight.
- it's possible to change the FFT size in midflight.

Not so nice:
the performance on GTX 1080 is disappointing. 5.9ms/it at the PRP wavefront, 4480K FFT. (thus I don't think it's such a good idea to do PRP or LL on Nvidia yet. Probably TF is a better fit for the 32bit-oriented hardware).
Not sure why 5.9, but "not-optimized yet" probably covers it. CUDALucas 2.06 does LL at 4.37ms/it on my GTX1080 at a slightly higher fft length, but probably didn't reach that performance level all at once (and lacks even the Jacobi check). CUDAowl reaching 74% of that from the start with GEC is not a bad effort at all.

Code:
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  Dec 13  04:22:13  |  M82599421   2000000  0x191f30c8ee1b9fe0  |  4608K  0.10938   4.3673  436.73s  |   4:01:26:41   2.42%  |
kriesel is offline   Reply With Quote
Old 2019-05-05, 15:48   #1122
SELROC
 

32·467 Posts
Default

Quote:
Originally Posted by SELROC View Post
In effect I have more *occasional* errors with Radeon VII, the RX580 never showed an error.
The last error occurred on Radeon VII was just EE with a normal residue, hard to decipher what is going on as I can't be all the time watching. But gpuowl has recovered happily, so I only lost the last 400K iterations.
Those errors are so occasional that doubling the block size or setting to 1000 has little impact overall. So it is good if you do it.



Experimenting with -block sizes for a 332M exponent:


1. the GEC time with block 400K is 2.11~ sec.
2. the GEC time with block 1000K is 4.25~ sec.


The GEC time varies with block size.
  Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 21:50.


Fri Aug 6 21:50:52 UTC 2021 up 14 days, 16:19, 1 user, load averages: 2.77, 2.54, 2.52

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.