![]() |
[QUOTE=M344587487;514489]Some top end platinum psus have enough connectors for 6 cards with fully populated eight pins and powered risers, there comes a point where you might as well go bold.
GEC performance as data transfer is limited sounds like an interesting thing to test, how much of an impact does it have? Would reducing the GEC frequency to mitigate this with the -blocks flag be detrimental to error checking in ways other than just taking longer before an error is detected?[/QUOTE] I have tested gpuowl with RX580 on riser and on 16x slot, the difference in GEC timing was about 2 seconds, with the old software. I have not redone the test with the new software. That is normal as GEC moves data forth and back so the transfer rate counts. |
[QUOTE=M344587487;514473]My Radeon VII card seems to work fine with ROCm on a powered riser. rocm-smi says unknown instead of the pcie speed but gpuowl ran happily for half an hour before I dismantled the setup.[/QUOTE]
How about a full primality-test duration? In my experience (Windows, RX550, other differences) AMD gpus and gpuowl can take hours to weeks to show issues. |
Right now I can't install the riser but will test it when possible.
|
[QUOTE=M344587487;514489]
GEC performance as data transfer is limited sounds like an interesting thing to test, how much of an impact does it have? Would reducing the GEC frequency to mitigate this with the -blocks flag be detrimental to error checking in ways other than just taking longer before an error is detected?[/QUOTE] I'm considering changing the default block size to 1000 (from the current 400), which would mean a check done every 1M iterations. This because RadeonVII is so fast and rather reliable that a default of 400 seems un-necesarilly low. Of course the user is able to specify lower values such as 400, 200, 100 if he's suspecting something or simply wants frequent feedback. In general GpuOwl does little transfer over the PCIe, so putting the card is a less-than-16x slot should have tiny impact. Indeed the check becomes a bit slower, but it's tiny anyway. |
[QUOTE=preda;514538]I'm considering changing the default block size to 1000 (from the current 400), which would mean a check done every 1M iterations. This because RadeonVII is so fast and rather reliable that a default of 400 seems un-necesarilly low. Of course the user is able to specify lower values such as 400, 200, 100 if he's suspecting something or simply wants frequent feedback.
In general GpuOwl does little transfer over the PCIe, so putting the card is a less-than-16x slot should have tiny impact. Indeed the check becomes a bit slower, but it's tiny anyway.[/QUOTE] In effect I have more *occasional* errors with Radeon VII, the RX580 never showed an error. The last error occurred on Radeon VII was just EE with a normal residue, hard to decipher what is going on as I can't be all the time watching. But gpuowl has recovered happily, so I only lost the last 400K iterations. Those errors are so occasional that doubling the block size or setting to 1000 has little impact overall. So it is good if you do it. |
[QUOTE=preda;513683]Warning for ROCm users: refrain from upgrading to recently-released ROCm 2.3, there is a 5% perf degradation. [URL]https://github.com/RadeonOpenCompute/ROCm/issues/766[/URL][/QUOTE]
Good news ! [url]https://github.com/RadeonOpenCompute/ROCm/issues/766#issuecomment-486592049[/url] |
Tried to use the makefile for the first time under MSYS2/Windows... getting this:
[code] i5-4670k@DESKTOP-H3R152O MINGW64 ~/gpuowl-master-t/gpuowl $ make echo \"`git describe --long --dirty --always`\" > version.inc echo Version: `cat version.inc` Version: "v6.5-24-g984cfc4" g++ -Wall -O2 -std=c++17 -Wall Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp main.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp state.cpp Signal.cpp FFTConfig.cpp -o gpuowl -lOpenCL -lgmp -lstdc++fs -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. d000046.o:(.idata$5+0x0): multiple definition of `__imp___C_specific_handler' d000043.o:(.idata$5+0x0): first defined here C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: In function `pre_c_init': E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:146: undefined reference to `__p__fmode' C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: In function `__tmainCRTStartup': E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:290: undefined reference to `_set_invalid_parameter_handler' E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:299: undefined reference to `__p__acmdln' C:\msys64\tmp\ccyfHvwr.o:common.cpp:(.text+0x53c): undefined reference to `__imp___acrt_iob_func' C:\msys64\tmp\ccrwc8MT.o:Args.cpp:(.text+0x29): undefined reference to `__imp___acrt_iob_func' C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-merr.o): In function `_matherr': E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/merr.c:46: undefined reference to `__acrt_iob_func' C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-pseudo-reloc.o): In function `__report_error': E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/pseudo-reloc.c:149: undefined reference to `__acrt_iob_func' E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/pseudo-reloc.c:150: undefined reference to `__acrt_iob_func' C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingwex.a(lib64_libmingwex_a-mingw_vfprintf.o): In function `__mingw_vfprintf': E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/stdio/mingw_vfprintf.c:53: undefined reference to `_lock_file' E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/stdio/mingw_vfprintf.c:55: undefined reference to `_unlock_file' collect2.exe: error: ld returned 1 exit status make: *** [Makefile:14: gpuowl] Error 1 [/code] However I can compile it with no problems "manually". |
[QUOTE=kracker;514682]Tried to use the makefile for the first time under MSYS2/Windows... getting this: [..]
However I can compile it with no problems "manually".[/QUOTE] What is the difference when you compile manually? Do you add or remove some flags? |
[QUOTE=preda;514711]What is the difference when you compile manually? Do you add or remove some flags?[/QUOTE]
This works with no errors/warnings (after generating version.inc) [code] cd gpuowl g++ -Wall -std=c++17 -c Pm1Plan.cpp g++ -Wall -std=c++17 -c GmpUtil.cpp g++ -Wall -std=c++17 -c Worktodo.cpp g++ -Wall -std=c++17 -c common.cpp g++ -Wall -std=c++17 -c main.cpp g++ -Wall -std=c++17 -c Gpu.cpp g++ -Wall -std=c++17 -c clwrap.cpp g++ -Wall -std=c++17 -c Task.cpp g++ -Wall -std=c++17 -c checkpoint.cpp g++ -Wall -std=c++17 -c timeutil.cpp g++ -Wall -std=c++17 -c Args.cpp g++ -Wall -std=c++17 -c state.cpp g++ -Wall -std=c++17 -c Signal.cpp g++ -Wall -std=c++17 -c FFTConfig.cpp g++ -o gpuowl.exe -static -std=c++17 Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o -lgmp -lstdc++fs /c/Windows/System32/OpenCL.dll [/code] |
[QUOTE=preda;490474]I added an initial CUDA backend to gpuOwl. I expect this to be rough, buggy and not-optimized yet, but it's a start.
... - cudaOwl has a rich choice of FFT sizes (unlike openOwl). FFT selection is controlled with the "-fft" argument, allowing to specify hard sizes such as 4096K or 4M, or delta steps from the "default" size for the exponent, such as +1 or -1. A few nice things: - it's possible to switch the savefile between CUDA/OpenCL in midflight. - it's possible to change the FFT size in midflight. Not so nice: the performance on GTX 1080 is disappointing. 5.9ms/it at the PRP wavefront, 4480K FFT. (thus I don't think it's such a good idea to do PRP or LL on Nvidia yet. Probably TF is a better fit for the 32bit-oriented hardware).[/QUOTE] Not sure why 5.9, but "not-optimized yet" probably covers it. CUDALucas 2.06 does LL at 4.37ms/it on my GTX1080 at a slightly higher fft length, but probably didn't reach that performance level all at once (and lacks even the Jacobi check). CUDAowl reaching 74% of that from the start with GEC is not a bad effort at all. [CODE]| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Dec 13 04:22:13 | M82599421 2000000 0x191f30c8ee1b9fe0 | 4608K 0.10938 4.3673 436.73s | 4:01:26:41 2.42% | [/CODE] |
[QUOTE=SELROC;514542]In effect I have more *occasional* errors with Radeon VII, the RX580 never showed an error.
The last error occurred on Radeon VII was just EE with a normal residue, hard to decipher what is going on as I can't be all the time watching. But gpuowl has recovered happily, so I only lost the last 400K iterations. Those errors are so occasional that doubling the block size or setting to 1000 has little impact overall. So it is good if you do it.[/QUOTE] Experimenting with -block sizes for a 332M exponent: 1. the GEC time with block 400K is 2.11~ sec. 2. the GEC time with block 1000K is 4.25~ sec. The GEC time varies with block size. |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.