![]() |
![]() |
#2256 | |
"Mihai Preda"
Apr 2015
5×172 Posts |
![]() Quote:
g++ --version |
|
![]() |
![]() |
![]() |
#2257 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·461 Posts |
![]() Quote:
Code:
$ g++ --version g++.exe (Rev2, Built by MSYS2 project) 9.2.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Code:
$ g++ -v Using built-in specs. COLLECT_GCC=C:\msys64\mingw64\bin\g++.exe COLLECT_LTO_WRAPPER=C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/lto-wrapper.exe Target: x86_64-w64-mingw32 Configured with: ../gcc-9.2.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,fortran,ada,objc,obj-c++ --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --enable-plugin --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev2, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld Thread model: posix gcc version 9.2.0 (Rev2, Built by MSYS2 project) Last fiddled with by kriesel on 2020-06-03 at 00:40 Reason: added build retry outcome |
|
![]() |
![]() |
![]() |
#2258 |
"Mihai Preda"
Apr 2015
5×172 Posts |
![]()
I tried ROCm 3.5, I see more than 5% performance hit vs. ROCm 3.3
I opened this issue: https://github.com/RadeonOpenCompute/ROCm/issues/1124 Feel free to +1 the issue if you think it prevents you from using ROCm 3.5. Also, if you do try ROCm 3.5, please add details on that issue. The timing can be easily obtained by running with "-time" command line argument to see per-kernel timing info. I personally moved back to 3.3 already. I was under the impression, I heard from the ROCm team, that they use gpuowl in their internal regression tools. It's mistifying then to see this regression. Last fiddled with by preda on 2020-06-03 at 00:59 |
![]() |
![]() |
![]() |
#2259 |
Sep 2002
Database er0rr
24×281 Posts |
![]()
I managed to get rocm-3.5.0 running after a short battle with Debian Buster, which involved an upgrade, getting some held back packages installed, an "autoclean", another upgrade, a dist-upgrade, a dpkg -i --force-all, a reboot, linking the libopencl shared object, recompiling gpuowl.
![]() 2 instances @5.5M FFT and sclk at 3, I think it is slower at 1517 µs/it, but overclocking the RAM to 1200 it is 1423 µs/it. I just checked. A SLOW DOWN from 1440 µs/it to 1517 ųs/it ![]() Edit: Change the -L option to 3.5.0 in Makefile and timings went from 1423 µs/it to 1418 µs/it. That helped! ![]() Another edit: With sclk at 4 I am now getting 1316 µs/it. Last fiddled with by paulunderwood on 2020-06-03 at 11:40 |
![]() |
![]() |
![]() |
#2260 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×461 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#2261 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11100110100002 Posts |
![]()
Upgraded to Windows 10. This allows GPU-Z to function correctly and display clock rates, which is badly broken in Windows 7 with remote desktop.
Then on to install AMD Radeon software to monitor and tweak things. First try was automatic ram overclock, which went to 1200Mhz, clearly too high based on the following. There is an untapped opportunity for gpuowl to detect known-bad residues at each console output step. Code:
2020-06-03 13:37:23 gpuowl v6.11-292-gecab9ae 2020-06-03 13:37:23 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 2020-06-03 13:37:23 device 2, unique id '' 2020-06-03 13:37:23 asr2/radeonvii2 160708577 FFT: 9M 1K:9:512 (17.03 bpw) 2020-06-03 13:37:23 asr2/radeonvii2 Expected maximum carry32: 2F4C0000 2020-06-03 13:37:25 asr2/radeonvii2 OpenCL args "-DEXP=160708577u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0xf.adab9b1c15ad8p-3 -DIWEIGHT_STEP=0x8.2a025c1f5ebcp-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-06-03 13:37:35 asr2/radeonvii2 OpenCL compilation in 10.37 s 2020-06-03 13:37:36 asr2/radeonvii2 160708577 LL 138500000 loaded: 77e2fa0b44bfb8f1 2020-06-03 13:39:57 asr2/radeonvii2 160708577 LL 138600000 86.24%; 1406 us/it; ETA 0d 08:38; dcde0783ad061fc5 2020-06-03 13:42:10 asr2/radeonvii2 160708577 LL 138700000 86.31%; 1336 us/it; ETA 0d 08:10; 0000000000000002 2020-06-03 13:44:22 asr2/radeonvii2 160708577 LL 138800000 86.37%; 1315 us/it; ETA 0d 08:00; 0000000000000002 2020-06-03 13:46:33 asr2/radeonvii2 160708577 LL 138900000 86.43%; 1315 us/it; ETA 0d 07:58; 0000000000000002 2020-06-03 13:48:50 asr2/radeonvii2 160708577 LL 139000000 86.49%; 1369 us/it; ETA 0d 08:15; 0000000000000002 2020-06-03 13:49:29 asr2/radeonvii2 Stopping, please wait.. 2020-06-03 13:49:30 asr2/radeonvii2 160708577 LL 139028000 86.51%; 1410 us/it; ETA 0d 08:29; 0000000000000002 2020-06-03 13:49:30 asr2/radeonvii2 waiting for the Jacobi check to finish.. 2020-06-03 13:49:30 asr2/radeonvii2 160708577 EE 139000000 (jacobi == 0) 2020-06-03 13:49:30 asr2/radeonvii2 Exiting because "stop requested" 2020-06-03 13:49:30 asr2/radeonvii2 Bye The mappings of devices are strange and inconsistent. In gpuowl, it's d0, d1, d2. In AMD radeon software, it's gpu1, gpu2, gpu3, and gpu1 got me d2's memory clock changing. The one LL instance was what I was trying to avoid. Per some online forums, the device numbering is not linear across a motherboard; x16 pcie slots go before x1 slots. GPU-Z list order does not match gpuowl list order either; AMD radeon software's gpu1 (first) is gpuowl's d2 (last) but gpu-z's second in the list of 3. And this is with all 3 running on adjacent x1 pcie slots! Windows device manager may be yet different. And oddly, while the HD4600 is now -d 3 in the gpuowl list, under Windows 7 it was device 0. |
![]() |
![]() |
![]() |
#2262 |
Jun 2019
Ipswich, MA
C16 Posts |
![]()
Almighty Mathlords, I beseech thee:
I am running four RX 5700 XT GPUs, with a 2990WX, all watercooled. gpuOwl version 6.11 is running well. However, I have two questions: 1. Should one choose to use a shared worktodo/results directory ( -pool ), is there a particular format for the results.txt file? I get an error message that it can't read the file. 2. How exactly does one determine the ideal FFT length? For the purpose of this discussion, one may assume (but not conclude) that I am a structural engineer and an attorney, not a programmer, and haven't programmed anything since 1992. Thanks |
![]() |
![]() |
![]() |
#2263 |
∂2ω=0
Sep 2002
República de California
5×2,351 Posts |
![]()
See here for the random-walk-based heuristic I use in my Mlucas code ... that gives results which more or less - typically with ~1%, exponent-wise - match the internal tables used by other known-to-be-high-accuracy codes such as Prime95 and gpuOwl. Here is a small C function implementing same heuristic:
Code:
/* For a given FFT length, estimate maximum exponent that can be tested. This implements formula (8) in the F24 paper (Math Comp. 72 (243), pp.1555-1572, December 2002) in order to estimate the maximum average wordsize for a given FFT length. For roughly IEEE64-compliant arithmetic, an asymptotic constant of 0.6 (log2(C) in the the paper, which recommends something around unity) seems to fit the observed data best. */ uint64 given_N_get_maxP(uint32 N) { const double Bmant = 53; const double AsympConst = 0.6; const double ln2inv = 1.0/log(2.0); double ln_N, lnln_N, l2_N, lnl2_N, l2l2_N, lnlnln_N, l2lnln_N; double Wbits, maxExp2; ln_N = log(1.0*N); lnln_N = log(ln_N); l2_N = ln2inv*ln_N; lnl2_N = log(l2_N); l2l2_N = ln2inv*lnl2_N; lnlnln_N = log(lnln_N); l2lnln_N = ln2inv*lnlnln_N; Wbits = 0.5*( Bmant - AsympConst - 0.5*(l2_N + l2l2_N) - 1.5*(l2lnln_N) ); maxExp2 = Wbits*N; /* fprintf(stderr,"N = %8u K maxP = %10u\n", N>>10, (uint32)maxExp2); */ return (uint64)maxExp2; } Code:
maxp(N, AsympConst) = { \ Bmant = 53.; ln2inv = 1.0/log(2.0); \ ln_N = log(1.0*N); lnln_N = log(ln_N); l2_N = ln2inv*ln_N; lnl2_N = log(l2_N); l2l2_N = ln2inv*lnl2_N; lnlnln_N = log(lnln_N); l2lnln_N = ln2inv*lnlnln_N; \ Wbits = 0.5*( Bmant - AsympConst - 0.5*(l2_N + l2l2_N) - 1.5*(l2lnln_N) ); \ return(Wbits*N); \ } And here a *nix bc function (invoke bc in floating-point mode, as 'bc -l'): Code:
define maxp(bmant, n, asympconst) { auto ln2inv, ln_n, lnln_n, l2_n, lnl2_n, l2l2_n, lnlnln_n, l2lnln_n, wbits; ln2inv = 1.0/l(2.0); ln_n = l(1.0*n); lnln_n = l(ln_n); l2_n = ln2inv*ln_n; lnl2_n = l(l2_n); l2l2_n = ln2inv*lnl2_n; lnlnln_n = l(lnln_n); l2lnln_n = ln2inv*lnlnln_n; wbits = 0.5*( bmant - asympconst - 0.5*(l2_n + l2l2_n) - 1.5*(l2lnln_n) ); return(wbits*n); } Have fun! Last fiddled with by ewmayer on 2020-06-04 at 01:10 |
![]() |
![]() |
![]() |
#2264 | ||
"Mihai Preda"
Apr 2015
5×172 Posts |
![]() Quote:
The error message may indicate that the pool folder does not exit. To be sure, you can post the error message next time. Quote:
|
||
![]() |
![]() |
![]() |
#2265 | |
Jun 2019
Ipswich, MA
11002 Posts |
![]() Quote:
-pool C:\Users\xebec\Desktop\GPUOwl Shared in the config.txt file. I get: Can't open 'C' (mode 'ab') Exception NSt10filesystem7__cxx1116filesystem_errorE" filesystem error" can't open file" No error [C"\Users\xebec\Desktop\GPUOwl_Shared/] Bye Last fiddled with by paulunderwood on 2020-06-05 at 13:14 Reason: fixed quote |
|
![]() |
![]() |
![]() |
#2266 |
Sep 2002
Database er0rr
10001100100002 Posts |
![]()
In the config you have "GPUOwl Shared" but the error has an underscore: "GPUOwl_Shared". Also an extraneous "/"
Last fiddled with by paulunderwood on 2020-06-05 at 13:22 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1719 | 2023-01-16 15:51 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |