![]() |
[QUOTE=kriesel;547044]See the error about ambiguous overload
[CODE]ProofSet.h:196:77: error: ambiguous overload for 'operator*' (operand types are '__gnu_cxx::__alloc_traits<std::allocator<__gmp_expr<__mpz_struct [1], __mpz_struct [1]> >, __gmp_expr<__mpz_struct [1], __mpz_struct [1]> >::value_type' {aka '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>'} and 'std::array<long long unsigned int, 4>::value_type' {aka 'long long unsigned int'}) 196 | for (int i = 0; i < (1 << (p - 1)); ++i) { hashes.push_back(hashes[i] * hash[0]); } [/CODE][/QUOTE] Attempted a fix. It's a bit in the dark, you should report at least the compiler version g++ --version |
g++ version info etc from msys2 attempts
1 Attachment(s)
[QUOTE=preda;547050]Attempted a fix. It's a bit in the dark, you should report at least the compiler version
g++ --version[/QUOTE]Yeah it's some trick to try to write for a different environment you don't have. Feel free to build that into the make file. If I alter the makefile it will come back flagged dirty? I'm a little surprised to see version was not included in the build log, which is a capture of the console output when the makefile runs. Will try another build. [CODE]$ g++ --version g++.exe (Rev2, Built by MSYS2 project) 9.2.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [/CODE][CODE]$ g++ -v Using built-in specs. COLLECT_GCC=C:\msys64\mingw64\bin\g++.exe COLLECT_LTO_WRAPPER=C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/lto-wrapper.exe Target: x86_64-w64-mingw32 Configured with: ../gcc-9.2.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,fortran,ada,objc,obj-c++ --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --enable-plugin --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev2, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld Thread model: posix gcc version 9.2.0 (Rev2, Built by MSYS2 project)[/CODE] edit: v6.11-311-gfa76bd9 same problem. |
ROCm 3.5
I tried ROCm 3.5, I see more than 5% performance hit vs. ROCm 3.3
I opened this issue: [url]https://github.com/RadeonOpenCompute/ROCm/issues/1124[/url] Feel free to +1 the issue if you think it prevents you from using ROCm 3.5. Also, if you do try ROCm 3.5, please add details on that issue. The timing can be easily obtained by running with "-time" command line argument to see per-kernel timing info. I personally moved back to 3.3 already. I was under the impression, I heard from the ROCm team, that they use gpuowl in their internal regression tools. It's mistifying then to see this regression. |
I managed to get rocm-3.5.0 running after a short battle with Debian Buster, which involved an upgrade, getting some held back packages installed, an "autoclean", another upgrade, a dist-upgrade, a dpkg -i --force-all, a reboot, linking the libopencl shared object, recompiling gpuowl. :boxer:
2 instances @5.5M FFT and sclk at 3, I think it is slower at 1517 µs/it, but overclocking the RAM to 1200 it is 1423 µs/it. I just checked. A SLOW DOWN from 1440 µs/it to 1517 ųs/it :rant: Edit: Change the -L option to 3.5.0 in Makefile and timings went from 1423 µs/it to 1418 µs/it. That helped! :whistle: Another edit: With sclk at 4 I am now getting 1316 µs/it. |
1 Attachment(s)
[QUOTE=preda;533571]ROCm exposes a per-GPU unique_id, e.g.:
[CODE] cat /sys/class/drm/card0/device/unique_id 3044212172dc768c [/CODE]This id is a property of the GPU itself, and does not depend on the system or PCIe slot. So changing a GPU in a different slot, or in a different system, preserves the UID. I added a way to specify the GPU to run on by using this unique id: ./gpuowl -uid 3044212172dc768c this can be used instead of -device (-d) which specifies the device by position in the list of devices. The advantage is that the identity of the GPU is preserved when swapping the PCIe slots. Combining -uid with -cpu allows to associate a stable symbolic name to an actual GPU. I also added a few small python scripts (ROCm) under the tools/ directory in the source code: - monitor.py : prints general information about all the ROCm GPUs found - device.py : given a UID, prints the device serial id The last script, device.py, can be used in user power-play scripts that set parameters of GPUs (e.g. memory frequency, undervolting, fan etc), to identify GPUs by UID instead of serial-id to achieve a correct GPU identification.[/QUOTE] Is that so regardless of gpu model? I note Radeon VII gpus have a serial number built in. (Cpuid hwinfo produced this on Windows. RX480 and RX550 do not have such serial numbers.) |
First ram overclock experiment
Upgraded to Windows 10. This allows GPU-Z to function correctly and display clock rates, which is badly broken in Windows 7 with remote desktop.
Then on to install AMD Radeon software to monitor and tweak things. First try was automatic ram overclock, which went to 1200Mhz, clearly too high based on the following. There is an untapped opportunity for gpuowl to detect known-bad residues at each console output step. [CODE]2020-06-03 13:37:23 gpuowl v6.11-292-gecab9ae 2020-06-03 13:37:23 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 2020-06-03 13:37:23 device 2, unique id '' 2020-06-03 13:37:23 asr2/radeonvii2 160708577 FFT: 9M 1K:9:512 (17.03 bpw) 2020-06-03 13:37:23 asr2/radeonvii2 Expected maximum carry32: 2F4C0000 2020-06-03 13:37:25 asr2/radeonvii2 OpenCL args "-DEXP=160708577u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0xf.adab9b1c15ad8p-3 -DIWEIGHT_STEP=0x8.2a025c1f5ebcp-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-06-03 13:37:35 asr2/radeonvii2 OpenCL compilation in 10.37 s 2020-06-03 13:37:36 asr2/radeonvii2 160708577 LL 138500000 loaded: 77e2fa0b44bfb8f1 2020-06-03 13:39:57 asr2/radeonvii2 160708577 LL 138600000 86.24%; 1406 us/it; ETA 0d 08:38; dcde0783ad061fc5 2020-06-03 13:42:10 asr2/radeonvii2 160708577 LL 138700000 86.31%; 1336 us/it; ETA 0d 08:10; 0000000000000002 2020-06-03 13:44:22 asr2/radeonvii2 160708577 LL 138800000 86.37%; 1315 us/it; ETA 0d 08:00; 0000000000000002 2020-06-03 13:46:33 asr2/radeonvii2 160708577 LL 138900000 86.43%; 1315 us/it; ETA 0d 07:58; 0000000000000002 2020-06-03 13:48:50 asr2/radeonvii2 160708577 LL 139000000 86.49%; 1369 us/it; ETA 0d 08:15; 0000000000000002 2020-06-03 13:49:29 asr2/radeonvii2 Stopping, please wait.. 2020-06-03 13:49:30 asr2/radeonvii2 160708577 LL 139028000 86.51%; 1410 us/it; ETA 0d 08:29; 0000000000000002 2020-06-03 13:49:30 asr2/radeonvii2 waiting for the Jacobi check to finish.. 2020-06-03 13:49:30 asr2/radeonvii2 160708577 EE 139000000 (jacobi == 0) 2020-06-03 13:49:30 asr2/radeonvii2 Exiting because "stop requested" 2020-06-03 13:49:30 asr2/radeonvii2 Bye[/CODE]Manual ram clock tuning while watching Jacobi checks and res64 seems to indicate that up to 1150Mhz is ok. The mappings of devices are strange and inconsistent. In gpuowl, it's d0, d1, d2. In AMD radeon software, it's gpu1, gpu2, gpu3, and gpu1 got me d2's memory clock changing. The one LL instance was what I was trying to avoid. Per some online forums, the device numbering is not linear across a motherboard; x16 pcie slots go before x1 slots. GPU-Z list order does not match gpuowl list order either; AMD radeon software's gpu1 (first) is gpuowl's d2 (last) but gpu-z's second in the list of 3. And this is with all 3 running on adjacent x1 pcie slots! Windows device manager may be yet different. And oddly, while the HD4600 is now -d 3 in the gpuowl list, under Windows 7 it was device 0. |
Questions about shared folder, and FFT length
Almighty Mathlords, I beseech thee:
I am running four RX 5700 XT GPUs, with a 2990WX, all watercooled. gpuOwl version 6.11 is running well. However, I have two questions: 1. Should one choose to use a shared worktodo/results directory ( -pool ), is there a particular format for the results.txt file? I get an error message that it can't read the file. 2. How exactly does one determine the ideal FFT length? For the purpose of this discussion, one may assume (but not conclude) that I am a structural engineer and an attorney, not a programmer, and haven't programmed anything since 1992. Thanks |
[QUOTE=Xebecer;547102]2. How exactly does one determine the ideal FFT length?[/QUOTE]
See [url=http://mersenneforum.org/mayer/F24.pdf]here[/url] for the random-walk-based heuristic I use in my Mlucas code ... that gives results which more or less - typically with ~1%, exponent-wise - match the internal tables used by other known-to-be-high-accuracy codes such as Prime95 and gpuOwl. Here is a small C function implementing same heuristic: [code]/* For a given FFT length, estimate maximum exponent that can be tested. This implements formula (8) in the F24 paper (Math Comp. 72 (243), pp.1555-1572, December 2002) in order to estimate the maximum average wordsize for a given FFT length. For roughly IEEE64-compliant arithmetic, an asymptotic constant of 0.6 (log2(C) in the the paper, which recommends something around unity) seems to fit the observed data best. */ uint64 given_N_get_maxP(uint32 N) { const double Bmant = 53; const double AsympConst = 0.6; const double ln2inv = 1.0/log(2.0); double ln_N, lnln_N, l2_N, lnl2_N, l2l2_N, lnlnln_N, l2lnln_N; double Wbits, maxExp2; ln_N = log(1.0*N); lnln_N = log(ln_N); l2_N = ln2inv*ln_N; lnl2_N = log(l2_N); l2l2_N = ln2inv*lnl2_N; lnlnln_N = log(lnln_N); l2lnln_N = ln2inv*lnlnln_N; Wbits = 0.5*( Bmant - AsympConst - 0.5*(l2_N + l2l2_N) - 1.5*(l2lnln_N) ); maxExp2 = Wbits*N; /* fprintf(stderr,"N = %8u K maxP = %10u\n", N>>10, (uint32)maxExp2); */ return (uint64)maxExp2; }[/code] Alternatively, here a simple PARI-GP 'script' to return maxExp for any desired N [enter FFT length in #doubles]: [code]maxp(N, AsympConst) = { \ Bmant = 53.; ln2inv = 1.0/log(2.0); \ ln_N = log(1.0*N); lnln_N = log(ln_N); l2_N = ln2inv*ln_N; lnl2_N = log(l2_N); l2l2_N = ln2inv*lnl2_N; lnlnln_N = log(lnln_N); l2lnln_N = ln2inv*lnlnln_N; \ Wbits = 0.5*( Bmant - AsympConst - 0.5*(l2_N + l2l2_N) - 1.5*(l2lnln_N) ); \ return(Wbits*N); \ }[/code] Ex: maxp(2240<<10, 0.4) = 43235170 And here a *nix bc function (invoke bc in floating-point mode, as 'bc -l'): [code]define maxp(bmant, n, asympconst) { auto ln2inv, ln_n, lnln_n, l2_n, lnl2_n, l2l2_n, lnlnln_n, l2lnln_n, wbits; ln2inv = 1.0/l(2.0); ln_n = l(1.0*n); lnln_n = l(ln_n); l2_n = ln2inv*ln_n; lnl2_n = l(l2_n); l2l2_n = ln2inv*lnl2_n; lnlnln_n = l(lnln_n); l2lnln_n = ln2inv*lnlnln_n; wbits = 0.5*( bmant - asympconst - 0.5*(l2_n + l2l2_n) - 1.5*(l2lnln_n) ); return(wbits*n); }[/code] Ex: maxp(2240*2^10, 0.4) = 43235170.58240592323366420480 Have fun! |
[QUOTE=Xebecer;547102]Almighty Mathlords, I beseech thee:
I am running four RX 5700 XT GPUs, with a 2990WX, all watercooled. gpuOwl version 6.11 is running well. However, I have two questions: 1. Should one choose to use a shared worktodo/results directory ( -pool ), is there a particular format for the results.txt file? I get an error message that it can't read the file. [/QUOTE] If you run multiple instances of gpuowl, -pool can be useful. I do use -pool myself for my runs. You need to create a directory that you pass to -pool (the directory must exist). You have to put work into pool/worktodo.txt (primenet.py can do that). Next start gpuowl with -pool <dir> The error message may indicate that the pool folder does not exit. To be sure, you can post the error message next time. [QUOTE] 2. How exactly does one determine the ideal FFT length? Thanks[/QUOTE] Should be good by default. No need to tinker with FFT length unless it's not running (getting 3 errors in chain and exit). |
[QUOTE=preda;547109]The error message may indicate that the pool folder does not exit. To be sure, you can post the error message next time.
Created shared folder, put in a results.txt and worktodo.txt, used [/quote] -pool C:\Users\xebec\Desktop\GPUOwl Shared in the config.txt file. I get: Can't open 'C' (mode 'ab') Exception NSt10filesystem7__cxx1116filesystem_errorE" filesystem error" can't open file" No error [C"\Users\xebec\Desktop\GPUOwl_Shared/] Bye |
[QUOTE=Xebecer;547219]-pool C:\Users\xebec\Desktop\GPUOwl Shared in the config.txt file. I get:
Can't open 'C' (mode 'ab') Exception NSt10filesystem7__cxx1116filesystem_errorE" filesystem error" can't open file" No error [C"\Users\xebec\Desktop\GPUOwl_Shared/] Bye[/QUOTE] In the config you have "GPUOwl Shared" but the error has an underscore: "GPUOwl_Shared". Also an extraneous "/" |
| All times are UTC. The time now is 23:02. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.