![]() |
[QUOTE=preda;546262]E.g. the FFT bounds could be set such that the probability of "at least one" roundoff-overflow over the 100M iterations of a test to be under 0.001, which would mean that I expect one roundoff problem in 1000 tests, and that's it.[/QUOTE]
That is what he was calling "*drastically* lower". This way, lots of tests that won't cause error in the "aggressive" version, would run a lot slower with the larger FFT. Then you would have to find a compromise between the aggressiveness (total speed or output, including re-doing few failed cases two or more times) and "safety" (your version, where there are no errors, no "re-do"s, but many of the tests are running slower). In fact, [U][B]that[/B][/U] will impose the limit, and not the 0.001% or whatever. And guess what, to find that compromise, all the discussion has to start again from scratch. This is what we are doing for about 20 years now... moving the limits a bit up, then a bit down, when tweaking the software and new ideas pop up, or when somebody finds boundary-related errors, or when somebody sees that some tests are too slow and they would run faster (and still correct) with a bit lower FFT. |
[QUOTE=kriesel;546224]Current versions gpuowl report number of GEC error detections; some earlier versions did not.[/QUOTE]
I just checked, even though the gpuowl version I use on colab has "[B]errors":{"gerbicz":0}[/B]" in the json result line, primenet still says only "PRP Unverified" without the (Reliable), so I guess it is just a manual results turnin issue. [M]91385527[/M] |
[QUOTE=ATH;546269]I just checked, even though the gpuowl version I use on colab has "[B]errors":{"gerbicz":0}[/B]" in the json result line, primenet still says only "PRP Unverified" without the (Reliable), so I guess it is just a manual results turnin issue.
[M]91385527[/M][/QUOTE] Good catch - I'll follow up with James. |
CudaLucas and gpuowl performnce compared on same gpus
CUDALucas v2.06 May 5 2017 version
Device GeForce GTX 1060 3GB Compatibility 6.1 clockRate (MHz) 1771 memClockRate (MHz) 4004 fft max exp ms/iter 4608 85111207 8.5188 4800 88579669 9.5860 5184 95507747 9.6766 5376 98967641 10.8774 Gpuowl v6.11-288 9074 us/iter at 5M 92M exponent; gpuowl 6.64% faster Device GeForce GTX 1080 Ti Compatibility 6.1 clockRate (MHz) 1620 memClockRate (MHz) 5505 fft max exp ms/iter 4608 85111207 3.2221 5184 95507747 3.5816 5292 97454309 4.0694 gpupowl v6.11-292 3523 us/iter at 5M 95.5M exponent; gpuowl 1.66% faster |
Asus Vega 64 ROG STRIX 8GB
2020-05-31 15:06:36 gpuowl v6.11-292-gecab9ae
2020-05-31 15:06:36 Note: not found 'config.txt' 2020-05-31 15:06:36 device 0, unique id '' 2020-05-31 15:06:36 gfx900-0 109906999 FFT: 6M 1K:12:256 (17.47 bpw) 2020-05-31 15:06:36 gfx900-0 Expected maximum carry32: 332C0000 2020-05-31 15:06:37 gfx900-0 OpenCL args "-DEXP=109906999u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DWEIGHT_STEP=0xb.8eb5f7b291c3p-3 -DIWEIGHT_STEP=0xb.13395a481b1p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-05-31 15:06:37 gfx900-0 ASM compilation failed, retrying compilation using NO_ASM 2020-05-31 15:06:43 gfx900-0 OpenCL compilation in 6.15 s 2020-05-31 15:06:44 gfx900-0 109906999 OK 59400800 loaded: blockSize 400, 7eadb31603ceef39 2020-05-31 15:06:46 gfx900-0 109906999 OK 59401600 54.05%; 1861 us/it; ETA 1d 02:07; 6ec5bdc61a1975cf (check 0.91s) 2020-05-31 15:13:01 gfx900-0 109906999 OK 59600000 54.23%; 1883 us/it; ETA 1d 02:19; e590d978d773482d (check 0.91s) 2020-05-31 15:19:20 gfx900-0 109906999 OK 59800000 54.41%; 1888 us/it; ETA 1d 02:17; 486333e0c90108ac (check 0.91s) 2020-05-31 15:25:38 gfx900-0 109906999 OK 60000000 54.59%; 1889 us/it; ETA 1d 02:11; 19bf8fb66b07c7e6 (check 0.91s) 2020-05-31 15:31:57 gfx900-0 109906999 OK 60200000 54.77%; 1889 us/it; ETA 1d 02:05; 26eec4f053b2dbf3 (check 0.91s) | |
2020-05-31 17:18:05 gpuowl v6.11-288-g20c4213
2020-05-31 17:18:05 Note: not found 'config.txt' 2020-05-31 17:18:05 device 0, unique id '' 2020-05-31 17:18:05 gfx900-0 94607437 FFT: 5M 1K:10:256 (18.04 bpw) 2020-05-31 17:18:05 gfx900-0 Expected maximum carry32: 44E30000 2020-05-31 17:18:06 gfx900-0 OpenCL args "-DEXP=94607437u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.8262bb7326f28p-3 -DIWEIGHT_STEP=0x8.40cb53a4a1fd8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-05-31 17:18:06 gfx900-0 ASM compilation failed, retrying compilation using NO_ASM 2020-05-31 17:18:12 gfx900-0 OpenCL compilation in 6.11 s 2020-05-31 17:18:13 gfx900-0 94607437 OK 0 loaded: blockSize 400, 0000000000000003 2020-05-31 17:18:15 gfx900-0 94607437 OK 800 0.00%; 1500 us/it; ETA 1d 15:26; 74fe0996bd85d39f (check 0.73s) 2020-05-31 17:23:19 gfx900-0 94607437 OK 200000 0.21%; 1524 us/it; ETA 1d 15:57; ba383754cbdc7083 (check 0.74s) 2020-05-31 17:23:34 gfx900-0 Stopping, please wait.. 2020-05-31 17:23:36 gfx900-0 94607437 OK 210400 0.22%; 1527 us/it; ETA 1d 16:02; 58b86283258646ee (check 0.75s) 2020-05-31 17:23:36 gfx900-0 Exiting because "stop requested" 2020-05-31 17:23:36 gfx900-0 Bye |
Could someone please help me set up gpuOwl. I have been trying for hours to compile it. I am using ubuntu 18.0.4.4 and I could only find Rocm 3.3 and not the suggest version 1.7. I have r270 gpu.
|
[QUOTE=Cheetahgod;546921]Could someone please help me set up gpuOwl. I have been trying for hours to compile it. I am using ubuntu 18.0.4.4 and I could only find Rocm 3.3 and not the suggest version 1.7. I have r270 gpu.[/QUOTE]
ROCm 3.3 is what you want, what everybody is using. What compilation problem do you hit? |
PRP-Proof first steps
You may have seen the thread about VDF and PRP-proof: [url]https://mersenneforum.org/showthread.php?p=546880[/url]
I'm ATM working on a proof of concept (or "reference implementation") of PRP-proof in gpuowl. It's still early, in flux, bugs may be present more than usual, experimental still subject to change etc: I do not recomment to invest huge work in generating proofs for large exponents just to find this work lost because.. e.g. the proof file format changed. Anyway, this is what's new: - a new command line argument, -proof <power>, indicates to generate a proof of the given power after the completion of a PRP test. (power is a small integer around 9). I'll go into a bit more detail about what exactly happens with -proof. The whole concept of PRP-Proof only applies to a PRP test. In order to be able to generate the proof at the end of the PRP test, a full set of residues must be saved at regular intervals as the PRP test progresses. These are saved in the folder <exponent>/proof/. The number of these residues is 2^N (N being the proof power). They take up a significant amount of disk space. Let's say you have a PRP test already started, half-way (without -proof). If you want to generate a proof for this test, it's too late because the needed residues for the first half of the test are not there. You have two options: give up on generating a proof for this exponent (recommended), or re-starting the exponent from the beginning with -proof to store the residues. If you simply pass the command line argument -proof to such an ongoing PRP test, you'll see a warning about the missing residues and no proof will be generated. But the next PRP test, which will see the -proof from the beginning, will be able to generate the proof. If you save the required residues for e.g. power 9, you can generate any proof power up to 9 from them. E.g. by running the PRP initially with -proof 9. After it completes, run the same exponent with -proof 8 -- the PRP will complete imediatelly and a new proof (of power 8) will be generated. Once you have a proof file, which is saved in a file with extension ".proof" in the folder <exponent>, you can verify it by using the command line argument -verify , passing either the name of the proof file, or the name of the exponent folder. That's all for now. If anybody wants to experiment with these things, I'd recomment to start with small exponents first that don't take a lot of time, to understand what's happening without feeling sorry for lost work. Happy to answer questions about PRP-Proof. |
2[SUP]9[/SUP] full size residues is a lot of disk space to commit to one exponent.
How low a power is adequate? If -proof n is included in config.txt for prp work, does gpuowl properly ignore it and function correctly when the work is LLDC or P-1? |
[QUOTE=Cheetahgod;546921]Could someone please help me set up gpuOwl. I have been trying for hours to compile it. I am using ubuntu 18.0.4.4 and I could only find Rocm 3.3 and not the suggest version 1.7. I have r270 gpu.[/QUOTE]
Try this as a starting point: [url]https://mersenneforum.org/showpost.php?p=511655&postcount=76[/url] also do a "sudo apt install libncurses5" |
[QUOTE=Prime95;546943]Try this as a starting point: [url]https://mersenneforum.org/showpost.php?p=511655&postcount=76[/url]
also do a "sudo apt install libncurses5"[/QUOTE] I find a full list of the minimal-package-set-needed to be helpful - here my own working setup recipe, have done it under Ubuntu 19.10, hopefully works the same under v18*. I need gcc/gdb for my Mlucas builds, not sure if either needed for gpuOwl running. The [...] in the apt install line is not literal, just means "install all of these, one at a time": o Install Ubuntu 19.04 Disco which comes with kernel 5.0.x o sudo apt update o sudo apt install [gcc|gdb|libgmp-dev|git|ssh|openssh-server|clinfo|libncurses5] o Edit /etc/default/grub to add amdgpu.ppfeaturemask=0xffffffff to GRUB_CMDLINE_LINUX_DEFAULT o sudo update-grub o sudo apt install libnuma-dev o wget -qO - [url]http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key[/url] | sudo apt-key add - o echo 'deb [arch=amd64] [url]http://repo.radeon.com/rocm/apt/debian/[/url] xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list o sudo apt update && sudo apt install rocm-dev o echo 'SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"' | sudo tee /etc/udev/rules.d/70-kfd.rules o reboot o git clone [url]https://github.com/preda/gpuowl[/url] && cd gpuowl && make ['clone' only on initial DL - subsequent updates can use 'git pull' from within the existing gpuowl-dir: cd ~/gpuowl && git pull [url]https://github.com/preda/gpuowl[/url] && make] |
just tried ubunto 19.04. Forced me to upgrade to 19.10 which upon reboot gives me a kernal panic. going back to ubuntu 18.04
|
./gpuowl
2020-06-01 14:30:39 gpuowl v6.11-309-gca2d00b 2020-06-01 14:30:39 Note: not found 'config.txt' 2020-06-01 14:30:39 device 0, unique id '' what do I do from here? using ubuntu 19.10 |
[QUOTE=Cheetahgod;546960]./gpuowl
2020-06-01 14:30:39 gpuowl v6.11-309-gca2d00b 2020-06-01 14:30:39 Note: not found 'config.txt' 2020-06-01 14:30:39 device 0, unique id '' what do I do from here? using ubuntu 19.10[/QUOTE] I get that stuff at whenever I fire up the program, but it should be followed by a line showing the exponent of the current run, FFT length, etc. Sounds like you haven't yet run the tools/primenet.py script to create/update your worktodo.txt file. One '.' at start here if you are running from the same dir as the executable, '..' if from within a run-subdir underneath it, as I do for my 2-runs-per-card setup on my Radeon VIIs: '../tools/primenet.py -u [your primenet uid] -p [primenet pwd]', cat the contents of the resulting worktodo.txt to see if they are as expected, then if so, restart the program. (The program has a multi-task set of flags, but I find it easier to just create 2 dirs run0 and run1 under the main-dir, cd into each to init/update the worktodo and 'nohup sudo ../gpuowl [-d [optional device id]] &'). |
[QUOTE=Cheetahgod;546960]./gpuowl
2020-06-01 14:30:39 gpuowl v6.11-309-gca2d00b 2020-06-01 14:30:39 Note: not found 'config.txt' 2020-06-01 14:30:39 device 0, unique id '' what do I do from here? using ubuntu 19.10[/QUOTE] Do you have a Exponent in the worktodo.txt File? f.e. DoubleCheck=B9FA5FA90A509BAC97C206CECF58B325,54957319,74,1 Compilation should start automatically.... |
[QUOTE=kriesel;546940]2[SUP]9[/SUP] full size residues is a lot of disk space to commit to one exponent.
How low a power is adequate? [/QUOTE] For each one-step-down of the power, the temporary disk space needed to store the residues is halved, the effort to generate the proof is [more than] halved, and the effort to verify the proof is doubled. For power==9, the verification effort is 1/512 of the full PRP test. For power==7, the verification cost is 1/128 of the full PRP test. Thus I'd say power==7 is the smallest that's practical. For 100M exponents (the current wavefront) I think power 8 or 9 would be appropriate. For much higher exponents, e.g. 320M, I would consider power 10. One more thing to consider here is that in practice the proof would be generated once, but verified at least 2 times (or more), so probably it's worth to tip the balance towards putting more work in the proof generation than in the verification. [QUOTE] If -proof n is included in config.txt for prp work, does gpuowl properly ignore it and function correctly when the work is LLDC or P-1?[/QUOTE] Yes that's the idea. |
I put a exponent in worktodo.txt but it hangs when I run the command. How do I check if opencl is working?
|
[QUOTE=Cheetahgod;546988]I put a exponent in worktodo.txt but it hangs when I run the command. How do I check if opencl is working?[/QUOTE]
Please be more explicit in your errors by posting in code tags the last few lines where it hangs Did you precede your command with sudo? I.e, [C]sudo ./gpuowl[/C]. |
[QUOTE=Cheetahgod;546988]How do I check if opencl is working?[/QUOTE]
Try running clinfo |
The PRP Proof spec is now here:
[url]https://github.com/preda/gpuowl/wiki/PRProof-File-Spec[/url] and gpuowl should now be conforming to that spec. |
It seems like I have a problem with the rocm installation.
|
[QUOTE=Cheetahgod;547000]It seems like I have a problem with the rocm installation.[/QUOTE]
What does [C]uname -a[/C] report? And [C]ls -d /opt/roc*[/C]? And what happens when you run (in the gpuowl directory) [C]sudo ./gpuowl[/C]? |
gpuowl-win v6.11-310 failed to build
1 Attachment(s)
See the error about ambiguous overload
[CODE]ProofSet.h:196:77: error: ambiguous overload for 'operator*' (operand types are '__gnu_cxx::__alloc_traits<std::allocator<__gmp_expr<__mpz_struct [1], __mpz_struct [1]> >, __gmp_expr<__mpz_struct [1], __mpz_struct [1]> >::value_type' {aka '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>'} and 'std::array<long long unsigned int, 4>::value_type' {aka 'long long unsigned int'}) 196 | for (int i = 0; i < (1 << (p - 1)); ++i) { hashes.push_back(hashes[i] * hash[0]); } [/CODE] |
[QUOTE=kriesel;547044]See the error about ambiguous overload
[CODE]ProofSet.h:196:77: error: ambiguous overload for 'operator*' (operand types are '__gnu_cxx::__alloc_traits<std::allocator<__gmp_expr<__mpz_struct [1], __mpz_struct [1]> >, __gmp_expr<__mpz_struct [1], __mpz_struct [1]> >::value_type' {aka '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>'} and 'std::array<long long unsigned int, 4>::value_type' {aka 'long long unsigned int'}) 196 | for (int i = 0; i < (1 << (p - 1)); ++i) { hashes.push_back(hashes[i] * hash[0]); } [/CODE][/QUOTE] Attempted a fix. It's a bit in the dark, you should report at least the compiler version g++ --version |
g++ version info etc from msys2 attempts
1 Attachment(s)
[QUOTE=preda;547050]Attempted a fix. It's a bit in the dark, you should report at least the compiler version
g++ --version[/QUOTE]Yeah it's some trick to try to write for a different environment you don't have. Feel free to build that into the make file. If I alter the makefile it will come back flagged dirty? I'm a little surprised to see version was not included in the build log, which is a capture of the console output when the makefile runs. Will try another build. [CODE]$ g++ --version g++.exe (Rev2, Built by MSYS2 project) 9.2.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [/CODE][CODE]$ g++ -v Using built-in specs. COLLECT_GCC=C:\msys64\mingw64\bin\g++.exe COLLECT_LTO_WRAPPER=C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/lto-wrapper.exe Target: x86_64-w64-mingw32 Configured with: ../gcc-9.2.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,fortran,ada,objc,obj-c++ --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --enable-plugin --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev2, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld Thread model: posix gcc version 9.2.0 (Rev2, Built by MSYS2 project)[/CODE] edit: v6.11-311-gfa76bd9 same problem. |
ROCm 3.5
I tried ROCm 3.5, I see more than 5% performance hit vs. ROCm 3.3
I opened this issue: [url]https://github.com/RadeonOpenCompute/ROCm/issues/1124[/url] Feel free to +1 the issue if you think it prevents you from using ROCm 3.5. Also, if you do try ROCm 3.5, please add details on that issue. The timing can be easily obtained by running with "-time" command line argument to see per-kernel timing info. I personally moved back to 3.3 already. I was under the impression, I heard from the ROCm team, that they use gpuowl in their internal regression tools. It's mistifying then to see this regression. |
I managed to get rocm-3.5.0 running after a short battle with Debian Buster, which involved an upgrade, getting some held back packages installed, an "autoclean", another upgrade, a dist-upgrade, a dpkg -i --force-all, a reboot, linking the libopencl shared object, recompiling gpuowl. :boxer:
2 instances @5.5M FFT and sclk at 3, I think it is slower at 1517 µs/it, but overclocking the RAM to 1200 it is 1423 µs/it. I just checked. A SLOW DOWN from 1440 µs/it to 1517 ųs/it :rant: Edit: Change the -L option to 3.5.0 in Makefile and timings went from 1423 µs/it to 1418 µs/it. That helped! :whistle: Another edit: With sclk at 4 I am now getting 1316 µs/it. |
1 Attachment(s)
[QUOTE=preda;533571]ROCm exposes a per-GPU unique_id, e.g.:
[CODE] cat /sys/class/drm/card0/device/unique_id 3044212172dc768c [/CODE]This id is a property of the GPU itself, and does not depend on the system or PCIe slot. So changing a GPU in a different slot, or in a different system, preserves the UID. I added a way to specify the GPU to run on by using this unique id: ./gpuowl -uid 3044212172dc768c this can be used instead of -device (-d) which specifies the device by position in the list of devices. The advantage is that the identity of the GPU is preserved when swapping the PCIe slots. Combining -uid with -cpu allows to associate a stable symbolic name to an actual GPU. I also added a few small python scripts (ROCm) under the tools/ directory in the source code: - monitor.py : prints general information about all the ROCm GPUs found - device.py : given a UID, prints the device serial id The last script, device.py, can be used in user power-play scripts that set parameters of GPUs (e.g. memory frequency, undervolting, fan etc), to identify GPUs by UID instead of serial-id to achieve a correct GPU identification.[/QUOTE] Is that so regardless of gpu model? I note Radeon VII gpus have a serial number built in. (Cpuid hwinfo produced this on Windows. RX480 and RX550 do not have such serial numbers.) |
First ram overclock experiment
Upgraded to Windows 10. This allows GPU-Z to function correctly and display clock rates, which is badly broken in Windows 7 with remote desktop.
Then on to install AMD Radeon software to monitor and tweak things. First try was automatic ram overclock, which went to 1200Mhz, clearly too high based on the following. There is an untapped opportunity for gpuowl to detect known-bad residues at each console output step. [CODE]2020-06-03 13:37:23 gpuowl v6.11-292-gecab9ae 2020-06-03 13:37:23 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 2020-06-03 13:37:23 device 2, unique id '' 2020-06-03 13:37:23 asr2/radeonvii2 160708577 FFT: 9M 1K:9:512 (17.03 bpw) 2020-06-03 13:37:23 asr2/radeonvii2 Expected maximum carry32: 2F4C0000 2020-06-03 13:37:25 asr2/radeonvii2 OpenCL args "-DEXP=160708577u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0xf.adab9b1c15ad8p-3 -DIWEIGHT_STEP=0x8.2a025c1f5ebcp-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-06-03 13:37:35 asr2/radeonvii2 OpenCL compilation in 10.37 s 2020-06-03 13:37:36 asr2/radeonvii2 160708577 LL 138500000 loaded: 77e2fa0b44bfb8f1 2020-06-03 13:39:57 asr2/radeonvii2 160708577 LL 138600000 86.24%; 1406 us/it; ETA 0d 08:38; dcde0783ad061fc5 2020-06-03 13:42:10 asr2/radeonvii2 160708577 LL 138700000 86.31%; 1336 us/it; ETA 0d 08:10; 0000000000000002 2020-06-03 13:44:22 asr2/radeonvii2 160708577 LL 138800000 86.37%; 1315 us/it; ETA 0d 08:00; 0000000000000002 2020-06-03 13:46:33 asr2/radeonvii2 160708577 LL 138900000 86.43%; 1315 us/it; ETA 0d 07:58; 0000000000000002 2020-06-03 13:48:50 asr2/radeonvii2 160708577 LL 139000000 86.49%; 1369 us/it; ETA 0d 08:15; 0000000000000002 2020-06-03 13:49:29 asr2/radeonvii2 Stopping, please wait.. 2020-06-03 13:49:30 asr2/radeonvii2 160708577 LL 139028000 86.51%; 1410 us/it; ETA 0d 08:29; 0000000000000002 2020-06-03 13:49:30 asr2/radeonvii2 waiting for the Jacobi check to finish.. 2020-06-03 13:49:30 asr2/radeonvii2 160708577 EE 139000000 (jacobi == 0) 2020-06-03 13:49:30 asr2/radeonvii2 Exiting because "stop requested" 2020-06-03 13:49:30 asr2/radeonvii2 Bye[/CODE]Manual ram clock tuning while watching Jacobi checks and res64 seems to indicate that up to 1150Mhz is ok. The mappings of devices are strange and inconsistent. In gpuowl, it's d0, d1, d2. In AMD radeon software, it's gpu1, gpu2, gpu3, and gpu1 got me d2's memory clock changing. The one LL instance was what I was trying to avoid. Per some online forums, the device numbering is not linear across a motherboard; x16 pcie slots go before x1 slots. GPU-Z list order does not match gpuowl list order either; AMD radeon software's gpu1 (first) is gpuowl's d2 (last) but gpu-z's second in the list of 3. And this is with all 3 running on adjacent x1 pcie slots! Windows device manager may be yet different. And oddly, while the HD4600 is now -d 3 in the gpuowl list, under Windows 7 it was device 0. |
Questions about shared folder, and FFT length
Almighty Mathlords, I beseech thee:
I am running four RX 5700 XT GPUs, with a 2990WX, all watercooled. gpuOwl version 6.11 is running well. However, I have two questions: 1. Should one choose to use a shared worktodo/results directory ( -pool ), is there a particular format for the results.txt file? I get an error message that it can't read the file. 2. How exactly does one determine the ideal FFT length? For the purpose of this discussion, one may assume (but not conclude) that I am a structural engineer and an attorney, not a programmer, and haven't programmed anything since 1992. Thanks |
[QUOTE=Xebecer;547102]2. How exactly does one determine the ideal FFT length?[/QUOTE]
See [url=http://mersenneforum.org/mayer/F24.pdf]here[/url] for the random-walk-based heuristic I use in my Mlucas code ... that gives results which more or less - typically with ~1%, exponent-wise - match the internal tables used by other known-to-be-high-accuracy codes such as Prime95 and gpuOwl. Here is a small C function implementing same heuristic: [code]/* For a given FFT length, estimate maximum exponent that can be tested. This implements formula (8) in the F24 paper (Math Comp. 72 (243), pp.1555-1572, December 2002) in order to estimate the maximum average wordsize for a given FFT length. For roughly IEEE64-compliant arithmetic, an asymptotic constant of 0.6 (log2(C) in the the paper, which recommends something around unity) seems to fit the observed data best. */ uint64 given_N_get_maxP(uint32 N) { const double Bmant = 53; const double AsympConst = 0.6; const double ln2inv = 1.0/log(2.0); double ln_N, lnln_N, l2_N, lnl2_N, l2l2_N, lnlnln_N, l2lnln_N; double Wbits, maxExp2; ln_N = log(1.0*N); lnln_N = log(ln_N); l2_N = ln2inv*ln_N; lnl2_N = log(l2_N); l2l2_N = ln2inv*lnl2_N; lnlnln_N = log(lnln_N); l2lnln_N = ln2inv*lnlnln_N; Wbits = 0.5*( Bmant - AsympConst - 0.5*(l2_N + l2l2_N) - 1.5*(l2lnln_N) ); maxExp2 = Wbits*N; /* fprintf(stderr,"N = %8u K maxP = %10u\n", N>>10, (uint32)maxExp2); */ return (uint64)maxExp2; }[/code] Alternatively, here a simple PARI-GP 'script' to return maxExp for any desired N [enter FFT length in #doubles]: [code]maxp(N, AsympConst) = { \ Bmant = 53.; ln2inv = 1.0/log(2.0); \ ln_N = log(1.0*N); lnln_N = log(ln_N); l2_N = ln2inv*ln_N; lnl2_N = log(l2_N); l2l2_N = ln2inv*lnl2_N; lnlnln_N = log(lnln_N); l2lnln_N = ln2inv*lnlnln_N; \ Wbits = 0.5*( Bmant - AsympConst - 0.5*(l2_N + l2l2_N) - 1.5*(l2lnln_N) ); \ return(Wbits*N); \ }[/code] Ex: maxp(2240<<10, 0.4) = 43235170 And here a *nix bc function (invoke bc in floating-point mode, as 'bc -l'): [code]define maxp(bmant, n, asympconst) { auto ln2inv, ln_n, lnln_n, l2_n, lnl2_n, l2l2_n, lnlnln_n, l2lnln_n, wbits; ln2inv = 1.0/l(2.0); ln_n = l(1.0*n); lnln_n = l(ln_n); l2_n = ln2inv*ln_n; lnl2_n = l(l2_n); l2l2_n = ln2inv*lnl2_n; lnlnln_n = l(lnln_n); l2lnln_n = ln2inv*lnlnln_n; wbits = 0.5*( bmant - asympconst - 0.5*(l2_n + l2l2_n) - 1.5*(l2lnln_n) ); return(wbits*n); }[/code] Ex: maxp(2240*2^10, 0.4) = 43235170.58240592323366420480 Have fun! |
[QUOTE=Xebecer;547102]Almighty Mathlords, I beseech thee:
I am running four RX 5700 XT GPUs, with a 2990WX, all watercooled. gpuOwl version 6.11 is running well. However, I have two questions: 1. Should one choose to use a shared worktodo/results directory ( -pool ), is there a particular format for the results.txt file? I get an error message that it can't read the file. [/QUOTE] If you run multiple instances of gpuowl, -pool can be useful. I do use -pool myself for my runs. You need to create a directory that you pass to -pool (the directory must exist). You have to put work into pool/worktodo.txt (primenet.py can do that). Next start gpuowl with -pool <dir> The error message may indicate that the pool folder does not exit. To be sure, you can post the error message next time. [QUOTE] 2. How exactly does one determine the ideal FFT length? Thanks[/QUOTE] Should be good by default. No need to tinker with FFT length unless it's not running (getting 3 errors in chain and exit). |
[QUOTE=preda;547109]The error message may indicate that the pool folder does not exit. To be sure, you can post the error message next time.
Created shared folder, put in a results.txt and worktodo.txt, used [/quote] -pool C:\Users\xebec\Desktop\GPUOwl Shared in the config.txt file. I get: Can't open 'C' (mode 'ab') Exception NSt10filesystem7__cxx1116filesystem_errorE" filesystem error" can't open file" No error [C"\Users\xebec\Desktop\GPUOwl_Shared/] Bye |
[QUOTE=Xebecer;547219]-pool C:\Users\xebec\Desktop\GPUOwl Shared in the config.txt file. I get:
Can't open 'C' (mode 'ab') Exception NSt10filesystem7__cxx1116filesystem_errorE" filesystem error" can't open file" No error [C"\Users\xebec\Desktop\GPUOwl_Shared/] Bye[/QUOTE] In the config you have "GPUOwl Shared" but the error has an underscore: "GPUOwl_Shared". Also an extraneous "/" |
[QUOTE=Xebecer;547219]-pool C:\Users\xebec\Desktop\GPUOwl Shared in the config.txt file. I get:
Can't open 'C' (mode 'ab') Exception NSt10filesystem7__cxx1116filesystem_errorE" filesystem error" can't open file" No error [C"\Users\xebec\Desktop\GPUOwl_Shared/] Bye[/QUOTE] It's pretty messed-up: something replaced the ':' character in the error message with the " (quote) character. It's also missing an expected "results.txt" at the end. For comparison, this is how I'd expect that error to look: [QUOTE] 2020-06-06 08:06:16 Can't open 'C:\Foo\bar/results.txt' (mode 'ab') 2020-06-06 08:06:16 Exception NSt10filesystem7__cxx1116filesystem_errorE: filesystem error: can't open file: Success [C:\Foo\bar/results.txt] [/QUOTE] Maybe you could attach the full log of gpuowl start-up, which should contain information about the full config options that it sees. |
[QUOTE=kriesel;547079]Is that so regardless of gpu model? I note Radeon VII gpus have a serial number built in. (Cpuid hwinfo produced this on Windows. RX480 and RX550 do not have such serial numbers.)[/QUOTE]
No, I would expect that the availability of unique_id depends on the GPU model. RadeonVII has it, others may not have it. If the file /sys/class/drm/cardN/device/unique_id is there it's likely to have the id information, otherwise not. |
gpuowl-win v6.11-312-gc69350e failed to build
1 Attachment(s)
Same ambiguous overload error as -310 and -311.
I would try an earlier commit but don't know the proper magic git incantation. |
Draft for gpuowl radeon vii tuning
This may become a gpuowl reference thread post later.
#1, get the latest gpuowl version. There have been lots of performance improvements lately, and some added features (LLDC, Jacobi check; PRP proof if you can build it) It's unclear how much performance depends on PCIe width, version, extender type if any, etc. Before starting tuning, document baseline performance and configuration. Do parameter testing while running PRP with GEC, to reliably detect unreliability. Run a gpu monitoring application such as GPU-Z, nvidia-smi or rocm-smi if possible Initially changes can be made rather quickly, and a quick GEC error is quick feedback you've been too aggressive with the settings, but to ensure the gpu is reliable, final selections should be watched for hours or days of error-free operation. Only after days error free is achieved, should any LL or P-1 runs be attempted. Increase memory clock from default (some are able to run as high as 1200Mhz, +20% above nominal) On my setup, I see about 1% performance gain for 5% memory clock increase. (This may be limited because it's in a warm area.) Undervolting. Which voltage(s) to adjust, what are people getting away with relative to what original settings? [URL]https://mersenneforum.org/showpost.php?p=533050&postcount=1630[/URL] is unclear to me Presumably it's what GPU-z calls vddc, the only voltage displayed, and what Radeon software Adrenaline 2020 calls gpu voltage, the only one offered for modification. What's the benefit of undervolting: Allowing higher clocks during thermal limiting or power limiting? Saving on power cost? linux: to manage power requirements, use lower sclk. Windows equivalent: directly adjust gpu clock sclk 5 (highest) 1684 sclk 4 ~1547 Mhz per philf [URL]https://mersenneforum.org/showpost.php?p=534334&postcount=1698[/URL] sclk 3 1373 sclk 2 ? sclk 1? sclk 0? Apparently these vary a bit; preda gave 1520 for sclk 4. In [URL]https://mersenneforum.org/showpost.php?p=533050&postcount=1630[/URL] preda gives an example bash script and describes parameters. see also [URL]https://mersenneforum.org/showpost.php?p=533072&postcount=1632[/URL] Fiddle with fan curve? Default on Windows is only 75% fan at 105C hot spot temp (corresponding to ~80C nominal gpu temp) (I see references such as in [URL]https://mersenneforum.org/showpost.php?p=533143&postcount=1642[/URL] by linux users to setting fan well above 100. What are the units of setfan in linux?) AMD Radeon software on Windows also allows to set a power limit up to +20% or down to -20% relative to nominal Name and save a profile with the resulting gpus-specific tuning settings in the Windows Adrenalin software, so it can easily be reloaded after a system start. After other tuning is done, if you have enough similar work, run two instances per gpu, for a bit more throughput at the cost of about double latency. Is that still worthwhile doing with current commits? Same computation type, PRP & PRP, or LLDC & LLDC, same fft length recommended. Ideally they will be a bit out of phase, so that when one instance is writing to disk or communicating between gpu ram and system/cpu ram, the other is utilizing the gpu computing resources. If work is too dissimilar, two instances will have lower combined throughput than one. Try, measure, adjust. See [URL]https://mersenneforum.org/showpost.php?p=532134&postcount=1507[/URL] Linux is supposedly faster than Windows, perhaps due to lower driver overhead. Does anyone have numbers for that on the same hardware? In bitcoin mining multi-gpu setup howtos, they advise turning off various things in the BIOS, as part of the process of preparing the system to support a large number of gpus. Is any of that known to be relevant or irrelevant to gpuowl performance? What else? |
1 Attachment(s)
We have gpuowl working on our (single-PCI slot) W5500. (We are also using it as our main display card.)
We had to change the makefile's "LIBPATH" to a different place: [C]LIBPATH = -L/opt/amdgpu-pro/lib64 -L.[/C] We have a sample test running. We don't yet know how to decipher the information presented, but at least it works! We are very surprised that gpuowl runs as our normal user. As far as performance, how does this look?[CODE]2020-06-05 17:13:13 gpuowl v6.11-312-gc69350e-dirty 2020-06-05 17:13:13 Note: not found 'config.txt' 2020-06-05 17:13:13 device 0, unique id '' 2020-06-05 17:13:13 gfx1012-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw) 2020-06-05 17:13:13 gfx1012-0 Expected maximum carry32: 583B0000 2020-06-05 17:13:13 gfx1012-0 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DWEIGHT_STEP=0x1.5621686be7602p+0 -DIWEIGHT_STEP=0x1.7f1af377e822p-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DPM1=0 -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=1u -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-06-05 17:13:16 gfx1012-0 OpenCL compilation in 3.10 s 2020-06-05 17:13:17 gfx1012-0 77936867 OK 0 loaded: blockSize 400, 0000000000000003 2020-06-05 17:13:21 gfx1012-0 77936867 OK 800 0.00%; 2982 us/it; ETA 2d 16:34; 1579c241dc63eca6 (check 1.27s) 2020-06-05 17:23:18 gfx1012-0 77936867 OK 200000 0.26%; 2991 us/it; ETA 2d 16:35; f0b04b45b0855bd2 (check 1.28s) 2020-06-05 17:33:15 gfx1012-0 77936867 OK 400000 0.51%; 2979 us/it; ETA 2d 16:10; c03f94396a5aa29e (check 1.27s) 2020-06-05 17:43:17 gfx1012-0 77936867 OK 600000 0.77%; 3004 us/it; ETA 2d 16:32; b9decd65ca71b629 (check 1.28s)[/CODE]PS - [C]Linux xii 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Thu Apr 9 13:49:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux[/C] :mike: |
Some gpuowl Radeon VII tuning data from the trenches
1 Attachment(s)
These are not final tunes, just what gives tolerable error rates for now. The rolloff in GhzD/day at higher fft lengths was surprisingly high at nearly 2:1. The peak I've achieved here at 5M is noticeably lower than what George posted (510 GhzD/day back in March with an earlier slower version on Linux.)
|
[QUOTE=Xyzzy;547247]
As far as performance, how does this look?[CODE]2020-06-05 17:13:13 gpuowl v6.11-312-gc69350e-dirty 2020-06-05 17:13:13 Note: not found 'config.txt' 2020-06-05 17:13:13 device 0, unique id '' 2020-06-05 17:13:13 gfx1012-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw) 2020-06-05 17:13:13 gfx1012-0 Expected maximum carry32: 583B0000 2020-06-05 17:13:13 gfx1012-0 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DWEIGHT_STEP=0x1.5621686be7602p+0 -DIWEIGHT_STEP=0x1.7f1af377e822p-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DPM1=0 -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=1u -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-06-05 17:13:16 gfx1012-0 OpenCL compilation in 3.10 s 2020-06-05 17:13:17 gfx1012-0 77936867 OK 0 loaded: blockSize 400, 0000000000000003 2020-06-05 17:13:21 gfx1012-0 77936867 OK 800 0.00%; 2982 us/it; ETA 2d 16:34; 1579c241dc63eca6 (check 1.27s) 2020-06-05 17:23:18 gfx1012-0 77936867 OK 200000 0.26%; 2991 us/it; ETA 2d 16:35; f0b04b45b0855bd2 (check 1.28s) 2020-06-05 17:33:15 gfx1012-0 77936867 OK 400000 0.51%; 2979 us/it; ETA 2d 16:10; c03f94396a5aa29e (check 1.27s) 2020-06-05 17:43:17 gfx1012-0 77936867 OK 600000 0.77%; 3004 us/it; ETA 2d 16:32; b9decd65ca71b629 (check 1.28s)[/CODE]:mike:[/QUOTE] That's a little slower than my RX480 and gpuowl-winv 6.11-292 runs 4.5M fft (at 2808us/it) on Windows 7 in a cramped warm HP Z600 workstation tower. Unfortunately [url]https://www.mersenne.ca/cudalucas.php[/url] doesn't know anything about a W5500. Maybe you could run and send James a benchmark. |
gpuowl-win v6.11-295 build (may be the last for a while)
2 Attachment(s)
[QUOTE=kriesel;547236]Same ambiguous overload error as -310 and -311.
I would try an earlier commit but don't know the proper magic git incantation.[/QUOTE] Using git bisect iteratively, per [URL]https://git-scm.com/docs/git-bisect[/URL] it appears the last gpuowl commit without the "ambiguous overload" fatal build issue on MSYS2 for Windows is v6.11-295-gaecf041, v6.11-296-g33e2d8e bad. $ git bisect good v6.11-295-gaecf041 [CODE]33e2d8ef73d81c581fc0d0aa161445ddefb03c18 is the first bad commit commit 33e2d8ef73d81c581fc0d0aa161445ddefb03c18 Author: Mihai Preda <mhpreda@gmail.com> Date: Mon May 25 23:39:37 2020 +1000 In work: proof construction blueprint Args.cpp | 6 +---- GmpUtil.cpp | 2 +- GmpUtil.h | 2 +- Gpu.cpp | 79 +++++++++++++++++++++++++++++++++++---------------------- Gpu.h | 6 +++-- ProofSet.h | 84 ++++++++++++++++++++++++++++++++++++++++++++++--------------- Task.cpp | 8 +++++- main.cpp | 1 + 8 files changed, 128 insertions(+), 60 deletions(-)[/CODE] |
FWIW, here is how we got gpuowl working on a clean install of Centos 8.
As root:[CODE]yum update yum install gmp-devel.x86_64 cd ~ wget https://drivers.amd.com/drivers/linux/amdgpu-pro-19.50-1011208-rhel-8.1.tar.xz tar Jxvf amdgpu-pro-19.50-1011208-rhel-8.1.tar.xz cd amdgpu-pro-19.50-1011208-rhel-8.1/ ./amdgpu-pro-install -y --opencl=pal,legacy reboot[/CODE]As a normal user:[CODE]cd ~ git clone https://github.com/preda/gpuowl cd gpuowl <<< fix makefile >>> make[/CODE]We will test Centos 7 later tonight. :mike: |
gpuowl-win v6.11-313 try
1 Attachment(s)
No joy there either. Thanks for trying.
|
[QUOTE=kriesel;547256]No joy there either. Thanks for trying.[/QUOTE]
OK I finally understand what's happening here: it's the windows 32-bit long again! Let me explain, GMP provides constructors that take pretty much any kind of int, among others: unsigned int, and long unsigned int. In my code I'm invoking the constructor with a 64-bit unsigned int, and guess what -- there's no constructor taking that! (because well, both unsigned and long unsigned are 32bit). Very helpful. I'll need to work-around this silly situation. |
[QUOTE=Xyzzy;547254]We will test Centos 7 later tonight.[/QUOTE]Centos 7 needs a newer compiler.
[C]g++: error: unrecognized command line option ‘-std=c++17’[/C] |
[QUOTE=kriesel;547256]No joy there either. Thanks for trying.[/QUOTE]
Please try the new commit, I hope it's fixed this time. |
gpuowl-win: v6.11-314-gde84f41 not yet there
1 Attachment(s)
[QUOTE=preda;547271]Please try the new commit, I hope it's fixed this time.[/QUOTE]now line 199 same error message, see attached build log file.
|
OK, here's a bug - last night noticed my wall wattmeter on the new 3 x R7 build was running ~250W below its normal 750-800W range ... checked status of the 3 cards, saw that device 1 was basically idling, despite 2 active gpuowl processes running on it:
[code]GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 69.0c 162.0W 1373Mhz 1151Mhz 63.92% manual 250.0W 4% 100% 1 36.0c 64.0W 1547Mhz 351Mhz 40.0% manual 250.0W 99% 100% 2 76.0c 163.0W 1373Mhz 1151Mhz 62.75% manual 250.0W 4% 100% [/code] I'd seen similar behavior before on my older Haswell-system R7, there a reboot typically solved the problem. (I've never gotten the ROCm --gpureset -d [device id] option to work ... it just hangs.) But this one persisted through multiple reboot/restart-jobs attempts. Had a look at the tail ends of the 2 logfiles for the jobs on this GPU, that revealed the problem: [code]2020-06-05 22:20:47 95b2786172df888e 105809299 P2 using 253 buffers of 44.0 MB each 2020-06-05 22:21:27 95b2786172df888e 105809299 P1 GCD: no factor 2020-06-05 22:20:31 95b2786172df888e 105809351 P2 using blocks [33 - 999] to cover 1476003 primes 2020-06-05 22:20:32 95b2786172df888e 105809351 P2 using 342 buffers of 44.0 MB each[/code] Job1's p-1 assignment has just begun stage 2 of the factoring attempt, while firing up stage 2 it also does the stage 1 GCD, that has just completed and reports 'no factor'. Job2 started its own p-1 stage 2 a few minutes before. Not sure why the 2 jobs have ended up using a different number of buffers (chunks of memory storing small even powers of the stage 1 residue - generally the more of these we can fit in memory the faster stage 2 will be, but there are diminishing returns once we get above ~100 such precomputed powers), but those buffers reveal the bug. let's see how much memory they represent: (253+342)*44 = 26180, 26GB of memory, nearly double the 16GB HBM available on the R7. I'm not sure what ROCm does in such circumstances - does it start swapping to regular RAM, or to disk? - but in any event the result is clear, processing on the card basically comes to a halt. If I hadn't noticed the problem shortly before going to bed, I suspect that card would have idled all night. To test the hypothesis I killed one of the two p-1 jobs, voila, the wall wattage immediately shot back up into the normal range, the ROCm display showed the temp and MCLK settings rising back to normal-under-load [code]GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 73.0c 158.0W 1373Mhz 1151Mhz 62.75% manual 250.0W 4% 100% 1 61.0c 192.0W 1547Mhz 1151Mhz 38.82% manual 250.0W 96% 100% 2 77.0c 159.0W 1373Mhz 1151Mhz 61.96% manual 250.0W 4% 100% [/code] So that VRAM = 99% in the first display was in fact saying "VRAM is maxed out, we're going on strike". The default stage 2 memalloc of 96% of 16GB is so close to the limit that even a modest additional memalloc puts us over the limit. Oddly, I'm near-100% sure that I've had such occurrences of two p-1 jobs running on the same card, both in stage 2, before on my Haswell R7 - surprised to only have hit this issue now. The fact that job 1 above, which started stage 2 after job2 did, only alloc'ed 253 buffers indicates to me that some kind of how-much-HBM-is-available check is being done at runtime, but somehow we still ended up over 16GB. Perhaps if two p-1 stage 2s running on the same card start nearly at the same time, could that cause the mem-available computation to be fubar? No, looking at the logs of the 2 jobs, job2 started its stage 2 a full 15 minutes before job1. Mihai, George, do you have any sense of the stage 2 performance hit from cutting the memalloc to half the current "use all available HBM"? That would mean ~170 buffers for exponents in the above range. |
[QUOTE=ewmayer;547307]OK, here's a bug - last night noticed my wall wattmeter on the new 3 x R7 build was running ~250W below its normal 750-800W range ... checked status of the 3 cards, saw that device 1 was basically idling, despite 2 active gpuowl processes running on it:
[code]GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 69.0c 162.0W 1373Mhz 1151Mhz 63.92% manual 250.0W 4% 100% 1 36.0c 64.0W 1547Mhz 351Mhz 40.0% manual 250.0W 99% 100% 2 76.0c 163.0W 1373Mhz 1151Mhz 62.75% manual 250.0W 4% 100% [/code] I'd seen similar behavior before on my older Haswell-system R7, there a reboot typically solved the problem. (I've never gotten the ROCm --gpureset -d [device id] option to work ... it just hangs.) But this one persisted through multiple reboot/restart-jobs attempts. Had a look at the tail ends of the 2 logfiles for the jobs on this GPU, that revealed the problem: [code]2020-06-05 22:20:47 95b2786172df888e 105809299 P2 using 253 buffers of 44.0 MB each 2020-06-05 22:21:27 95b2786172df888e 105809299 P1 GCD: no factor 2020-06-05 22:20:31 95b2786172df888e 105809351 P2 using blocks [33 - 999] to cover 1476003 primes 2020-06-05 22:20:32 95b2786172df888e 105809351 P2 using 342 buffers of 44.0 MB each[/code] Job1's p-1 assignment has just begun stage 2 of the factoring attempt, while firing up stage 2 it also does the stage 1 GCD, that has just completed and reports 'no factor'. Job2 started its own p-1 stage 2 a few minutes before. Not sure why the 2 jobs have ended up using a different number of buffers (chunks of memory storing small even powers of the stage 1 residue - generally the more of these we can fit in memory the faster stage 2 will be, but there are diminishing returns once we get above ~100 such precomputed powers), but those buffers reveal the bug. let's see how much memory they represent: (253+342)*44 = 26180, 26GB of memory, nearly double the 16GB HBM available on the R7. I'm not sure what ROCm does in such circumstances - does it start swapping to regular RAM, or to disk? - but in any event the result is clear, processing on the card basically comes to a halt. If I hadn't noticed the problem shortly before going to bed, I suspect that card would have idled all night. To test the hypothesis I killed one of the two p-1 jobs, voila, the wall wattage immediately shot back up into the normal range, the ROCm display showed the temp and MCLK settings rising back to normal-under-load [code]GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 73.0c 158.0W 1373Mhz 1151Mhz 62.75% manual 250.0W 4% 100% 1 61.0c 192.0W 1547Mhz 1151Mhz 38.82% manual 250.0W 96% 100% 2 77.0c 159.0W 1373Mhz 1151Mhz 61.96% manual 250.0W 4% 100% [/code] So that VRAM = 99% in the first display was in fact saying "VRAM is maxed out, we're going on strike". The default stage 2 memalloc of 96% of 16GB is so close to the limit that even a modest additional memalloc puts us over the limit. Oddly, I'm near-100% sure that I've had such occurrences of two p-1 jobs running on the same card, both in stage 2, before on my Haswell R7 - surprised to only have hit this issue now. The fact that job 1 above, which started stage 2 after job2 did, only alloc'ed 253 buffers indicates to me that some kind of how-much-HBM-is-available check is being done at runtime, but somehow we still ended up over 16GB. Perhaps if two p-1 stage 2s running on the same card start nearly at the same time, could that cause the mem-available computation to be fubar? No, looking at the logs of the 2 jobs, job2 started its stage 2 a full 15 minutes before job1. Mihai, George, do you have any sense of the stage 2 performance hit from cutting the memalloc to half the current "use all available HBM"? That would mean ~170 buffers for exponents in the above range.[/QUOTE] In OpenCL there's no way to get the "free/available memory on the GPU" (there is a mecanism to get the *total* memory, but that's not useful when that total is shared between an unknown number of actors). That lack comes from a philosophical choice OpenCL made to "abstract away" the hardware as much as possible, including the amount of memory available. It acts like this: you want to allocate memory, the GPU doesn't have any anymore, fine we're going to allocate it on the host behind your back and lie to you that the alloc succeeded. The difference you see in the second process is because, after a while of allocating on the host, ROCm finally decided that too much is too much and reported a first alloc failure. Anyway, by default P-1 will assume it runs in single-process and attempt to allocate "all it can" for itself, but no more than 16GB. When running 2-process you're supposed to use the -maxAlloc to indicate how much memory *you* allocate to each process. (e.g.: -maxAlloc 7500 for about 7.5G limit). BTW, the processes were not idling, they were just running extremely slowly because the buffers were on host memory instead of GPU (because that's a wise ROCm choice, oh yes). |
[QUOTE=preda;547310]In OpenCL there's no way to get the "free/available memory on the GPU" (there is a mecanism to get the *total* memory, but that's not useful when that total is shared between an unknown number of actors). That lack comes from a philosophical choice OpenCL made to "abstract away" the hardware as much as possible, including the amount of memory available. It acts like this: you want to allocate memory, the GPU doesn't have any anymore, fine we're going to allocate it on the host behind your back and lie to you that the alloc succeeded. The difference you see in the second process is because, after a while of allocating on the host, ROCm finally decided that too much is too much and reported a first alloc failure.[/quote]
But for gpuowl, based on tests on a variety of GPUs, we know that 2 tasks is the maximum which makes sense from a performance perspective, correct or not? [quote]Anyway, by default P-1 will assume it runs in single-process and attempt to allocate "all it can" for itself, but no more than 16GB. When running 2-process you're supposed to use the -maxAlloc to indicate how much memory *you* allocate to each process. (e.g.: -maxAlloc 7500 for about 7.5G limit). BTW, the processes were not idling, they were just running extremely slowly because the buffers were on host memory instead of GPU (because that's a wise ROCm choice, oh yes).[/QUOTE] Thanks, I've added that to my setup scripts. BTW, once you get beyond the 100x-slowdown range, the difference between 'idling' and 'running very slowly' becomes more or less philosophical. :) |
Does gpuowl run P-1 automatically or do you have to tell it to?
If you get "Smallest available first-time PRP tests" will that work have p-1 done before it is issued to you? |
[QUOTE=Xyzzy;547317]Does gpuowl run P-1 automatically or do you have to tell it to?
If you get "Smallest available first-time PRP tests" will that work have p-1 done before it is issued to you?[/QUOTE] The later versions of gpuowl are auto-geared for P-1. The server issues the right code if P-1 is required. |
Some thoughts:
ROCm doesn't support the 5500/5600/5700 "Navi" GPUs, so we are forced to use the amdgpu-pro drivers. Hopefully amdgpu-pro will continue to be supported by gpuowl. Is there a stable branch of gpuowl for people like us who want to be certain that things work properly? (We also don't try to tune gouowl. We just use whatever the defaults are.) We haven't noticed any "kworker" hijacks yet. Is this something rare? Our GPU does not have a unique device ID. We just read this entire thread from start to finish. It is full of great information. Summarizing it all would be a monumental task! :mike: |
[QUOTE=preda;547233]No, I would expect that the availability of unique_id depends on the GPU model. RadeonVII has it, others may not have it. If the file /sys/class/drm/cardN/device/unique_id is there it's likely to have the id information, otherwise not.[/QUOTE]It turns out that some NVIDIA gpus also have queryable serial numbers and unique ids. Per [URL]http://on-demand.gputechconf.com/gtc/2012/presentations/S0238-Tesla-Cluster-Monitoring-Management-Apis.pdf[/URL] nvidia-smi (command--line) and NVML (interface to C and Python) can access these. Possibly it's available through OpenCL too. NVIDIA-Smi serial number availability applies to all Tesla models and most Quadro models (4000 and above yes; 2000 no); not the popular GTX or RTX though.
|
[QUOTE=ewmayer;547307]OK, here's a bug - last night noticed my wall wattmeter on the new 3 x R7 build was running ~250W below its normal 750-800W range ... checked status of the 3 cards, saw that device 1 was basically idling, despite 2 active gpuowl processes running on it[/QUOTE]I have a Radeon VII on Windows 10 that has decided after a hang, kill process, and launch new process, to run at 570Mhz, which is below the nominal minimum. It seems to be doing ok there, in an odd sort of ultra-power-saving mode. Indicated power consumption in GPU-Z is 61W on that gpu.
[CODE]2020-06-08 11:38:18 asr2/radeonvii 96495283 OK 36200000 37.51%; 1534 us/it; ETA 1d 01:42; 9b8147bf22397183 (check 0.89s) 2020-06-08 11:43:26 asr2/radeonvii 96495283 OK 36400000 37.72%; 1534 us/it; ETA 1d 01:36; 50d84f8ddd0a49b5 (check 0.88s)[/CODE]Compare to timings prior to the hang at over triple the watts:[CODE]2020-06-08 00:44:19 asr2/radeonvii 96495283 OK 16200000 16.79%; 733 us/it; ETA 0d 16:21; 3535cc0cfc329a0a (check 0.61s) 2020-06-08 00:46:46 asr2/radeonvii 96495283 OK 16400000 17.00%; 732 us/it; ETA 0d 16:17; d829cbbe1e03c698 (check 0.56s) [/CODE] |
We are trying to compile in Centos 7.
We have python3 and gcc 7, 8 and 9 installed. All versions of g++ fail here:[CODE]$ make ./tools/expand.py < gpuowl.cl > gpuowl-expanded.cl cat head.txt gpuowl-expanded.cl tail.txt > gpuowl-wrap.cpp echo \"`git describe --long --dirty --always`\" > version.new diff -q -N version.new version.inc >/dev/null || mv version.new version.inc echo Version: `cat version.inc` Version: "v6.11-318-g3109989-dirty" g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17 -c -o Pm1Plan.o Pm1Plan.cpp g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17 -c -o GmpUtil.o GmpUtil.cpp GmpUtil.cpp: In function ‘std::string GCD(u32, const std::vector<unsigned int>&, u32)’: GmpUtil.cpp:51:25: error: ‘gcd’ was not declared in this scope mpz_class resultGcd = gcd((mpz_class{1} << exp) - 1, w - sub); ^~~ GmpUtil.cpp:51:25: note: suggested alternative: ‘gcvt’ mpz_class resultGcd = gcd((mpz_class{1} << exp) - 1, w - sub); ^~~ gcvt make: *** [Makefile:30: GmpUtil.o] Error 1[/CODE][CODE]$ cat GmpUtil.cpp // Copyright (C) Mihai Preda. #include "GmpUtil.h" #include <gmp.h> #include <cmath> #include <cassert> using namespace std; namespace { mpz_class mpz(const vector<u32>& words) { mpz_class b{}; mpz_import(b.get_mpz_t(), words.size(), -1 /*order: LSWord first*/, sizeof(u32), 0 /*endianess: native*/, 0 /*nails*/, words.data()); return b; } mpz_class primorial(u32 p) { mpz_class b{}; mpz_primorial_ui(b.get_mpz_t(), p); return b; } mpz_class powerSmooth(u32 exp, u32 B1) { mpz_class a{exp}; a *= 256; // boost 2s. for (int k = log2(B1); k >= 1; --k) { a *= primorial(pow(B1, 1.0 / k)); } return a; } u32 sizeBits(mpz_class a) { return mpz_sizeinbase(a.get_mpz_t(), 2); } } vector<bool> bitsMSB(const mpz_class& a) { vector<bool> bits; int nBits = sizeBits(a); bits.reserve(nBits); for (int i = nBits - 1; i >= 0; --i) { bits.push_back(mpz_tstbit(a.get_mpz_t(), i)); } assert(int(bits.size()) == nBits); return bits; } // return GCD(bits - sub, 2^exp - 1) as a decimal string if GCD!=1, or empty string otherwise. std::string GCD(u32 exp, const std::vector<u32>& words, u32 sub) { mpz_class w = mpz(words); if (w == 0 || w == sub) { throw std::domain_error("GCD invalid input"); } mpz_class resultGcd = gcd((mpz_class{1} << exp) - 1, w - sub); return (resultGcd == 1) ? ""s : resultGcd.get_str(); } // MSB: Most Significant Bit first (at index 0). vector<bool> powerSmoothMSB(u32 exp, u32 B1) { return bitsMSB(powerSmooth(exp, B1)); } int jacobi(u32 exp, const std::vector<u32>& words) { mpz_class w = mpz(words) - 2; mpz_class m = (mpz_class{1} << exp) - 1; return mpz_jacobi(w.get_mpz_t(), m.get_mpz_t()); } [/CODE]:help: Edit: We learned how to checkout older versions of gpuowl. This happens in the other versions (last month or so) we tried as well. |
[QUOTE=kriesel;547460]I have a Radeon VII on Windows 10 that has decided after a hang, kill process, and launch new process, to run at 570Mhz, which is below the nominal minimum. It seems to be doing ok there, in an odd sort of ultra-power-saving mode. Indicated power consumption in GPU-Z is 61W on that gpu.[/QUOTE]
Do you have a similar rocm-smi cli under Windows as we Linuxers have? How do you manually fiddle the sclk setting under Windows? Because it sounds like your sclk setting simply got reset to a low level. |
1 Attachment(s)
[QUOTE=ewmayer;547489]Do you have a similar rocm-smi cli under Windows as we Linuxers have? How do you manually fiddle the sclk setting under Windows? Because it sounds like your sclk setting simply got reset to a low level.[/QUOTE]Rocm is linux specific, so no. Windows seems to be the neglected stepchild there. [URL]https://github.com/RadeonOpenCompute/ROCm/issues/18[/URL]
(And the string "windo" is not present in [url]https://github.com/RadeonOpenCompute/ROCm;[/url] there's a list of linux supported only.) Compared to NVIDIA supporting nvidia-smi on linux and Windows, for pro gpus and gamers' and low end, it's a definite negative for AMD. There is a graphical interactive tool that is available in the AMD Radeon software package that includes the Adrenalin 2020 driver 20.4.2.(GPU-Z identifies it as Driver version 26.20.15029.27017 (Adrenalin 20.4.2) DCH / Win10 64 May 15 2020) As far as I can tell, if I want to treat one gpu differently, and I do, I must tune each separately. It appears that takes different save files. Those are in xml, in which I'm pretty illiterate. There's no such thing as a sclk setting as far as I can tell. I can dial gpu or memory clocks up by individual percents over limited ranges, or switch modes and dial their limits by kilohertz increments. This tool says that slowww gpu should be running at 808 to ~1670 Mhz. GPU-Z and it reports it at 570, as does HWInfo, and the slower gpuowl timings on the same exponent confirm it. I think it will take a system restart or power cycle to clear it up. Windows Device Manager disable and reenable after carefully identifying it by PCI bus number did not clear it. That would even be slow for the HD4600 IGP, which shows at 600Mhz at idle. There is no "570" string in the xml file (see code section below). There are min, mid, and max values in Mhz, all larger than 800. (I checked other files too.) I added the annotation in green font. This interface difference is why when linux users talk sclk 2 through sclk5, it does not translate without a conversion table. And I see different posters give different Mhz values for the same sclk value in a thread or two here. There's also something called Wattman which I've used, also graphical, but that is not installed on this system. I think it came with a much older driver package, different source. [CODE]<?xml version="1.0" encoding="UTF-8"?> <GPU DevID="66AF" RevID="C1"> <PPW Value="1"/> <FEATURE ID="100" Enabled="0"> <STATES> <STATE ID="0" Enabled="False" Value="0"/> </STATES> </FEATURE> <FEATURE ID="101" Enabled="3"> <STATES> <STATE ID="0" Enabled="True" Value="0"/> </STATES> </FEATURE> <FEATURE ID="4" Enabled="True"> <STATES> <STATE ID="0" Enabled="False" Value="808"/> [COLOR=green]minimum gpu clock[/COLOR] <STATE ID="1" Enabled="False" Value="1240"/> [COLOR=Green]midpoint adjusted from maximum being lowered[/COLOR] <STATE ID="2" Enabled="False" Value="1672"/> [COLOR=green]lowered maximum[/COLOR] <STATE ID="3" Enabled="False" Value="808"/> <STATE ID="4" Enabled="False" Value="1672"/> [COLOR=green]lowered maximum[/COLOR] </STATES> </FEATURE> <FEATURE ID="12" Enabled="False"> <STATES> <STATE ID="0" Enabled="False" Value="712"/> <STATE ID="1" Enabled="False" Value="797"/> <STATE ID="2" Enabled="False" Value="1023"/> </STATES> </FEATURE> <FEATURE ID="5" Enabled="True"> <STATES> <STATE ID="0" Enabled="False" Value="1122"/>[COLOR=Green] vram clock[/COLOR] </STATES> </FEATURE> <FEATURE ID="9" Enabled="False"> <STATES> <STATE ID="0" Enabled="False" Value="0"/> </STATES> </FEATURE> <FEATURE ID="8" Enabled="False"> <STATES> <STATE ID="0" Enabled="True" Value="0"/> <STATE ID="1" Enabled="True" Value="0"/> <STATE ID="2" Enabled="True" Value="-14"/> [COLOR=green]power limit modified[/COLOR] </STATES> </FEATURE> <FEATURE ID="17" Enabled="False"> <STATES> <STATE ID="0" Enabled="True" Value="0"/> </STATES> </FEATURE> <FEATURE ID="19" Enabled="False"> <STATES> <STATE ID="0" Enabled="True" Value="0"/> </STATES> </FEATURE> <FEATURE ID="20" Enabled="False"> <STATES> <STATE ID="0" Enabled="True" Value="0"/> </STATES> </FEATURE> <FEATURE ID="21" Enabled="False"> <STATES> <STATE ID="0" Enabled="True" Value="0"/> </STATES> </FEATURE> <FEATURE ID="22" Enabled="True"> <STATES> <STATE ID="0" Enabled="False" Value="30"/> <STATE ID="1" Enabled="False" Value="7"/> <STATE ID="2" Enabled="False" Value="50"/> <STATE ID="3" Enabled="False" Value="8"/> <STATE ID="4" Enabled="False" Value="72"/> <STATE ID="5" Enabled="False" Value="34"/> <STATE ID="6" Enabled="False" Value="89"/> <STATE ID="7" Enabled="False" Value="71"/> <STATE ID="8" Enabled="False" Value="100"/> <STATE ID="9" Enabled="False" Value="100"/> </STATES> </FEATURE> </GPU> [/CODE] |
[QUOTE=Xyzzy;547488]We are trying to compile in Centos 7.
[/QUOTE] Probably you need a newer version of GMP. Could you please find the file gmpxx.h and grep it for "gcd", e.g. [CODE] $ grep gcd /usr/include/gmpxx.h struct __gmp_gcd_function { mpz_gcd(z, w, v); } { mpz_gcd_ui(z, w, l); } { __GMPXX_TMPZ_D; mpz_gcd (z, w, temp); } __GMP_DEFINE_BINARY_FUNCTION(gcd, __gmp_gcd_function) [/CODE] |
[code]$ ls -l gmp*
-rw-r--r--. 1 root root 2289 Aug 2 2017 gmp.h -rw-r--r--. 1 root root 2473 Aug 2 2017 gmp-mparam.h -rw-r--r--. 1 root root 11524 Aug 2 2017 gmp-mparam-x86_64.h -rw-r--r--. 1 root root 83249 Aug 2 2017 gmp-x86_64.h -rw-r--r--. 1 root root 113143 Aug 2 2017 gmpxx.h[/code][code]$ grep gcd gmp* gmp-x86_64.h:#define mpz_gcd __gmpz_gcd gmp-x86_64.h:__GMP_DECLSPEC void mpz_gcd (mpz_ptr, mpz_srcptr, mpz_srcptr); gmp-x86_64.h:#define mpz_gcd_ui __gmpz_gcd_ui gmp-x86_64.h:__GMP_DECLSPEC unsigned long int mpz_gcd_ui (mpz_ptr, mpz_srcptr, unsigned long int); gmp-x86_64.h:#define mpz_gcdext __gmpz_gcdext gmp-x86_64.h:__GMP_DECLSPEC void mpz_gcdext (mpz_ptr, mpz_ptr, mpz_ptr, mpz_srcptr, mpz_srcptr); gmp-x86_64.h:#define mpn_gcd __MPN(gcd) gmp-x86_64.h:__GMP_DECLSPEC mp_size_t mpn_gcd (mp_ptr, mp_ptr, mp_size_t, mp_ptr, mp_size_t); gmp-x86_64.h:#define mpn_gcd_1 __MPN(gcd_1) gmp-x86_64.h:__GMP_DECLSPEC mp_limb_t mpn_gcd_1 (mp_srcptr, mp_size_t, mp_limb_t) __GMP_ATTRIBUTE_PURE; gmp-x86_64.h:#define mpn_gcdext_1 __MPN(gcdext_1) gmp-x86_64.h:__GMP_DECLSPEC mp_limb_t mpn_gcdext_1 (mp_limb_signed_t *, mp_limb_signed_t *, mp_limb_t, mp_limb_t); gmp-x86_64.h:#define mpn_gcdext __MPN(gcdext) gmp-x86_64.h:__GMP_DECLSPEC mp_size_t mpn_gcdext (mp_ptr, mp_ptr, mp_size_t *, mp_ptr, mp_size_t, mp_ptr, mp_size_t);[/code][code]$ yum info gmp-devel.x86_64 Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile * base: mirrors.advancedhosters.com * extras: mirror.steadfastnet.com * updates: mirror.dal10.us.leaseweb.net Installed Packages Name : gmp-devel Arch : x86_64 Epoch : 1 Version : 6.0.0 Release : 15.el7 Size : 340 k Repo : installed From repo : base Summary : Development tools for the GNU MP arbitrary precision library URL : http://gmplib.org/ License : LGPLv3+ or GPLv2+ Description : The libraries, header files and documentation for using the GNU MP : arbitrary precision library in applications. : : If you want to develop applications which will use the GNU MP library, : you'll need to install the gmp-devel package. You'll also need to : install the gmp package.[/code] |
[QUOTE=Xyzzy;547513]Version : 6.0.0
[/QUOTE] Yes, can you please install GMP 6.1 or 6.2? It seems gcd() was added to the c++ wrapper after 6.0 (a hypothesis). |
We had to install python3 and gcc 8.3.1 already. It took several hours today for us to learn how to install the newer compiler toolchain. We are going to call it quits on Centos 7. It isn't worth our effort or your time since we have a version running on Centos 8 just fine.
Thanks for the help! PS - Instructions for gcc 8 on Centos 7: [URL]https://stackoverflow.com/questions/55345373/how-to-install-gcc-g-8-on-centos[/URL] |
[QUOTE=kriesel;547508] or switch modes and dial their limits by [COLOR=Red]kilo[/COLOR]hertz increments. [/QUOTE]Oops no, make that megahertz increments.
A warm restart (shutdown -r) handled the anomalous stuck 570 Mhz situation. For now at least. |
George, Mihai, how much percentage performance improvement is there from the FMA to MUL change in [URL="https://github.com/preda/gpuowl/commit/c336704c220e16ff246d177d7be5908ed1d445db?"]https://github.com/preda/gpuowl/commit/c336704c220e16ff246d177d7be5908ed1d445db[/URL]?
Does "May make MAX_ACCURACY ever so marginally faster" mean the optimization gains are coming to a conclusion? |
[QUOTE=kriesel;547540]George, Mihai, how much percentage performance improvement is there from the FMA to MUL change in [URL="https://github.com/preda/gpuowl/commit/c336704c220e16ff246d177d7be5908ed1d445db?"]https://github.com/preda/gpuowl/commit/c336704c220e16ff246d177d7be5908ed1d445db[/URL]?
Does "May make MAX_ACCURACY ever so marginally faster" mean the optimization gains are coming to a conclusion?[/QUOTE] Should be no faster on Radeon VII. Yes, we are running out of ideas to make the code faster. |
[QUOTE=Prime95;547544]Yes, we are running out of ideas to make the code faster.[/QUOTE]It's been a good run. So now it's up to the engineers to create faster hardware again and more of it.
I've hit a bad spot in a 9M fft LL DC run. This had repeated for hours, so I reran it with -use STATS. I don't see anything in the stats indicating trouble, but there it is in Jacobi=1 instead of -1. Bits/word looks ok to me at <17. [CODE]2020-06-09 20:58:08 gpuowl v6.11-292-gecab9ae 2020-06-09 20:58:08 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM,STATS -maxAlloc 15000 2020-06-09 20:58:08 device 2, unique id '' 2020-06-09 20:58:08 asr2/radeonvii2 159805579 FFT: 9M 1K:9:512 (16.93 bpw) 2020-06-09 20:58:08 asr2/radeonvii2 Expected maximum carry32: 2C430000 2020-06-09 20:58:11 asr2/radeonvii2 OpenCL args "-DEXP=159805579u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0x8.60730821aeaf8p-3 -DIWEIGHT_STEP=0xf.47c6f52dba228p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-06-09 20:58:24 asr2/radeonvii2 OpenCL compilation in 13.20 s 2020-06-09 20:58:25 asr2/radeonvii2 159805579 LL 98500000 loaded: 70acc69859d13f28 2020-06-09 21:00:54 asr2/radeonvii2 Roundoff: N=100000, mean 0.057751, SD 0.003232, CV 0.055957, max 0.084695, z 136.9 (pErr 0.000000%) 2020-06-09 21:00:54 asr2/radeonvii2 159805579 LL 98600000 61.70%; 1489 us/it; ETA 1d 01:19; a2af2a7b425c7d75 2020-06-09 21:03:23 asr2/radeonvii2 Roundoff: N=100000, mean 0.057745, SD 0.003224, CV 0.055838, max 0.085250, z 137.2 (pErr 0.000000%) 2020-06-09 21:03:23 asr2/radeonvii2 159805579 LL 98700000 61.76%; 1488 us/it; ETA 1d 01:15; d05666c89b157db8 2020-06-09 21:05:51 asr2/radeonvii2 Roundoff: N=100000, mean 0.057751, SD 0.003235, CV 0.056023, max 0.091412, z 136.7 (pErr 0.000000%) 2020-06-09 21:05:52 asr2/radeonvii2 159805579 LL 98800000 61.83%; 1487 us/it; ETA 1d 01:12; b16756b2ece2dc45 2020-06-09 21:08:20 asr2/radeonvii2 Roundoff: N=100000, mean 0.057747, SD 0.003231, CV 0.055950, max 0.085328, z 136.9 (pErr 0.000000%) 2020-06-09 21:08:20 asr2/radeonvii2 159805579 LL 98900000 61.89%; 1486 us/it; ETA 1d 01:09; 0ffbd0ea6349483d 2020-06-09 21:10:48 asr2/radeonvii2 Roundoff: N=100000, mean 0.057737, SD 0.003227, CV 0.055895, max 0.082131, z 137.0 (pErr 0.000000%) 2020-06-09 21:10:49 asr2/radeonvii2 159805579 LL 99000000 61.95%; 1486 us/it; ETA 1d 01:06; 7b7cb65b21554200 2020-06-09 21:13:17 asr2/radeonvii2 Roundoff: N=100000, mean 0.057735, SD 0.003235, CV 0.056027, max 0.084226, z 136.7 (pErr 0.000000%) 2020-06-09 21:13:17 asr2/radeonvii2 159805579 LL 99100000 62.01%; 1485 us/it; ETA 1d 01:03; 82f79a16d2dbdb5e 2020-06-09 21:15:46 asr2/radeonvii2 Roundoff: N=100000, mean 0.057739, SD 0.003235, CV 0.056027, max 0.087179, z 136.7 (pErr 0.000000%) 2020-06-09 21:15:46 asr2/radeonvii2 159805579 LL 99200000 62.08%; 1486 us/it; ETA 1d 01:01; 6e572a98c0492314 2020-06-09 21:15:46 asr2/radeonvii2 159805579 EE 99000000 ([COLOR=red]jacobi == 1[/COLOR]) 2020-06-09 21:15:47 asr2/radeonvii2 159805579 LL 98500000 loaded: 70acc69859d13f28 2020-06-09 21:18:15 asr2/radeonvii2 Roundoff: N=100000, mean 0.057751, SD 0.003232, CV 0.055957, max 0.084695, z 136.9 (pErr 0.000000%) 2020-06-09 21:18:15 asr2/radeonvii2 159805579 LL 98600000 61.70%; 1486 us/it; ETA 1d 01:16; a2af2a7b425c7d75 2020-06-09 21:20:43 asr2/radeonvii2 Roundoff: N=100000, mean 0.057745, SD 0.003224, CV 0.055838, max 0.085250, z 137.2 (pErr 0.000000%) 2020-06-09 21:20:44 asr2/radeonvii2 159805579 LL 98700000 61.76%; 1486 us/it; ETA 1d 01:13; d05666c89b157db8 2020-06-09 21:23:12 asr2/radeonvii2 Roundoff: N=100000, mean 0.057751, SD 0.003235, CV 0.056023, max 0.091412, z 136.7 (pErr 0.000000%) 2020-06-09 21:23:12 asr2/radeonvii2 159805579 LL 98800000 61.83%; 1487 us/it; ETA 1d 01:12; b16756b2ece2dc45 2020-06-09 21:25:41 asr2/radeonvii2 Roundoff: N=100000, mean 0.057747, SD 0.003231, CV 0.055950, max 0.085328, z 136.9 (pErr 0.000000%) 2020-06-09 21:25:41 asr2/radeonvii2 159805579 LL 98900000 61.89%; 1486 us/it; ETA 1d 01:09; 0ffbd0ea6349483d 2020-06-09 21:28:09 asr2/radeonvii2 Roundoff: N=100000, mean 0.057737, SD 0.003227, CV 0.055895, max 0.082131, z 137.0 (pErr 0.000000%) 2020-06-09 21:28:10 asr2/radeonvii2 159805579 LL 99000000 61.95%; 1486 us/it; ETA 1d 01:06; 7b7cb65b21554200 2020-06-09 21:30:38 asr2/radeonvii2 Roundoff: N=100000, mean 0.057735, SD 0.003235, CV 0.056027, max 0.084226, z 136.7 (pErr 0.000000%) 2020-06-09 21:30:38 asr2/radeonvii2 159805579 LL 99100000 62.01%; 1486 us/it; ETA 1d 01:04; 82f79a16d2dbdb5e 2020-06-09 21:30:47 asr2/radeonvii2 Stopping, please wait.. 2020-06-09 21:30:47 asr2/radeonvii2 Roundoff: N=6000, mean 0.057697, SD 0.003223, CV 0.055866, max 0.076871, z 137.2 (pErr 0.000000%) 2020-06-09 21:30:48 asr2/radeonvii2 159805579 LL 99106000 62.02%; 1551 us/it; ETA 1d 02:09; 8a382a6d30513954 2020-06-09 21:30:48 asr2/radeonvii2 waiting for the Jacobi check to finish.. 2020-06-09 21:31:21 asr2/radeonvii2 159805579 EE 99000000 ([COLOR=Red]jacobi == 1[/COLOR]) 2020-06-09 21:31:21 asr2/radeonvii2 Exiting because "stop requested" 2020-06-09 21:31:21 asr2/radeonvii2 Bye[/CODE]Running a 5M PRP on the same hardware and conditions (clock rates & temperatures) does not produce GEC errors. Tried the other 9M and the first 10M fft with same Jacobi=1 results. |
[QUOTE=kriesel;547594]It's been a good run. So now it's up to the engineers to create faster hardware again and more of it.
I've hit a bad spot in a 9M fft LL DC run. This had repeated for hours, so I reran it with -use STATS. I don't see anything in the stats indicating trouble, but there it is in Jacobi=1 instead of -1. Bits/word looks ok to me at <17. [/QUOTE] Maybe the savefile is bad (i.e. has a wrong Jacobi somehow) which gets preserved? did you try exiting in the middle, thus forcing an earlier Jacobi check, to see if that also returns fails? If you have any earlier savefile make a copy before it's gone. |
[QUOTE=preda;547595]Maybe the savefile is bad (i.e. has a wrong Jacobi somehow) which gets preserved? did you try exiting in the middle, thus forcing an earlier Jacobi check, to see if that also returns fails?
If you have any earlier savefile make a copy before it's gone.[/QUOTE]Gpuowl seems to be doing a good job of preserving the last Jacobi-ok save files for that exponent. They're now nearly a day old and have survived dozens of cycles of bad-Jacobi results. The log file indicates correct Jacobi for the 98500000 iteration save. That's what it starts from, over and over, whether I allow it to go past 99M (the 500k Jacobi check interval) or interrupt it. The 98600000 to 99100000 iteration res64s are reproducible from one cycle to the next, including runs with different fft choices. Will try to reproduce from the -old file. There's something else going on also. Sometimes a gpuowl process will hang. I haven't kept notes, but my recollection is that occurs only on a subset of the gpus. This has bad effects on other processes that access gpus on the same system. Other gpuowl processes after Ctrl-C to terminate say Bye, but do not exit cleanly to the command prompt. The AMD Radeon Software hangs after a stuck gpu is selected. GPU-Z if it's running also hangs regardless of which Radeon VII is being displayed. Even Task Manager hangs rather than exiting. This is in Windows 10 1909. Once that situation develops, it seems anything that accesses any Radeon VII gpu hangs rather than exiting. Once I left the system be for hours after it developed. Gpu applications recorded log entries of about an hour longer iterating time than normal between updates, then returned to normal rates (~2.5 minute update interval). Occasionally there is a Windows stop 116, which some search results say is a video driver timeout related crash. I have a subjective impression that the system is less stable when using remote desktop, which I use a LOT. I'm contemplating moving to Windows 10 2004 or driver 20.5.1 in response. I'm currently running the recommended 20.4.2. [URL]https://www.amd.com/en/support/graphics/amd-radeon-2nd-generation-vega/amd-radeon-2nd-generation-vega/amd-radeon-vii[/URL] Longer term plan is to install Ubuntu on a different HD on this hardware for a head to head comparison. |
Ken:
Having read this thread from start to finish recently, it is very apparent that your dogged dependence on Windows is causing you a lot of trouble and work. This thread feels dominated by these issues which makes it incredibly difficult to find useful information for the 99% of us that use Linux. We think that you should create a separate thread for Windows+gpuowl problems. To be blunt, your refusal to learn something new (Linux) is irrational. Linux costs no money to deploy. Linux is simpler to manage. Linux drivers for AMD cards are dramatically better. Remote administration via the command line is ridiculously simple. The developer of gpuowl chose to use the Linux toolchain and environment for a very good reason! You have countless extremely-verbose posts complaining about blizzard-like compile-time errors, obscure compiler incompatibilities, driver failures, remote viewer problems, GPU-Z not displaying right, hung processes and all sorts of bizarre things. In fact, we think your approach is rude and it reeks of entitlement. The developer has a finite amount of time to work on this project, from which he earns no money. This thread is the primary communication medium for his project. You expect him to read through your posts and possibly suggest ideas to help, knowing that his time is extremely limited and that your usage is highly unconventional. Rather than changing your workflow to accommodate him, you expect him to change his workflow to accommodate you. You are obviously a smart man and certainly much smarter than we are. If we can run Linux successfully it will be trivial for you to figure out. In the end, you will be more productive. And a lot happier! Please note that we are not attacking you as a person. We are questioning your behavior. We can only hope that if we step out of line in any area of our life that you all would be kind enough to point that out to us. What makes us strong in the end is working together for the common good. :mike: |
[QUOTE=kriesel;547613]Once I left the system be for hours after it developed. Gpu applications recorded log entries of about an hour longer iterating time than normal between updates, then returned to normal rates (~2.5 minute update interval).[/QUOTE]
A restart from the -old.ll file got it through. That file was the save before the hour-long stall. [CODE]2020-06-09 11:15:48 asr2/radeonvii2 159805579 OK 97500000 (jacobi == -1) 2020-06-09 11:18:10 asr2/radeonvii2 159805579 LL 97800000 61.20%; 1424 us/it; ETA 1d 00:31; fe20ad63fcfd3b9d 2020-06-09 11:20:32 asr2/radeonvii2 159805579 LL 97900000 61.26%; 1422 us/it; ETA 1d 00:27; cc49d2676782a244 2020-06-09 11:22:54 asr2/radeonvii2 159805579 LL 98000000 61.32%; 1422 us/it; ETA 1d 00:25; b9fd54ac7a269b6b 2020-06-09 11:25:17 asr2/radeonvii2 159805579 LL 98100000 61.39%; 1421 us/it; ETA 1d 00:21; b02b5beb8b4333cb 2020-06-09 12:27:57 asr2/radeonvii2 159805579 LL 98200000 61.45%; [COLOR=Red][B]37603[/B][/COLOR] us/it; ETA 26d 19:29; 9717c57f52232474 2020-06-09 12:27:57 asr2/radeonvii2 159805579 OK 98000000 (jacobi == -1) 2020-06-09 12:30:20 asr2/radeonvii2 159805579 LL 98300000 61.51%; 1428 us/it; ETA 1d 00:23; 9443427265a9205d 2020-06-09 12:32:42 asr2/radeonvii2 159805579 LL 98400000 61.57%; 1423 us/it; ETA 1d 00:16; f9d0aed9378ab348 2020-06-09 12:35:04 asr2/radeonvii2 159805579 LL 98500000 61.64%; 1422 us/it; ETA 1d 00:13; 70acc69859d13f28 2020-06-09 12:37:26 asr2/radeonvii2 159805579 LL 98600000 61.70%; 1422 us/it; ETA 1d 00:10; 614b0458ec0a61c5 2020-06-09 12:39:48 asr2/radeonvii2 159805579 LL 98700000 61.76%; 1422 us/it; ETA 1d 00:08; 465b5dcca306e59f 2020-06-09 12:39:48 asr2/radeonvii2 159805579 OK 98500000 (jacobi == -1) 2020-06-09 12:42:11 asr2/radeonvii2 159805579 LL 98800000 61.83%; 1423 us/it; ETA 1d 00:07; 5b900a5be77a3a4e 2020-06-09 12:44:33 asr2/radeonvii2 159805579 LL 98900000 61.89%; 1422 us/it; ETA 1d 00:03; a6a1a86307e43912 2020-06-09 12:46:55 asr2/radeonvii2 159805579 LL 99000000 61.95%; 1422 us/it; ETA 1d 00:01; 82e1ab78258c33aa 2020-06-09 12:49:17 asr2/radeonvii2 159805579 LL 99100000 62.01%; 1421 us/it; ETA 0d 23:58; 0021e9dd957648d6 2020-06-09 12:51:39 asr2/radeonvii2 159805579 LL 99200000 62.08%; 1422 us/it; ETA 0d 23:56; e8b31d3feae8b5d3 2020-06-09 12:51:39 asr2/radeonvii2 159805579 EE 99000000 (jacobi == 1) ... restart from -old.ll 2020-06-10 09:22:57 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 2020-06-10 09:22:57 device 2, unique id '' 2020-06-10 09:22:57 asr2/radeonvii2 159805579 FFT: 9M 1K:9:512 (16.93 bpw) 2020-06-10 09:22:57 asr2/radeonvii2 Expected maximum carry32: 2C430000 2020-06-10 10:13:47 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 2020-06-10 10:13:47 device 2, unique id '' 2020-06-10 10:13:47 asr2/radeonvii2 159805579 FFT: 9M 1K:9:512 (16.93 bpw) 2020-06-10 10:13:47 asr2/radeonvii2 Expected maximum carry32: 2C430000 2020-06-10 10:13:50 asr2/radeonvii2 OpenCL args "-DEXP=159805579u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0x8.60730821aeaf8p-3 -DIWEIGHT_STEP=0xf.47c6f52dba228p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-06-10 10:14:00 asr2/radeonvii2 OpenCL compilation in 9.91 s 2020-06-10 10:14:01 asr2/radeonvii2 159805579 LL 98000000 loaded: b9fd54ac7a269b6b 2020-06-10 10:16:25 asr2/radeonvii2 159805579 LL 98100000 61.39%; 1432 us/it; ETA 1d 00:33; b02b5beb8b4333cb 2020-06-10 10:18:48 asr2/radeonvii2 159805579 LL 98200000 61.45%; 1430 us/it; ETA 1d 00:28; 4445edc0fe724ead 2020-06-10 10:21:10 asr2/radeonvii2 159805579 LL 98300000 61.51%; 1428 us/it; ETA 1d 00:24; bee498ff8b570757 2020-06-10 10:23:33 asr2/radeonvii2 159805579 LL 98400000 61.57%; 1428 us/it; ETA 1d 00:21; 4c059943f2b79342 2020-06-10 10:25:56 asr2/radeonvii2 159805579 LL 98500000 61.64%; 1427 us/it; ETA 1d 00:18; d25596f6693a56b8 2020-06-10 10:28:19 asr2/radeonvii2 159805579 LL 98600000 61.70%; 1427 us/it; ETA 1d 00:15; 050b480356e38d20 2020-06-10 10:30:41 asr2/radeonvii2 159805579 LL 98700000 61.76%; 1427 us/it; ETA 1d 00:13; c14411b25adef25a 2020-06-10 10:30:41 asr2/radeonvii2 159805579 OK 98500000 (jacobi == -1) 2020-06-10 10:33:04 asr2/radeonvii2 159805579 LL 98800000 61.83%; 1429 us/it; ETA 1d 00:13; 143c9b9af02178b7 2020-06-10 10:35:27 asr2/radeonvii2 159805579 LL 98900000 61.89%; 1427 us/it; ETA 1d 00:08; 42b5d402613d85a6 2020-06-10 10:37:50 asr2/radeonvii2 159805579 LL 99000000 61.95%; 1427 us/it; ETA 1d 00:06; e94b9826e8127fcb 2020-06-10 10:40:12 asr2/radeonvii2 159805579 LL 99100000 62.01%; 1427 us/it; ETA 1d 00:03; 5959abbdfdeb1ce5 2020-06-10 10:42:35 asr2/radeonvii2 159805579 LL 99200000 62.08%; 1427 us/it; ETA 1d 00:02; cdb14c3fdfc6efb3 2020-06-10 10:42:35 asr2/radeonvii2 159805579 OK 99000000 (jacobi == -1) 2020-06-10 10:44:58 asr2/radeonvii2 159805579 LL 99300000 62.14%; 1430 us/it; ETA 1d 00:02; c4b27f268b46fbb9 2020-06-10 10:47:21 asr2/radeonvii2 159805579 LL 99400000 62.20%; 1428 us/it; ETA 0d 23:57; 3c64f72339788f86 2020-06-10 10:49:43 asr2/radeonvii2 159805579 LL 99500000 62.26%; 1428 us/it; ETA 0d 23:55; ba60a838de80ef50 2020-06-10 10:52:06 asr2/radeonvii2 159805579 LL 99600000 62.33%; 1427 us/it; ETA 0d 23:52; b867fe97fc92271e 2020-06-10 10:54:29 asr2/radeonvii2 159805579 LL 99700000 62.39%; 1428 us/it; ETA 0d 23:50; 5c82c2dec2aa9484 2020-06-10 10:54:29 asr2/radeonvii2 159805579 OK 99500000 (jacobi == -1) ...[/CODE] |
[QUOTE=Xyzzy;547627]Ken:
Having read this thread from start to finish recently, it is very apparent that your dogged dependence on Windows is causing you a lot of trouble and work. This thread feels dominated by these issues which makes it incredibly difficult to find useful information for the 99% of us that use Linux. We think that you should create a separate thread for Windows+gpuowl problems. To be blunt, your refusal to learn something new (Linux) is irrational. Linux costs no money to deploy. Linux is simpler to manage. Linux drivers for AMD cards are dramatically better. Remote administration via the command line is ridiculously simple. The developer of gpuowl chose to use the Linux toolchain and environment for a very good reason! You have countless extremely-verbose posts complaining about blizzard-like compile-time errors, obscure compiler incompatibilities, driver failures, remote viewer problems, GPU-Z not displaying right, hung processes and all sorts of bizarre things. In fact, we think your approach is rude and it reeks of entitlement. The developer has a finite amount of time to work on this project, from which he earns no money. This thread is the primary communication medium for his project. You expect him to read through your posts and possibly suggest ideas to help, knowing that his time is extremely limited and that your usage is highly unconventional. Rather than changing your workflow to accommodate him, you expect him to change his workflow to accommodate you. You are obviously a smart man and certainly much smarter than we are. If we can run Linux successfully it will be trivial for you to figure out. In the end, you will be more productive. And a lot happier! Please note that we are not attacking you as a person. We are questioning your behavior. We can only hope that if we step out of line in any area of our life that you all would be kind enough to point that out to us. What makes us strong in the end is working together for the common good. :mike:[/QUOTE]Your post quoted above occurs to me as intemperate and unjustified, containing unsupported assumptions and conclusions. A PM may have been more appropriate. But hey, you launched and run the forum and certainly are entitled to an opinion. I spend several hours daily working to forward the project. My tests, purchases, posts and blog are intended to assist, not annoy. Questions I answer free the excellent programmers to focus on code. Problem reports by various people help the program improve. Mihai with the help of others has expanded it to support NVIDIA as well as AMD, on Windows as well as Linux (and MacOS too I think has been run). Kracker and I post Windows builds periodically as a convenience to others. When I do, I post the build log as a separate attachment so they can see how it was created. If it fails to build, again, the build log is an attachment, so that if Mihai chooses to try to fix the problem, the full information is there to refer to, but it doesn't clutter the thread with lengthy inline text. I'm skeptical of the 99% linux usage of gpuowl. Could you point me to poll results? Linux is clearly popular with some people, and operates a lot of important systems worldwide, but some users don't have the option of switching to it, or switching fully, for various reasons, such as a business owns the hardware and they haven't administrative access or permission to alter the system, or the significant other would not accept it on a home system that's shared. I've taken many runs at learning Linux beginning with Slackware in 1996 and Red Hat in 1999, Debian multiple versions, Ubuntu 18 and 19, WSL, but it's always occurred to me as analogous to being dropped into a foreign country blindfolded with a mission but no map, dictionary, translator, or knowledge of where to find the tools. Any attempt to do anything nontrivial degenerates into lengthy recursive web search for information of what to do, with what, to what, how, etc. It is for me at least an order of magnitude slower to do almost anything in Linux, that I can do easily in Windows. No doubt for some it's the reverse, Linux is familiar and Windows is not. I'm at an age where what I call the "teflon neuron" effect holds; new learning occurs slowly, and evaporates quickly. (Hence writing so much down, and a huge blog.) The trivial cost of a Windows license (~$2/license key online) is essentially free in comparison. Nevertheless: [QUOTE=kriesel;547613] Longer term plan is to install Ubuntu on a different HD on this hardware for a head to head comparison.[/QUOTE] Ernst, an accomplished programmer and Linux user, is having trouble getting gpuowl to run on his NUC. Linux is not a panacea; no OS is. George Woltman develops for and releases mprime/prime95 on many platforms, including Windows, for good reason. That's where many users and potential users are. He's also stated that gpuowl on Radeon VII has such good performance that it is no longer worthwhile to buy and feed cpus with electricity to do primality tests. I don't think it reasonable to expect all Windows GIMPS participants to switch to Linux, or all gpu owners who'd like to run gpuowl. Per Madpoo a while back, all GIMPS Mersenne prime discoveries to date have been made on Windows. There are a lot of gaming pcs out there with good gpus and a lot of them have Windows installed. Gpuowl is too good to leave as a niche program (one OS, one gpu vendor). Windows interest in gpuowl was immediate and Mihai's early response was supportive: [url]https://www.mersenneforum.org/showpost.php?p=457059&postcount=4[/url] Perhaps the gpuowl program has become successful and important enough to justify its own subforum, with perhaps a managed modest sized number of threads, something like: [LIST][*]Code development (informal discussion thread)[*]Announcements (Mihai or perhaps George post here, such as for a new version or significant feature addition, locked thread suggested)[*]Windows-specific (builds get posted here; OS-specific bug reports; open for peer support solving problems) or maybe that's two threads[*]Linux-specific (maybe builds get posted here; OS-specific bug reports if any; open for peer support solving problems) or maybe that's two threads. Rarely, there are requests on the forum for linux builds to download.[*]Bug reports (that seem generic, not OS-specific; open for peer support solving problems)[*]Feature requests[*]More?[/LIST]Since gpuowl is still primarily Mihai's baby, he probably should have a say in its subforum's design if it goes that way. What he would find useful and efficient is a consideration. The preceding should not be mistaken for volunteering to run such a subforum, nor expecting Mihai to. |
[QUOTE=kriesel;547642]Your post quoted above occurs to me as intemperate and unjustified, containing unsupported assumptions and conclusions. A PM may have been more appropriate. But hey, you launched and run the forum and certainly are entitled to an opinion.
.[/QUOTE] Don't worry, Linux only had a market share of 1.71% in March 2020, while Windows had a 77.1% share. Your gpuowl builds for Windows are very helpful for me, at least I haven't had a kernel panic under Windows 10 Pro till now. |
[QUOTE=preda;547515]Yes, can you please install GMP 6.1 or 6.2? It seems gcd() was added to the c++ wrapper after 6.0 (a hypothesis).[/QUOTE]GMP 6.1.2 works.
|
[QUOTE=moebius;547645]Don't worry, Linux only had a market share of 1.71% in March 2020, while Windows had a 77.1% share. Your gpuowl builds for Windows are very helpful for me, at least I haven't had a kernel panic under Windows 10 Pro till now.[/QUOTE]
The relative 'market shares' among gpuowl users (both current and want-to-be) is what matters. Perhaps we need separate "gpuowl linux build issues" and "gpuowl windows build issues" threads specifically for build & run issues on those 2 OS flavors? This current thread could remain for general program development. It seems most of the recent thread content has been dominated by one specific build issue after another. |
[QUOTE=preda;547310]Anyway, by default P-1 will assume it runs in single-process and attempt to allocate "all it can" for itself, but no more than 16GB. When running 2-process you're supposed to use the -maxAlloc to indicate how much memory *you* allocate to each process. (e.g.: -maxAlloc 7500 for about 7.5G limit).
BTW, the processes were not idling, they were just running extremely slowly because the buffers were on host memory instead of GPU (because that's a wise ROCm choice, oh yes).[/QUOTE]Our card has 8GB. It "hung" last night. We will now use [C]maxAlloc=4096[/C] or something like that. We think that the default memory allocation is a bit too aggressive. "HUNG" SPEED[CODE]2020-06-11 03:09:35 gfx1012-0 108928049 P1 1442134 100.00%; 4394 us/it; ETA 0d 00:00; a570b7758edb17d9 2020-06-11 03:09:35 gfx1012-0 108928049 P2 using blocks [33 - 999] to cover 1476003 primes 2020-06-11 03:09:35 gfx1012-0 108928049 P2 using 146 buffers of 48.0 MB each 2020-06-11 03:10:10 gfx1012-0 108928049 P1 GCD: no factor 2020-06-11 03:58:02 gfx1012-0 108928049 P2 146/2880: 75657 primes; setup 33.47 s, 37.973 ms/prime 2020-06-11 04:46:00 gfx1012-0 108928049 P2 292/2880: 74852 primes; setup 33.56 s, 37.993 ms/prime 2020-06-11 05:34:02 gfx1012-0 108928049 P2 438/2880: 74939 primes; setup 33.41 s, 38.008 ms/prime 2020-06-11 06:22:01 gfx1012-0 108928049 P2 584/2880: 74616 primes; setup 33.58 s, 38.134 ms/prime 2020-06-11 07:11:03 gfx1012-0 108928049 P2 730/2880: 74939 primes; setup 33.73 s, 38.810 ms/prime 2020-06-11 07:59:17 gfx1012-0 108928049 P2 876/2880: 74852 primes; setup 33.55 s, 38.210 ms/prime 2020-06-11 08:47:13 gfx1012-0 108928049 P2 1022/2880: 74774 primes; setup 33.84 s, 38.012 ms/prime 2020-06-11 09:35:15 gfx1012-0 108928049 P2 1168/2880: 74947 primes; setup 33.49 s, 38.011 ms/prime 2020-06-11 10:23:37 gfx1012-0 108928049 P2 1314/2880: 74700 primes; setup 33.41 s, 38.398 ms/prime 2020-06-11 11:11:58 gfx1012-0 108928049 P2 1460/2880: 74931 primes; setup 33.49 s, 38.271 ms/prime[/CODE]REGULAR SPEED[CODE]2020-06-11 12:22:36 gfx1012-0 108928049 P2 using blocks [33 - 999] to cover 1476003 primes 2020-06-11 12:22:36 gfx1012-0 108928049 P2 using 67 buffers of 48.0 MB each 2020-06-11 12:25:24 gfx1012-0 108928049 P2 1527/2880: 34330 primes; setup 1.64 s, 4.812 ms/prime 2020-06-11 12:28:09 gfx1012-0 108928049 P2 1594/2880: 34382 primes; setup 1.63 s, 4.751 ms/prime 2020-06-11 12:30:53 gfx1012-0 108928049 P2 1661/2880: 34222 primes; setup 1.67 s, 4.752 ms/prime 2020-06-11 12:33:38 gfx1012-0 108928049 P2 1728/2880: 34427 primes; setup 1.68 s, 4.754 ms/prime 2020-06-11 12:36:23 gfx1012-0 108928049 P2 1795/2880: 34351 primes; setup 1.63 s, 4.752 ms/prime 2020-06-11 12:39:07 gfx1012-0 108928049 P2 1862/2880: 34346 primes; setup 1.66 s, 4.738 ms/prime 2020-06-11 12:41:52 gfx1012-0 108928049 P2 1929/2880: 34477 primes; setup 1.68 s, 4.737 ms/prime 2020-06-11 12:44:36 gfx1012-0 108928049 P2 1996/2880: 34187 primes; setup 1.59 s, 4.738 ms/prime 2020-06-11 12:47:20 gfx1012-0 108928049 P2 2063/2880: 34247 primes; setup 1.61 s, 4.738 ms/prime 2020-06-11 12:50:04 gfx1012-0 108928049 P2 2130/2880: 34186 primes; setup 1.69 s, 4.741 ms/prime 2020-06-11 12:52:48 gfx1012-0 108928049 P2 2197/2880: 34376 primes; setup 1.64 s, 4.740 ms/prime 2020-06-11 12:55:33 gfx1012-0 108928049 P2 2264/2880: 34310 primes; setup 1.66 s, 4.740 ms/prime 2020-06-11 12:58:16 gfx1012-0 108928049 P2 2331/2880: 34008 primes; setup 1.64 s, 4.742 ms/prime 2020-06-11 13:01:00 gfx1012-0 108928049 P2 2398/2880: 34314 primes; setup 1.64 s, 4.742 ms/prime 2020-06-11 13:03:43 gfx1012-0 108928049 P2 2465/2880: 34126 primes; setup 1.61 s, 4.739 ms/prime 2020-06-11 13:06:27 gfx1012-0 108928049 P2 2532/2880: 34252 primes; setup 1.68 s, 4.738 ms/prime 2020-06-11 13:09:11 gfx1012-0 108928049 P2 2599/2880: 34253 primes; setup 1.66 s, 4.741 ms/prime 2020-06-11 13:11:55 gfx1012-0 108928049 P2 2666/2880: 34316 primes; setup 1.64 s, 4.739 ms/prime 2020-06-11 13:14:40 gfx1012-0 108928049 P2 2733/2880: 34431 primes; setup 1.64 s, 4.738 ms/prime 2020-06-11 13:17:24 gfx1012-0 108928049 P2 2800/2880: 34264 primes; setup 1.66 s, 4.743 ms/prime 2020-06-11 13:20:09 gfx1012-0 108928049 P2 2867/2880: 34349 primes; setup 1.58 s, 4.738 ms/prime[/CODE]PS - We'd like to speed up the spinner. What source file should we poke around in? |
@Mike: Hmm, 146*48/8192 = 0.855, the default run mode was using less than 90% of the card memory. I wonder if different cards have significantly different "system memory" usages by their respective card-management software.
You noted that you were only running 1 program instance per card, so -maxAlloc=4096 should be fine, but were you running 2 instances, looks like you'd need to cut that down to around 3000 per job. Might be useful to find out more precisely where your card's max-mem-to-use lies - you could move the next PFactor line in your worktodo to make it the topmost line, save, kill run and restart with some high value of maxAlloc, wait 'til it hits stage 2 to see if it's slow, if so kill and restart with a slightly lower -maxAlloc value until you see the speed come back to normal non-swap level. |
[QUOTE=Xyzzy;547731]PS - We'd like to speed up the spinner. What source file should we poke around in?[/QUOTE]
Gpu.cpp returns stuff for [c]grep spin[/c], but I can't see how to speed it up in that file. |
[QUOTE=ewmayer;547734]Might be useful to find out more precisely where your card's max-mem-to-use lies - you could move the next PFactor line in your worktodo to make it the topmost line, save, kill run and restart with some high value of maxAlloc, wait 'til it hits stage 2 to see if it's slow, if so kill and restart with a slightly lower -maxAlloc value until you see the speed come back to normal non-swap level.[/QUOTE]We need to find a gpu-top program!
We'll probably just do the P-1 work on the CPU. Is there any chance George can add a work-type to Primenet to P-1 test first-run PRP tests before they are issued for PRP testing? :mike: |
[QUOTE=Xyzzy;547747]We'll probably just do the P-1 work on the CPU.
Is there any chance George can add a work-type to Primenet to P-1 test first-run PRP tests before they are issued for PRP testing?[/QUOTE] I thought you said you wanted this new setup to run in set-it-and-forget-it mode ... lacking the Primenet worktype option you mention (which would indeed be useful), running p-1 on the CPU means manually fiddling all your gpuowl worktodo-file entries to have a trailing '0' before the program gets around to auto-splitting them into PFactor=... and PRP=...,0 pairs, or moving all the PFactor entries resulting from such auto-splitting into your mprime's worktodo file. Even with 'just' 4GB card memory allocated for your runs, the resulting p-1 should be more or less as efficient as it can be. Once the number of stage 2 buffers gets above the 50-100 range there is really negligible performance gain to be had from using more buffers. But I would be interested in a comparison of p-1 runtime on your CPU vs GPU, both using similar expos (= same FFT lengths) and stage bounds. Do you have any recent p-1 work captured in your mprime logfile on that system? |
[QUOTE=Xyzzy;547747]We need to find a gpu-top program!
:mike:[/QUOTE] [URL="https://awesomedetect.com/how-to-monitor-amd-ati-or-radeon-gpu-usage-in-linux/"]sudo radeontop[/URL] works for me. |
Somewhere between the Linux kernel, ROCm 3.5.0 and gpuOwl there is no longer the creeping kworker CPU-hogger problem; There is no need to stop/start gpuOwl to get rid of it.
|
[QUOTE=paulunderwood;547847]Somewhere between the Linux kernel, ROCm 3.5.0 and gpuOwl there is no longer the creeping kworker CPU-hogger problem; There is no need to stop/start gpuOwl to get rid of it.[/QUOTE]
It's the amdgpu (that is part of the Linux kernel) that has the fix. Most likely ROCm 3.5 has nothing to do with it, as it was fixed for me on ROCm 3.3 by updating the kernel. How do you see the performance of ROCm 3.5 compared to 3.3? |
[QUOTE=preda;547855]It's the amdgpu (that is part of the Linux kernel) that has the fix. Most likely ROCm 3.5 has nothing to do with it, as it was fixed for me on ROCm 3.3 by updating the kernel.
How do you see the performance of ROCm 3.5 compared to 3.3?[/QUOTE] It is [I]slower[/I] with ROCm 3.5 :down: But I have compensated by overclocking the RAM. And that is another thing -- I get fewer GEC errors with ROCm 3.5 with the memory at 1200 -- in fact one error in the last 6 tests. I am also daily adjusting sclk (3 or 4) and fans depending the ambient temperature. I never let the junction temperature go over 95C. 2 instances at 5.5M FFT: sclk 3: 1423 µs/it sclk 4: 1317 µs/it |
[QUOTE=Xyzzy;547747]Is there any chance George can add a work-type to Primenet to P-1 test first-run PRP tests before they are issued for PRP testing?[/QUOTE]Isn't the work type in mPrime95 enough? If not Chris has a solution.
|
strange error
[CODE]2020-06-13 21:47:51 f582388172fd5d41 104975743 OK 104400000 99.45%; 1321 us/it; ETA 0d 00:13; 70d06a8a5f7db9ce (check 0.67s)
2020-06-13 21:51:17 f582388172fd5d41 104975743 OK 104600000 99.64%; 1025 us/it; ETA 0d 00:06; 21631a5aea1b4537 (check 0.41s) 2020-06-13 21:53:34 f582388172fd5d41 104975743 OK 104800000 99.83%; 684 us/it; ETA 0d 00:02; 35fc341e287f3108 (check 0.42s) 2020-06-13 21:55:34 f582388172fd5d41 CC 104975743 / 104975743, fd81bec8b0e1a661 2020-06-13 21:55:35 f582388172fd5d41 104975743 EE 104976000 100.00%; 684 us/it; ETA 0d 00:00; 5c9491730bed16cb (check 0.37s) 2020-06-13 21:55:35 f582388172fd5d41 104975743 OK 104800000 loaded: blockSize 400, 35fc341e287f3108 2020-06-13 21:56:44 f582388172fd5d41 104975743 OK 104900000 99.93%; 684 us/it; ETA 0d 00:01; 12a1da4dffe7c5d0 (check 0.45s) 1 errors 2020-06-13 21:57:36 f582388172fd5d41 CC 104975743 / 104975743, fd81bec8b0e1a661 [/CODE] I had just stopped instance 2 to let instance 1 catch so I could have them synchronized. There appears to be no error because the res64s match. What gives? |
[QUOTE=paulunderwood;547905][CODE]2020-06-13 21:47:51 f582388172fd5d41 104975743 OK 104400000 99.45%; 1321 us/it; ETA 0d 00:13; 70d06a8a5f7db9ce (check 0.67s)
2020-06-13 21:51:17 f582388172fd5d41 104975743 OK 104600000 99.64%; 1025 us/it; ETA 0d 00:06; 21631a5aea1b4537 (check 0.41s) 2020-06-13 21:53:34 f582388172fd5d41 104975743 OK 104800000 99.83%; 684 us/it; ETA 0d 00:02; 35fc341e287f3108 (check 0.42s) 2020-06-13 21:55:34 f582388172fd5d41 CC 104975743 / 104975743, fd81bec8b0e1a661 2020-06-13 21:55:35 f582388172fd5d41 104975743 EE 104976000 100.00%; 684 us/it; ETA 0d 00:00; 5c9491730bed16cb (check 0.37s) 2020-06-13 21:55:35 f582388172fd5d41 104975743 OK 104800000 loaded: blockSize 400, 35fc341e287f3108 2020-06-13 21:56:44 f582388172fd5d41 104975743 OK 104900000 99.93%; 684 us/it; ETA 0d 00:01; 12a1da4dffe7c5d0 (check 0.45s) 1 errors 2020-06-13 21:57:36 f582388172fd5d41 CC 104975743 / 104975743, fd81bec8b0e1a661 [/CODE] I had just stopped instance 2 to let instance 1 catch so I could have them synchronized. There appears to be no error because the res64s match. What gives?[/QUOTE] I'm not sure how you could not notice that, but similar things happen to me too sometimes. After detecting the error, gpuOwl goes back to last correct residue and starts again, with doubled error-check frequency (instead of every 200,000 iterations it's 100,000, and if there is another error, 50,000 and so on... You get it.) Because it went back, it's valid now, and that's the good or rather best thing about GEC. (I think Jacobi does a similar thing, but I don't remember seeing it on my computer, perhaps due to low detection rate.) |
We set up a work pool directory today for our two cards. It works great!
Do we have to stop gpuowl to add work to the worktodo.txt file? We tried using a worktodo.add file but nothing has happened yet. (For now we are manually filling up the pool. Maybe later we will get the python thingie running.) :mike: |
[QUOTE=Xyzzy;547909]We set up a work pool directory today for our two cards. It works great!
Do we have to stop gpuowl to add work to the worktodo.txt file? We tried using a worktodo.add file but nothing has happened yet.[/QUOTE] Are you running 1 job per card or 2? As I noted in an edit to my how-to-under-linux thread, even if your particular card gives no better total throughput running 2 jobs, it makes sense to do so as "crash protection insurance" - one of the 2 jobs I run on my Haswell-system's R7 coredumped the other night, if that had been the only instance, I would've lost ~10 hours crunching. You can fiddle worktodo during the run - as long as your current assignment isn't just about to finish (or to complete a p-1 try) and you don't change line 1 of the file, that is safe. |
Last night's before-going-to-bed run check showed that one of the 2 jobs on card2 of my 3-R7 system had aborted due to repeated errors:
[code]2020-06-13 17:46:50 412688e172fd62d9 105965933 OK 87200000 82.29%; 1390 us/it; ETA 0d 07:15; 3bd02b5e45382bcc (check 0.84s) 2020-06-13 17:51:28 412688e172fd62d9 105965933 EE 87400000 82.48%; 1386 us/it; ETA 0d 07:09; 083b13bb609c0724 (check 0.79s) 2020-06-13 17:51:29 412688e172fd62d9 105965933 OK 87200000 loaded: blockSize 400, 3bd02b5e45382bcc 2020-06-13 17:53:49 412688e172fd62d9 105965933 OK 87300000 82.38%; 1389 us/it; ETA 0d 07:12; 2761f6451aecc6dc (check 0.79s) 1 errors 2020-06-13 17:56:08 412688e172fd62d9 105965933 EE 87400000 82.48%; 1386 us/it; ETA 0d 07:09; 083b13bb609c0724 (check 0.74s) 1 errors 2020-06-13 17:56:09 412688e172fd62d9 105965933 OK 87300000 loaded: blockSize 400, 2761f6451aecc6dc 2020-06-13 17:57:19 412688e172fd62d9 105965933 OK 87350000 82.43%; 1388 us/it; ETA 0d 07:11; 4d1b42bc1dbcf1e1 (check 0.75s) 2 errors 2020-06-13 17:58:29 412688e172fd62d9 105965933 EE 87400000 82.48%; 1392 us/it; ETA 0d 07:11; 083b13bb609c0724 (check 0.79s) 2 errors 2020-06-13 17:58:30 412688e172fd62d9 105965933 OK 87350000 loaded: blockSize 400, 4d1b42bc1dbcf1e1 2020-06-13 17:59:40 412688e172fd62d9 105965933 EE 87400000 82.48%; 1389 us/it; ETA 0d 07:10; 083b13bb609c0724 (check 0.77s) 3 errors 2020-06-13 17:59:41 412688e172fd62d9 105965933 OK 87350000 loaded: blockSize 400, 4d1b42bc1dbcf1e1 2020-06-13 18:00:51 412688e172fd62d9 105965933 EE 87400000 82.48%; 1385 us/it; ETA 0d 07:09; 083b13bb609c0724 (check 0.81s) 4 errors 2020-06-13 18:00:51 412688e172fd62d9 3 sequential errors, will stop. 2020-06-13 18:00:51 412688e172fd62d9 Exiting because "too many errors" 2020-06-13 18:00:51 412688e172fd62d9 Bye[/code] Attempted restart of run hit same issue ... possibly the GEC residue got corrupted somehow? Card has been running at [linux, ROCm] sclk=3, mclk=1150MHz very stably for last month, temps were fine, and other job running on same card suffered no issues. By way of temporary workaround moved the above assignment to bottom of worktodo file and restarted, no issues with the next assignment. Should I try copying 105965933-old.owl to 105965933.owl and restarting? |
[QUOTE=ewmayer;547992]Last night's before-going-to-bed run check showed that one of the 2 jobs on card2 of my 3-R7 system had aborted due to repeated errors:
Should I try copying 105965933-old.owl to 105965933.owl and restarting?[/QUOTE] Please try building from the latest source. A week or so ago I checked in changes that target a 0.5% chance of roundoff failure during the very top end of the FFT where all accuracy options are turned on. It also targets a 0.1% probability of error if only some or none of the accuracy options are turned on. The previous version was not as good at keeper the failure probability that low. P.S. We've talked about adding a command line option that lets you be more aggressive or more conservative. Not yet implemented. |
[QUOTE=Prime95;547997]Please try building from the latest source. A week or so ago I checked in changes that target a 0.5% chance of roundoff failure during the very top end of the FFT where all accuracy options are turned on.[/QUOTE]
Ah yes, I see the expo in question is very near what we can expect to be able to use 5.5M FFT - I had run multiple PRPs of expos in that range, some even larger, without issue, thus hearing it was likely due to ROE was unexpected. Of course the vagueness of that 'EE' error code emitted by the program does not help in that regard. Retry with new build (and worktodo refiddled to restore problematic assignment to top) looks good, but the "mysteries of the ROCm code management engine" front, previously the 2 jobs were getting ~1365 us/iter each, but now job1 (still using older build) is running at 1710 us/iter whereas new-build-using job2 is at 1160 us/iter. Total throughput more or less same, though. Oh, here the list of all the expos I've run in the 105M+ range on that system - the problematic one is starred, you can see there are multiple larger ones which caused no problem using the same build: [code]105615283 105712007 105809183 105809189 105809299 105809351 105810461 105810857 105810857 105810979 105813713 105813853 105813859 105813941 105815543 105815581 105815627 105840467 105843821 105844853 105846053 105892097 105892109 105892159 105892211 105892307 105892399 105892459 105892693 105893233 105893321 105894211 105894319 105894329 105894947 105895001 105895157 105895297 105897229 105897839 105900511 105900539 105900703 105900797 105904387 105904529 105904739 105949913 105950011 105956441 105959179 105963503 105964751 105965887 105965933* 105969967 105973331 105980201 105981719 105981937 105984089 105984521 105987407 105987709 105989249 105991979 105992911 105995317 105997069 105997247 105998617[/code] This would appear to confirm my long-running stance that it is unreasonable to expect ROEs to behave in perfectly monotone fashion with exponent at a given FFT size, so we should set our breakover points as aggressively as reasonably possible, but build in mechanisms to gracefully handle such high-ROE cases which (as determined by the GEC) are not due to data corruption. Various internal flags to ratchet up the convolution floating-point accuracy at a given FFT length are the obvious first line of defense, but a simple "if said flags are already at their most-accurate settings, and we still hit a dangerously high ROE, just switch to the next-larger FFT length and complete the current run at that" is the obvious last line of defense in my view. (I do get the "using such dodges disincentivizes us coders from working to our utmost to rein in ROEs" moral-hazard aspect of the issue, though.) [b]Edit:[/b] Oh - you mention an as-yet-unimplemented CL flag ... you know what would be really useful? Enhanced error reporting that informs the user whether that 'EE' was due to simple GEC failure or dangerous-ROE-detected. Say I still hit an ROE-related failure using the current build, and there is no new build promising more accuracy. If I could discern from the log that ROEs were behind the abort, I could simply restart and force a slightly higher FFT length, either for the rest of the run or just the next few checkpoints. In the case I hit, it got through ~80% of the run before hitting the error, based on the above all-exponents-in-this-range data said error was clearly an outlier, so simply running for a few minutes ~6M FFT, then reverting to default would have worked around the problem. I'll probably make that my SOP going forward since I have a lot of familiarity with max-expo/FFT-length numbers, but your average user will have no clue. |
[QUOTE=ewmayer;547999]This would appear to confirm my long-running stance that it is unreasonable to expect ROEs to behave in perfectly monotone fashion with exponent at a given FFT size, so we should set our breakover points as aggressively as reasonably possible, but build in mechanisms to gracefully handle such high-ROE cases which (as determined by the GEC) are not due to data corruption. Various internal flags to ratchet up the convolution floating-point accuracy at a given FFT length are the obvious first line of defense, but a simple "if said flags are already at their most-accurate settings, and we still hit a dangerously high ROE, just switch to the next-larger FFT length and complete the current run at that" is the obvious last line of defense in my view.
(I do get the "using such dodges disincentivizes us coders from working to our utmost to rein in ROEs" moral-hazard aspect of the issue, though.)[/QUOTE] The probability of an excessive roundoff in p iterations or less is what matters. If the probability of an ROE is low in 10p our chances of completion are pretty good. I did a lengthy study long ago, on gpuowl v1.9's -fft M61 -size 4M, of what the exponent limit could be. See the attachment and description at [URL]https://www.mersenneforum.org/showpost.php?p=498231&postcount=8[/URL]. If you'd like some end user/ tester participation in finding or testing the limits for various fft lengths, please share. CUDALucas reports any fft length changes it makes due to RO values in its console output, either increases due to high RO values, or decreases after a stretch of low RO values on increased fft length. Gpuowl could report its somewhere too. Then that could be gathered and fed back into program updates. We users trust you to push exponents limit per fft length within reason. No miracles expected. |
[QUOTE=ewmayer;547999]you know what would be really useful? Enhanced error reporting that informs the user whether that 'EE' was due to simple GEC failure or dangerous-ROE-detected.[/QUOTE]
gpuowl does not know if the error is ROE related -- all it knows is that GEC failed. You can try running with "-use STATS" to see what the ROE is once an EE occurs. Unlike CPUs, determining ROE seems a bit expensive so gpuowl never turns that on by default. I do not know if you can switch to a 6M length form a 5.5M savefile and vice versa. Mihai needs to weigh in. |
@George:
pretty sure savefiles are FFT-length-independent ... I played with some force-lower-than-default-FFT-length tries a couple months ago for some expos just above then then-5M/5.5M threshold. At least one made it to iter 1M before hitting repeatable roundoff errors, was able to resume at the default 5.5m FFT length w/o issues. Re. -use STATS, useful to know, but again all the onus is on the user. How about the following simple scheme? If run hits GEC, program automatically enables -use STATS for the retry-from-last-good-GEC-checkpoint interval. Then, 1 of 2 things happens: 1. Retry fails repeatably: in this case program aborts, but user has some hopefully-useful ROE data to go on, to see if resuming at the next-larger FFT length is likely to be useful. 2. Retry succeeds: in this case program automatically shuts off -use STATS, and all that has been lost is a tiny bit of runtime running the retry interval in slower ROE-data-gathering mode. |
[QUOTE=Prime95;547997]P.S. We've talked about adding a command line option that lets you be more aggressive or more conservative.[/QUOTE]Please add this!
:mike: |
| All times are UTC. The time now is 07:02. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.