mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwl Windows setup for Radeon VII (https://www.mersenneforum.org/showthread.php?t=24938)

kriesel 2019-11-17 20:47

[QUOTE=Prime95;530850]I think P-1 uses the same code. If I'm right, P-1 should work too.[/QUOTE]First test I ran on your new version was the 24M P-1 test, which did both stages with no logged errors and found a known factor as it should. Bigger P-1 tests will follow. [CODE]gpuowl-win -user kriesel -cpu roa/radeonvii -use FMA_X2 -device 1 -maxAlloc 15000[/CODE]Interestingly, that uses over 16000MB of gpu ram.

Prime95 2019-11-17 20:51

Rebuilt with proper version info. Please use so that proper version info is reported to PrimeNet.

[url]https://www.dropbox.com/s/7g01ossm3bu28kx/gpuowl-win.exe.zip?dl=0[/url]

xx005fs 2019-11-17 21:29

[QUOTE=Prime95;530852]How do you do that? Using Wattman all I can see to change is the memory clock.[/QUOTE]

I use AMDMemoryTweak that I got from GitHub ([url]https://github.com/Eliovp/amdmemorytweak[/url]), there are a lot of variables in the memory timing table, and I think it would be helpful searching for Radeon VII XMR mining timing since I don't own one and I can't test whether similar timings from Vega 64 can apply. The program can also tweak typical Wattman settings such as core clk, mem clk, core vid, mem vid and such.

Prime95 2019-11-18 03:32

Performance note: I've found that -use ORIG_X2 performs better than -use FMA_X2. For me it is 963 us vs 975 us.

This makes sense as the alternate X2 implementations were to get around a rocm bug that is generating poor code. Windows does not use rocm to compile OpenCL.

kriesel 2019-11-18 04:32

[QUOTE=kriesel;530859]Bigger P-1 tests will follow.[/QUOTE]50M found expected factor; 100M ran normally, 150M stage 2 gcd finished after some hiccups and restarts. 200M though:[CODE]...
2019-11-17 21:23:01 200001103 P1 1880000 82.47%; 2254 us/sq; ETA 0d 00:15; 4ac74359de6d14b6
2019-11-17 21:23:24 200001103 P1 1890000 82.91%; 2253 us/sq; ETA 0d 00:15; ced92ca6e7f121ed
2019-11-17 21:23:46 200001103 P1 1900000 83.35%; 2189 us/sq; ETA 0d 00:14; 0000000000000000
2019-11-17 21:24:08 200001103 P1 1910000 83.79%; 2161 us/sq; ETA 0d 00:13; 0000000000000000
...[/CODE]continues iterations with zeroes. Zero and one as res64 values are long-known error condition residues for CUDAPm1 (see bug and wish list item # 15), as are certain values for CUDALucas (item # 4) and other applications. Apparently gpuowl does not check for them or recover from that condition.

kriesel 2019-11-18 12:13

The zero res64 is concealed in stage 2 because of the output format.
[CODE]2019-11-17 23:44:22 200001103 P2 2754/2880: setup 2263 ms; 3810 us/prime, 89616 primes
2019-11-17 23:47:27 200001103 P2 2880/2880: setup 1978 ms; 2482 us/prime, 73759 primes
2019-11-17 23:47:37 waiting for background GCDs..
2019-11-17 23:47:37 300001133 FFT 18432K: Width 256x4, Height 256x4, Middle 9; 15.89 bits/word
2019-11-17 23:47:37 OpenCL args "-DEXP=300001133u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DWEIGHT_STEP=0x8.9b2caa0a7102p-3 -DIWEIGHT_STEP=0xe.df8276d3383b8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DFMA_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
Assertion failed: mpz_cmp_ui(b, 0), file GmpUtil.cpp, line 25
Sun 11/17/2019 23:47:42.06 C:\Users\ken\Documents\gpuowl-gw-patch>[/CODE]GMP errors on it. Gpuowl removes the worktodo entry, as if it was completed, although it was not.

kriesel 2019-11-18 13:37

Not reproducible
 
The 200M zero res64 was not reproduced in a second run of the same exponent and bounds (same worktodo line). At least, not in stage one, which is complete through the gcd.

kriesel 2019-11-18 18:11

300M P-1 switched to and from and back to zero residues
 
[CODE]2019-11-18 08:48:38 300001133 FFT 18432K: Width 256x4, Height 256x4, Middle 9; 15.89 bits/word
2019-11-18 08:48:38 OpenCL args "-DEXP=300001133u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DWEIGHT_STEP=0x8.9b2caa0a7102p-3 -DIWEIGHT_STEP=0xe.df8276d3383b8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DFMA_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-11-18 08:48:43 OpenCL compilation in 4759 ms
2019-11-18 08:48:49 300001133 P1 B1=2410000, B2=55430000; 3476394 bits; starting at 0
2019-11-18 08:49:29 300001133 P1 10000 0.29%; 3915 us/sq; ETA 0d 03:46; ff6497ab4324ddc7
2019-11-18 08:50:08 300001133 P1 20000 0.58%; 3975 us/sq; ETA 0d 03:49; 6152873e4a9521be
2019-11-18 08:50:48 300001133 P1 30000 0.86%; 3978 us/sq; ETA 0d 03:49; 1b1e5e0242383bc3
2019-11-18 08:51:28 300001133 P1 40000 1.15%; 3975 us/sq; ETA 0d 03:48; 310b1a288109077b
2019-11-18 08:52:08 300001133 P1 50000 1.44%; 3977 us/sq; ETA 0d 03:47; 730bb98f460ccb45
2019-11-18 08:52:22 200001103 P2 GCD: no factor
2019-11-18 08:52:22 {"exponent":"200001103", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":""}, ...
2019-11-18 08:52:48 300001133 P1 60000 1.73%; 3982 us/sq; ETA 0d 03:47; 6f784693e276fb51
2019-11-18 08:53:28 300001133 P1 70000 2.01%; 3984 us/sq; ETA 0d 03:46; e5688d0931ac386f
2019-11-18 08:54:08 300001133 P1 80000 2.30%; 4048 us/sq; ETA 0d 03:49; ac6f4afd8e652ee9
2019-11-18 08:54:48 300001133 P1 90000 2.59%; 3974 us/sq; ETA 0d 03:44; f39c63b52c69dd19
2019-11-18 08:55:28 300001133 P1 100000 2.88%; 3967 us/sq; ETA 0d 03:43; 2587d5fe7390721b
2019-11-18 08:56:08 300001133 P1 110000 3.16%; 3960 us/sq; ETA 0d 03:42; b74a4b143645fb43
2019-11-18 08:56:46 300001133 P1 120000 3.45%; 3814 us/sq; ETA 0d 03:33; 0000000000000000
2019-11-18 08:57:24 300001133 P1 130000 3.74%; 3804 us/sq; ETA 0d 03:32; 0000000000000000
...
2019-11-18 10:17:25 300001133 P1 1390000 39.98%; 3791 us/sq; ETA 0d 02:12; 0000000000000000
2019-11-18 10:18:03 300001133 P1 1400000 40.27%; 3788 us/sq; ETA 0d 02:11; 0000000000000000
2019-11-18 10:18:41 300001133 P1 1410000 40.56%; 3790 us/sq; ETA 0d 02:11; 0000000000000000
2019-11-18 10:19:19 300001133 P1 1420000 40.85%; 3871 us/sq; ETA 0d 02:13; 3d48fe12c99912e8
2019-11-18 10:19:59 300001133 P1 1430000 41.13%; 3973 us/sq; ETA 0d 02:16; cc3d64ce658c4ac6
2019-11-18 10:20:39 300001133 P1 1440000 41.42%; 3964 us/sq; ETA 0d 02:15; 938dc157b422bbb7
...
2019-11-18 11:46:59 300001133 P1 2740000 78.82%; 3963 us/sq; ETA 0d 00:49; fba2a25560522bf2
2019-11-18 11:47:39 300001133 P1 2750000 79.10%; 3963 us/sq; ETA 0d 00:48; 7dbd115477419dfa
2019-11-18 11:48:18 300001133 P1 2760000 79.39%; 3969 us/sq; ETA 0d 00:47; bddd2324c54bb16a
2019-11-18 11:48:57 300001133 P1 2770000 79.68%; 3842 us/sq; ETA 0d 00:45; 0000000000000000
2019-11-18 11:49:36 300001133 P1 2780000 79.97%; 3854 us/sq; ETA 0d 00:45; 0000000000000000
2019-11-18 11:50:14 300001133 P1 2790000 80.26%; 3793 us/sq; ETA 0d 00:43; 0000000000000000
...
2019-11-18 11:58:28 300001133 P1 2920000 84.00%; 3787 us/sq; ETA 0d 00:35; 0000000000000000
2019-11-18 11:59:06 300001133 P1 2930000 84.28%; 3790 us/sq; ETA 0d 00:35; 0000000000000000
2019-11-18 11:59:39 Stopping, please wait..
2019-11-18 11:59:40 Exiting because "stop requested"
2019-11-18 11:59:40 waiting for background GCDs..
2019-11-18 11:59:40 Bye[/CODE]Run abandoned, retrying from start, 3 hours lost.

preda 2019-11-18 21:47

Ken, are you overclocking the GPU or the memory? it seems it has, for whatever reason, memory errors. These errors can't be reliably detected during P-1 (although the all-zero is a special case that could be detected, yes). You should also look into the cause of these errors happening at all -- as there's no protection in P-1, thus you should have a reliable setup before starting P-1.

[QUOTE=kriesel;530903][CODE]2019-11-18 08:48:38 300001133 FFT 18432K: Width 256x4, Height 256x4, Middle 9; 15.89 bits/word
2019-11-18 08:48:38 OpenCL args "-DEXP=300001133u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DWEIGHT_STEP=0x8.9b2caa0a7102p-3 -DIWEIGHT_STEP=0xe.df8276d3383b8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DFMA_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-11-18 08:48:43 OpenCL compilation in 4759 ms
2019-11-18 08:48:49 300001133 P1 B1=2410000, B2=55430000; 3476394 bits; starting at 0
2019-11-18 08:49:29 300001133 P1 10000 0.29%; 3915 us/sq; ETA 0d 03:46; ff6497ab4324ddc7
2019-11-18 08:50:08 300001133 P1 20000 0.58%; 3975 us/sq; ETA 0d 03:49; 6152873e4a9521be
2019-11-18 08:50:48 300001133 P1 30000 0.86%; 3978 us/sq; ETA 0d 03:49; 1b1e5e0242383bc3
2019-11-18 08:51:28 300001133 P1 40000 1.15%; 3975 us/sq; ETA 0d 03:48; 310b1a288109077b
2019-11-18 08:52:08 300001133 P1 50000 1.44%; 3977 us/sq; ETA 0d 03:47; 730bb98f460ccb45
2019-11-18 08:52:22 200001103 P2 GCD: no factor
2019-11-18 08:52:22 {"exponent":"200001103", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":""}, ...
2019-11-18 08:52:48 300001133 P1 60000 1.73%; 3982 us/sq; ETA 0d 03:47; 6f784693e276fb51
2019-11-18 08:53:28 300001133 P1 70000 2.01%; 3984 us/sq; ETA 0d 03:46; e5688d0931ac386f
2019-11-18 08:54:08 300001133 P1 80000 2.30%; 4048 us/sq; ETA 0d 03:49; ac6f4afd8e652ee9
2019-11-18 08:54:48 300001133 P1 90000 2.59%; 3974 us/sq; ETA 0d 03:44; f39c63b52c69dd19
2019-11-18 08:55:28 300001133 P1 100000 2.88%; 3967 us/sq; ETA 0d 03:43; 2587d5fe7390721b
2019-11-18 08:56:08 300001133 P1 110000 3.16%; 3960 us/sq; ETA 0d 03:42; b74a4b143645fb43
2019-11-18 08:56:46 300001133 P1 120000 3.45%; 3814 us/sq; ETA 0d 03:33; 0000000000000000
2019-11-18 08:57:24 300001133 P1 130000 3.74%; 3804 us/sq; ETA 0d 03:32; 0000000000000000
...
2019-11-18 10:17:25 300001133 P1 1390000 39.98%; 3791 us/sq; ETA 0d 02:12; 0000000000000000
2019-11-18 10:18:03 300001133 P1 1400000 40.27%; 3788 us/sq; ETA 0d 02:11; 0000000000000000
2019-11-18 10:18:41 300001133 P1 1410000 40.56%; 3790 us/sq; ETA 0d 02:11; 0000000000000000
2019-11-18 10:19:19 300001133 P1 1420000 40.85%; 3871 us/sq; ETA 0d 02:13; 3d48fe12c99912e8
2019-11-18 10:19:59 300001133 P1 1430000 41.13%; 3973 us/sq; ETA 0d 02:16; cc3d64ce658c4ac6
2019-11-18 10:20:39 300001133 P1 1440000 41.42%; 3964 us/sq; ETA 0d 02:15; 938dc157b422bbb7
...
2019-11-18 11:46:59 300001133 P1 2740000 78.82%; 3963 us/sq; ETA 0d 00:49; fba2a25560522bf2
2019-11-18 11:47:39 300001133 P1 2750000 79.10%; 3963 us/sq; ETA 0d 00:48; 7dbd115477419dfa
2019-11-18 11:48:18 300001133 P1 2760000 79.39%; 3969 us/sq; ETA 0d 00:47; bddd2324c54bb16a
2019-11-18 11:48:57 300001133 P1 2770000 79.68%; 3842 us/sq; ETA 0d 00:45; 0000000000000000
2019-11-18 11:49:36 300001133 P1 2780000 79.97%; 3854 us/sq; ETA 0d 00:45; 0000000000000000
2019-11-18 11:50:14 300001133 P1 2790000 80.26%; 3793 us/sq; ETA 0d 00:43; 0000000000000000
...
2019-11-18 11:58:28 300001133 P1 2920000 84.00%; 3787 us/sq; ETA 0d 00:35; 0000000000000000
2019-11-18 11:59:06 300001133 P1 2930000 84.28%; 3790 us/sq; ETA 0d 00:35; 0000000000000000
2019-11-18 11:59:39 Stopping, please wait..
2019-11-18 11:59:40 Exiting because "stop requested"
2019-11-18 11:59:40 waiting for background GCDs..
2019-11-18 11:59:40 Bye[/CODE]Run abandoned, retrying from start, 3 hours lost.[/QUOTE]

kriesel 2019-11-18 22:25

[QUOTE=preda;530924]Ken, are you overclocking the GPU or the memory? it seems it has, for whatever reason, memory errors. These errors can't be reliably detected during P-1 (although the all-zero is a special case that could be detected, yes). You should also look into the cause of these errors happening at all -- as there's no protection in P-1, thus you should have a reliable setup before starting P-1.[/QUOTE]Thanks for your reply.

I initially had fatal issues with gpuowl with strictly stock settings on the Radeon VII. After apparently getting it sorted out, but still having occasional GEC or Jacobi or roundoff errors, I cautiously increased memory clock 2%. After George posted his gpu settings a couple of times, I increased the memory clock to +8%. The three memory clock conditions don't seem different as far as error rate. I'll return to stock clock, or try underclock, shortly. I'm running out of things to try to improve reliability. (Open to suggestions.) A rerun of 300M P-1 stage 1 today has managed to avoid any 0x00 res64 outputs. Same settings, same worktodo, same everything, different outcome. I've taken to manually saving copies of the P-1 checkpoints.

Right now, I have 300M P-1 stage 2 running. Wattman comes up as an unresponsive blank white rectangle, which is where I would try to adjust memory clock. It also did this during the 200M P-1.

Adding detection of error conditions such as res64 0x00 or 0x01 in gpuowl P-1 would be a very good thing, as would periodic save of known-good checkpoints. And showing res64 in stage2 output would provide an indication if things are going wrong there. Right now there's no indication.

Prime95 2019-11-19 04:43

[QUOTE=kriesel;530926]After George posted his gpu settings a couple of times, I increased the memory clock to +8%. The three memory clock conditions don't seem different as far as error rate.[/QUOTE]

I did have an XFX card where I never found a stable memory setting. All the Asrock cards are stable (defined as a month without GEC errors) at memory settings between +16% and +20%. I'm comfortable with these aggressive settings because of the GEC check. If I were doing P-1 or LL test I would dial back somewhat.

P-1 may be more stressful on memory than P-1 -- especially in stage 2 where lots of memory is used.


All times are UTC. The time now is 05:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.