![]() |
|
|
#45 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2·53·71 Posts |
Thanks all, I can now build an executable. -static was the key.
Debugging has been difficult. The Windows driver seems to have a problem with atomic operations or global memory fences. |
|
|
|
|
|
#46 |
|
"Composite as Heck"
Oct 2017
2×11×37 Posts |
I've done a bit of research and it seems GPU passthrough is not easily available on a windows host. It's on Linux host in many ways, no doubt BSD, it's on VMWare's ESXi through VMWare's vSphere which again is not windows. It's on modern Hyper V with Windows Server but not Windows 10. Shame.
|
|
|
|
|
|
#47 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2×53×71 Posts |
Please try this version without using long carry. It seems to work for me. I'll forward the source changes to Mihai for his approval.
|
|
|
|
|
|
#48 | |
|
"Eric"
Jan 2018
USA
22·53 Posts |
Quote:
|
|
|
|
|
|
|
#49 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2·53·71 Posts |
Quote:
This was built using the latest source with one change to gpuowl.cl. Apparently I did something non-standard (I copied sources from somewhere rather than using git clone) as the .exe I uploaded does not have version info. Fixed gpuowl.cl: https://www.dropbox.com/s/bin8vkcthu...gpuowl.cl?dl=0 Awaiting Mihai's review. |
|
|
|
|
|
|
#50 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
31·173 Posts |
|
|
|
|
|
|
#51 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
31×173 Posts |
On my XFX RadeonVII on Win10 Pro, multiple gpuowl versions succeed in running primality check, with detected errors. But P-1 stage 2 kills v6.11-9 early.
I haven't tried George's patched version yet. gpuowl V0.5 runs but errors but produces a matching LL DC res64. Code:
03340000 / 50240549 [6.65%], ms/iter: 1.571, ETA: 0d 20:28; f8cd01b74ea441df error 4.57764e-005 (max 4.57764e-005) 03360000 / 50240549 [6.69%], ms/iter: 1.575, ETA: 0d 20:31; 3b0cf8e4e4ac5ba3 error 0.5 (max 0.5) Error is too large; retrying 03360000 / 50240549 [6.69%], ms/iter: 1.564, ETA: 0d 20:22; 6f99d09d989db017 error 4.19617e-005 (max 4.57764e-005) 10020000 / 50240549 [19.94%], ms/iter: 1.551, ETA: 0d 17:19; 737cbb7601f4f31c error 0.5 (max 0.5) Error is too large; retrying 10020000 / 50240549 [19.94%], ms/iter: 1.566, ETA: 0d 17:30; 934860171a3ce275 error 4.19617e-005 (max 4.57764e-005) 12780000 / 50240549 [25.44%], ms/iter: 1.582, ETA: 0d 16:28; 72b7c7384b821d8a error 0.5 (max 0.5) Error is too large; retrying 12780000 / 50240549 [25.44%], ms/iter: 1.582, ETA: 0d 16:28; 76f11a6325918f85 error 4.19617e-005 (max 4.57764e-005) 12940000 / 50240549 [25.76%], ms/iter: 1.571, ETA: 0d 16:17; 3142cfaa94ace2c4 error 4.19617e-005 (max 4.57764e-005) 12960000 / 50240549 [25.80%], ms/iter: 1.568, ETA: 0d 16:14; a25bf07ea45c4185 error 0.5 (max 0.5) Error is too large; retrying 12960000 / 50240549 [25.80%], ms/iter: 1.586, ETA: 0d 16:25; 9084b83ebd4baf0a error 4.19617e-005 (max 4.57764e-005) 23920000 / 50240549 [47.61%], ms/iter: 1.569, ETA: 0d 11:28; 97723077c12a402b error 0.5 (max 0.5) Error is too large; retrying 23920000 / 50240549 [47.61%], ms/iter: 1.585, ETA: 0d 11:35; 670902199a743007 error 4.57764e-005 (max 4.57764e-005) 24100000 / 50240549 [47.97%], ms/iter: 1.580, ETA: 0d 11:28; 440e09ad44656de6 error 0.5 (max 0.5) Error is too large; retrying 24100000 / 50240549 [47.97%], ms/iter: 1.578, ETA: 0d 11:27; b83eca8f811eba37 error 4.19617e-005 (max 4.57764e-005) 24340000 / 50240549 [48.45%], ms/iter: 1.566, ETA: 0d 11:16; d926485817437b11 error 0.5 (max 0.5) Error is too large; retrying 24340000 / 50240549 [48.45%], ms/iter: 1.584, ETA: 0d 11:24; d058b40d338a65ff error 4.19617e-005 (max 4.57764e-005) 29040000 / 50240549 [57.80%], ms/iter: 1.578, ETA: 0d 09:17; edbfb2989f739a15 error 0.5 (max 0.5) Error is too large; retrying 29040000 / 50240549 [57.80%], ms/iter: 1.571, ETA: 0d 09:15; eef8fe6ae92878a3 error 4.19617e-005 (max 4.57764e-005) 31160000 / 50240549 [62.02%], ms/iter: 1.581, ETA: 0d 08:23; 4da16390f5451420 error 0.5 (max 0.5) Error is too large; retrying 31160000 / 50240549 [62.02%], ms/iter: 1.578, ETA: 0d 08:22; 633691483196cbc1 error 4.19617e-005 (max 4.57764e-005) 36560000 / 50240549 [72.77%], ms/iter: 1.582, ETA: 0d 06:01; 1e2f4254fd266b67 error 0.5 (max 0.5) Error is too large; retrying 36560000 / 50240549 [72.77%], ms/iter: 1.574, ETA: 0d 05:59; c6254b151b53ab62 error 4.19617e-005 (max 4.95911e-005) 39980000 / 50240549 [79.58%], ms/iter: 2.381, ETA: 0d 06:47; fffffffffffffffe error 0.5 (max 0.5) Error is too large; retrying 39980000 / 50240549 [79.58%], ms/iter: 2.405, ETA: 0d 06:51; e66878a4d76f79cc error 3.8147e-005 (max 4.95911e-005) 43760000 / 50240549 [87.10%], ms/iter: 1.611, ETA: 0d 02:54; 84d54e2f7e4abae7 error 0.5 (max 0.5) Error is too large; retrying 43760000 / 50240549 [87.10%], ms/iter: 1.609, ETA: 0d 02:54; 68c1b96e06cdea1a error 4.19617e-005 (max 4.57764e-005) 48200000 / 50240549 [95.94%], ms/iter: 1.538, ETA: 0d 00:52; 8c672fa7465df027 error 4.19617e-005 (max 4.57764e-005) 48220000 / 50240549 [95.98%], ms/iter: 1.533, ETA: 0d 00:52; fffffffffffffffe error 0.382529 (max 0.382529) Error jump by 835546.38%, doing a consistency check. 48220000 / 50240549 [95.98%], ms/iter: 1.531, ETA: 0d 00:52; 619709852a967226 error 4.19617e-005 (max 4.57764e-005) Consistency check FAILED, stopping. 48200000 / 50240549 [95.94%], ms/iter: 1.538, ETA: 0d 00:52; 8c672fa7465df027 error 4.19617e-005 (max 4.57764e-005) 48220000 / 50240549 [95.98%], ms/iter: 1.533, ETA: 0d 00:52; fffffffffffffffe error 0.382529 (max 0.382529) Error jump by 835546.38%, doing a consistency check. 48220000 / 50240549 [95.98%], ms/iter: 1.531, ETA: 0d 00:52; 619709852a967226 error 4.19617e-005 (max 4.57764e-005) Consistency check FAILED, stopping. 2 consistency check res64 matched. https://www.mersenne.org/report_expo...exp_hi=&full=1 V0.6 LL DC: I tried a strategic triple check in Gpuowl V0.6 which does LL with Jacobi check on AMD gpus in 4M fft, so it's good for LL DC up to ~77M, or several years yet. A Radeon VII can knock these out in under a day. The result I got apparently confirms Ernst's result, on 55473541 (And the server refused it, or rather did not understand it. Presumably because its result output included ,AID: 0 for a TC I could not get an assignment for from the server. James H has been notified and has responded already.) Code:
02140000 / 55473541 [3.86%], ms/iter: 2.920, ETA: 1d 19:15; 15d065d8c8729d15 roundoff 0.000244141 (max 0.000274658) 02160000 / 55473541 6b207006b08df182 Retry : roundoff 0.5 is too large 02160000 / 55473541 [3.89%], ms/iter: 5.851, ETA: 3d 14:39; 0f92dd48c1784517 roundoff 0.000244141 (max 0.000274658) 02880000 / 55473541 [5.19%], ms/iter: 2.634, ETA: 1d 14:29; 8b67007254d501ef roundoff 0.000244141 (max 0.000274658) 02900000 / 55473541 205b4d16b929d367 Retry : roundoff 0.5 is too large 02900000 / 55473541 [5.23%], ms/iter: 5.285, ETA: 3d 05:11; 623d19d986f3dd05 roundoff 0.000244141 (max 0.000274658) 20380000 / 55473541 [36.74%], ms/iter: 0.873, ETA: 0d 08:31; 0139edea50787064 roundoff 0.000244141 (max 0.000274658) 20400000 / 55473541 0000000000000002 Retry : loop 20400000 / 55473541 [36.77%], ms/iter: 1.775, ETA: 0d 17:17; 28ee13a6ef2ccaf4 roundoff 0.000244141 (max 0.000274658) Caught and corrected two GEC errors, completed correctly a rerun of PRP3 on 82589933. Code:
V6.11-9 82589933 PRP3 errors on Radeon VII 2019-11-16 01:45:13 82589933 55850000 67.62%; 941 us/sq; ETA 0d 06:59; 0a348af3703b3a13 2019-11-16 01:46:00 82589933 55900000 67.68%; 939 us/sq; ETA 0d 06:58; 0000000000000000 2019-11-16 01:46:46 82589933 55950000 67.74%; 933 us/sq; ETA 0d 06:54; 0000000000000000 2019-11-16 01:47:34 82589933 EE 56000000 67.80%; 934 us/sq; ETA 0d 06:54; 0000000000000000 (check 0.77s) 2019-11-16 01:48:22 82589933 55800000 67.56%; 953 us/sq; ETA 0d 07:06; 8f077051d6fdb58f 2019-11-16 01:49:09 82589933 55850000 67.62%; 941 us/sq; ETA 0d 07:00; 0a348af3703b3a13 2019-11-16 03:44:21 82589933 OK 60500000 73.25%; 1955 us/sq; ETA 0d 12:00; 023994d6bd9e58a8 (check 1.27s) 1 errors 2019-11-16 03:45:59 82589933 60550000 73.31%; 1953 us/sq; ETA 0d 11:57; 88169fb1f9b4b21c 2019-11-16 03:47:36 82589933 60600000 73.37%; 1943 us/sq; ETA 0d 11:52; f2993d91a0316a39 2019-11-16 03:49:13 82589933 60650000 73.44%; 1935 us/sq; ETA 0d 11:48; ab213f1673967b49 2019-11-16 03:50:50 82589933 60700000 73.50%; 1950 us/sq; ETA 0d 11:52; 443b9385d91ecbc4 2019-11-16 03:52:29 82589933 EE 60750000 73.56%; 1954 us/sq; ETA 0d 11:51; 7d72d5ceb8bc4d90 (check 1.29s) 1 errors 2019-11-16 03:54:08 82589933 60550000 73.31%; 1978 us/sq; ETA 0d 12:07; 88169fb1f9b4b21c 2019-11-16 03:55:46 82589933 60600000 73.37%; 1945 us/sq; ETA 0d 11:53; f2993d91a0316a39 2019-11-16 03:57:23 82589933 60650000 73.44%; 1942 us/sq; ETA 0d 11:50; ab213f1673967b49 2019-11-16 03:59:00 82589933 60700000 73.50%; 1953 us/sq; ETA 0d 11:52; 443b9385d91ecbc4 2019-11-16 04:00:39 82589933 OK 60750000 73.56%; 1950 us/sq; ETA 0d 11:50; d3ebe76a016e73a5 (check 1.32s) 2 errors Code:
gpuowl v6.11-9 P-1 stage 2 fatal error 2019-11-16 17:44:37 100003037 P1 1190000 99.36%; 3871 us/sq; ETA 0d 00:00; 46553feda7b08932 2019-11-16 17:45:07 100003037 P1 1197722 100.00%; 3893 us/sq; ETA 0d 00:00; 36368ba430e41255 2019-11-16 17:45:07 P-1 (B1=830000, B2=17430000, D=30030): primes 1050980, expanded 1071560, doubles 177259 (left 703338), singles 696462, total 873721 (83%) 2019-11-16 17:45:07 100003037 P2 using blocks [28 - 580] to cover 873721 primes 2019-11-16 17:45:08 100003037 P2 using 344 buffers of 44.0 MB each (crash; restart) 2019-11-16 20:29:43 Note: no config.txt file found 2019-11-16 20:29:43 config: -user kriesel -cpu roa/radeonvii -use FMA_X2 -device 1 -carry long 2019-11-16 20:29:43 100003037 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word 2019-11-16 20:29:43 using long carry kernels 2019-11-16 20:29:43 OpenCL args "-DEXP=100003037u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xc.a3e01ed682068p-3 -DIWEIGHT_STEP=0xa.20606be35c478p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DFMA_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-11-16 20:29:49 OpenCL compilation in 5216 ms 2019-11-16 20:29:51 100003037 P1 B1=830000, B2=17430000; 1197722 bits; starting at 1197721 2019-11-16 20:29:51 100003037 P1 1197722 100.00%; 12091 us/sq; ETA 0d 00:00; 36368ba430e41255 2019-11-16 20:29:51 P-1 (B1=830000, B2=17430000, D=30030): primes 1050980, expanded 1071560, doubles 177259 (left 703338), singles 696462, total 873721 (83%) 2019-11-16 20:29:51 100003037 P2 using blocks [28 - 580] to cover 873721 primes 2019-11-16 20:29:52 100003037 P2 using 345 buffers of 44.0 MB each (crash again) 2019-11-17 00:29:45 Note: no config.txt file found 2019-11-17 00:29:45 config: -user kriesel -cpu roa/radeonvii -use FMA_X2 -device 1 -carry long -maxAlloc 15000 2019-11-17 00:29:45 100003037 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word 2019-11-17 00:29:45 using long carry kernels 2019-11-17 00:29:46 OpenCL args "-DEXP=100003037u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xc.a3e01ed682068p-3 -DIWEIGHT_STEP=0xa.20606be35c478p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DFMA_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-11-17 00:29:51 OpenCL compilation in 5326 ms 2019-11-17 00:29:53 100003037 P1 B1=830000, B2=17430000; 1197722 bits; starting at 1197721 2019-11-17 00:29:53 100003037 P1 1197722 100.00%; 10500 us/sq; ETA 0d 00:00; 36368ba430e41255 2019-11-17 00:29:54 P-1 (B1=830000, B2=17430000, D=30030): primes 1050980, expanded 1071560, doubles 177259 (left 703338), singles 696462, total 873721 (83%) 2019-11-17 00:29:54 100003037 P2 using blocks [28 - 580] to cover 873721 primes 2019-11-17 00:29:54 100003037 P2 using 324 buffers of 44.0 MB each (crash) 2019-11-17 07:47:50 Note: no config.txt file found 2019-11-17 07:47:50 config: -user kriesel -cpu roa/radeonvii -use FMA_X2 -device 1 -carry long -maxAlloc 8000 2019-11-17 07:47:50 100003037 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word 2019-11-17 07:47:50 using long carry kernels 2019-11-17 07:47:50 OpenCL args "-DEXP=100003037u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xc.a3e01ed682068p-3 -DIWEIGHT_STEP=0xa.20606be35c478p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DFMA_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-11-17 07:47:56 OpenCL compilation in 5375 ms 2019-11-17 07:47:57 100003037 P1 B1=830000, B2=17430000; 1197722 bits; starting at 1197721 2019-11-17 07:47:58 100003037 P1 1197722 100.00%; 10455 us/sq; ETA 0d 00:00; 36368ba430e41255 2019-11-17 07:47:58 P-1 (B1=830000, B2=17430000, D=30030): primes 1050980, expanded 1071560, doubles 177259 (left 703338), singles 696462, total 873721 (83%) 2019-11-17 07:47:58 100003037 P2 using blocks [28 - 580] to cover 873721 primes 2019-11-17 07:47:58 100003037 P2 using 165 buffers of 44.0 MB each (crash; 2 Windows TDR events, id 4101 in Windows system event log 07:48:09 and 07:48:34) |
|
|
|
|
|
#52 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
1D6616 Posts |
|
|
|
|
|
|
#53 | |
|
"Eric"
Jan 2018
USA
22×53 Posts |
Quote:
Another thing worth experimenting on Windows is tweaking the HBM2 timings. I have found significant uplifts going from a looser timing to tighter timings in mining such as XMR. I am trying to figure out a better timing table for PRP workloads that's hopefully stable. But so far I just put my XMR mining timing on and a similar 4-5% increase in throughput is observed as switching to short carry from long. Last fiddled with by xx005fs on 2019-11-17 at 19:10 |
|
|
|
|
|
|
#54 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2·53·71 Posts |
Quote:
|
|
|
|
|
|
|
#55 | |
|
Oct 2019
2×7 Posts |
Quote:
After I used your executable my new setup works great! I wanted to say thanks before I forgot. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL-specific reference material | kriesel | kriesel | 29 | 2021-07-10 15:47 |
| gpuowl: runtime error | SELROC | GpuOwl | 59 | 2020-10-02 03:56 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| gpuowl tuning | M344587487 | GpuOwl | 14 | 2018-12-29 08:11 |
| How to interface gpuOwl with PrimeNet | preda | PrimeNet | 2 | 2017-10-07 21:32 |