![]() |
[QUOTE=LaurV;545985]My bad wording. Sorry. I meant 2 instances, each in its own card.[/QUOTE]I thought that was clear. And now it's clearer.
The 2 on one card I wrote of is about performance tweaking. CUDALucas and LL will be with us for a while yet, not only because of user inertia, but because some gpus are not capable of running gpuowl but can run CUDALucas; any NVIDIA gpu that does not support a sufficient subset of opencl. Some have DP performance stronger relative to their 32bit-int performance than others, such as old Teslas and Quadros. |
Jacobi returns, and Middle15 appears
2 Attachment(s)
Win7 x64 build of gpuowl v6.11-288-g20c4213 attached.
I didn't notice until now, but sometime between 6.11-219 and 6.11-255, the ffts longer than 96M (up to 192M) were dropped. |
Mihai, with all (and great) respect, I'm trying to understand your resistance to supporting shifted residues. Please let me know if any of the following are incorrect:
1. You previously supported shift, at least for LL, perhaps also for PRP (and if you only had it for LL, note that it's even more trivial to support for PRP, because one does not need to do the per-iteration "at which bit location does the -2 get injected?" computation LL needs, one only needs to update the shift count via doubling-mod-p); 2. You noted that supporting shift did not adversely affect performance. So why not simply reactivate the old shift-supporting code segments? Has the main codebase changed so much in the meanwhile that doing so would be a major pain? Knowing that 2 tests, whether using the same or different codes, run at the same FFT length are using fundamentally different residue-word data is a big deal, confidence-in-result-wise. "Gerbicz check is totally foolproof" is [a] a surmise, based on limited run numbers with independent-program DCs for cross-checking, and [b] is implementation-dependent, in similar fashion to "a mathematically foolproof cryptography scheme can be nullified by a flawed software implementation". If it's relatively trivial to support, why not do so? Restore the hopefully-modest amounts of code needed to support it, and then move on. Shift-related code is simple enough that once deployed, it doesn't need further attention. The next time I expect to need to revisit my shift-related code in Mlucas will be if and when a major vendor puts out a 1024-or-more-bit SIMD architecture, at which point I'll need to make some modest changes to the LL-test carry routines to inject the per-iteration -2 into the proper one of the 16 doubles in the corresponding carry-step SIMD vector-of-residue-words datum. |
Reviewing my notes and some old posts, I see Mihai implemented shift for LL, back at gpuowl v0.3, indicating an 0.5% overhead then, and dropped it by v0.6, when he first implemented Jacobi check with a stated overhead of 0.2%. (See links at [URL]https://www.mersenneforum.org/showpost.php?p=489083&postcount=7[/URL])
If those values are still about current, I would be fine with such 0.7% overhead or a bit more to gain some reliability through differing shift and some error detection combined in the same build. People like LaurV who run dual runs probably would be too. I think some P-1 error checking at 2% overhead, perhaps more, would be ok. For LL, perhaps it could be made optional, such as by -use no_offset,no_jacobi, with offset and jacobi check on by default. It would be good if those only affected LL worktodo lines and did not prevent gpuowl from running mixed work in the same worktodo file; LL followed by P-1 and PRP etc. (Providing no_GEC to PRP is unappealing.) Some of us would be happy to benchmark the reliability options if/when available. Way back when gpuowl was announced, Mihai closed [URL]https://www.mersenneforum.org/showpost.php?p=457032&postcount=1[/URL] with[CODE]I'll be looking for problems to fix. I hope you'll enjoy! cheers, Mihai[/CODE]Yes, immensely. It seems to me a lively 3 years and great progress to date. Looking forward to what comes next. |
[QUOTE=kriesel;545974]Two gpuowl instances running on the same gpu reportedly helps total testing throughput. But not always. Test what you run. (terribly dispare instances detail... 40.94% of solo throughput.
It's probably best to run same computation type, same fft size, or perhaps very similar size.[/QUOTE] One vs two instances on a gpu, gpuowl-win v6.11-278, 9M PRP/9M PRP, radeonvii gpu, i7-4790 cpu 1a 1467.4 us/it, 681.48 iter/sec 1b 1465.6 us/it, 682.31 iter/sec average 681.9 iter/sec 2 together 2884.4 us/it +2876.8 us/it, 346.69 iter/sec + 347.61 iter/sec = 694.30 iter/sec, 2 instances / gpu yielded 1.01819 times speed of a single instance; 1.82% faster. Equivalent to 217.3 GhzD/day. Seems low. Maybe thermally throttling? |
Gpuowl-win v6.11-288 LL Jacobi test
First trial of GpuOwL v6.11-288 LL with Jacobi check, continuation of an existing run.
Happy/relieved to see the initial Jacobi check pass, although this gpu has successfully completed PRP wavefront exponents with no GEC errors. [CODE]2020-05-20 18:27:57 gpuowl v6.11-288-g20c4213 2020-05-20 18:27:57 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 2020-05-20 18:27:57 device 2, unique id '' 2020-05-20 18:27:57 asr2/radeonvii2 154155713 FFT: 8M 1K:8:512 (18.38 bpw) 2020-05-20 18:27:57 asr2/radeonvii2 Expected maximum carry32: 70B40000 2020-05-20 18:27:59 asr2/radeonvii2 OpenCL args "-DEXP=154155713u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=8u -DWEIGHT_STEP=0xc.528658a63b438p-3 -DIWEIGHT_STEP=0xa.633a f6ee9bb58p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DMM_CHAIN=2u -DMM2_CHAIN=3u -DNO_ASM=1 -cl-fast -relaxed-math -cl-std=CL2.0 " 2020-05-20 18:28:09 asr2/radeonvii2 OpenCL compilation in 10.16 s 2020-05-20 18:28:10 asr2/radeonvii2 154155713 LL 60507000 loaded: 223f2e70d392347e 2020-05-20 18:30:29 asr2/radeonvii2 154155713 LL 60600000 39.31%; 1492 us/it; ETA 1d 14:47; 70cdcc47cb7d66f7 2020-05-20 18:32:48 asr2/radeonvii2 154155713 LL 60700000 39.38%; 1397 us/it; ETA 1d 12:16; fa7b795be968fc04 2020-05-20 18:35:08 asr2/radeonvii2 154155713 LL 60800000 39.44%; 1397 us/it; ETA 1d 12:13; 3d74bd4d1e787fca 2020-05-20 18:37:28 asr2/radeonvii2 154155713 LL 60900000 39.51%; 1396 us/it; ETA 1d 12:10; 771738771fa1b18f 2020-05-20 18:39:47 asr2/radeonvii2 154155713 LL 61000000 39.57%; 1396 us/it; ETA 1d 12:08; 663deec0ffbc5691 2020-05-20 18:42:07 asr2/radeonvii2 154155713 LL 61100000 39.64%; 1396 us/it; ETA 1d 12:05; 8b7abb01cae7c971 2020-05-20 18:42:07 asr2/radeonvii2 154155713 OK 61000000 (jacobi == -1) 2020-05-20 18:44:27 asr2/radeonvii2 154155713 LL 61200000 39.70%; 1398 us/it; ETA 1d 12:06; 1d4fad0f5f424457 ... 2020-05-21 04:42:25 asr2/radeonvii2 154155713 LL 86900000 56.37%; 1395 us/it; ETA 1d 02:04; b558d286cd9d5735 2020-05-21 04:44:45 asr2/radeonvii2 154155713 LL 87000000 56.44%; 1395 us/it; ETA 1d 02:01; 62c4e005f393521f 2020-05-21 04:47:04 asr2/radeonvii2 154155713 LL 87100000 56.50%; 1394 us/it; ETA 1d 01:58; d31bd8c703cd82e2 2020-05-21 04:49:23 asr2/radeonvii2 154155713 LL 87200000 56.57%; 1395 us/it; ETA 1d 01:57; 4ebde4106568146f 2020-05-21 04:49:23 asr2/radeonvii2 154155713 OK 87000000 (jacobi == -1) 2020-05-21 04:51:43 asr2/radeonvii2 154155713 LL 87300000 56.63%; 1396 us/it; ETA 1d 01:55; deb4718bd97c793e[/CODE]until it hit a repeatable Jacobi error between 87M and 88M: [CODE]2020-05-21 05:07:59 asr2/radeonvii2 154155713 LL 88000000 57.09%; 1395 us/it; ETA 1d 01:38; 8a332b7fbfafa5f7 2020-05-21 05:10:19 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1394 us/it; ETA 1d 01:35; 76a8832229ab7d36 2020-05-21 05:12:38 asr2/radeonvii2 154155713 LL 88200000 57.21%; 1395 us/it; ETA 1d 01:33; 687d9f07557de7fc 2020-05-21 05:12:38 asr2/radeonvii2 154155713 EE 88000000 (jacobi == 1) 2020-05-21 05:12:39 asr2/radeonvii2 154155713 LL 87000000 loaded: 62c4e005f393521f 2020-05-21 05:14:58 asr2/radeonvii2 154155713 LL 87100000 56.50%; 1394 us/it; ETA 1d 01:58; d31bd8c703cd82e2 2020-05-21 05:17:18 asr2/radeonvii2 154155713 LL 87200000 56.57%; 1394 us/it; ETA 1d 01:56; 4ebde4106568146f 2020-05-21 05:19:37 asr2/radeonvii2 154155713 LL 87300000 56.63%; 1394 us/it; ETA 1d 01:54; deb4718bd97c793e 2020-05-21 05:21:56 asr2/radeonvii2 154155713 LL 87400000 56.70%; 1395 us/it; ETA 1d 01:52; 78c200210067f63a 2020-05-21 05:24:16 asr2/radeonvii2 154155713 LL 87500000 56.76%; 1395 us/it; ETA 1d 01:49; e2e6daf06bc84732 2020-05-21 05:26:35 asr2/radeonvii2 154155713 LL 87600000 56.83%; 1394 us/it; ETA 1d 01:47; 421b198d26a5456f 2020-05-21 05:28:55 asr2/radeonvii2 154155713 LL 87700000 56.89%; 1395 us/it; ETA 1d 01:45; e6f1cee228a832b9 2020-05-21 05:31:14 asr2/radeonvii2 154155713 LL 87800000 56.96%; 1395 us/it; ETA 1d 01:42; 8f22ca3882842947 2020-05-21 05:33:34 asr2/radeonvii2 154155713 LL 87900000 57.02%; 1395 us/it; ETA 1d 01:40; 956c469d245e11f3 2020-05-21 05:35:53 asr2/radeonvii2 154155713 LL 88000000 57.09%; 1395 us/it; ETA 1d 01:38; 8a332b7fbfafa5f7 2020-05-21 05:38:13 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1394 us/it; ETA 1d 01:35; 76a8832229ab7d36 2020-05-21 05:40:32 asr2/radeonvii2 154155713 LL 88200000 57.21%; 1395 us/it; ETA 1d 01:33; 687d9f07557de7fc 2020-05-21 05:40:32 asr2/radeonvii2 154155713 EE 88000000 (jacobi == 1) 2020-05-21 05:40:32 asr2/radeonvii2 154155713 LL 87000000 loaded: 62c4e005f393521f 2020-05-21 05:42:52 asr2/radeonvii2 154155713 LL 87100000 56.50%; 1394 us/it; ETA 1d 01:58; d31bd8c703cd82e2 2020-05-21 05:45:11 asr2/radeonvii2 154155713 LL 87200000 56.57%; 1395 us/it; ETA 1d 01:56; 4ebde4106568146f 2020-05-21 05:47:31 asr2/radeonvii2 154155713 LL 87300000 56.63%; 1394 us/it; ETA 1d 01:54; deb4718bd97c793e 2020-05-21 05:49:50 asr2/radeonvii2 154155713 LL 87400000 56.70%; 1395 us/it; ETA 1d 01:52; 78c200210067f63a 2020-05-21 05:52:10 asr2/radeonvii2 154155713 LL 87500000 56.76%; 1395 us/it; ETA 1d 01:49; e2e6daf06bc84732 2020-05-21 05:54:29 asr2/radeonvii2 154155713 LL 87600000 56.83%; 1395 us/it; ETA 1d 01:47; 421b198d26a5456f 2020-05-21 05:56:49 asr2/radeonvii2 154155713 LL 87700000 56.89%; 1395 us/it; ETA 1d 01:45; e6f1cee228a832b9 2020-05-21 05:59:08 asr2/radeonvii2 154155713 LL 87800000 56.96%; 1395 us/it; ETA 1d 01:43; 8f22ca3882842947 2020-05-21 06:01:28 asr2/radeonvii2 154155713 LL 87900000 57.02%; 1395 us/it; ETA 1d 01:40; 956c469d245e11f3 2020-05-21 06:03:47 asr2/radeonvii2 154155713 LL 88000000 57.09%; 1394 us/it; ETA 1d 01:37; 8a332b7fbfafa5f7 2020-05-21 06:06:06 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1394 us/it; ETA 1d 01:35; 76a8832229ab7d36 2020-05-21 06:08:26 asr2/radeonvii2 154155713 LL 88200000 57.21%; 1395 us/it; ETA 1d 01:33; 687d9f07557de7fc 2020-05-21 06:08:26 asr2/radeonvii2 154155713 EE 88000000 (jacobi == 1) 2020-05-21 06:08:26 asr2/radeonvii2 154155713 LL 87000000 loaded: 62c4e005f393521f 2020-05-21 06:10:46 asr2/radeonvii2 154155713 LL 87100000 56.50%; 1394 us/it; ETA 1d 01:58; d31bd8c703cd82e2 2020-05-21 06:13:05 asr2/radeonvii2 154155713 LL 87200000 56.57%; 1395 us/it; ETA 1d 01:56; 4ebde4106568146f 2020-05-21 06:15:25 asr2/radeonvii2 154155713 LL 87300000 56.63%; 1395 us/it; ETA 1d 01:54; deb4718bd97c793e 2020-05-21 06:17:44 asr2/radeonvii2 154155713 LL 87400000 56.70%; 1394 us/it; ETA 1d 01:51; 78c200210067f63a 2020-05-21 06:20:04 asr2/radeonvii2 154155713 LL 87500000 56.76%; 1395 us/it; ETA 1d 01:49; e2e6daf06bc84732 2020-05-21 06:22:23 asr2/radeonvii2 154155713 LL 87600000 56.83%; 1395 us/it; ETA 1d 01:47; 421b198d26a5456f 2020-05-21 06:24:43 asr2/radeonvii2 154155713 LL 87700000 56.89%; 1395 us/it; ETA 1d 01:45; e6f1cee228a832b9 2020-05-21 06:27:02 asr2/radeonvii2 154155713 LL 87800000 56.96%; 1395 us/it; ETA 1d 01:42; 8f22ca3882842947 2020-05-21 06:29:22 asr2/radeonvii2 154155713 LL 87900000 57.02%; 1395 us/it; ETA 1d 01:40; 956c469d245e11f3 2020-05-21 06:31:41 asr2/radeonvii2 154155713 LL 88000000 57.09%; 1394 us/it; ETA 1d 01:37; 8a332b7fbfafa5f7 2020-05-21 06:34:00 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1394 us/it; ETA 1d 01:35; 76a8832229ab7d36 2020-05-21 06:34:00 asr2/radeonvii2 154155713 EE 88000000 (jacobi == 1) 2020-05-21 06:34:01 asr2/radeonvii2 154155713 LL 87000000 loaded: 62c4e005f393521f 2020-05-21 06:36:20 asr2/radeonvii2 154155713 LL 87100000 56.50%; 1395 us/it; ETA 1d 01:59; d31bd8c703cd82e2 2020-05-21 06:38:40 asr2/radeonvii2 154155713 LL 87200000 56.57%; 1395 us/it; ETA 1d 01:56; 4ebde4106568146f 2020-05-21 06:40:59 asr2/radeonvii2 154155713 LL 87300000 56.63%; 1394 us/it; ETA 1d 01:54; deb4718bd97c793e 2020-05-21 06:43:19 asr2/radeonvii2 154155713 LL 87400000 56.70%; 1395 us/it; ETA 1d 01:52; 78c200210067f63a 2020-05-21 06:45:38 asr2/radeonvii2 154155713 LL 87500000 56.76%; 1395 us/it; ETA 1d 01:50; e2e6daf06bc84732 2020-05-21 06:47:58 asr2/radeonvii2 154155713 LL 87600000 56.83%; 1395 us/it; ETA 1d 01:47; 421b198d26a5456f 2020-05-21 06:50:17 asr2/radeonvii2 154155713 LL 87700000 56.89%; 1395 us/it; ETA 1d 01:45; e6f1cee228a832b9 2020-05-21 06:52:37 asr2/radeonvii2 154155713 LL 87800000 56.96%; 1396 us/it; ETA 1d 01:43; 8f22ca3882842947 2020-05-21 06:54:56 asr2/radeonvii2 154155713 LL 87900000 57.02%; 1395 us/it; ETA 1d 01:41; 956c469d245e11f3 2020-05-21 06:57:16 asr2/radeonvii2 154155713 LL 88000000 57.09%; 1395 us/it; ETA 1d 01:38; 8a332b7fbfafa5f7 2020-05-21 06:59:35 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1395 us/it; ETA 1d 01:35; 76a8832229ab7d36 2020-05-21 06:59:35 asr2/radeonvii2 154155713 EE 88000000 (jacobi == 1) 2020-05-21 06:59:36 asr2/radeonvii2 154155713 LL 87000000 loaded: 62c4e005f393521f 2020-05-21 06:59:43 asr2/radeonvii2 Stopping, please wait.. 2020-05-21 06:59:43 asr2/radeonvii2 154155713 LL 87005000 56.44%; 1457 us/it; ETA 1d 03:11; b349261c8446786d 2020-05-21 06:59:43 asr2/radeonvii2 waiting for the Jacobi check to finish.. 2020-05-21 07:02:09 asr2/radeonvii2 154155713 OK 87005000 (jacobi == -1) 2020-05-21 07:02:09 asr2/radeonvii2 Exiting because "stop requested" 2020-05-21 07:02:09 asr2/radeonvii2 Bye[/CODE]Reducing the Jacobi interval after a fail or two in a row might be helpful. This particular cpu/gpu combination would be limited to about 200K or above for gpu time > cpu Jacobi check time so that there's at most one Jacobi check pending at a time in a continuing run. [CODE]2020-05-21 11:12:22 asr2/radeonvii2 OpenCL compilation in 8.66 s 2020-05-21 11:12:23 asr2/radeonvii2 154155713 LL 87503000 loaded: b5116bfd98e830ad 2020-05-21 11:14:38 asr2/radeonvii2 154155713 LL 87600000 56.83%; 1395 us/it; ETA 1d 01:47; 421b198d26a5456f 2020-05-21 11:16:58 asr2/radeonvii2 154155713 LL 87700000 56.89%; 1396 us/it; ETA 1d 01:46; e6f1cee228a832b9 2020-05-21 11:17:04 asr2/radeonvii2 Stopping, please wait.. 2020-05-21 11:17:04 asr2/radeonvii2 154155713 LL 87704000 56.89%; 1477 us/it; ETA 1d 03:16; 79d3368b77502fd3 2020-05-21 11:17:04 asr2/radeonvii2 waiting for the Jacobi check to finish.. 2020-05-21 11:19:24 asr2/radeonvii2 154155713 OK 87704000 (jacobi == -1)[/CODE]Manually starting and stopping after 100K or 200K intervals, then smaller intervals after 87.9M, I coaxed it further past 87M. Jacobi checks were ok to 87.973M; not at 88M.[CODE]2020-05-21 12:00:12 gpuowl v6.11-288-g20c4213 2020-05-21 12:00:12 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 2020-05-21 12:00:12 device 2, unique id '' 2020-05-21 12:00:12 asr2/radeonvii2 154155713 FFT: 8M 1K:8:512 (18.38 bpw) 2020-05-21 12:00:12 asr2/radeonvii2 Expected maximum carry32: 70B40000 2020-05-21 12:00:14 asr2/radeonvii2 OpenCL args "-DEXP=154155713u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=8u -DWEIGHT_STEP=0xc.528658a63b438p-3 -DIWEIGHT_STEP=0xa.633a f6ee9bb58p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DMM_CHAIN=2u -DMM2_CHAIN=3u -DNO_ASM=1 -cl-fast -relaxed-math -cl-std=CL2.0 " 2020-05-21 12:00:25 asr2/radeonvii2 OpenCL compilation in 11.03 s 2020-05-21 12:00:26 asr2/radeonvii2 154155713 LL 87952000 loaded: 64a16470a62c62a6 2020-05-21 12:00:55 asr2/radeonvii2 Stopping, please wait.. 2020-05-21 12:00:56 asr2/radeonvii2 154155713 LL 87973000 57.07%; 1406 us/it; ETA 1d 01:51; 6cdf9822b4a86ada 2020-05-21 12:00:56 asr2/radeonvii2 waiting for the Jacobi check to finish.. 2020-05-21 12:03:22 asr2/radeonvii2 154155713 OK 87973000 (jacobi == -1) 2020-05-21 12:03:22 asr2/radeonvii2 Exiting because "stop requested"[/CODE] I saved a copy of the ll file at 87.952M iterations. The above was with the default fft selected by the program, 8M 1K:8:512, which seemed ambitious at 18.38 bits/word. Trying different fft was mostly foiled by the following -fft specs not being accepted, called too small, 0K or 128K, even though the first two were copied and pasted from the help output: -fft 1K:4:1K -fft 512:8:1K -fft +1[CODE] 2020-05-21 12:03:45 gpuowl v6.11-288-g20c4213 2020-05-21 12:03:45 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 -fft 1K:4:1K 2020-05-21 12:03:45 device 2, unique id '' 2020-05-21 12:03:45 asr2/radeonvii2 154155713 FFT: 0M 1K:4K:1K (inf bpw) 2020-05-21 12:03:45 asr2/radeonvii2 FFT size too small for exponent (inf bits/word). 2020-05-21 12:03:45 asr2/radeonvii2 Exiting because "FFT size too small" 2020-05-21 12:03:45 asr2/radeonvii2 Bye 2020-05-21 12:04:54 gpuowl v6.11-288-g20c4213 2020-05-21 12:04:54 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 -fft +1 2020-05-21 12:04:54 device 2, unique id '' 2020-05-21 12:04:54 asr2/radeonvii2 154155713 FFT: 128K 256:1:256 (1176.11 bpw) 2020-05-21 12:04:54 asr2/radeonvii2 FFT size too small for exponent (1176.11 bits/word). 2020-05-21 12:04:54 asr2/radeonvii2 Exiting because "FFT size too small" 2020-05-21 12:04:54 asr2/radeonvii2 Bye 2020-05-21 12:08:51 gpuowl v6.11-288-g20c4213 2020-05-21 12:08:51 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 -fft 512:8:1K 2020-05-21 12:08:51 device 2, unique id '' 2020-05-21 12:08:51 asr2/radeonvii2 154155713 FFT: 0M 512:8K:1K (inf bpw) 2020-05-21 12:08:51 asr2/radeonvii2 FFT size too small for exponent (inf bits/word). 2020-05-21 12:08:51 asr2/radeonvii2 Exiting because "FFT size too small" 2020-05-21 12:08:51 asr2/radeonvii2 Bye[/CODE] -fft 4K:4:256 was accepted and ran. This got it through 88M with a correct Jacobi and a different res64 8d2dd3005435c15e. I'm continuing the exponent with that last fft choice, to check speed and luck. If that doesn't work I'll try 9M next. Edit: That didn't take long. [CODE]2020-05-21 12:39:37 gpuowl v6.11-288-g20c4213 2020-05-21 12:39:37 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 -fft 4K:4:256 2020-05-21 12:39:37 device 2, unique id '' 2020-05-21 12:39:37 asr2/radeonvii2 154155713 FFT: 8M 4K:4:256 (18.38 bpw) 2020-05-21 12:39:37 asr2/radeonvii2 Expected maximum carry32: 70B40000 2020-05-21 12:39:39 asr2/radeonvii2 OpenCL args "-DEXP=154155713u -DWIDTH=4096u -DSMALL_HEIGHT=256u -DMIDDLE=4u -DWEIGHT_STEP=0xc.528658a63b438p-3 -DIWEIGHT_STEP=0xa.633a f6ee9bb58p-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DNO_ASM=1 -cl-fast -relaxed-math -cl-std=CL2.0 " 2020-05-21 12:39:46 asr2/radeonvii2 OpenCL compilation in 6.65 s 2020-05-21 12:39:47 asr2/radeonvii2 154155713 LL 88000000 loaded: 8d2dd3005435c15e 2020-05-21 12:42:32 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1654 us/it; ETA 1d 06:21; a1378029fe3d2180 2020-05-21 12:45:18 asr2/radeonvii2 154155713 LL 88200000 57.21%; 1655 us/it; ETA 1d 06:19; cdbb9de9a9320c46 2020-05-21 12:47:03 asr2/radeonvii2 Stopping, please wait.. 2020-05-21 12:47:04 asr2/radeonvii2 154155713 LL 88264000 57.26%; 1656 us/it; ETA 1d 06:19; 30218bc01cbfa191 2020-05-21 12:47:04 asr2/radeonvii2 waiting for the Jacobi check to finish.. 2020-05-21 12:49:25 asr2/radeonvii2 154155713 EE 88264000 (jacobi == 1)[/CODE]Trying 9M; different res64 at 88.1M and onward, and only ~3% slower than the default 8M fft spec. |
Ken,
Judging from these flags -DMM_CHAIN=2u -DMM2_CHAIN=3u the exponent is near the upper limits of that FFT length. Our gpuowl defaults may be too aggressive. Assuming you saved the problematic files: Can you try getting past the bad iteration with "ULTRA_TRIG=1"? Can you get past with "MM_CHAIN=3? |
[QUOTE=kriesel;545970]I agree that PRP first test with the excellent GEC is preferable over LL with Jacobi check and its 50% error detection probability or LL without Jacobi as in CUDALucas and most gpuowl LL-supporting versions.
But the realities are that a great deal of LL is still being done by various programs. [/QUOTE] I checked today May 21st around 06:00 UTC, here are the LL and PRP numbers: [CODE] LL PRP LL-DC PRP-DC 50000000 735 0 19033 0 51000000 226 0 19389 0 52000000 2685 0 16974 0 53000000 7196 0 12482 0 54000000 13839 0 5648 0 55000000 15560 0 4097 0 56000000 16364 0 3419 0 57000000 17324 0 2347 0 58000000 18143 0 1645 0 59000000 19212 0 504 0 60000000 19259 0 410 1 61000000 19165 0 433 0 62000000 19083 0 434 0 63000000 19328 0 396 0 64000000 19138 0 402 0 65000000 19075 0 404 0 66000000 19163 0 330 0 67000000 18767 0 627 0 68000000 18998 0 570 0 69000000 18584 0 744 0 70000000 18870 0 622 2 71000000 18975 0 423 0 72000000 18633 0 661 0 73000000 18528 0 985 0 74000000 18535 0 789 1 75000000 18695 0 771 91 76000000 18499 0 545 269 77000000 18303 3 655 233 78000000 18461 3 540 184 79000000 18374 12 584 206 80000000 18531 90 576 207 81000000 18532 49 697 257 82000000 18326 153 508 311 83000000 18184 646 415 95 84000000 17352 1395 376 17 85000000 16062 2496 474 48 86000000 14688 3901 439 55 87000000 13157 5706 324 26 88000000 11390 7422 318 23 89000000 9821 9002 262 34 90000000 5765 12929 223 20 91000000 8306 9079 104 26 92000000 2032 16016 3 6 93000000 4834 13058 4 12 94000000 1576 16983 3 35 95000000 890 17133 2 2 96000000 246 1486 8 8 97000000 244 176 3 0 98000000 2081 4042 19 1 99000000 2446 4488 17 2 100000000 505 842 16 1 101000000 495 506 2 1 102000000 1003 684 8 2 103000000 766 701 6 1 104000000 221 127 1 0 105000000 395 243 2 1 106000000 963 473 2 1 107000000 707 462 2 1 108000000 11 4 1 1 109000000 16 1 1 0 110000000 15 1 1 0 [/CODE] |
[QUOTE=Prime95;546145]Ken,
Judging from these flags -DMM_CHAIN=2u -DMM2_CHAIN=3u the exponent is near the upper limits of that FFT length. Our gpuowl defaults may be too aggressive. Assuming you saved the problematic files: Can you try getting past the bad iteration with "ULTRA_TRIG=1"? Can you get past with "MM_CHAIN=3?[/QUOTE] Will do. I also have an interesting case at 94.95M 5M fft on an RX550, 18.11 bits/word, that has been problematic in both v6.11-257 and -288. |
[QUOTE=kriesel;546132]
Trying different fft was mostly foiled by the following -fft specs not being accepted, called too small, 0K or 128K, even though the first two were copied and pasted from the help output: -fft 1K:4:1K -fft 512:8:1K -fft +1[/QUOTE] Ken, thanks for reporting this FFT-spec parsing bug, should be fixed now. -fft +1 does not work anymore -- there is no format for "relative to the default" anymove. So -fft +1 is the same as -fft 1, which is an invalid FFT size. I plan to reduce the default Jacobi step to 500K. It's safe to set it manually to a smaller value, it won't start a new background check if there is one ongoing. |
[QUOTE=ATH;546150]I checked today May 21st around 06:00 UTC, here are the LL and PRP numbers:[/QUOTE]So, totals,
[CODE] 677277 130312 101680 2181 83.86% 16.14% 97.90% 2.10% LL PRP LL-DC PRP-DC[/CODE]Not enough PRP-DC for statistics yet. |
| All times are UTC. The time now is 23:03. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.