mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2020-05-20 17:53

[QUOTE=LaurV;545985]My bad wording. Sorry. I meant 2 instances, each in its own card.[/QUOTE]I thought that was clear. And now it's clearer.
The 2 on one card I wrote of is about performance tweaking.

CUDALucas and LL will be with us for a while yet, not only because of user inertia, but because some gpus are not capable of running gpuowl but can run CUDALucas; any NVIDIA gpu that does not support a sufficient subset of opencl. Some have DP performance stronger relative to their 32bit-int performance than others, such as old Teslas and Quadros.

kriesel 2020-05-20 18:04

Jacobi returns, and Middle15 appears
 
2 Attachment(s)
Win7 x64 build of gpuowl v6.11-288-g20c4213 attached.
I didn't notice until now, but sometime between 6.11-219 and 6.11-255, the ffts longer than 96M (up to 192M) were dropped.

ewmayer 2020-05-20 19:45

Mihai, with all (and great) respect, I'm trying to understand your resistance to supporting shifted residues. Please let me know if any of the following are incorrect:

1. You previously supported shift, at least for LL, perhaps also for PRP (and if you only had it for LL, note that it's even more trivial to support for PRP, because one does not need to do the per-iteration "at which bit location does the -2 get injected?" computation LL needs, one only needs to update the shift count via doubling-mod-p);

2. You noted that supporting shift did not adversely affect performance.

So why not simply reactivate the old shift-supporting code segments? Has the main codebase changed so much in the meanwhile that doing so would be a major pain?

Knowing that 2 tests, whether using the same or different codes, run at the same FFT length are using fundamentally different residue-word data is a big deal, confidence-in-result-wise. "Gerbicz check is totally foolproof" is [a] a surmise, based on limited run numbers with independent-program DCs for cross-checking, and [b] is implementation-dependent, in similar fashion to "a mathematically foolproof cryptography scheme can be nullified by a flawed software implementation".

If it's relatively trivial to support, why not do so? Restore the hopefully-modest amounts of code needed to support it, and then move on. Shift-related code is simple enough that once deployed, it doesn't need further attention. The next time I expect to need to revisit my shift-related code in Mlucas will be if and when a major vendor puts out a 1024-or-more-bit SIMD architecture, at which point I'll need to make some modest changes to the LL-test carry routines to inject the per-iteration -2 into the proper one of the 16 doubles in the corresponding carry-step SIMD vector-of-residue-words datum.

kriesel 2020-05-20 20:39

Reviewing my notes and some old posts, I see Mihai implemented shift for LL, back at gpuowl v0.3, indicating an 0.5% overhead then, and dropped it by v0.6, when he first implemented Jacobi check with a stated overhead of 0.2%. (See links at [URL]https://www.mersenneforum.org/showpost.php?p=489083&postcount=7[/URL])

If those values are still about current, I would be fine with such 0.7% overhead or a bit more to gain some reliability through differing shift and some error detection combined in the same build. People like LaurV who run dual runs probably would be too. I think some P-1 error checking at 2% overhead, perhaps more, would be ok.

For LL, perhaps it could be made optional, such as by -use no_offset,no_jacobi, with offset and jacobi check on by default. It would be good if those only affected LL worktodo lines and did not prevent gpuowl from running mixed work in the same worktodo file; LL followed by P-1 and PRP etc. (Providing no_GEC to PRP is unappealing.) Some of us would be happy to benchmark the reliability options if/when available.

Way back when gpuowl was announced, Mihai closed [URL]https://www.mersenneforum.org/showpost.php?p=457032&postcount=1[/URL] with[CODE]I'll be looking for problems to fix. I hope you'll enjoy!
cheers,
Mihai[/CODE]Yes, immensely. It seems to me a lively 3 years and great progress to date. Looking forward to what comes next.

kriesel 2020-05-20 21:52

[QUOTE=kriesel;545974]Two gpuowl instances running on the same gpu reportedly helps total testing throughput. But not always. Test what you run. (terribly dispare instances detail... 40.94% of solo throughput.
It's probably best to run same computation type, same fft size, or perhaps very similar size.[/QUOTE]
One vs two instances on a gpu, gpuowl-win v6.11-278, 9M PRP/9M PRP, radeonvii gpu, i7-4790 cpu

1a 1467.4 us/it, 681.48 iter/sec
1b 1465.6 us/it, 682.31 iter/sec
average 681.9 iter/sec
2 together 2884.4 us/it +2876.8 us/it, 346.69 iter/sec + 347.61 iter/sec = 694.30 iter/sec,
2 instances / gpu yielded 1.01819 times speed of a single instance; 1.82% faster.
Equivalent to 217.3 GhzD/day. Seems low. Maybe thermally throttling?

kriesel 2020-05-21 17:40

Gpuowl-win v6.11-288 LL Jacobi test
 
First trial of GpuOwL v6.11-288 LL with Jacobi check, continuation of an existing run.
Happy/relieved to see the initial Jacobi check pass, although this gpu has successfully completed PRP wavefront exponents with no GEC errors.
[CODE]2020-05-20 18:27:57 gpuowl v6.11-288-g20c4213
2020-05-20 18:27:57 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000
2020-05-20 18:27:57 device 2, unique id ''
2020-05-20 18:27:57 asr2/radeonvii2 154155713 FFT: 8M 1K:8:512 (18.38 bpw)
2020-05-20 18:27:57 asr2/radeonvii2 Expected maximum carry32: 70B40000
2020-05-20 18:27:59 asr2/radeonvii2 OpenCL args "-DEXP=154155713u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=8u -DWEIGHT_STEP=0xc.528658a63b438p-3 -DIWEIGHT_STEP=0xa.633a
f6ee9bb58p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DMM_CHAIN=2u -DMM2_CHAIN=3u -DNO_ASM=1 -cl-fast
-relaxed-math -cl-std=CL2.0 "
2020-05-20 18:28:09 asr2/radeonvii2 OpenCL compilation in 10.16 s
2020-05-20 18:28:10 asr2/radeonvii2 154155713 LL 60507000 loaded: 223f2e70d392347e
2020-05-20 18:30:29 asr2/radeonvii2 154155713 LL 60600000 39.31%; 1492 us/it; ETA 1d 14:47; 70cdcc47cb7d66f7
2020-05-20 18:32:48 asr2/radeonvii2 154155713 LL 60700000 39.38%; 1397 us/it; ETA 1d 12:16; fa7b795be968fc04
2020-05-20 18:35:08 asr2/radeonvii2 154155713 LL 60800000 39.44%; 1397 us/it; ETA 1d 12:13; 3d74bd4d1e787fca
2020-05-20 18:37:28 asr2/radeonvii2 154155713 LL 60900000 39.51%; 1396 us/it; ETA 1d 12:10; 771738771fa1b18f
2020-05-20 18:39:47 asr2/radeonvii2 154155713 LL 61000000 39.57%; 1396 us/it; ETA 1d 12:08; 663deec0ffbc5691
2020-05-20 18:42:07 asr2/radeonvii2 154155713 LL 61100000 39.64%; 1396 us/it; ETA 1d 12:05; 8b7abb01cae7c971
2020-05-20 18:42:07 asr2/radeonvii2 154155713 OK 61000000 (jacobi == -1)
2020-05-20 18:44:27 asr2/radeonvii2 154155713 LL 61200000 39.70%; 1398 us/it; ETA 1d 12:06; 1d4fad0f5f424457
...
2020-05-21 04:42:25 asr2/radeonvii2 154155713 LL 86900000 56.37%; 1395 us/it; ETA 1d 02:04; b558d286cd9d5735
2020-05-21 04:44:45 asr2/radeonvii2 154155713 LL 87000000 56.44%; 1395 us/it; ETA 1d 02:01; 62c4e005f393521f
2020-05-21 04:47:04 asr2/radeonvii2 154155713 LL 87100000 56.50%; 1394 us/it; ETA 1d 01:58; d31bd8c703cd82e2
2020-05-21 04:49:23 asr2/radeonvii2 154155713 LL 87200000 56.57%; 1395 us/it; ETA 1d 01:57; 4ebde4106568146f
2020-05-21 04:49:23 asr2/radeonvii2 154155713 OK 87000000 (jacobi == -1)
2020-05-21 04:51:43 asr2/radeonvii2 154155713 LL 87300000 56.63%; 1396 us/it; ETA 1d 01:55; deb4718bd97c793e[/CODE]until it hit a repeatable Jacobi error between 87M and 88M:
[CODE]2020-05-21 05:07:59 asr2/radeonvii2 154155713 LL 88000000 57.09%; 1395 us/it; ETA 1d 01:38; 8a332b7fbfafa5f7
2020-05-21 05:10:19 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1394 us/it; ETA 1d 01:35; 76a8832229ab7d36
2020-05-21 05:12:38 asr2/radeonvii2 154155713 LL 88200000 57.21%; 1395 us/it; ETA 1d 01:33; 687d9f07557de7fc
2020-05-21 05:12:38 asr2/radeonvii2 154155713 EE 88000000 (jacobi == 1)
2020-05-21 05:12:39 asr2/radeonvii2 154155713 LL 87000000 loaded: 62c4e005f393521f
2020-05-21 05:14:58 asr2/radeonvii2 154155713 LL 87100000 56.50%; 1394 us/it; ETA 1d 01:58; d31bd8c703cd82e2
2020-05-21 05:17:18 asr2/radeonvii2 154155713 LL 87200000 56.57%; 1394 us/it; ETA 1d 01:56; 4ebde4106568146f
2020-05-21 05:19:37 asr2/radeonvii2 154155713 LL 87300000 56.63%; 1394 us/it; ETA 1d 01:54; deb4718bd97c793e
2020-05-21 05:21:56 asr2/radeonvii2 154155713 LL 87400000 56.70%; 1395 us/it; ETA 1d 01:52; 78c200210067f63a
2020-05-21 05:24:16 asr2/radeonvii2 154155713 LL 87500000 56.76%; 1395 us/it; ETA 1d 01:49; e2e6daf06bc84732
2020-05-21 05:26:35 asr2/radeonvii2 154155713 LL 87600000 56.83%; 1394 us/it; ETA 1d 01:47; 421b198d26a5456f
2020-05-21 05:28:55 asr2/radeonvii2 154155713 LL 87700000 56.89%; 1395 us/it; ETA 1d 01:45; e6f1cee228a832b9
2020-05-21 05:31:14 asr2/radeonvii2 154155713 LL 87800000 56.96%; 1395 us/it; ETA 1d 01:42; 8f22ca3882842947
2020-05-21 05:33:34 asr2/radeonvii2 154155713 LL 87900000 57.02%; 1395 us/it; ETA 1d 01:40; 956c469d245e11f3
2020-05-21 05:35:53 asr2/radeonvii2 154155713 LL 88000000 57.09%; 1395 us/it; ETA 1d 01:38; 8a332b7fbfafa5f7
2020-05-21 05:38:13 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1394 us/it; ETA 1d 01:35; 76a8832229ab7d36
2020-05-21 05:40:32 asr2/radeonvii2 154155713 LL 88200000 57.21%; 1395 us/it; ETA 1d 01:33; 687d9f07557de7fc
2020-05-21 05:40:32 asr2/radeonvii2 154155713 EE 88000000 (jacobi == 1)
2020-05-21 05:40:32 asr2/radeonvii2 154155713 LL 87000000 loaded: 62c4e005f393521f
2020-05-21 05:42:52 asr2/radeonvii2 154155713 LL 87100000 56.50%; 1394 us/it; ETA 1d 01:58; d31bd8c703cd82e2
2020-05-21 05:45:11 asr2/radeonvii2 154155713 LL 87200000 56.57%; 1395 us/it; ETA 1d 01:56; 4ebde4106568146f
2020-05-21 05:47:31 asr2/radeonvii2 154155713 LL 87300000 56.63%; 1394 us/it; ETA 1d 01:54; deb4718bd97c793e
2020-05-21 05:49:50 asr2/radeonvii2 154155713 LL 87400000 56.70%; 1395 us/it; ETA 1d 01:52; 78c200210067f63a
2020-05-21 05:52:10 asr2/radeonvii2 154155713 LL 87500000 56.76%; 1395 us/it; ETA 1d 01:49; e2e6daf06bc84732
2020-05-21 05:54:29 asr2/radeonvii2 154155713 LL 87600000 56.83%; 1395 us/it; ETA 1d 01:47; 421b198d26a5456f
2020-05-21 05:56:49 asr2/radeonvii2 154155713 LL 87700000 56.89%; 1395 us/it; ETA 1d 01:45; e6f1cee228a832b9
2020-05-21 05:59:08 asr2/radeonvii2 154155713 LL 87800000 56.96%; 1395 us/it; ETA 1d 01:43; 8f22ca3882842947
2020-05-21 06:01:28 asr2/radeonvii2 154155713 LL 87900000 57.02%; 1395 us/it; ETA 1d 01:40; 956c469d245e11f3
2020-05-21 06:03:47 asr2/radeonvii2 154155713 LL 88000000 57.09%; 1394 us/it; ETA 1d 01:37; 8a332b7fbfafa5f7
2020-05-21 06:06:06 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1394 us/it; ETA 1d 01:35; 76a8832229ab7d36
2020-05-21 06:08:26 asr2/radeonvii2 154155713 LL 88200000 57.21%; 1395 us/it; ETA 1d 01:33; 687d9f07557de7fc
2020-05-21 06:08:26 asr2/radeonvii2 154155713 EE 88000000 (jacobi == 1)
2020-05-21 06:08:26 asr2/radeonvii2 154155713 LL 87000000 loaded: 62c4e005f393521f
2020-05-21 06:10:46 asr2/radeonvii2 154155713 LL 87100000 56.50%; 1394 us/it; ETA 1d 01:58; d31bd8c703cd82e2
2020-05-21 06:13:05 asr2/radeonvii2 154155713 LL 87200000 56.57%; 1395 us/it; ETA 1d 01:56; 4ebde4106568146f
2020-05-21 06:15:25 asr2/radeonvii2 154155713 LL 87300000 56.63%; 1395 us/it; ETA 1d 01:54; deb4718bd97c793e
2020-05-21 06:17:44 asr2/radeonvii2 154155713 LL 87400000 56.70%; 1394 us/it; ETA 1d 01:51; 78c200210067f63a
2020-05-21 06:20:04 asr2/radeonvii2 154155713 LL 87500000 56.76%; 1395 us/it; ETA 1d 01:49; e2e6daf06bc84732
2020-05-21 06:22:23 asr2/radeonvii2 154155713 LL 87600000 56.83%; 1395 us/it; ETA 1d 01:47; 421b198d26a5456f
2020-05-21 06:24:43 asr2/radeonvii2 154155713 LL 87700000 56.89%; 1395 us/it; ETA 1d 01:45; e6f1cee228a832b9
2020-05-21 06:27:02 asr2/radeonvii2 154155713 LL 87800000 56.96%; 1395 us/it; ETA 1d 01:42; 8f22ca3882842947
2020-05-21 06:29:22 asr2/radeonvii2 154155713 LL 87900000 57.02%; 1395 us/it; ETA 1d 01:40; 956c469d245e11f3
2020-05-21 06:31:41 asr2/radeonvii2 154155713 LL 88000000 57.09%; 1394 us/it; ETA 1d 01:37; 8a332b7fbfafa5f7
2020-05-21 06:34:00 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1394 us/it; ETA 1d 01:35; 76a8832229ab7d36
2020-05-21 06:34:00 asr2/radeonvii2 154155713 EE 88000000 (jacobi == 1)
2020-05-21 06:34:01 asr2/radeonvii2 154155713 LL 87000000 loaded: 62c4e005f393521f
2020-05-21 06:36:20 asr2/radeonvii2 154155713 LL 87100000 56.50%; 1395 us/it; ETA 1d 01:59; d31bd8c703cd82e2
2020-05-21 06:38:40 asr2/radeonvii2 154155713 LL 87200000 56.57%; 1395 us/it; ETA 1d 01:56; 4ebde4106568146f
2020-05-21 06:40:59 asr2/radeonvii2 154155713 LL 87300000 56.63%; 1394 us/it; ETA 1d 01:54; deb4718bd97c793e
2020-05-21 06:43:19 asr2/radeonvii2 154155713 LL 87400000 56.70%; 1395 us/it; ETA 1d 01:52; 78c200210067f63a
2020-05-21 06:45:38 asr2/radeonvii2 154155713 LL 87500000 56.76%; 1395 us/it; ETA 1d 01:50; e2e6daf06bc84732
2020-05-21 06:47:58 asr2/radeonvii2 154155713 LL 87600000 56.83%; 1395 us/it; ETA 1d 01:47; 421b198d26a5456f
2020-05-21 06:50:17 asr2/radeonvii2 154155713 LL 87700000 56.89%; 1395 us/it; ETA 1d 01:45; e6f1cee228a832b9
2020-05-21 06:52:37 asr2/radeonvii2 154155713 LL 87800000 56.96%; 1396 us/it; ETA 1d 01:43; 8f22ca3882842947
2020-05-21 06:54:56 asr2/radeonvii2 154155713 LL 87900000 57.02%; 1395 us/it; ETA 1d 01:41; 956c469d245e11f3
2020-05-21 06:57:16 asr2/radeonvii2 154155713 LL 88000000 57.09%; 1395 us/it; ETA 1d 01:38; 8a332b7fbfafa5f7
2020-05-21 06:59:35 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1395 us/it; ETA 1d 01:35; 76a8832229ab7d36
2020-05-21 06:59:35 asr2/radeonvii2 154155713 EE 88000000 (jacobi == 1)
2020-05-21 06:59:36 asr2/radeonvii2 154155713 LL 87000000 loaded: 62c4e005f393521f
2020-05-21 06:59:43 asr2/radeonvii2 Stopping, please wait..
2020-05-21 06:59:43 asr2/radeonvii2 154155713 LL 87005000 56.44%; 1457 us/it; ETA 1d 03:11; b349261c8446786d
2020-05-21 06:59:43 asr2/radeonvii2 waiting for the Jacobi check to finish..
2020-05-21 07:02:09 asr2/radeonvii2 154155713 OK 87005000 (jacobi == -1)
2020-05-21 07:02:09 asr2/radeonvii2 Exiting because "stop requested"
2020-05-21 07:02:09 asr2/radeonvii2 Bye[/CODE]Reducing the Jacobi interval after a fail or two in a row might be helpful. This particular cpu/gpu combination would be limited to about 200K or above for gpu time > cpu Jacobi check time so that there's at most one Jacobi check pending at a time in a continuing run. [CODE]2020-05-21 11:12:22 asr2/radeonvii2 OpenCL compilation in 8.66 s
2020-05-21 11:12:23 asr2/radeonvii2 154155713 LL 87503000 loaded: b5116bfd98e830ad
2020-05-21 11:14:38 asr2/radeonvii2 154155713 LL 87600000 56.83%; 1395 us/it; ETA 1d 01:47; 421b198d26a5456f
2020-05-21 11:16:58 asr2/radeonvii2 154155713 LL 87700000 56.89%; 1396 us/it; ETA 1d 01:46; e6f1cee228a832b9
2020-05-21 11:17:04 asr2/radeonvii2 Stopping, please wait..
2020-05-21 11:17:04 asr2/radeonvii2 154155713 LL 87704000 56.89%; 1477 us/it; ETA 1d 03:16; 79d3368b77502fd3
2020-05-21 11:17:04 asr2/radeonvii2 waiting for the Jacobi check to finish..
2020-05-21 11:19:24 asr2/radeonvii2 154155713 OK 87704000 (jacobi == -1)[/CODE]Manually starting and stopping after 100K or 200K intervals, then smaller intervals after 87.9M, I coaxed it further past 87M. Jacobi checks were ok to 87.973M; not at 88M.[CODE]2020-05-21 12:00:12 gpuowl v6.11-288-g20c4213
2020-05-21 12:00:12 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000
2020-05-21 12:00:12 device 2, unique id ''
2020-05-21 12:00:12 asr2/radeonvii2 154155713 FFT: 8M 1K:8:512 (18.38 bpw)
2020-05-21 12:00:12 asr2/radeonvii2 Expected maximum carry32: 70B40000
2020-05-21 12:00:14 asr2/radeonvii2 OpenCL args "-DEXP=154155713u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=8u -DWEIGHT_STEP=0xc.528658a63b438p-3 -DIWEIGHT_STEP=0xa.633a
f6ee9bb58p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DMM_CHAIN=2u -DMM2_CHAIN=3u -DNO_ASM=1 -cl-fast
-relaxed-math -cl-std=CL2.0 "
2020-05-21 12:00:25 asr2/radeonvii2 OpenCL compilation in 11.03 s
2020-05-21 12:00:26 asr2/radeonvii2 154155713 LL 87952000 loaded: 64a16470a62c62a6
2020-05-21 12:00:55 asr2/radeonvii2 Stopping, please wait..
2020-05-21 12:00:56 asr2/radeonvii2 154155713 LL 87973000 57.07%; 1406 us/it; ETA 1d 01:51; 6cdf9822b4a86ada
2020-05-21 12:00:56 asr2/radeonvii2 waiting for the Jacobi check to finish..
2020-05-21 12:03:22 asr2/radeonvii2 154155713 OK 87973000 (jacobi == -1)
2020-05-21 12:03:22 asr2/radeonvii2 Exiting because "stop requested"[/CODE] I saved a copy of the ll file at 87.952M iterations.
The above was with the default fft selected by the program, 8M 1K:8:512, which seemed ambitious at 18.38 bits/word.
Trying different fft was mostly foiled by the following -fft specs not being accepted, called too small, 0K or 128K, even though the first two were copied and pasted from the help output:
-fft 1K:4:1K
-fft 512:8:1K
-fft +1[CODE]
2020-05-21 12:03:45 gpuowl v6.11-288-g20c4213
2020-05-21 12:03:45 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 -fft 1K:4:1K
2020-05-21 12:03:45 device 2, unique id ''
2020-05-21 12:03:45 asr2/radeonvii2 154155713 FFT: 0M 1K:4K:1K (inf bpw)
2020-05-21 12:03:45 asr2/radeonvii2 FFT size too small for exponent (inf bits/word).
2020-05-21 12:03:45 asr2/radeonvii2 Exiting because "FFT size too small"
2020-05-21 12:03:45 asr2/radeonvii2 Bye

2020-05-21 12:04:54 gpuowl v6.11-288-g20c4213
2020-05-21 12:04:54 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 -fft +1
2020-05-21 12:04:54 device 2, unique id ''
2020-05-21 12:04:54 asr2/radeonvii2 154155713 FFT: 128K 256:1:256 (1176.11 bpw)
2020-05-21 12:04:54 asr2/radeonvii2 FFT size too small for exponent (1176.11 bits/word).
2020-05-21 12:04:54 asr2/radeonvii2 Exiting because "FFT size too small"
2020-05-21 12:04:54 asr2/radeonvii2 Bye

2020-05-21 12:08:51 gpuowl v6.11-288-g20c4213
2020-05-21 12:08:51 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 -fft 512:8:1K
2020-05-21 12:08:51 device 2, unique id ''
2020-05-21 12:08:51 asr2/radeonvii2 154155713 FFT: 0M 512:8K:1K (inf bpw)
2020-05-21 12:08:51 asr2/radeonvii2 FFT size too small for exponent (inf bits/word).
2020-05-21 12:08:51 asr2/radeonvii2 Exiting because "FFT size too small"
2020-05-21 12:08:51 asr2/radeonvii2 Bye[/CODE] -fft 4K:4:256 was accepted and ran. This got it through 88M with a correct Jacobi and a different res64 8d2dd3005435c15e.
I'm continuing the exponent with that last fft choice, to check speed and luck. If that doesn't work I'll try 9M next.

Edit: That didn't take long. [CODE]2020-05-21 12:39:37 gpuowl v6.11-288-g20c4213
2020-05-21 12:39:37 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000 -fft 4K:4:256
2020-05-21 12:39:37 device 2, unique id ''
2020-05-21 12:39:37 asr2/radeonvii2 154155713 FFT: 8M 4K:4:256 (18.38 bpw)
2020-05-21 12:39:37 asr2/radeonvii2 Expected maximum carry32: 70B40000
2020-05-21 12:39:39 asr2/radeonvii2 OpenCL args "-DEXP=154155713u -DWIDTH=4096u -DSMALL_HEIGHT=256u -DMIDDLE=4u -DWEIGHT_STEP=0xc.528658a63b438p-3 -DIWEIGHT_STEP=0xa.633a
f6ee9bb58p-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DNO_ASM=1 -cl-fast
-relaxed-math -cl-std=CL2.0 "
2020-05-21 12:39:46 asr2/radeonvii2 OpenCL compilation in 6.65 s
2020-05-21 12:39:47 asr2/radeonvii2 154155713 LL 88000000 loaded: 8d2dd3005435c15e
2020-05-21 12:42:32 asr2/radeonvii2 154155713 LL 88100000 57.15%; 1654 us/it; ETA 1d 06:21; a1378029fe3d2180
2020-05-21 12:45:18 asr2/radeonvii2 154155713 LL 88200000 57.21%; 1655 us/it; ETA 1d 06:19; cdbb9de9a9320c46
2020-05-21 12:47:03 asr2/radeonvii2 Stopping, please wait..
2020-05-21 12:47:04 asr2/radeonvii2 154155713 LL 88264000 57.26%; 1656 us/it; ETA 1d 06:19; 30218bc01cbfa191
2020-05-21 12:47:04 asr2/radeonvii2 waiting for the Jacobi check to finish..
2020-05-21 12:49:25 asr2/radeonvii2 154155713 EE 88264000 (jacobi == 1)[/CODE]Trying 9M; different res64 at 88.1M and onward, and only ~3% slower than the default 8M fft spec.

Prime95 2020-05-21 19:18

Ken,

Judging from these flags -DMM_CHAIN=2u -DMM2_CHAIN=3u the exponent is near the upper limits of that FFT length. Our gpuowl defaults may be too aggressive.

Assuming you saved the problematic files:
Can you try getting past the bad iteration with "ULTRA_TRIG=1"?
Can you get past with "MM_CHAIN=3?

ATH 2020-05-21 20:14

[QUOTE=kriesel;545970]I agree that PRP first test with the excellent GEC is preferable over LL with Jacobi check and its 50% error detection probability or LL without Jacobi as in CUDALucas and most gpuowl LL-supporting versions.
But the realities are that a great deal of LL is still being done by various programs.
[/QUOTE]

I checked today May 21st around 06:00 UTC, here are the LL and PRP numbers:


[CODE] LL PRP LL-DC PRP-DC
50000000 735 0 19033 0
51000000 226 0 19389 0
52000000 2685 0 16974 0
53000000 7196 0 12482 0
54000000 13839 0 5648 0
55000000 15560 0 4097 0
56000000 16364 0 3419 0
57000000 17324 0 2347 0
58000000 18143 0 1645 0
59000000 19212 0 504 0
60000000 19259 0 410 1
61000000 19165 0 433 0
62000000 19083 0 434 0
63000000 19328 0 396 0
64000000 19138 0 402 0
65000000 19075 0 404 0
66000000 19163 0 330 0
67000000 18767 0 627 0
68000000 18998 0 570 0
69000000 18584 0 744 0
70000000 18870 0 622 2
71000000 18975 0 423 0
72000000 18633 0 661 0
73000000 18528 0 985 0
74000000 18535 0 789 1
75000000 18695 0 771 91
76000000 18499 0 545 269
77000000 18303 3 655 233
78000000 18461 3 540 184
79000000 18374 12 584 206
80000000 18531 90 576 207
81000000 18532 49 697 257
82000000 18326 153 508 311
83000000 18184 646 415 95
84000000 17352 1395 376 17
85000000 16062 2496 474 48
86000000 14688 3901 439 55
87000000 13157 5706 324 26
88000000 11390 7422 318 23
89000000 9821 9002 262 34
90000000 5765 12929 223 20
91000000 8306 9079 104 26
92000000 2032 16016 3 6
93000000 4834 13058 4 12
94000000 1576 16983 3 35
95000000 890 17133 2 2
96000000 246 1486 8 8
97000000 244 176 3 0
98000000 2081 4042 19 1
99000000 2446 4488 17 2
100000000 505 842 16 1
101000000 495 506 2 1
102000000 1003 684 8 2
103000000 766 701 6 1
104000000 221 127 1 0
105000000 395 243 2 1
106000000 963 473 2 1
107000000 707 462 2 1
108000000 11 4 1 1
109000000 16 1 1 0
110000000 15 1 1 0
[/CODE]

kriesel 2020-05-21 21:02

[QUOTE=Prime95;546145]Ken,

Judging from these flags -DMM_CHAIN=2u -DMM2_CHAIN=3u the exponent is near the upper limits of that FFT length. Our gpuowl defaults may be too aggressive.

Assuming you saved the problematic files:
Can you try getting past the bad iteration with "ULTRA_TRIG=1"?
Can you get past with "MM_CHAIN=3?[/QUOTE]
Will do. I also have an interesting case at 94.95M 5M fft on an RX550, 18.11 bits/word, that has been problematic in both v6.11-257 and -288.

preda 2020-05-21 23:22

[QUOTE=kriesel;546132]
Trying different fft was mostly foiled by the following -fft specs not being accepted, called too small, 0K or 128K, even though the first two were copied and pasted from the help output:
-fft 1K:4:1K
-fft 512:8:1K
-fft +1[/QUOTE]

Ken, thanks for reporting this FFT-spec parsing bug, should be fixed now.

-fft +1 does not work anymore -- there is no format for "relative to the default" anymove. So -fft +1 is the same as -fft 1, which is an invalid FFT size.

I plan to reduce the default Jacobi step to 500K. It's safe to set it manually to a smaller value, it won't start a new background check if there is one ongoing.

kriesel 2020-05-22 03:03

[QUOTE=ATH;546150]I checked today May 21st around 06:00 UTC, here are the LL and PRP numbers:[/QUOTE]So, totals,
[CODE] 677277 130312 101680 2181
83.86% 16.14% 97.90% 2.10%
LL PRP LL-DC PRP-DC[/CODE]Not enough PRP-DC for statistics yet.


All times are UTC. The time now is 23:03.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.