mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2018-10-22 15:26

[QUOTE=SELROC;498468]Large number test:


[CODE]2018-10-22 10:56:04 3 gpuowl 4.6--mod
2018-10-22 10:56:04 3 FFT 73728K: Width 2048 (256x8), Height 2048 (256x8), Middle 9; 13.25 bits/word
2018-10-22 10:56:04 3 Note: using long carry kernels
2018-10-22 10:56:04 3 Ellesmere-36x1360-@6:0.0 Radeon RX 580 Series
2018-10-22 10:56:05 3 OpenCL compilation in 1020 ms, with "-DEXP=1000000001u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-10-22 10:56:10 3 PRP M(1000000001), FFT 73728K, 13.25 bits/word, B1 0
2018-10-22 10:56:51 3 OK loaded: 0/1000000001, B1 0, blockSize 400, 0000000000000003 (expected 0000000000000003)
2018-10-22 10:56:51 3 Selected 0 P-1 trial points
2018-10-22 10:58:21 3 OK 800/1000000001 [ 0.00%], 70.93 ms/it [70.87, 70.99]; ETA 820d 22:05; b70bd0429f585b7f (check 33.15s)
2018-10-22 11:06:52 3 Stopping, please wait..
2018-10-22 11:07:25 3 OK 8000/1000000001 [ 0.00%], 71.02 ms/it [70.96, 71.70]; ETA 822d 00:15; e8cbaa94ad3015eb (check 33.45s)
2018-10-22 11:07:25 3 Starting GCD over 0 points
2018-10-22 11:07:28 3 Waiting for GCD to finish..
2018-10-22 11:07:28 3 Exiting because "stop requested"
2018-10-22 11:07:28 3 Bye[/CODE][/QUOTE]

Interesting information. Doesn't seem great performance though :)
It's true I didn't even look into tuning that exponent size (yet); hopefully could be improved a bit. But realistically, is anybody testing such exponents..?

SELROC 2018-10-22 15:51

[QUOTE=preda;498502]Interesting information. Doesn't seem great performance though :)
It's true I didn't even look into tuning that exponent size (yet); hopefully could be improved a bit. But realistically, is anybody testing such exponents..?[/QUOTE]


I don't know, I test them expressly to see gpuowl performance.
Basically I have to redo the test on each version.


Right now I am searching which exponent range triggers the too small FFT selection.

ET_ 2018-10-22 16:15

[QUOTE=preda;498499]I could add a test for the primality of the exponent.. but is that really needed?
I mean, usually the user would obtain the exponent from some source, and then it's prime. Otherwise if somebody just wants to do a test (with a random non-prime value), why not? I think there are some asserts that would trigger on an even exponent.[/QUOTE]

I agree with you about the need of a user-awareneess related to the program.

My doubt derives from the statement "Let Mp = 2p − 1 be the Mersenne number to test with p an odd [COLOR="Red"]prime[/COLOR]." As you test L(p-1) equivalent to 0 mod Mp, your mod operation may have different values if Mp is not prime, and the LL test might show a wrong result.

But again, I'm not proficient in coding the LL algorithm, and may be wrong as well.

SELROC 2018-10-22 16:23

[QUOTE=SELROC;498503]I don't know, I test them expressly to see gpuowl performance.
Basically I have to redo the test on each version.


Right now I am searching which exponent range triggers the too small FFT selection.[/QUOTE]




Long log:


2018-10-22 17:39:59 3 gpuowl 4.6--mod
2018-10-22 17:39:59 3 FFT 4608K: Width 512 (64x8), Height 512 (64x8), Middle 9; 18.37 bits/word
2018-10-22 17:39:59 3 Note: using short carry kernels
2018-10-22 17:39:59 3 Ellesmere-36x1360-@6:0.0 Radeon RX 580 Series
2018-10-22 17:40:00 3 OpenCL compilation in 853 ms, with "-DEXP=86700001u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-10-22 17:40:00 3 PRP M(86700001), FFT 4608K, 18.37 bits/word, B1 0
2018-10-22 17:40:02 3 OK loaded: 0/86700001, B1 0, blockSize 400, 0000000000000003 (expected 0000000000000003)
2018-10-22 17:40:02 3 Selected 0 P-1 trial points
2018-10-22 17:40:08 3 OK 800/86700001 [ 0.00%], 4.10 ms/it [4.09, 4.10]; ETA 4d 02:39; 16460e4280156eaa (check 2.10s)
2018-10-22 17:40:45 3 10000/86700001 [ 0.01%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:53; 82759ed0ecd6fc95
2018-10-22 17:41:27 3 20000/86700001 [ 0.02%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:52; d5075945f4b2b972
2018-10-22 17:42:08 3 30000/86700001 [ 0.03%], 4.11 ms/it [4.10, 4.16]; ETA 4d 02:55; 41a8bb8b01bf4633
2018-10-22 17:42:49 3 40000/86700001 [ 0.05%], 4.11 ms/it [4.10, 4.15]; ETA 4d 02:50; ee0734f60d7b1db3
2018-10-22 17:43:30 3 50000/86700001 [ 0.06%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:49; f6a660fe2b783ce6
2018-10-22 17:44:11 3 60000/86700001 [ 0.07%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:49; 4274d66af0efb8f4
2018-10-22 17:44:52 3 70000/86700001 [ 0.08%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:48; 750a4f4758744c06
2018-10-22 17:45:33 3 80000/86700001 [ 0.09%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:46; c711dd998fe2f029
2018-10-22 17:46:14 3 90000/86700001 [ 0.10%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:48; 6a0a2c5b2353d680
2018-10-22 17:46:55 3 100000/86700001 [ 0.12%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:45; 8cdc89078250d440
2018-10-22 17:47:36 3 110000/86700001 [ 0.13%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:49; 17bab91f2ac31b34
2018-10-22 17:48:17 3 120000/86700001 [ 0.14%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:42; 4b09dcf722c21e76
2018-10-22 17:48:58 3 130000/86700001 [ 0.15%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:44; 9a5730be57db997f
2018-10-22 17:49:39 3 140000/86700001 [ 0.16%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:45; 1f4140f6971a3fc8
2018-10-22 17:50:20 3 150000/86700001 [ 0.17%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:43; db1014b1d4121f4c
2018-10-22 17:51:04 3 OK 160000/86700001 [ 0.18%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:43; a6afed696697b5c4 (check 2.10s)
2018-10-22 17:51:45 3 170000/86700001 [ 0.20%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:42; 9bfc2eae15121eeb
2018-10-22 17:52:26 3 180000/86700001 [ 0.21%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:42; fac14bff3c6daa15
2018-10-22 17:53:07 3 190000/86700001 [ 0.22%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:43; bed36466316f8442
2018-10-22 17:53:48 3 200000/86700001 [ 0.23%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:40; b332138d2e641277
2018-10-22 17:54:29 3 210000/86700001 [ 0.24%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:41; c74057dc770ba4b7
2018-10-22 17:55:10 3 220000/86700001 [ 0.25%], 4.11 ms/it [4.10, 4.18]; ETA 4d 02:44; c03b5a734225ab6c
2018-10-22 17:55:51 3 230000/86700001 [ 0.27%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:39; 46160146db97217f
2018-10-22 17:56:32 3 240000/86700001 [ 0.28%], 4.11 ms/it [4.10, 4.15]; ETA 4d 02:42; 001f3c4bc2e3107e
2018-10-22 17:57:13 3 250000/86700001 [ 0.29%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:39; 3d01711945270baf
2018-10-22 17:57:55 3 260000/86700001 [ 0.30%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:39; 9bc0aa6cb0bfac3b
2018-10-22 17:58:36 3 270000/86700001 [ 0.31%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:38; ce0b50a6de9e3293
2018-10-22 17:59:17 3 280000/86700001 [ 0.32%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:38; b99f827733e4ec3d
2018-10-22 17:59:58 3 290000/86700001 [ 0.33%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:37; 9827be481aba3805
2018-10-22 18:00:39 3 300000/86700001 [ 0.35%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:36; f33b6414738221e7
2018-10-22 18:01:20 3 310000/86700001 [ 0.36%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:37; 5f6c14ec328abdee
2018-10-22 18:02:03 3 OK 320000/86700001 [ 0.37%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:36; 2bbfb6c8f24890d2 (check 2.10s)
2018-10-22 18:02:44 3 330000/86700001 [ 0.38%], 4.10 ms/it [4.09, 4.11]; ETA 4d 02:17; 8d1b60211b94ac90
2018-10-22 18:03:25 3 340000/86700001 [ 0.39%], 4.10 ms/it [4.09, 4.12]; ETA 4d 02:20; 05590b87a9ddfafa
2018-10-22 18:04:06 3 350000/86700001 [ 0.40%], 4.10 ms/it [4.09, 4.13]; ETA 4d 02:22; 1f3a102e8771564d
2018-10-22 18:04:47 3 360000/86700001 [ 0.42%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:20; 9f0890f84be5d70e
2018-10-22 18:05:28 3 370000/86700001 [ 0.43%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:20; bf5bb13efebafbae
2018-10-22 18:06:09 3 380000/86700001 [ 0.44%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:19; eadd1ee5a31802f5
2018-10-22 18:06:50 3 390000/86700001 [ 0.45%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:17; d398a7819315a916
2018-10-22 18:07:31 3 400000/86700001 [ 0.46%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:20; b8450226874b2ef6
2018-10-22 18:08:12 3 410000/86700001 [ 0.47%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:16; 4c3ccab2d6bfa17a
2018-10-22 18:08:53 3 420000/86700001 [ 0.48%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:15; eba5f14fdcd6f03d
2018-10-22 18:09:34 3 430000/86700001 [ 0.50%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:17; 3f8852cd4e1ce0ae
2018-10-22 18:10:15 3 440000/86700001 [ 0.51%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:13; 7bee67893af3109e
2018-10-22 18:10:56 3 450000/86700001 [ 0.52%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:15; 07dc22de8bfb9598
2018-10-22 18:11:37 3 460000/86700001 [ 0.53%], 4.10 ms/it [4.09, 4.12]; ETA 4d 02:11; a54bc95b4c65ba7a
2018-10-22 18:12:18 3 470000/86700001 [ 0.54%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:11; 6499a4e33d5d4c17
2018-10-22 18:13:02 3 EE 480000/86700001 [ 0.55%], 4.10 ms/it [4.09, 4.13]; ETA 4d 02:13; 836d28346a966447 (check 2.08s)
2018-10-22 18:13:04 3 OK loaded: 320000/86700001, B1 0, blockSize 400, 2bbfb6c8f24890d2 (expected 2bbfb6c8f24890d2)
2018-10-22 18:13:45 3 330000/86700001 [ 0.38%], 4.30 ms/it [4.10, 9.14]; ETA 4d 07:15; 8d1b60211b94ac90
2018-10-22 18:14:26 3 340000/86700001 [ 0.39%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:26; 05590b87a9ddfafa
2018-10-22 18:15:07 3 350000/86700001 [ 0.40%], 4.10 ms/it [4.09, 4.14]; ETA 4d 02:26; 1f3a102e8771564d
2018-10-22 18:15:48 3 360000/86700001 [ 0.42%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:27; 9f0890f84be5d70e
2018-10-22 18:16:29 3 370000/86700001 [ 0.43%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:26; bf5bb13efebafbae
2018-10-22 18:17:10 3 380000/86700001 [ 0.44%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:24; eadd1ee5a31802f5
2018-10-22 18:17:51 3 390000/86700001 [ 0.45%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:22; d398a7819315a916
2018-10-22 18:18:32 3 400000/86700001 [ 0.46%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:20; b8450226874b2ef6
2018-10-22 18:19:06 3 Stopping, please wait..
2018-10-22 18:19:08 3 EE 408400/86700001 [ 0.47%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:22; b25cad7eb46d0cf1 (check 2.08s)
2018-10-22 18:19:10 3 OK loaded: 320000/86700001, B1 0, blockSize 400, 2bbfb6c8f24890d2 (expected 2bbfb6c8f24890d2)
2018-10-22 18:19:10 3 Exiting because "stop requested"
2018-10-22 18:19:10 3 Bye




Even smaller than 86700001 produces EE, but later.

kriesel 2018-10-22 17:33

[QUOTE=preda;498500]I'm removing the ghz-day/day display, probably that zero wan an intermediary step.[/QUOTE]
I like having the GD/d display, providing it's reasonably accurate. [CODE]2018-10-20 20:41:05 condorella-rx480 PRP M(658000139), FFT 36864K, 17.43 bits/word, 0 GHz-day [/CODE]Zero for the total work of a PRP test clearly was not accurate. That propagated into the rate determined in interim outputs. Couldn't remember if such an anomaly had been seen and reported before.
I note in some v3.8 trials on varying exponents & fft lengths that GHzD/day was going up with fft length. I thought it would decline. Following are gpuowl-OpenCL 3.8-91c52fa on an RX480
[CODE]
FFT length exponent ms/it Ghz-D/day days projected
4608K 83871797 3.85 72.4 3.74
8192K 152000249 7.03 71.7 12.4
16384K 299000059 14.23 76.0 49.3
18432K 335000377 16.3 78.8 63.2
36864K 658000139 34.2 0.0 261.
73728K not tested, estimated > 1070 days (2.94 years)[/CODE]Selroc's 71 ms/it for 72M FFT on an RX580 seems not far off the mark.
Re the error I hit compiling gpuowl v4.6 in mingw64/msys2 for Windows x64, I will download a later commit and retry later, and thanks for the revision.

SELROC 2018-10-22 17:48

[QUOTE=preda;498500]I'm removing the ghz-day/day display, probably that zero wan an intermediary step.[/QUOTE]


I have not encountered that problem yet.


Please put back the ghz display, it is useful for comparisons.

kriesel 2018-10-22 18:49

[QUOTE=SELROC;498520]I have not encountered that problem yet.

Please put back the ghz display, it is useful for comparisons.[/QUOTE]

I concur, it is useful. Zero values can be ignored (or suppressed if Preda prefers).
It appears to only show zero at higher exponents or fft lengths, up to 335M 18M fft) was fine.

[CODE]C:\msys64\home\ken>openowl-V38-91c52fa-W64.exe -user kriesel -cpu condorella-rx480 -device 0
2018-10-22 13:10:01 condorella-rx480 gpuowl-OpenCL 3.8-91c52fa
2018-10-22 13:10:01 condorella-rx480 FFT 73728K: Width 2048 (256x8), Height 2048 (256x8), Middle 9; 13.25 bits/word
2018-10-22 13:10:01 condorella-rx480 Note: using long carry kernels
2018-10-22 13:10:04 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-10-22 13:10:09 condorella-rx480 OpenCL compilation in 5033 ms, with "-DEXP=999999937u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed-
math -cl-std=CL2.0 "
2018-10-22 13:10:21 condorella-rx480 PRP M(999999937), FFT 73728K, 13.25 bits/word, 0 GHz-day
2018-10-22 13:11:35 condorella-rx480 OK loaded: 0/999999937, blockSize 400, 0000000000000003
2018-10-22 13:12:04 condorella-rx480 OK initial check: 0000000000000003
2018-10-22 13:13:40 condorella-rx480 OK 800/999999937 [ 0.00%], 72.55 ms/it [72.42, 72.69] (0.0 GHz-day/day); ETA 839d 17:27; c3c8e02da339fdfa (check 38.43
s) (saved)
2018-10-22 13:24:52 condorella-rx480 10000/999999937 [ 0.00%], 72.95 ms/it [72.78, 74.27] (0.0 GHz-day/day); ETA 844d 07:10; a30a0c45e9fb828c
2018-10-22 13:26:19 condorella-rx480 Stopping, please wait..
2018-10-22 13:26:58 condorella-rx480 OK 11200/999999937 [ 0.00%], 72.81 ms/it [72.79, 72.82] (0.0 GHz-day/day); ETA 842d 17:00; a379021250b8cb64 (check 38.66
s) (saved)
2018-10-22 13:26:58 condorella-rx480 Bye[/CODE][CODE]C:\msys64\home\ken>openowl-V38-91c52fa-W64.exe -user kriesel -cpu condorella-rx480 -device 0
2018-10-22 13:28:17 condorella-rx480 gpuowl-OpenCL 3.8-91c52fa
2018-10-22 13:28:17 condorella-rx480 FFT 40960K: Width 2048 (256x8), Height 2048 (256x8), Middle 5; 17.36 bits/word
2018-10-22 13:28:17 condorella-rx480 Note: using long carry kernels
2018-10-22 13:28:19 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-10-22 13:28:24 condorella-rx480 OpenCL compilation in 5013 ms, with "-DEXP=728000017u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=5u -I. -cl-fast-relaxed-
math -cl-std=CL2.0 "
2018-10-22 13:28:30 condorella-rx480 PRP M(728000017), FFT 40960K, 17.36 bits/word, 0 GHz-day
2018-10-22 13:29:12 condorella-rx480 OK loaded: 0/728000017, blockSize 400, 0000000000000003
2018-10-22 13:29:29 condorella-rx480 OK initial check: 0000000000000003
2018-10-22 13:30:24 condorella-rx480 OK 800/728000017 [ 0.00%], 41.19 ms/it [41.10, 41.27] (0.0 GHz-day/day); ETA 347d 01:17; b381fc5ed49bc70d (check 21.83
s) (saved)
2018-10-22 13:36:45 condorella-rx480 10000/728000017 [ 0.00%], 41.38 ms/it [41.31, 41.60] (0.0 GHz-day/day); ETA 348d 16:15; fc3543bd6f05c1bd
2018-10-22 13:37:01 condorella-rx480 Stopping, please wait..
2018-10-22 13:37:23 condorella-rx480 OK 10400/728000017 [ 0.00%], 41.47 ms/it [41.47, 41.47] (0.0 GHz-day/day); ETA 349d 11:03; 1204e3403d7dbe8d (check 22.00
s) (saved)
2018-10-22 13:37:23 condorella-rx480 Bye[/CODE][CODE]C:\msys64\home\ken>openowl-V38-91c52fa-W64.exe -user kriesel -cpu condorella-rx480 -device 0
2018-10-22 13:39:50 condorella-rx480 gpuowl-OpenCL 3.8-91c52fa
2018-10-22 13:39:50 condorella-rx480 FFT 20480K: Width 1024 (256x4), Height 2048 (256x8), Middle 5; 17.69 bits/word
2018-10-22 13:39:50 condorella-rx480 Note: using short carry kernels
2018-10-22 13:39:52 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-10-22 13:39:56 condorella-rx480 OpenCL compilation in 4591 ms, with "-DEXP=371000039u -DWIDTH=1024u -DSMALL_HEIGHT=2048u -DMIDDLE=5u -I. -cl-fast-relaxed-
math -cl-std=CL2.0 "
2018-10-22 13:40:00 condorella-rx480 PRP M(371000039), FFT 20480K, 17.69 bits/word, 6400 GHz-day
2018-10-22 13:40:20 condorella-rx480 OK loaded: 0/371000039, blockSize 400, 0000000000000003
2018-10-22 13:40:28 condorella-rx480 OK initial check: 0000000000000003
2018-10-22 13:40:53 condorella-rx480 OK 800/371000039 [ 0.00%], 18.80 ms/it [18.75, 18.85] (79.3 GHz-day/day); ETA 80d 17:27; e2ac5ec2a6819689 (check 10.16
s) (saved)
2018-10-22 13:43:47 condorella-rx480 10000/371000039 [ 0.00%], 18.92 ms/it [18.82, 20.53] (78.8 GHz-day/day); ETA 81d 06:03; edb6fd2abeb3f16c
2018-10-22 13:45:41 condorella-rx480 Stopping, please wait..
2018-10-22 13:45:51 condorella-rx480 OK 16000/371000039 [ 0.00%], 18.96 ms/it [18.82, 20.52] (78.6 GHz-day/day); ETA 81d 10:04; 9c0d2afc3567b445 (check 10.14
s) (saved)
2018-10-22 13:45:51 condorella-rx480 Bye[/CODE]The zero work/PRP seems to develop above 20M fft length, which was the maximum length in V3.3.

preda 2018-10-22 20:24

[QUOTE=SELROC;498505]Long log:
[..]
2018-10-22 18:12:18 3 470000/86700001 [ 0.54%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:11; 6499a4e33d5d4c17
2018-10-22 18:13:02 3 EE 480000/86700001 [ 0.55%], 4.10 ms/it [4.09, 4.13]; ETA 4d 02:13; 836d28346a966447 (check 2.08s)
2018-10-22 18:13:04 3 OK loaded: 320000/86700001, B1 0, blockSize 400, 2bbfb6c8f24890d2 (expected 2bbfb6c8f24890d2)
2018-10-22 18:13:45 3 330000/86700001 [ 0.38%], 4.30 ms/it [4.10, 9.14]; ETA 4d 07:15; 8d1b60211b94ac90
2018-10-22 18:14:26 3 340000/86700001 [ 0.39%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:26; 05590b87a9ddfafa
2018-10-22 18:15:07 3 350000/86700001 [ 0.40%], 4.10 ms/it [4.09, 4.14]; ETA 4d 02:26; 1f3a102e8771564d
2018-10-22 18:15:48 3 360000/86700001 [ 0.42%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:27; 9f0890f84be5d70e
2018-10-22 18:16:29 3 370000/86700001 [ 0.43%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:26; bf5bb13efebafbae
2018-10-22 18:17:10 3 380000/86700001 [ 0.44%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:24; eadd1ee5a31802f5
2018-10-22 18:17:51 3 390000/86700001 [ 0.45%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:22; d398a7819315a916
2018-10-22 18:18:32 3 400000/86700001 [ 0.46%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:20; b8450226874b2ef6
2018-10-22 18:19:06 3 Stopping, please wait..
2018-10-22 18:19:08 3 EE 408400/86700001 [ 0.47%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:22; b25cad7eb46d0cf1 (check 2.08s)
2018-10-22 18:19:10 3 OK loaded: 320000/86700001, B1 0, blockSize 400, 2bbfb6c8f24890d2 (expected 2bbfb6c8f24890d2)
2018-10-22 18:19:10 3 Exiting because "stop requested"
2018-10-22 18:19:10 3 Bye


Even smaller than 86700001 produces EE, but later.[/QUOTE]

Thanks for the investigation. Interesting, I don't see the same (see below). I think maybe some driver difference or something else, e.g. different precision in sin/cos implementation on the GPU, may be the reason. But a slight relaxation of FFT size may be granted.

2018-10-22 21:59:05 vega0 gpuowl 4.6-9b7ff7b-mod
2018-10-22 21:59:05 vega0 FFT 4608K: Width 512 (64x8), Height 512 (64x8), Middle 9; 18.37 bits/word
2018-10-22 21:59:05 vega0 Note: using short carry kernels
2018-10-22 21:59:07 vega0 gfx900-64x1630-@67:0.0 Vega [Radeon RX Vega]
2018-10-22 21:59:10 vega0 OpenCL compilation in 3774 ms, with "-DEXP=86700001u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-10-22 21:59:11 vega0 PRP M(86700001), FFT 4608K, 18.37 bits/word, B1 0
2018-10-22 21:59:12 vega0 OK loaded: 0/86700001, B1 0, blockSize 400, 0000000000000003 (expected 0000000000000003)
2018-10-22 21:59:12 vega0 Selected 0 P-1 trial points
2018-10-22 21:59:15 vega0 OK 800/86700001 [ 0.00%], 2.15 ms/it [2.15, 2.16]; ETA 2d 03:54; 16460e4280156eaa (check 1.11s)
2018-10-22 21:59:35 vega0 10000/86700001 [ 0.01%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:02; 82759ed0ecd6fc95
2018-10-22 21:59:57 vega0 20000/86700001 [ 0.02%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:04; d5075945f4b2b972
2018-10-22 22:00:18 vega0 30000/86700001 [ 0.03%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:04; 41a8bb8b01bf4633
2018-10-22 22:00:40 vega0 40000/86700001 [ 0.05%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:04; ee0734f60d7b1db3
2018-10-22 22:01:01 vega0 50000/86700001 [ 0.06%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:03; f6a660fe2b783ce6
2018-10-22 22:01:23 vega0 60000/86700001 [ 0.07%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:04; 4274d66af0efb8f4
2018-10-22 22:01:45 vega0 70000/86700001 [ 0.08%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:03; 750a4f4758744c06
2018-10-22 22:02:06 vega0 80000/86700001 [ 0.09%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:03; c711dd998fe2f029
2018-10-22 22:02:28 vega0 90000/86700001 [ 0.10%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:02; 6a0a2c5b2353d680
2018-10-22 22:02:50 vega0 100000/86700001 [ 0.12%], 2.16 ms/it [2.16, 2.17]; ETA 2d 04:02; 8cdc89078250d440
2018-10-22 22:03:11 vega0 110000/86700001 [ 0.13%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:01; 17bab91f2ac31b34
2018-10-22 22:03:33 vega0 120000/86700001 [ 0.14%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:01; 4b09dcf722c21e76
2018-10-22 22:03:55 vega0 130000/86700001 [ 0.15%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:02; 9a5730be57db997f
2018-10-22 22:04:16 vega0 140000/86700001 [ 0.16%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:01; 1f4140f6971a3fc8
2018-10-22 22:04:38 vega0 150000/86700001 [ 0.17%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:01; db1014b1d4121f4c
2018-10-22 22:05:01 vega0 OK 160000/86700001 [ 0.18%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:01; a6afed696697b5c4 (check 1.21s)
2018-10-22 22:05:22 vega0 170000/86700001 [ 0.20%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:59; 9bfc2eae15121eeb
2018-10-22 22:05:44 vega0 180000/86700001 [ 0.21%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:59; fac14bff3c6daa15
2018-10-22 22:06:06 vega0 190000/86700001 [ 0.22%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:59; bed36466316f8442
2018-10-22 22:06:27 vega0 200000/86700001 [ 0.23%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:59; b332138d2e641277
2018-10-22 22:06:49 vega0 210000/86700001 [ 0.24%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:59; c74057dc770ba4b7
2018-10-22 22:07:11 vega0 220000/86700001 [ 0.25%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:58; c03b5a734225ab6c
2018-10-22 22:07:32 vega0 230000/86700001 [ 0.27%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:57; 46160146db97217f
2018-10-22 22:07:54 vega0 240000/86700001 [ 0.28%], 2.16 ms/it [2.16, 2.17]; ETA 2d 03:58; 001f3c4bc2e3107e
2018-10-22 22:08:16 vega0 250000/86700001 [ 0.29%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:56; 3d01711945270baf
2018-10-22 22:08:37 vega0 260000/86700001 [ 0.30%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:57; 9bc0aa6cb0bfac3b
2018-10-22 22:08:59 vega0 270000/86700001 [ 0.31%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:56; ce0b50a6de9e3293
2018-10-22 22:09:21 vega0 280000/86700001 [ 0.32%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:55; b99f827733e4ec3d
2018-10-22 22:09:42 vega0 290000/86700001 [ 0.33%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:55; 9827be481aba3805
2018-10-22 22:10:04 vega0 300000/86700001 [ 0.35%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:55; f33b6414738221e7
2018-10-22 22:10:25 vega0 310000/86700001 [ 0.36%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:55; 5f6c14ec328abdee
2018-10-22 22:10:48 vega0 OK 320000/86700001 [ 0.37%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:54; 2bbfb6c8f24890d2 (check 1.11s)
2018-10-22 22:11:10 vega0 330000/86700001 [ 0.38%], 2.16 ms/it [2.15, 2.16]; ETA 2d 03:53; 8d1b60211b94ac90
2018-10-22 22:11:32 vega0 340000/86700001 [ 0.39%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:53; 05590b87a9ddfafa
2018-10-22 22:11:53 vega0 350000/86700001 [ 0.40%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:53; 1f3a102e8771564d
2018-10-22 22:12:15 vega0 360000/86700001 [ 0.42%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:53; 9f0890f84be5d70e
2018-10-22 22:12:36 vega0 370000/86700001 [ 0.43%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:53; bf5bb13efebafbae
2018-10-22 22:12:58 vega0 380000/86700001 [ 0.44%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:52; eadd1ee5a31802f5
2018-10-22 22:13:20 vega0 390000/86700001 [ 0.45%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:52; d398a7819315a916
2018-10-22 22:13:41 vega0 400000/86700001 [ 0.46%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:51; b8450226874b2ef6
2018-10-22 22:14:03 vega0 410000/86700001 [ 0.47%], 2.16 ms/it [2.16, 2.17]; ETA 2d 03:51; 75d4ceed1067a7f4
2018-10-22 22:14:25 vega0 420000/86700001 [ 0.48%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:50; 8cf582cfb8566ad5
2018-10-22 22:14:46 vega0 430000/86700001 [ 0.50%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:50; ca456c850b7640d5
2018-10-22 22:15:08 vega0 440000/86700001 [ 0.51%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:50; bb6a2381b36d8890
2018-10-22 22:15:30 vega0 450000/86700001 [ 0.52%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:49; f5f83bb8aefe41de
2018-10-22 22:15:51 vega0 460000/86700001 [ 0.53%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:49; 9d36032d12489b80
2018-10-22 22:16:13 vega0 470000/86700001 [ 0.54%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:49; 399fdd68854c4d92
2018-10-22 22:16:36 vega0 OK 480000/86700001 [ 0.55%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:48; ceac25548f3e21f7 (check 1.10s)
2018-10-22 22:16:57 vega0 490000/86700001 [ 0.57%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:47; 28371927276164f3

kriesel 2018-10-22 20:57

144m fft timing on RX480
 
Just for laughs, and to check the ghzd/day behavior, while I did a little yard work:

[CODE]C:\msys64\home\ken>openowl-V38-91c52fa-W64.exe -user kriesel -cpu condorella-rx480 -device 0
2018-10-22 14:14:26 condorella-rx480 gpuowl-OpenCL 3.8-91c52fa
2018-10-22 14:14:26 condorella-rx480 FFT 147456K: Width 4096 (512x8), Height 2048 (256x8), Middle 9; 9.93 bits/word
2018-10-22 14:14:26 condorella-rx480 Note: using long carry kernels
2018-10-22 14:14:27 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-10-22 14:14:32 condorella-rx480 OpenCL compilation in 4854 ms, with "-DEXP=1500000041u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-10-22 14:14:56 condorella-rx480 PRP M(1500000041), FFT 147456K, 9.93 bits/word, 0 GHz-day
2018-10-22 14:17:40 condorella-rx480 OK loaded: 0/1500000041, blockSize 400, 0000000000000003
2018-10-22 14:18:47 condorella-rx480 OK initial check: 0000000000000003
2018-10-22 14:22:27 condorella-rx480 OK 800/1500000041 [ 0.00%], 171.48 ms/it [162.68, 180.27] (0.0 GHz-day/day); ETA 2977d 00:58; 6e5cf1719717835b (check 82.44s) (saved)
2018-10-22 14:46:39 condorella-rx480 10000/1500000041 [ 0.00%], 157.80 ms/it [156.84, 163.21] (0.0 GHz-day/day); ETA 2739d 12:15; 4fc4ebb3728095f7
2018-10-22 15:12:53 condorella-rx480 20000/1500000041 [ 0.00%], 157.48 ms/it [156.66, 165.88] (0.0 GHz-day/day); ETA 2733d 22:53; 0b25accbcd18fb0c
2018-10-22 15:39:12 condorella-rx480 30000/1500000041 [ 0.00%], 157.88 ms/it [156.91, 164.99] (0.0 GHz-day/day); ETA 2740d 23:39; 8b85d889c0dc246b
2018-10-22 15:50:48 condorella-rx480 Stopping, please wait..
2018-10-22 15:52:12 condorella-rx480 OK 34400/1500000041 [ 0.00%], 158.29 ms/it [156.78, 165.15] (0.0 GHz-day/day); ETA 2748d 01:10; a8e7126d135ca84a (check 83.59s) (saved)
2018-10-22 15:52:13 condorella-rx480 Bye[/CODE](7.5 YEARS to complete)

preda 2018-10-22 21:09

[QUOTE=SELROC;498493]Ok, there are a couple of bugs that have endured various versions:


1. FFT selection, sometimes selects FFT size too small for the exponent.
2. GpuOwl output with -h does not show program version. If we want to know which version is the executable, we must necessarily start a computation only to see the version number.[/QUOTE]

Both should be fixed now. If you encounter other exponents with errors caused by too-small default FFT size, please report and I'll have a look to further tune the default.

preda 2018-10-22 21:15

[QUOTE=kriesel;498526]I concur, it is useful. Zero values can be ignored (or suppressed if Preda prefers).
It appears to only show zero at higher exponents or fft lengths, up to 335M 18M fft) was fine.
[/QUOTE]

OK I'll look into adding it back. (It uses a table of "effort per FFT size" that I imported from James, and maybe that table does not contain very-high sizes; I didn't look though).


All times are UTC. The time now is 23:08.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.