![]() |
[QUOTE=SELROC;498468]Large number test:
[CODE]2018-10-22 10:56:04 3 gpuowl 4.6--mod 2018-10-22 10:56:04 3 FFT 73728K: Width 2048 (256x8), Height 2048 (256x8), Middle 9; 13.25 bits/word 2018-10-22 10:56:04 3 Note: using long carry kernels 2018-10-22 10:56:04 3 Ellesmere-36x1360-@6:0.0 Radeon RX 580 Series 2018-10-22 10:56:05 3 OpenCL compilation in 1020 ms, with "-DEXP=1000000001u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-10-22 10:56:10 3 PRP M(1000000001), FFT 73728K, 13.25 bits/word, B1 0 2018-10-22 10:56:51 3 OK loaded: 0/1000000001, B1 0, blockSize 400, 0000000000000003 (expected 0000000000000003) 2018-10-22 10:56:51 3 Selected 0 P-1 trial points 2018-10-22 10:58:21 3 OK 800/1000000001 [ 0.00%], 70.93 ms/it [70.87, 70.99]; ETA 820d 22:05; b70bd0429f585b7f (check 33.15s) 2018-10-22 11:06:52 3 Stopping, please wait.. 2018-10-22 11:07:25 3 OK 8000/1000000001 [ 0.00%], 71.02 ms/it [70.96, 71.70]; ETA 822d 00:15; e8cbaa94ad3015eb (check 33.45s) 2018-10-22 11:07:25 3 Starting GCD over 0 points 2018-10-22 11:07:28 3 Waiting for GCD to finish.. 2018-10-22 11:07:28 3 Exiting because "stop requested" 2018-10-22 11:07:28 3 Bye[/CODE][/QUOTE] Interesting information. Doesn't seem great performance though :) It's true I didn't even look into tuning that exponent size (yet); hopefully could be improved a bit. But realistically, is anybody testing such exponents..? |
[QUOTE=preda;498502]Interesting information. Doesn't seem great performance though :)
It's true I didn't even look into tuning that exponent size (yet); hopefully could be improved a bit. But realistically, is anybody testing such exponents..?[/QUOTE] I don't know, I test them expressly to see gpuowl performance. Basically I have to redo the test on each version. Right now I am searching which exponent range triggers the too small FFT selection. |
[QUOTE=preda;498499]I could add a test for the primality of the exponent.. but is that really needed?
I mean, usually the user would obtain the exponent from some source, and then it's prime. Otherwise if somebody just wants to do a test (with a random non-prime value), why not? I think there are some asserts that would trigger on an even exponent.[/QUOTE] I agree with you about the need of a user-awareneess related to the program. My doubt derives from the statement "Let Mp = 2p − 1 be the Mersenne number to test with p an odd [COLOR="Red"]prime[/COLOR]." As you test L(p-1) equivalent to 0 mod Mp, your mod operation may have different values if Mp is not prime, and the LL test might show a wrong result. But again, I'm not proficient in coding the LL algorithm, and may be wrong as well. |
[QUOTE=SELROC;498503]I don't know, I test them expressly to see gpuowl performance.
Basically I have to redo the test on each version. Right now I am searching which exponent range triggers the too small FFT selection.[/QUOTE] Long log: 2018-10-22 17:39:59 3 gpuowl 4.6--mod 2018-10-22 17:39:59 3 FFT 4608K: Width 512 (64x8), Height 512 (64x8), Middle 9; 18.37 bits/word 2018-10-22 17:39:59 3 Note: using short carry kernels 2018-10-22 17:39:59 3 Ellesmere-36x1360-@6:0.0 Radeon RX 580 Series 2018-10-22 17:40:00 3 OpenCL compilation in 853 ms, with "-DEXP=86700001u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-10-22 17:40:00 3 PRP M(86700001), FFT 4608K, 18.37 bits/word, B1 0 2018-10-22 17:40:02 3 OK loaded: 0/86700001, B1 0, blockSize 400, 0000000000000003 (expected 0000000000000003) 2018-10-22 17:40:02 3 Selected 0 P-1 trial points 2018-10-22 17:40:08 3 OK 800/86700001 [ 0.00%], 4.10 ms/it [4.09, 4.10]; ETA 4d 02:39; 16460e4280156eaa (check 2.10s) 2018-10-22 17:40:45 3 10000/86700001 [ 0.01%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:53; 82759ed0ecd6fc95 2018-10-22 17:41:27 3 20000/86700001 [ 0.02%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:52; d5075945f4b2b972 2018-10-22 17:42:08 3 30000/86700001 [ 0.03%], 4.11 ms/it [4.10, 4.16]; ETA 4d 02:55; 41a8bb8b01bf4633 2018-10-22 17:42:49 3 40000/86700001 [ 0.05%], 4.11 ms/it [4.10, 4.15]; ETA 4d 02:50; ee0734f60d7b1db3 2018-10-22 17:43:30 3 50000/86700001 [ 0.06%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:49; f6a660fe2b783ce6 2018-10-22 17:44:11 3 60000/86700001 [ 0.07%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:49; 4274d66af0efb8f4 2018-10-22 17:44:52 3 70000/86700001 [ 0.08%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:48; 750a4f4758744c06 2018-10-22 17:45:33 3 80000/86700001 [ 0.09%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:46; c711dd998fe2f029 2018-10-22 17:46:14 3 90000/86700001 [ 0.10%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:48; 6a0a2c5b2353d680 2018-10-22 17:46:55 3 100000/86700001 [ 0.12%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:45; 8cdc89078250d440 2018-10-22 17:47:36 3 110000/86700001 [ 0.13%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:49; 17bab91f2ac31b34 2018-10-22 17:48:17 3 120000/86700001 [ 0.14%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:42; 4b09dcf722c21e76 2018-10-22 17:48:58 3 130000/86700001 [ 0.15%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:44; 9a5730be57db997f 2018-10-22 17:49:39 3 140000/86700001 [ 0.16%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:45; 1f4140f6971a3fc8 2018-10-22 17:50:20 3 150000/86700001 [ 0.17%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:43; db1014b1d4121f4c 2018-10-22 17:51:04 3 OK 160000/86700001 [ 0.18%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:43; a6afed696697b5c4 (check 2.10s) 2018-10-22 17:51:45 3 170000/86700001 [ 0.20%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:42; 9bfc2eae15121eeb 2018-10-22 17:52:26 3 180000/86700001 [ 0.21%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:42; fac14bff3c6daa15 2018-10-22 17:53:07 3 190000/86700001 [ 0.22%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:43; bed36466316f8442 2018-10-22 17:53:48 3 200000/86700001 [ 0.23%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:40; b332138d2e641277 2018-10-22 17:54:29 3 210000/86700001 [ 0.24%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:41; c74057dc770ba4b7 2018-10-22 17:55:10 3 220000/86700001 [ 0.25%], 4.11 ms/it [4.10, 4.18]; ETA 4d 02:44; c03b5a734225ab6c 2018-10-22 17:55:51 3 230000/86700001 [ 0.27%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:39; 46160146db97217f 2018-10-22 17:56:32 3 240000/86700001 [ 0.28%], 4.11 ms/it [4.10, 4.15]; ETA 4d 02:42; 001f3c4bc2e3107e 2018-10-22 17:57:13 3 250000/86700001 [ 0.29%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:39; 3d01711945270baf 2018-10-22 17:57:55 3 260000/86700001 [ 0.30%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:39; 9bc0aa6cb0bfac3b 2018-10-22 17:58:36 3 270000/86700001 [ 0.31%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:38; ce0b50a6de9e3293 2018-10-22 17:59:17 3 280000/86700001 [ 0.32%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:38; b99f827733e4ec3d 2018-10-22 17:59:58 3 290000/86700001 [ 0.33%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:37; 9827be481aba3805 2018-10-22 18:00:39 3 300000/86700001 [ 0.35%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:36; f33b6414738221e7 2018-10-22 18:01:20 3 310000/86700001 [ 0.36%], 4.11 ms/it [4.10, 4.14]; ETA 4d 02:37; 5f6c14ec328abdee 2018-10-22 18:02:03 3 OK 320000/86700001 [ 0.37%], 4.11 ms/it [4.10, 4.13]; ETA 4d 02:36; 2bbfb6c8f24890d2 (check 2.10s) 2018-10-22 18:02:44 3 330000/86700001 [ 0.38%], 4.10 ms/it [4.09, 4.11]; ETA 4d 02:17; 8d1b60211b94ac90 2018-10-22 18:03:25 3 340000/86700001 [ 0.39%], 4.10 ms/it [4.09, 4.12]; ETA 4d 02:20; 05590b87a9ddfafa 2018-10-22 18:04:06 3 350000/86700001 [ 0.40%], 4.10 ms/it [4.09, 4.13]; ETA 4d 02:22; 1f3a102e8771564d 2018-10-22 18:04:47 3 360000/86700001 [ 0.42%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:20; 9f0890f84be5d70e 2018-10-22 18:05:28 3 370000/86700001 [ 0.43%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:20; bf5bb13efebafbae 2018-10-22 18:06:09 3 380000/86700001 [ 0.44%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:19; eadd1ee5a31802f5 2018-10-22 18:06:50 3 390000/86700001 [ 0.45%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:17; d398a7819315a916 2018-10-22 18:07:31 3 400000/86700001 [ 0.46%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:20; b8450226874b2ef6 2018-10-22 18:08:12 3 410000/86700001 [ 0.47%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:16; 4c3ccab2d6bfa17a 2018-10-22 18:08:53 3 420000/86700001 [ 0.48%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:15; eba5f14fdcd6f03d 2018-10-22 18:09:34 3 430000/86700001 [ 0.50%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:17; 3f8852cd4e1ce0ae 2018-10-22 18:10:15 3 440000/86700001 [ 0.51%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:13; 7bee67893af3109e 2018-10-22 18:10:56 3 450000/86700001 [ 0.52%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:15; 07dc22de8bfb9598 2018-10-22 18:11:37 3 460000/86700001 [ 0.53%], 4.10 ms/it [4.09, 4.12]; ETA 4d 02:11; a54bc95b4c65ba7a 2018-10-22 18:12:18 3 470000/86700001 [ 0.54%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:11; 6499a4e33d5d4c17 2018-10-22 18:13:02 3 EE 480000/86700001 [ 0.55%], 4.10 ms/it [4.09, 4.13]; ETA 4d 02:13; 836d28346a966447 (check 2.08s) 2018-10-22 18:13:04 3 OK loaded: 320000/86700001, B1 0, blockSize 400, 2bbfb6c8f24890d2 (expected 2bbfb6c8f24890d2) 2018-10-22 18:13:45 3 330000/86700001 [ 0.38%], 4.30 ms/it [4.10, 9.14]; ETA 4d 07:15; 8d1b60211b94ac90 2018-10-22 18:14:26 3 340000/86700001 [ 0.39%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:26; 05590b87a9ddfafa 2018-10-22 18:15:07 3 350000/86700001 [ 0.40%], 4.10 ms/it [4.09, 4.14]; ETA 4d 02:26; 1f3a102e8771564d 2018-10-22 18:15:48 3 360000/86700001 [ 0.42%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:27; 9f0890f84be5d70e 2018-10-22 18:16:29 3 370000/86700001 [ 0.43%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:26; bf5bb13efebafbae 2018-10-22 18:17:10 3 380000/86700001 [ 0.44%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:24; eadd1ee5a31802f5 2018-10-22 18:17:51 3 390000/86700001 [ 0.45%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:22; d398a7819315a916 2018-10-22 18:18:32 3 400000/86700001 [ 0.46%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:20; b8450226874b2ef6 2018-10-22 18:19:06 3 Stopping, please wait.. 2018-10-22 18:19:08 3 EE 408400/86700001 [ 0.47%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:22; b25cad7eb46d0cf1 (check 2.08s) 2018-10-22 18:19:10 3 OK loaded: 320000/86700001, B1 0, blockSize 400, 2bbfb6c8f24890d2 (expected 2bbfb6c8f24890d2) 2018-10-22 18:19:10 3 Exiting because "stop requested" 2018-10-22 18:19:10 3 Bye Even smaller than 86700001 produces EE, but later. |
[QUOTE=preda;498500]I'm removing the ghz-day/day display, probably that zero wan an intermediary step.[/QUOTE]
I like having the GD/d display, providing it's reasonably accurate. [CODE]2018-10-20 20:41:05 condorella-rx480 PRP M(658000139), FFT 36864K, 17.43 bits/word, 0 GHz-day [/CODE]Zero for the total work of a PRP test clearly was not accurate. That propagated into the rate determined in interim outputs. Couldn't remember if such an anomaly had been seen and reported before. I note in some v3.8 trials on varying exponents & fft lengths that GHzD/day was going up with fft length. I thought it would decline. Following are gpuowl-OpenCL 3.8-91c52fa on an RX480 [CODE] FFT length exponent ms/it Ghz-D/day days projected 4608K 83871797 3.85 72.4 3.74 8192K 152000249 7.03 71.7 12.4 16384K 299000059 14.23 76.0 49.3 18432K 335000377 16.3 78.8 63.2 36864K 658000139 34.2 0.0 261. 73728K not tested, estimated > 1070 days (2.94 years)[/CODE]Selroc's 71 ms/it for 72M FFT on an RX580 seems not far off the mark. Re the error I hit compiling gpuowl v4.6 in mingw64/msys2 for Windows x64, I will download a later commit and retry later, and thanks for the revision. |
[QUOTE=preda;498500]I'm removing the ghz-day/day display, probably that zero wan an intermediary step.[/QUOTE]
I have not encountered that problem yet. Please put back the ghz display, it is useful for comparisons. |
[QUOTE=SELROC;498520]I have not encountered that problem yet.
Please put back the ghz display, it is useful for comparisons.[/QUOTE] I concur, it is useful. Zero values can be ignored (or suppressed if Preda prefers). It appears to only show zero at higher exponents or fft lengths, up to 335M 18M fft) was fine. [CODE]C:\msys64\home\ken>openowl-V38-91c52fa-W64.exe -user kriesel -cpu condorella-rx480 -device 0 2018-10-22 13:10:01 condorella-rx480 gpuowl-OpenCL 3.8-91c52fa 2018-10-22 13:10:01 condorella-rx480 FFT 73728K: Width 2048 (256x8), Height 2048 (256x8), Middle 9; 13.25 bits/word 2018-10-22 13:10:01 condorella-rx480 Note: using long carry kernels 2018-10-22 13:10:04 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-10-22 13:10:09 condorella-rx480 OpenCL compilation in 5033 ms, with "-DEXP=999999937u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed- math -cl-std=CL2.0 " 2018-10-22 13:10:21 condorella-rx480 PRP M(999999937), FFT 73728K, 13.25 bits/word, 0 GHz-day 2018-10-22 13:11:35 condorella-rx480 OK loaded: 0/999999937, blockSize 400, 0000000000000003 2018-10-22 13:12:04 condorella-rx480 OK initial check: 0000000000000003 2018-10-22 13:13:40 condorella-rx480 OK 800/999999937 [ 0.00%], 72.55 ms/it [72.42, 72.69] (0.0 GHz-day/day); ETA 839d 17:27; c3c8e02da339fdfa (check 38.43 s) (saved) 2018-10-22 13:24:52 condorella-rx480 10000/999999937 [ 0.00%], 72.95 ms/it [72.78, 74.27] (0.0 GHz-day/day); ETA 844d 07:10; a30a0c45e9fb828c 2018-10-22 13:26:19 condorella-rx480 Stopping, please wait.. 2018-10-22 13:26:58 condorella-rx480 OK 11200/999999937 [ 0.00%], 72.81 ms/it [72.79, 72.82] (0.0 GHz-day/day); ETA 842d 17:00; a379021250b8cb64 (check 38.66 s) (saved) 2018-10-22 13:26:58 condorella-rx480 Bye[/CODE][CODE]C:\msys64\home\ken>openowl-V38-91c52fa-W64.exe -user kriesel -cpu condorella-rx480 -device 0 2018-10-22 13:28:17 condorella-rx480 gpuowl-OpenCL 3.8-91c52fa 2018-10-22 13:28:17 condorella-rx480 FFT 40960K: Width 2048 (256x8), Height 2048 (256x8), Middle 5; 17.36 bits/word 2018-10-22 13:28:17 condorella-rx480 Note: using long carry kernels 2018-10-22 13:28:19 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-10-22 13:28:24 condorella-rx480 OpenCL compilation in 5013 ms, with "-DEXP=728000017u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=5u -I. -cl-fast-relaxed- math -cl-std=CL2.0 " 2018-10-22 13:28:30 condorella-rx480 PRP M(728000017), FFT 40960K, 17.36 bits/word, 0 GHz-day 2018-10-22 13:29:12 condorella-rx480 OK loaded: 0/728000017, blockSize 400, 0000000000000003 2018-10-22 13:29:29 condorella-rx480 OK initial check: 0000000000000003 2018-10-22 13:30:24 condorella-rx480 OK 800/728000017 [ 0.00%], 41.19 ms/it [41.10, 41.27] (0.0 GHz-day/day); ETA 347d 01:17; b381fc5ed49bc70d (check 21.83 s) (saved) 2018-10-22 13:36:45 condorella-rx480 10000/728000017 [ 0.00%], 41.38 ms/it [41.31, 41.60] (0.0 GHz-day/day); ETA 348d 16:15; fc3543bd6f05c1bd 2018-10-22 13:37:01 condorella-rx480 Stopping, please wait.. 2018-10-22 13:37:23 condorella-rx480 OK 10400/728000017 [ 0.00%], 41.47 ms/it [41.47, 41.47] (0.0 GHz-day/day); ETA 349d 11:03; 1204e3403d7dbe8d (check 22.00 s) (saved) 2018-10-22 13:37:23 condorella-rx480 Bye[/CODE][CODE]C:\msys64\home\ken>openowl-V38-91c52fa-W64.exe -user kriesel -cpu condorella-rx480 -device 0 2018-10-22 13:39:50 condorella-rx480 gpuowl-OpenCL 3.8-91c52fa 2018-10-22 13:39:50 condorella-rx480 FFT 20480K: Width 1024 (256x4), Height 2048 (256x8), Middle 5; 17.69 bits/word 2018-10-22 13:39:50 condorella-rx480 Note: using short carry kernels 2018-10-22 13:39:52 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-10-22 13:39:56 condorella-rx480 OpenCL compilation in 4591 ms, with "-DEXP=371000039u -DWIDTH=1024u -DSMALL_HEIGHT=2048u -DMIDDLE=5u -I. -cl-fast-relaxed- math -cl-std=CL2.0 " 2018-10-22 13:40:00 condorella-rx480 PRP M(371000039), FFT 20480K, 17.69 bits/word, 6400 GHz-day 2018-10-22 13:40:20 condorella-rx480 OK loaded: 0/371000039, blockSize 400, 0000000000000003 2018-10-22 13:40:28 condorella-rx480 OK initial check: 0000000000000003 2018-10-22 13:40:53 condorella-rx480 OK 800/371000039 [ 0.00%], 18.80 ms/it [18.75, 18.85] (79.3 GHz-day/day); ETA 80d 17:27; e2ac5ec2a6819689 (check 10.16 s) (saved) 2018-10-22 13:43:47 condorella-rx480 10000/371000039 [ 0.00%], 18.92 ms/it [18.82, 20.53] (78.8 GHz-day/day); ETA 81d 06:03; edb6fd2abeb3f16c 2018-10-22 13:45:41 condorella-rx480 Stopping, please wait.. 2018-10-22 13:45:51 condorella-rx480 OK 16000/371000039 [ 0.00%], 18.96 ms/it [18.82, 20.52] (78.6 GHz-day/day); ETA 81d 10:04; 9c0d2afc3567b445 (check 10.14 s) (saved) 2018-10-22 13:45:51 condorella-rx480 Bye[/CODE]The zero work/PRP seems to develop above 20M fft length, which was the maximum length in V3.3. |
[QUOTE=SELROC;498505]Long log:
[..] 2018-10-22 18:12:18 3 470000/86700001 [ 0.54%], 4.10 ms/it [4.10, 4.12]; ETA 4d 02:11; 6499a4e33d5d4c17 2018-10-22 18:13:02 3 EE 480000/86700001 [ 0.55%], 4.10 ms/it [4.09, 4.13]; ETA 4d 02:13; 836d28346a966447 (check 2.08s) 2018-10-22 18:13:04 3 OK loaded: 320000/86700001, B1 0, blockSize 400, 2bbfb6c8f24890d2 (expected 2bbfb6c8f24890d2) 2018-10-22 18:13:45 3 330000/86700001 [ 0.38%], 4.30 ms/it [4.10, 9.14]; ETA 4d 07:15; 8d1b60211b94ac90 2018-10-22 18:14:26 3 340000/86700001 [ 0.39%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:26; 05590b87a9ddfafa 2018-10-22 18:15:07 3 350000/86700001 [ 0.40%], 4.10 ms/it [4.09, 4.14]; ETA 4d 02:26; 1f3a102e8771564d 2018-10-22 18:15:48 3 360000/86700001 [ 0.42%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:27; 9f0890f84be5d70e 2018-10-22 18:16:29 3 370000/86700001 [ 0.43%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:26; bf5bb13efebafbae 2018-10-22 18:17:10 3 380000/86700001 [ 0.44%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:24; eadd1ee5a31802f5 2018-10-22 18:17:51 3 390000/86700001 [ 0.45%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:22; d398a7819315a916 2018-10-22 18:18:32 3 400000/86700001 [ 0.46%], 4.10 ms/it [4.10, 4.14]; ETA 4d 02:20; b8450226874b2ef6 2018-10-22 18:19:06 3 Stopping, please wait.. 2018-10-22 18:19:08 3 EE 408400/86700001 [ 0.47%], 4.10 ms/it [4.10, 4.13]; ETA 4d 02:22; b25cad7eb46d0cf1 (check 2.08s) 2018-10-22 18:19:10 3 OK loaded: 320000/86700001, B1 0, blockSize 400, 2bbfb6c8f24890d2 (expected 2bbfb6c8f24890d2) 2018-10-22 18:19:10 3 Exiting because "stop requested" 2018-10-22 18:19:10 3 Bye Even smaller than 86700001 produces EE, but later.[/QUOTE] Thanks for the investigation. Interesting, I don't see the same (see below). I think maybe some driver difference or something else, e.g. different precision in sin/cos implementation on the GPU, may be the reason. But a slight relaxation of FFT size may be granted. 2018-10-22 21:59:05 vega0 gpuowl 4.6-9b7ff7b-mod 2018-10-22 21:59:05 vega0 FFT 4608K: Width 512 (64x8), Height 512 (64x8), Middle 9; 18.37 bits/word 2018-10-22 21:59:05 vega0 Note: using short carry kernels 2018-10-22 21:59:07 vega0 gfx900-64x1630-@67:0.0 Vega [Radeon RX Vega] 2018-10-22 21:59:10 vega0 OpenCL compilation in 3774 ms, with "-DEXP=86700001u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-10-22 21:59:11 vega0 PRP M(86700001), FFT 4608K, 18.37 bits/word, B1 0 2018-10-22 21:59:12 vega0 OK loaded: 0/86700001, B1 0, blockSize 400, 0000000000000003 (expected 0000000000000003) 2018-10-22 21:59:12 vega0 Selected 0 P-1 trial points 2018-10-22 21:59:15 vega0 OK 800/86700001 [ 0.00%], 2.15 ms/it [2.15, 2.16]; ETA 2d 03:54; 16460e4280156eaa (check 1.11s) 2018-10-22 21:59:35 vega0 10000/86700001 [ 0.01%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:02; 82759ed0ecd6fc95 2018-10-22 21:59:57 vega0 20000/86700001 [ 0.02%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:04; d5075945f4b2b972 2018-10-22 22:00:18 vega0 30000/86700001 [ 0.03%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:04; 41a8bb8b01bf4633 2018-10-22 22:00:40 vega0 40000/86700001 [ 0.05%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:04; ee0734f60d7b1db3 2018-10-22 22:01:01 vega0 50000/86700001 [ 0.06%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:03; f6a660fe2b783ce6 2018-10-22 22:01:23 vega0 60000/86700001 [ 0.07%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:04; 4274d66af0efb8f4 2018-10-22 22:01:45 vega0 70000/86700001 [ 0.08%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:03; 750a4f4758744c06 2018-10-22 22:02:06 vega0 80000/86700001 [ 0.09%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:03; c711dd998fe2f029 2018-10-22 22:02:28 vega0 90000/86700001 [ 0.10%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:02; 6a0a2c5b2353d680 2018-10-22 22:02:50 vega0 100000/86700001 [ 0.12%], 2.16 ms/it [2.16, 2.17]; ETA 2d 04:02; 8cdc89078250d440 2018-10-22 22:03:11 vega0 110000/86700001 [ 0.13%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:01; 17bab91f2ac31b34 2018-10-22 22:03:33 vega0 120000/86700001 [ 0.14%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:01; 4b09dcf722c21e76 2018-10-22 22:03:55 vega0 130000/86700001 [ 0.15%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:02; 9a5730be57db997f 2018-10-22 22:04:16 vega0 140000/86700001 [ 0.16%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:01; 1f4140f6971a3fc8 2018-10-22 22:04:38 vega0 150000/86700001 [ 0.17%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:01; db1014b1d4121f4c 2018-10-22 22:05:01 vega0 OK 160000/86700001 [ 0.18%], 2.16 ms/it [2.16, 2.16]; ETA 2d 04:01; a6afed696697b5c4 (check 1.21s) 2018-10-22 22:05:22 vega0 170000/86700001 [ 0.20%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:59; 9bfc2eae15121eeb 2018-10-22 22:05:44 vega0 180000/86700001 [ 0.21%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:59; fac14bff3c6daa15 2018-10-22 22:06:06 vega0 190000/86700001 [ 0.22%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:59; bed36466316f8442 2018-10-22 22:06:27 vega0 200000/86700001 [ 0.23%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:59; b332138d2e641277 2018-10-22 22:06:49 vega0 210000/86700001 [ 0.24%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:59; c74057dc770ba4b7 2018-10-22 22:07:11 vega0 220000/86700001 [ 0.25%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:58; c03b5a734225ab6c 2018-10-22 22:07:32 vega0 230000/86700001 [ 0.27%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:57; 46160146db97217f 2018-10-22 22:07:54 vega0 240000/86700001 [ 0.28%], 2.16 ms/it [2.16, 2.17]; ETA 2d 03:58; 001f3c4bc2e3107e 2018-10-22 22:08:16 vega0 250000/86700001 [ 0.29%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:56; 3d01711945270baf 2018-10-22 22:08:37 vega0 260000/86700001 [ 0.30%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:57; 9bc0aa6cb0bfac3b 2018-10-22 22:08:59 vega0 270000/86700001 [ 0.31%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:56; ce0b50a6de9e3293 2018-10-22 22:09:21 vega0 280000/86700001 [ 0.32%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:55; b99f827733e4ec3d 2018-10-22 22:09:42 vega0 290000/86700001 [ 0.33%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:55; 9827be481aba3805 2018-10-22 22:10:04 vega0 300000/86700001 [ 0.35%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:55; f33b6414738221e7 2018-10-22 22:10:25 vega0 310000/86700001 [ 0.36%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:55; 5f6c14ec328abdee 2018-10-22 22:10:48 vega0 OK 320000/86700001 [ 0.37%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:54; 2bbfb6c8f24890d2 (check 1.11s) 2018-10-22 22:11:10 vega0 330000/86700001 [ 0.38%], 2.16 ms/it [2.15, 2.16]; ETA 2d 03:53; 8d1b60211b94ac90 2018-10-22 22:11:32 vega0 340000/86700001 [ 0.39%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:53; 05590b87a9ddfafa 2018-10-22 22:11:53 vega0 350000/86700001 [ 0.40%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:53; 1f3a102e8771564d 2018-10-22 22:12:15 vega0 360000/86700001 [ 0.42%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:53; 9f0890f84be5d70e 2018-10-22 22:12:36 vega0 370000/86700001 [ 0.43%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:53; bf5bb13efebafbae 2018-10-22 22:12:58 vega0 380000/86700001 [ 0.44%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:52; eadd1ee5a31802f5 2018-10-22 22:13:20 vega0 390000/86700001 [ 0.45%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:52; d398a7819315a916 2018-10-22 22:13:41 vega0 400000/86700001 [ 0.46%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:51; b8450226874b2ef6 2018-10-22 22:14:03 vega0 410000/86700001 [ 0.47%], 2.16 ms/it [2.16, 2.17]; ETA 2d 03:51; 75d4ceed1067a7f4 2018-10-22 22:14:25 vega0 420000/86700001 [ 0.48%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:50; 8cf582cfb8566ad5 2018-10-22 22:14:46 vega0 430000/86700001 [ 0.50%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:50; ca456c850b7640d5 2018-10-22 22:15:08 vega0 440000/86700001 [ 0.51%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:50; bb6a2381b36d8890 2018-10-22 22:15:30 vega0 450000/86700001 [ 0.52%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:49; f5f83bb8aefe41de 2018-10-22 22:15:51 vega0 460000/86700001 [ 0.53%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:49; 9d36032d12489b80 2018-10-22 22:16:13 vega0 470000/86700001 [ 0.54%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:49; 399fdd68854c4d92 2018-10-22 22:16:36 vega0 OK 480000/86700001 [ 0.55%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:48; ceac25548f3e21f7 (check 1.10s) 2018-10-22 22:16:57 vega0 490000/86700001 [ 0.57%], 2.16 ms/it [2.16, 2.16]; ETA 2d 03:47; 28371927276164f3 |
144m fft timing on RX480
Just for laughs, and to check the ghzd/day behavior, while I did a little yard work:
[CODE]C:\msys64\home\ken>openowl-V38-91c52fa-W64.exe -user kriesel -cpu condorella-rx480 -device 0 2018-10-22 14:14:26 condorella-rx480 gpuowl-OpenCL 3.8-91c52fa 2018-10-22 14:14:26 condorella-rx480 FFT 147456K: Width 4096 (512x8), Height 2048 (256x8), Middle 9; 9.93 bits/word 2018-10-22 14:14:26 condorella-rx480 Note: using long carry kernels 2018-10-22 14:14:27 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-10-22 14:14:32 condorella-rx480 OpenCL compilation in 4854 ms, with "-DEXP=1500000041u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-10-22 14:14:56 condorella-rx480 PRP M(1500000041), FFT 147456K, 9.93 bits/word, 0 GHz-day 2018-10-22 14:17:40 condorella-rx480 OK loaded: 0/1500000041, blockSize 400, 0000000000000003 2018-10-22 14:18:47 condorella-rx480 OK initial check: 0000000000000003 2018-10-22 14:22:27 condorella-rx480 OK 800/1500000041 [ 0.00%], 171.48 ms/it [162.68, 180.27] (0.0 GHz-day/day); ETA 2977d 00:58; 6e5cf1719717835b (check 82.44s) (saved) 2018-10-22 14:46:39 condorella-rx480 10000/1500000041 [ 0.00%], 157.80 ms/it [156.84, 163.21] (0.0 GHz-day/day); ETA 2739d 12:15; 4fc4ebb3728095f7 2018-10-22 15:12:53 condorella-rx480 20000/1500000041 [ 0.00%], 157.48 ms/it [156.66, 165.88] (0.0 GHz-day/day); ETA 2733d 22:53; 0b25accbcd18fb0c 2018-10-22 15:39:12 condorella-rx480 30000/1500000041 [ 0.00%], 157.88 ms/it [156.91, 164.99] (0.0 GHz-day/day); ETA 2740d 23:39; 8b85d889c0dc246b 2018-10-22 15:50:48 condorella-rx480 Stopping, please wait.. 2018-10-22 15:52:12 condorella-rx480 OK 34400/1500000041 [ 0.00%], 158.29 ms/it [156.78, 165.15] (0.0 GHz-day/day); ETA 2748d 01:10; a8e7126d135ca84a (check 83.59s) (saved) 2018-10-22 15:52:13 condorella-rx480 Bye[/CODE](7.5 YEARS to complete) |
[QUOTE=SELROC;498493]Ok, there are a couple of bugs that have endured various versions:
1. FFT selection, sometimes selects FFT size too small for the exponent. 2. GpuOwl output with -h does not show program version. If we want to know which version is the executable, we must necessarily start a computation only to see the version number.[/QUOTE] Both should be fixed now. If you encounter other exponents with errors caused by too-small default FFT size, please report and I'll have a look to further tune the default. |
[QUOTE=kriesel;498526]I concur, it is useful. Zero values can be ignored (or suppressed if Preda prefers).
It appears to only show zero at higher exponents or fft lengths, up to 335M 18M fft) was fine. [/QUOTE] OK I'll look into adding it back. (It uses a table of "effort per FFT size" that I imported from James, and maybe that table does not contain very-high sizes; I didn't look though). |
| All times are UTC. The time now is 23:08. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.