![]() |
[QUOTE=preda;499534]0 MULs means that no GCD multiplications were done. This is normal with B1=0. When B1 != 0, sometimes a number of MULs are done, and sometimes 0 are done, depending on iteration.
When MULs==0, ms/sq is the same as the old ms/it. But when MULs are done, the ms/sq tries to measure only the "squaring" time (the normal PRP iteration), excluding the time taken by the MULs. Thus I changed the name to ms/sq to show that it's not the same as the old ms/it. Why I though that indicating speed this way is good: because this number, ms/sq, is relatively stable and does not depend on the number (or time taken) by the MULs that are in variable number in iteration blocks. Thus this number can be used to compare speed. The other option would be: take total time (squares + muls) and divide it by the number of iterations in the block. This number would be larger where there are more MULs and smaller with less MULs, thus a bit more difficult to read GPU perf from it, IMO.[/QUOTE] Thank you now it is clear. This is a good attempt to get a stable measure, in the latest test the ms/sq time is 0.18 everywhere except for the last iteration which is 0.19 ms/sq. Also the introduction of smaller FFT size is good. Now the program can be validated on new hardware with a quick test against the smallest prime. |
RX550 timings versus gpuowl version for m89m
RX550, AMD Adrenaline 18.10.2 driver for Win7 x64
m89000167 5000K for v2.0, 5120k for others (iterations 10000-20000) Ver ms/it (no P-1 or TF) 2.0 17.38 3.3 16.90 3.5 16.42 <--min 3.6 16.44 3.8 16.43 3.9 17.22 4.3 17.35 4.6 17.25 4.7 NA 5.0 17.25 Note the more exactly comparable methodology in this report than the RX480 timings reported earlier, and the different location of the minimum. Difference v3.5-3.8 is +-1 least digit, so may be insignificant |
[QUOTE=kriesel;499544]RX550, AMD Adrenaline 18.10.2 driver for Win7 x64
m89000167 5000K for v2.0, 5120k for others (iterations 10000-20000) Ver ms/it (no P-1 or TF) 2.0 17.38 3.3 16.90 3.5 16.42 <--min 3.6 16.44 3.8 16.43 3.9 17.22 4.3 17.35 4.6 17.25 4.7 NA 5.0 17.25 Note the more exactly comparable methodology in this report than the RX480 timings reported earlier, and the different location of the minimum. Difference v3.5-3.8 is +-1 least digit, so may be insignificant[/QUOTE] The numbers are not exactly comparable by now: [CODE]2018-11-04 16:24:42 gpuowl 5.0--mod 2018-11-04 16:24:42 RX580 -user selroc -cpu RX580 -device 0 2018-11-04 16:24:42 RX580 89000167 FFT 5120K: Width 256x4, Height 64x8, Middle 5; 16.98 bits/word 2018-11-04 16:24:42 RX580 using short carry kernels 2018-11-04 16:24:43 RX580 gfx803-36x1360-@4a:0.0 Ellesmere [Radeon RX 470/480] 2018-11-04 16:24:44 RX580 OpenCL compilation in 1076 ms, with "-DEXP=89000167u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=5u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-11-04 16:24:44 RX580 89000167.owl not found, starting from the beginning. 2018-11-04 16:24:52 RX580 89000167 OK 800 0.00%; 4.44 ms/sq, 0 MULs; ETA 4d 13:50; 2744231e7051f3fe (check 1.95s) 2018-11-04 16:25:33 RX580 89000167 10000 0.01%; 4.46 ms/sq, 0 MULs; ETA 4d 14:20; 2a55d51cdf0d91cb 2018-11-04 16:26:18 RX580 89000167 20000 0.02%; 4.47 ms/sq, 0 MULs; ETA 4d 14:35; 8dcb0029e791db2a 2018-11-04 16:27:03 RX580 89000167 30000 0.03%; 4.48 ms/sq, 0 MULs; ETA 4d 14:37; 2fbd246d68f86f29 2018-11-04 16:27:47 RX580 89000167 40000 0.04%; 4.48 ms/sq, 0 MULs; ETA 4d 14:47; d85f84a6744d7090 2018-11-04 16:28:32 RX580 89000167 50000 0.06%; 4.49 ms/sq, 0 MULs; ETA 4d 14:50; afa46f7cdc5ffb7d 2018-11-04 16:29:17 RX580 89000167 60000 0.07%; 4.49 ms/sq, 0 MULs; ETA 4d 14:49; 98906e9529e4667f 2018-11-04 16:30:02 RX580 89000167 70000 0.08%; 4.49 ms/sq, 0 MULs; ETA 4d 14:49; 90b5e67934fcdcff 2018-11-04 16:30:07 RX580 Stopping, please wait.. 2018-11-04 16:30:09 RX580 89000167 OK 71200 0.08%; 4.49 ms/sq, 0 MULs; ETA 4d 14:51; 14f11cfb55a43415 (check 1.96s) 2018-11-04 16:30:09 RX580 Exiting because "stop requested" 2018-11-04 16:30:09 RX580 Bye[/CODE] |
[QUOTE=SELROC;499550]The numbers are not exactly comparable by now:
[/QUOTE] I claim the numbers are very comparable for the same gpu, versus gpuowl version, despite the ms/it vs. ms/sq difference in labeling, as long as all versions were run with no P-1 activity, as my recent benchmark resuls for RX550 and RX480 posted for m89000167 were. Speed difference between RX550 and RX480 or RX580 is expected to be a considerable ratio. RX550 is a low wattage slow card. Running the same iteration span for each gpuowl version made the RX550 timings more comparable than my earlier m89m benchmarking for the RX480, which was successive iteration ranges (not same iteration span for different gpuowl versions). |
[QUOTE=kriesel;499551]I claim the numbers are very comparable for the same gpu, versus gpuowl version, despite the ms/it vs. ms/sq difference in labeling, as long as all versions were run with no P-1 activity, as my recent benchmark resuls for RX550 and RX480 posted for m89000167 were. Speed difference between RX550 and RX480 or RX580 is expected to be a considerable ratio. RX550 is a low wattage slow card. Running the same iteration span for each gpuowl version made the RX550 timings more comparable than my earlier m89m benchmarking for the RX480, which was successive iteration ranges (not same iteration span for different gpuowl versions).[/QUOTE]
Our numbers come from different gpus, different operating systems, different drivers, a difference must be accounted for. |
-list fft fails if worktodo does not exist
[CODE]C:\msys64\home\ken\gpuowl-compile\v5x>openowl -h
2018-11-04 10:20:41 gpuowl 5.0-df2bdf2 Command line options: -user <name> : specify the user name. -cpu <name> : specify the hardware name. -time : display kernel profiling information. -fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1. -block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner. -carry long|short : force carry type. Short carry may be faster, but requires high bits/word. -list fft : display a list of available FFT configurations. -tf <bit-offset> : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth. -device <N> : select a specific device: 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft 2018-11-04 10:20:56 gpuowl 5.0-df2bdf2 2018-11-04 10:20:56 -list fft 2018-11-04 10:20:56 Can't open 'worktodo.txt' (mode 'rb') 2018-11-04 10:20:56 Bye [/CODE]A user might want to list the fft selection available, to create the worktodo list. Slight catch-22 here. I think it's reasonable to have -list fft act like -h, which runs whether there's a worktodo available or not, and terminates. -list fft is a form of help. |
[QUOTE=kriesel;499554]A user might want to list the fft selection available, to create the worktodo list. Slight catch-22 here. I think it's reasonable to have -list fft act like -h, which runs whether there's a worktodo available or not, and terminates. -list fft is a form of help.[/QUOTE]
Yes it makes sense. I'll look into implementing that. |
[QUOTE=kriesel;499554][CODE]C:\msys64\home\ken\gpuowl-compile\v5x>openowl -h
2018-11-04 10:20:41 gpuowl 5.0-df2bdf2 Command line options: -user <name> : specify the user name. -cpu <name> : specify the hardware name. -time : display kernel profiling information. -fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1. -block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner. -carry long|short : force carry type. Short carry may be faster, but requires high bits/word. -list fft : display a list of available FFT configurations. -tf <bit-offset> : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth. -device <N> : select a specific device: 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft 2018-11-04 10:20:56 gpuowl 5.0-df2bdf2 2018-11-04 10:20:56 -list fft 2018-11-04 10:20:56 Can't open 'worktodo.txt' (mode 'rb') 2018-11-04 10:20:56 Bye [/CODE]A user might want to list the fft selection available, to create the worktodo list. Slight catch-22 here. I think it's reasonable to have -list fft act like -h, which runs whether there's a worktodo available or not, and terminates. -list fft is a form of help.[/QUOTE] I concur. |
gpuowl v5.0-df2bdf2 build for Win7 x64
1 Attachment(s)
A quick check shows it will run at least a few known primes m216091 and up apparently correctly.
-list fft output: [CODE]C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft 2018-11-04 10:20:56 gpuowl 5.0-df2bdf2 2018-11-04 10:20:56 -list fft 2018-11-04 10:20:56 Can't open 'worktodo.txt' (mode 'rb') 2018-11-04 10:20:56 Bye C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft 2018-11-04 10:28:00 gpuowl 5.0-df2bdf2 2018-11-04 10:28:00 -list fft 2018-11-04 10:28:00 FFT maxExp W H M 2018-11-04 10:28:00 0.1M 2.6M 256 256 1 2018-11-04 10:28:00 0.2M 5.2M 256 512 1 2018-11-04 10:28:00 0.2M 5.2M 512 256 1 2018-11-04 10:28:00 0.5M 10.2M 1024 256 1 2018-11-04 10:28:00 0.5M 10.2M 256 1024 1 2018-11-04 10:28:00 0.5M 10.2M 512 512 1 2018-11-04 10:28:00 0.6M 12.7M 256 256 5 2018-11-04 10:28:00 1.0M 20.0M 1024 512 1 2018-11-04 10:28:00 1.0M 20.0M 256 2048 1 2018-11-04 10:28:00 1.0M 20.0M 512 1024 1 2018-11-04 10:28:00 1.0M 20.0M 2048 256 1 2018-11-04 10:28:00 1.1M 22.5M 256 256 9 2018-11-04 10:28:00 1.2M 24.9M 256 512 5 2018-11-04 10:28:00 1.2M 24.9M 512 256 5 2018-11-04 10:28:00 2.0M 39.3M 1024 1024 1 2018-11-04 10:28:00 2.0M 39.3M 512 2048 1 2018-11-04 10:28:00 2.0M 39.3M 2048 512 1 2018-11-04 10:28:00 2.0M 39.3M 4096 256 1 2018-11-04 10:28:00 2.2M 44.1M 256 512 9 2018-11-04 10:28:00 2.2M 44.1M 512 256 9 2018-11-04 10:28:00 2.5M 48.9M 1024 256 5 2018-11-04 10:28:00 2.5M 48.9M 256 1024 5 2018-11-04 10:28:00 2.5M 48.9M 512 512 5 2018-11-04 10:28:00 4.0M 77.3M 1024 2048 1 2018-11-04 10:28:00 4.0M 77.3M 2048 1024 1 2018-11-04 10:28:00 4.0M 77.3M 4096 512 1 2018-11-04 10:28:00 4.5M 86.7M 1024 256 9 2018-11-04 10:28:00 4.5M 86.7M 256 1024 9 2018-11-04 10:28:00 4.5M 86.7M 512 512 9 2018-11-04 10:28:00 5.0M 96.1M 1024 512 5 2018-11-04 10:28:00 5.0M 96.1M 256 2048 5 2018-11-04 10:28:00 5.0M 96.1M 512 1024 5 2018-11-04 10:28:00 5.0M 96.1M 2048 256 5 2018-11-04 10:28:00 8.0M 151.8M 2048 2048 1 2018-11-04 10:28:00 8.0M 151.8M 4096 1024 1 2018-11-04 10:28:00 9.0M 170.3M 1024 512 9 2018-11-04 10:28:00 9.0M 170.3M 256 2048 9 2018-11-04 10:28:00 9.0M 170.3M 512 1024 9 2018-11-04 10:28:00 9.0M 170.3M 2048 256 9 2018-11-04 10:28:00 10.0M 188.7M 1024 1024 5 2018-11-04 10:28:00 10.0M 188.7M 512 2048 5 2018-11-04 10:28:00 10.0M 188.7M 2048 512 5 2018-11-04 10:28:00 10.0M 188.7M 4096 256 5 2018-11-04 10:28:00 16.0M 298.1M 4096 2048 1 2018-11-04 10:28:00 18.0M 334.3M 1024 1024 9 2018-11-04 10:28:00 18.0M 334.3M 512 2048 9 2018-11-04 10:28:00 18.0M 334.3M 2048 512 9 2018-11-04 10:28:00 18.0M 334.3M 4096 256 9 2018-11-04 10:28:00 20.0M 370.4M 1024 2048 5 2018-11-04 10:28:00 20.0M 370.4M 2048 1024 5 2018-11-04 10:28:00 20.0M 370.4M 4096 512 5 2018-11-04 10:28:00 36.0M 656.2M 1024 2048 9 2018-11-04 10:28:00 36.0M 656.2M 2048 1024 9 2018-11-04 10:28:00 36.0M 656.2M 4096 512 9 2018-11-04 10:28:00 40.0M 727.0M 2048 2048 5 2018-11-04 10:28:00 40.0M 727.0M 4096 1024 5 2018-11-04 10:28:00 72.0M 1287.5M 2048 2048 9 2018-11-04 10:28:00 72.0M 1287.5M 4096 1024 9 2018-11-04 10:28:00 80.0M 1426.4M 4096 2048 5 2018-11-04 10:28:00 144.0M 2525.2M 4096 2048 9[/CODE]110503 fails, 132049 has errors, 216091 runs correctly to completion (not surprising due to low # of bits/word, and not a request for yet smaller fft lengths, just observations) [CODE]C:\msys64\home\ken\gpuowl-compile\v5x>openowl 2018-11-04 10:33:00 gpuowl 5.0-df2bdf2 2018-11-04 10:33:00 110503 FFT 128K: Width 64x4, Height 64x4; 0.84 bits/word 2018-11-04 10:33:00 using long carry kernels 2018-11-04 10:33:00 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-11-04 10:33:03 OpenCL compilation in 2391 ms, with "-DEXP=110503u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-11-04 10:33:03 110503.owl not found, starting from the beginning. 2018-11-04 10:33:03 powerSmooth(110503, 10000) has 14484 bits Assertion failed! Program: C:\msys64\home\ken\gpuowl-compile\v5x\openowl.exe File: state.cpp, Line 24 Expression: 0 <= w && w < (1 << nBits) This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. C:\msys64\home\ken\gpuowl-compile\v5x>openowl 2018-11-04 10:33:18 gpuowl 5.0-df2bdf2 2018-11-04 10:33:18 132049 FFT 128K: Width 64x4, Height 64x4; 1.01 bits/word 2018-11-04 10:33:18 using long carry kernels 2018-11-04 10:33:19 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-11-04 10:33:21 OpenCL compilation in 2432 ms, with "-DEXP=132049u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-11-04 10:33:21 132049.owl not found, starting from the beginning. 2018-11-04 10:33:21 powerSmooth(132049, 10000) has 14484 bits 2018-11-04 10:33:23 132049 P-1 10000 69.04%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; d8645cee5574c284 2018-11-04 10:33:24 132049.owl loaded: k 0, B1 10000, block 400, res64 6379d2d731e5e48e, stage 1, baseBits 0 2018-11-04 10:33:24 132049 B1=10000 B2=70000 (effective B2=70000) selected 4142 P-1 points in 0.01s 2018-11-04 10:33:24 132049 EE 800 0.60%; 0.16 ms/sq, 1 MULs; ETA 0d 00:00; da4be711d7cf309d (check 0.08s) 2018-11-04 10:33:24 132049.owl loaded: k 0, B1 10000, block 400, res64 6379d2d731e5e48e, stage 1, baseBits 0 2018-11-04 10:33:24 132049 EE 800 0.60%; 0.28 ms/sq, 1 MULs; ETA 0d 00:01; da4be711d7cf309d (check 0.08s) 2018-11-04 10:33:24 132049.owl loaded: k 0, B1 10000, block 400, res64 6379d2d731e5e48e, stage 1, baseBits 0 2018-11-04 10:33:25 132049 EE 800 0.60%; 0.28 ms/sq, 1 MULs; ETA 0d 00:01; da4be711d7cf309d (check 0.08s) 2018-11-04 10:33:25 3 sequential errors, will stop. 2018-11-04 10:33:25 Exiting because "too many errors" 2018-11-04 10:33:25 Bye C:\msys64\home\ken\gpuowl-compile\v5x>openowl 2018-11-04 10:33:42 gpuowl 5.0-df2bdf2 2018-11-04 10:33:42 216091 FFT 128K: Width 64x4, Height 64x4; 1.65 bits/word 2018-11-04 10:33:42 using long carry kernels 2018-11-04 10:33:43 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-11-04 10:33:46 OpenCL compilation in 2434 ms, with "-DEXP=216091u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-11-04 10:33:46 216091.owl not found, starting from the beginning. 2018-11-04 10:33:46 powerSmooth(216091, 10000) has 14484 bits 2018-11-04 10:33:48 216091 P-1 10000 69.04%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 9e7518aa03950b26 2018-11-04 10:33:48 216091.owl loaded: k 0, B1 10000, block 400, res64 d8a71ba2415f2773, stage 1, baseBits 0 2018-11-04 10:33:48 216091 B1=10000 B2=130000 (effective B2=130000) selected 7611 P-1 points in 0.02s 2018-11-04 10:33:49 216091 OK 800 0.37%; 0.17 ms/sq, 1 MULs; ETA 0d 00:01; 5646ce8634d76602 (check 0.08s) 2018-11-04 10:33:49 216091 GCD no factor (0.07s) 2018-11-04 10:33:50 216091 10000 4.62%; 0.16 ms/sq, 287 MULs; ETA 0d 00:01; 6d0028f1a3744d15 2018-11-04 10:33:52 216091 20000 9.24%; 0.16 ms/sq, 1067 MULs; ETA 0d 00:01; e4c865e7e023a233 2018-11-04 10:33:54 216091 30000 13.86%; 0.16 ms/sq, 1053 MULs; ETA 0d 00:01; 52c18ca42b2e5a40 2018-11-04 10:33:55 216091 40000 18.48%; 0.16 ms/sq, 990 MULs; ETA 0d 00:01; 003716dc307b3768 2018-11-04 10:33:57 216091 50000 23.11%; 0.16 ms/sq, 882 MULs; ETA 0d 00:00; f19c985e00f9ab66 2018-11-04 10:33:59 216091 60000 27.73%; 0.16 ms/sq, 794 MULs; ETA 0d 00:00; 6679d5a415aece9e 2018-11-04 10:34:01 216091 70000 32.35%; 0.16 ms/sq, 566 MULs; ETA 0d 00:00; da83220b76e8b55b 2018-11-04 10:34:02 216091 80000 36.97%; 0.16 ms/sq, 339 MULs; ETA 0d 00:00; 3700d4ccd97a326a 2018-11-04 10:34:04 216091 90000 41.59%; 0.16 ms/sq, 346 MULs; ETA 0d 00:00; c0472c1f976aa2c1 2018-11-04 10:34:05 216091 100000 46.21%; 0.16 ms/sq, 329 MULs; ETA 0d 00:00; d3f402b7fb9adb65 2018-11-04 10:34:07 216091 110000 50.83%; 0.16 ms/sq, 304 MULs; ETA 0d 00:00; a625b1471a8e6481 2018-11-04 10:34:09 216091 120000 55.45%; 0.16 ms/sq, 328 MULs; ETA 0d 00:00; fe8081748d1a088e 2018-11-04 10:34:10 216091 130000 60.07%; 0.16 ms/sq, 325 MULs; ETA 0d 00:00; b38dbc5b3d73c584 2018-11-04 10:34:12 216091 140000 64.70%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; a71c266b43cc9171 2018-11-04 10:34:14 216091 150000 69.32%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; a6a2e15e86701788 2018-11-04 10:34:15 216091 OK 160000 73.94%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 6cc0b6cbc453946a (check 0.09s) 2018-11-04 10:34:17 216091 170000 78.56%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 16838b06bd004e23 2018-11-04 10:34:18 216091 180000 83.18%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 38a44921f392a2fc 2018-11-04 10:34:20 216091 190000 87.80%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 63580cfe1f80b303 2018-11-04 10:34:22 216091 200000 92.42%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 7c3f2446e5e6fd09 2018-11-04 10:34:23 216091 210000 97.04%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; b6e9bb0a7c8ede6b 2018-11-04 10:34:24 PP 216090 / 216091, d8a71ba2415f2773 (base d8a71ba2415f2773) 2018-11-04 10:34:24 216091 OK 216400 100.00%; 0.17 ms/sq, 0 MULs; ETA 0d 00:00; e898188ce32335d4 (check 0.09s) 2018-11-04 10:34:24 {"exponent":"216091", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:3 4:24 UTC", "aid":"0", "fft-length":131072, "res64":"d8a71ba2415f2773", "b2":"130000", "base":{"b1":"10000", "bias":{"2":19}, "res64":"d8a71ba2415f2773"}}[/CODE]B2 bounds appear to be correctly reported [CODE]{"exponent":"216091", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:34:24 UTC", "aid":"0", "fft-length":131072, "res64":"d8a71ba2415f2773", "b2":"130000", "base":{"b1":"10000", "bias":{"2":19}, "res64":"d8a71ba2415f2773"}} {"exponent":"756839", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:36:55 UTC", "aid":"0", "fft-length":131072, "res64":"0e12589efe2be6c5", "b2":"500000", "base":{"b1":"20000", "bias":{"2":19}, "res64":"0e12589efe2be6c5"}} {"exponent":"859433", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:39:23 UTC", "aid":"0", "fft-length":131072, "res64":"ac86e7a51cecadb0", "b2":"580000", "base":{"b1":"20000", "bias":{"2":19}, "res64":"ac86e7a51cecadb0"}} [/CODE]m89000167 timing 4.521 ms/sq with no P-1 on RX480, Win7 x64, Adrenalin 18.10.2 driver. |
[QUOTE=kriesel;499222]
Some radix-3 transforms, and maybe 7 if it helps speed. 6M and 12M in particular. It's a particularly long jump between 20M and 36M, so adding 24M or 32M or both would be good. Similarly between 40M and 72M, 48M or 64M or both. [/QUOTE] I just added an FFT-3 "middle" step. |
V5.0-9c13870 build for Win 7 x64 OpenCL on AMD
1 Attachment(s)
[QUOTE=preda;499580]I just added an FFT-3 "middle" step.[/QUOTE]
[CODE]$ make openowl-win g++ -std=c++17 -O2 -DREV=\"9c13870\" -Wall Worktodo.cpp Result.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp GCD.cpp Primes.cpp Stats.cpp state.cpp Signal.cpp -o openowl -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static Gpu.cpp: In member function 'PRPState Gpu::loadPRP(u32, u32, u32)': Gpu.cpp:557:9: warning: unknown conversion type character 'l' in format [-Wformat=] log("%u EE loaded: %d, B1 %u, blockSize %d, %016llx (expected %016llx)\n", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Gpu.cpp:557:9: warning: unknown conversion type character 'l' in format [-Wformat=] Gpu.cpp:557:9: warning: too many arguments for format [-Wformat-extra-args] Gpu.cpp: In member function 'PRPResult Gpu::isPrimePRP(u32, const Args&, u32, u32)': Gpu.cpp:690:11: warning: unknown conversion type character 'l' in format [-Wformat=] log("%s %8d / %d, %016llx (base %016llx)\n", isPrime ? "PP" : "CC", kEnd, E, finalRes64, residue(base)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Gpu.cpp:690:11: warning: unknown conversion type character 'l' in format [-Wformat=] Gpu.cpp:690:11: warning: too many arguments for format [-Wformat-extra-args] checkpoint.cpp: In member function 'void PRPState::loadInt(u32, u32, u32)': checkpoint.cpp:167:7: warning: unknown conversion type character 'l' in format [-Wformat=] log("%s loaded: k %u, B1 %u, block %u, res64 %016llx, stage %u, baseBits %u\n", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ checkpoint.cpp:167:7: warning: format '%u' expects argument of type 'unsigned int', but argument 6 has type 'u64' {aka 'long long unsigned int'} [-Wformat=] checkpoint.cpp:167:7: warning: too many arguments for format [-Wformat-extra-args] [/CODE]Lots of new fft lengths due to the x3: 0.4, 0.8, 1.5, 3, 6, 12, 24, 48M. It's getting to be a large list. Please change the -list fft to not require a worktodo.txt.[CODE]C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>openowl -h 2018-11-04 16:07:53 gpuowl 5.0-9c13870 Command line options: -user <name> : specify the user name. -cpu <name> : specify the hardware name. -time : display kernel profiling information. -fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1. -block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner. -carry long|short : force carry type. Short carry may be faster, but requires high bits/word. -list fft : display a list of available FFT configurations. -tf <bit-offset> : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth. -device <N> : select a specific device: 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>openowl -list fft 2018-11-04 16:08:03 gpuowl 5.0-9c13870 2018-11-04 16:08:03 -list fft 2018-11-04 16:08:03 Can't open 'worktodo.txt' (mode 'rb') 2018-11-04 16:08:03 Bye C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>copy ..\v3.8\worktodo.txt . 1 file(s) copied. C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>openowl -list fft 2018-11-04 16:08:34 gpuowl 5.0-9c13870 2018-11-04 16:08:34 -list fft 2018-11-04 16:08:34 FFT maxExp W H M 2018-11-04 16:08:34 0.1M 2.6M 256 256 1 2018-11-04 16:08:34 0.2M 5.2M 256 512 1 2018-11-04 16:08:34 0.2M 5.2M 512 256 1 2018-11-04 16:08:34 0.4M 7.7M 256 256 3 2018-11-04 16:08:34 0.5M 10.2M 1024 256 1 2018-11-04 16:08:34 0.5M 10.2M 256 1024 1 2018-11-04 16:08:34 0.5M 10.2M 512 512 1 2018-11-04 16:08:34 0.6M 12.7M 256 256 5 2018-11-04 16:08:34 0.8M 15.1M 256 512 3 2018-11-04 16:08:34 0.8M 15.1M 512 256 3 2018-11-04 16:08:34 1.0M 20.0M 1024 512 1 2018-11-04 16:08:34 1.0M 20.0M 256 2048 1 2018-11-04 16:08:34 1.0M 20.0M 512 1024 1 2018-11-04 16:08:34 1.0M 20.0M 2048 256 1 2018-11-04 16:08:34 1.1M 22.5M 256 256 9 2018-11-04 16:08:34 1.2M 24.9M 256 512 5 2018-11-04 16:08:34 1.2M 24.9M 512 256 5 2018-11-04 16:08:34 1.5M 29.7M 1024 256 3 2018-11-04 16:08:34 1.5M 29.7M 256 1024 3 2018-11-04 16:08:35 1.5M 29.7M 512 512 3 2018-11-04 16:08:35 2.0M 39.3M 1024 1024 1 2018-11-04 16:08:35 2.0M 39.3M 512 2048 1 2018-11-04 16:08:35 2.0M 39.3M 2048 512 1 2018-11-04 16:08:35 2.0M 39.3M 4096 256 1 2018-11-04 16:08:35 2.2M 44.1M 256 512 9 2018-11-04 16:08:35 2.2M 44.1M 512 256 9 2018-11-04 16:08:35 2.5M 48.9M 1024 256 5 2018-11-04 16:08:35 2.5M 48.9M 256 1024 5 2018-11-04 16:08:35 2.5M 48.9M 512 512 5 2018-11-04 16:08:35 3.0M 58.4M 1024 512 3 2018-11-04 16:08:35 3.0M 58.4M 256 2048 3 2018-11-04 16:08:35 3.0M 58.4M 512 1024 3 2018-11-04 16:08:35 3.0M 58.4M 2048 256 3 2018-11-04 16:08:35 4.0M 77.3M 1024 2048 1 2018-11-04 16:08:35 4.0M 77.3M 2048 1024 1 2018-11-04 16:08:35 4.0M 77.3M 4096 512 1 2018-11-04 16:08:35 4.5M 86.7M 1024 256 9 2018-11-04 16:08:35 4.5M 86.7M 256 1024 9 2018-11-04 16:08:35 4.5M 86.7M 512 512 9 2018-11-04 16:08:35 5.0M 96.1M 1024 512 5 2018-11-04 16:08:35 5.0M 96.1M 256 2048 5 2018-11-04 16:08:35 5.0M 96.1M 512 1024 5 2018-11-04 16:08:35 5.0M 96.1M 2048 256 5 2018-11-04 16:08:35 6.0M 114.7M 1024 1024 3 2018-11-04 16:08:35 6.0M 114.7M 512 2048 3 2018-11-04 16:08:35 6.0M 114.7M 2048 512 3 2018-11-04 16:08:35 6.0M 114.7M 4096 256 3 2018-11-04 16:08:35 8.0M 151.8M 2048 2048 1 2018-11-04 16:08:35 8.0M 151.8M 4096 1024 1 2018-11-04 16:08:35 9.0M 170.3M 1024 512 9 2018-11-04 16:08:35 9.0M 170.3M 256 2048 9 2018-11-04 16:08:35 9.0M 170.3M 512 1024 9 2018-11-04 16:08:35 9.0M 170.3M 2048 256 9 2018-11-04 16:08:35 10.0M 188.7M 1024 1024 5 2018-11-04 16:08:35 10.0M 188.7M 512 2048 5 2018-11-04 16:08:35 10.0M 188.7M 2048 512 5 2018-11-04 16:08:35 10.0M 188.7M 4096 256 5 2018-11-04 16:08:35 12.0M 225.3M 1024 2048 3 2018-11-04 16:08:35 12.0M 225.3M 2048 1024 3 2018-11-04 16:08:35 12.0M 225.3M 4096 512 3 2018-11-04 16:08:35 16.0M 298.1M 4096 2048 1 2018-11-04 16:08:35 18.0M 334.3M 1024 1024 9 2018-11-04 16:08:35 18.0M 334.3M 512 2048 9 2018-11-04 16:08:35 18.0M 334.3M 2048 512 9 2018-11-04 16:08:35 18.0M 334.3M 4096 256 9 2018-11-04 16:08:35 20.0M 370.4M 1024 2048 5 2018-11-04 16:08:35 20.0M 370.4M 2048 1024 5 2018-11-04 16:08:35 20.0M 370.4M 4096 512 5 2018-11-04 16:08:35 24.0M 442.3M 2048 2048 3 2018-11-04 16:08:35 24.0M 442.3M 4096 1024 3 2018-11-04 16:08:35 36.0M 656.2M 1024 2048 9 2018-11-04 16:08:35 36.0M 656.2M 2048 1024 9 2018-11-04 16:08:35 36.0M 656.2M 4096 512 9 2018-11-04 16:08:35 40.0M 727.0M 2048 2048 5 2018-11-04 16:08:35 40.0M 727.0M 4096 1024 5 2018-11-04 16:08:35 48.0M 868.1M 4096 2048 3 2018-11-04 16:08:35 72.0M 1287.5M 2048 2048 9 2018-11-04 16:08:35 72.0M 1287.5M 4096 1024 9 2018-11-04 16:08:35 80.0M 1426.4M 4096 2048 5 2018-11-04 16:08:35 144.0M 2525.2M 4096 2048 9 [/CODE]On RX480, Adrenalin 18.10.2 driver:[CODE]... 2018-11-04 17:05:29 6972593 5960000 85.47%; 0.38 ms/sq, 0 MULs; ETA 0d 00:06; 9192684b7c1359cd 2018-11-04 17:05:33 6972593 5970000 85.62%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; c2f9539990824bd3 2018-11-04 17:05:36 6972593 5980000 85.76%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; 0e55f43c273e071f 2018-11-04 17:05:40 6972593 5990000 85.91%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; 1b238cbce00977ec 2018-11-04 17:05:44 6972593 6000000 86.05%; 0.38 ms/sq, 0 MULs; ETA 0d 00:06; 226f17e463c15782 2018-11-04 17:05:48 6972593 6010000 86.19%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; 37cb92ee936c55d2 2018-11-04 17:05:51 6972593 6020000 86.34%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; c1966294670fbb2f 2018-11-04 17:05:55 6972593 6030000 86.48%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; f03d90475b5f5672 2018-11-04 17:05:59 6972593 6040000 86.62%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; 3130d3e8833a08d3 2018-11-04 17:06:03 6972593 6050000 86.77%; 0.38 ms/sq, 0 MULs; ETA 0d 00:06; fd70900ff37a05c4 2018-11-04 17:06:06 6972593 6060000 86.91%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; a0ececf155185dba 2018-11-04 17:06:10 6972593 6070000 87.05%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; e869127794b701de 2018-11-04 17:06:14 6972593 OK 6080000 87.20%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; d11c156803bb5922 (check 0.19s) ...[/CODE][CODE]{"exponent":"6972593", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-9c13870"}, "timestamp":"2018-11-04 23:11:49 UTC", "aid":"0", "fft-length":393216, "res64":"bc16906ca9e08ff7", "b2":"1440000", "base":{"b1":"80000", "bias":{"2":19}, "res64":"bc16906ca9e08ff7"}} [/CODE] |
| All times are UTC. The time now is 23:10. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.