mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

SELROC 2018-11-04 12:15

[QUOTE=preda;499534]0 MULs means that no GCD multiplications were done. This is normal with B1=0. When B1 != 0, sometimes a number of MULs are done, and sometimes 0 are done, depending on iteration.

When MULs==0, ms/sq is the same as the old ms/it.
But when MULs are done, the ms/sq tries to measure only the "squaring" time (the normal PRP iteration), excluding the time taken by the MULs. Thus I changed the name to ms/sq to show that it's not the same as the old ms/it.

Why I though that indicating speed this way is good: because this number, ms/sq, is relatively stable and does not depend on the number (or time taken) by the MULs that are in variable number in iteration blocks. Thus this number can be used to compare speed.

The other option would be: take total time (squares + muls) and divide it by the number of iterations in the block. This number would be larger where there are more MULs and smaller with less MULs, thus a bit more difficult to read GPU perf from it, IMO.[/QUOTE]


Thank you now it is clear.
This is a good attempt to get a stable measure, in the latest test the ms/sq time is 0.18 everywhere except for the last iteration which is 0.19 ms/sq.


Also the introduction of smaller FFT size is good. Now the program can be validated on new hardware with a quick test against the smallest prime.

kriesel 2018-11-04 15:14

RX550 timings versus gpuowl version for m89m
 
RX550, AMD Adrenaline 18.10.2 driver for Win7 x64

m89000167 5000K for v2.0, 5120k for others (iterations 10000-20000)
Ver ms/it (no P-1 or TF)
2.0 17.38
3.3 16.90
3.5 16.42 <--min
3.6 16.44
3.8 16.43
3.9 17.22
4.3 17.35
4.6 17.25
4.7 NA
5.0 17.25

Note the more exactly comparable methodology in this report than the RX480 timings reported earlier, and the different location of the minimum. Difference v3.5-3.8 is +-1 least digit, so may be insignificant

SELROC 2018-11-04 15:35

[QUOTE=kriesel;499544]RX550, AMD Adrenaline 18.10.2 driver for Win7 x64

m89000167 5000K for v2.0, 5120k for others (iterations 10000-20000)
Ver ms/it (no P-1 or TF)
2.0 17.38
3.3 16.90
3.5 16.42 <--min
3.6 16.44
3.8 16.43
3.9 17.22
4.3 17.35
4.6 17.25
4.7 NA
5.0 17.25

Note the more exactly comparable methodology in this report than the RX480 timings reported earlier, and the different location of the minimum. Difference v3.5-3.8 is +-1 least digit, so may be insignificant[/QUOTE]


The numbers are not exactly comparable by now:


[CODE]2018-11-04 16:24:42 gpuowl 5.0--mod
2018-11-04 16:24:42 RX580 -user selroc -cpu RX580 -device 0
2018-11-04 16:24:42 RX580 89000167 FFT 5120K: Width 256x4, Height 64x8, Middle 5; 16.98 bits/word
2018-11-04 16:24:42 RX580 using short carry kernels
2018-11-04 16:24:43 RX580 gfx803-36x1360-@4a:0.0 Ellesmere [Radeon RX 470/480]
2018-11-04 16:24:44 RX580 OpenCL compilation in 1076 ms, with "-DEXP=89000167u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=5u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-04 16:24:44 RX580 89000167.owl not found, starting from the beginning.
2018-11-04 16:24:52 RX580 89000167 OK 800 0.00%; 4.44 ms/sq, 0 MULs; ETA 4d 13:50; 2744231e7051f3fe (check 1.95s)
2018-11-04 16:25:33 RX580 89000167 10000 0.01%; 4.46 ms/sq, 0 MULs; ETA 4d 14:20; 2a55d51cdf0d91cb
2018-11-04 16:26:18 RX580 89000167 20000 0.02%; 4.47 ms/sq, 0 MULs; ETA 4d 14:35; 8dcb0029e791db2a
2018-11-04 16:27:03 RX580 89000167 30000 0.03%; 4.48 ms/sq, 0 MULs; ETA 4d 14:37; 2fbd246d68f86f29
2018-11-04 16:27:47 RX580 89000167 40000 0.04%; 4.48 ms/sq, 0 MULs; ETA 4d 14:47; d85f84a6744d7090
2018-11-04 16:28:32 RX580 89000167 50000 0.06%; 4.49 ms/sq, 0 MULs; ETA 4d 14:50; afa46f7cdc5ffb7d
2018-11-04 16:29:17 RX580 89000167 60000 0.07%; 4.49 ms/sq, 0 MULs; ETA 4d 14:49; 98906e9529e4667f
2018-11-04 16:30:02 RX580 89000167 70000 0.08%; 4.49 ms/sq, 0 MULs; ETA 4d 14:49; 90b5e67934fcdcff
2018-11-04 16:30:07 RX580 Stopping, please wait..
2018-11-04 16:30:09 RX580 89000167 OK 71200 0.08%; 4.49 ms/sq, 0 MULs; ETA 4d 14:51; 14f11cfb55a43415 (check 1.96s)
2018-11-04 16:30:09 RX580 Exiting because "stop requested"
2018-11-04 16:30:09 RX580 Bye[/CODE]

kriesel 2018-11-04 15:58

[QUOTE=SELROC;499550]The numbers are not exactly comparable by now:
[/QUOTE]
I claim the numbers are very comparable for the same gpu, versus gpuowl version, despite the ms/it vs. ms/sq difference in labeling, as long as all versions were run with no P-1 activity, as my recent benchmark resuls for RX550 and RX480 posted for m89000167 were. Speed difference between RX550 and RX480 or RX580 is expected to be a considerable ratio. RX550 is a low wattage slow card. Running the same iteration span for each gpuowl version made the RX550 timings more comparable than my earlier m89m benchmarking for the RX480, which was successive iteration ranges (not same iteration span for different gpuowl versions).

SELROC 2018-11-04 16:08

[QUOTE=kriesel;499551]I claim the numbers are very comparable for the same gpu, versus gpuowl version, despite the ms/it vs. ms/sq difference in labeling, as long as all versions were run with no P-1 activity, as my recent benchmark resuls for RX550 and RX480 posted for m89000167 were. Speed difference between RX550 and RX480 or RX580 is expected to be a considerable ratio. RX550 is a low wattage slow card. Running the same iteration span for each gpuowl version made the RX550 timings more comparable than my earlier m89m benchmarking for the RX480, which was successive iteration ranges (not same iteration span for different gpuowl versions).[/QUOTE]


Our numbers come from different gpus, different operating systems, different drivers, a difference must be accounted for.

kriesel 2018-11-04 16:27

-list fft fails if worktodo does not exist
 
[CODE]C:\msys64\home\ken\gpuowl-compile\v5x>openowl -h
2018-11-04 10:20:41 gpuowl 5.0-df2bdf2

Command line options:

-user <name> : specify the user name.
-cpu <name> : specify the hardware name.
-time : display kernel profiling information.
-fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner.
-carry long|short : force carry type. Short carry may be faster, but requires high bits/word.
-list fft : display a list of available FFT configurations.
-tf <bit-offset> : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth.
-device <N> : select a specific device:
0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
1 : gfx804-8x1203-@3:0.0 Radeon 550 Series

C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft
2018-11-04 10:20:56 gpuowl 5.0-df2bdf2
2018-11-04 10:20:56 -list fft
2018-11-04 10:20:56 Can't open 'worktodo.txt' (mode 'rb')
2018-11-04 10:20:56 Bye
[/CODE]A user might want to list the fft selection available, to create the worktodo list. Slight catch-22 here. I think it's reasonable to have -list fft act like -h, which runs whether there's a worktodo available or not, and terminates. -list fft is a form of help.

preda 2018-11-04 17:13

[QUOTE=kriesel;499554]A user might want to list the fft selection available, to create the worktodo list. Slight catch-22 here. I think it's reasonable to have -list fft act like -h, which runs whether there's a worktodo available or not, and terminates. -list fft is a form of help.[/QUOTE]

Yes it makes sense. I'll look into implementing that.

SELROC 2018-11-04 17:16

[QUOTE=kriesel;499554][CODE]C:\msys64\home\ken\gpuowl-compile\v5x>openowl -h
2018-11-04 10:20:41 gpuowl 5.0-df2bdf2

Command line options:

-user <name> : specify the user name.
-cpu <name> : specify the hardware name.
-time : display kernel profiling information.
-fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner.
-carry long|short : force carry type. Short carry may be faster, but requires high bits/word.
-list fft : display a list of available FFT configurations.
-tf <bit-offset> : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth.
-device <N> : select a specific device:
0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
1 : gfx804-8x1203-@3:0.0 Radeon 550 Series

C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft
2018-11-04 10:20:56 gpuowl 5.0-df2bdf2
2018-11-04 10:20:56 -list fft
2018-11-04 10:20:56 Can't open 'worktodo.txt' (mode 'rb')
2018-11-04 10:20:56 Bye
[/CODE]A user might want to list the fft selection available, to create the worktodo list. Slight catch-22 here. I think it's reasonable to have -list fft act like -h, which runs whether there's a worktodo available or not, and terminates. -list fft is a form of help.[/QUOTE]


I concur.

kriesel 2018-11-04 18:10

gpuowl v5.0-df2bdf2 build for Win7 x64
 
1 Attachment(s)
A quick check shows it will run at least a few known primes m216091 and up apparently correctly.
-list fft output:
[CODE]C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft
2018-11-04 10:20:56 gpuowl 5.0-df2bdf2
2018-11-04 10:20:56 -list fft
2018-11-04 10:20:56 Can't open 'worktodo.txt' (mode 'rb')
2018-11-04 10:20:56 Bye

C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft
2018-11-04 10:28:00 gpuowl 5.0-df2bdf2
2018-11-04 10:28:00 -list fft
2018-11-04 10:28:00 FFT maxExp W H M
2018-11-04 10:28:00 0.1M 2.6M 256 256 1
2018-11-04 10:28:00 0.2M 5.2M 256 512 1
2018-11-04 10:28:00 0.2M 5.2M 512 256 1
2018-11-04 10:28:00 0.5M 10.2M 1024 256 1
2018-11-04 10:28:00 0.5M 10.2M 256 1024 1
2018-11-04 10:28:00 0.5M 10.2M 512 512 1
2018-11-04 10:28:00 0.6M 12.7M 256 256 5
2018-11-04 10:28:00 1.0M 20.0M 1024 512 1
2018-11-04 10:28:00 1.0M 20.0M 256 2048 1
2018-11-04 10:28:00 1.0M 20.0M 512 1024 1
2018-11-04 10:28:00 1.0M 20.0M 2048 256 1
2018-11-04 10:28:00 1.1M 22.5M 256 256 9
2018-11-04 10:28:00 1.2M 24.9M 256 512 5
2018-11-04 10:28:00 1.2M 24.9M 512 256 5
2018-11-04 10:28:00 2.0M 39.3M 1024 1024 1
2018-11-04 10:28:00 2.0M 39.3M 512 2048 1
2018-11-04 10:28:00 2.0M 39.3M 2048 512 1
2018-11-04 10:28:00 2.0M 39.3M 4096 256 1
2018-11-04 10:28:00 2.2M 44.1M 256 512 9
2018-11-04 10:28:00 2.2M 44.1M 512 256 9
2018-11-04 10:28:00 2.5M 48.9M 1024 256 5
2018-11-04 10:28:00 2.5M 48.9M 256 1024 5
2018-11-04 10:28:00 2.5M 48.9M 512 512 5
2018-11-04 10:28:00 4.0M 77.3M 1024 2048 1
2018-11-04 10:28:00 4.0M 77.3M 2048 1024 1
2018-11-04 10:28:00 4.0M 77.3M 4096 512 1
2018-11-04 10:28:00 4.5M 86.7M 1024 256 9
2018-11-04 10:28:00 4.5M 86.7M 256 1024 9
2018-11-04 10:28:00 4.5M 86.7M 512 512 9
2018-11-04 10:28:00 5.0M 96.1M 1024 512 5
2018-11-04 10:28:00 5.0M 96.1M 256 2048 5
2018-11-04 10:28:00 5.0M 96.1M 512 1024 5
2018-11-04 10:28:00 5.0M 96.1M 2048 256 5
2018-11-04 10:28:00 8.0M 151.8M 2048 2048 1
2018-11-04 10:28:00 8.0M 151.8M 4096 1024 1
2018-11-04 10:28:00 9.0M 170.3M 1024 512 9
2018-11-04 10:28:00 9.0M 170.3M 256 2048 9
2018-11-04 10:28:00 9.0M 170.3M 512 1024 9
2018-11-04 10:28:00 9.0M 170.3M 2048 256 9
2018-11-04 10:28:00 10.0M 188.7M 1024 1024 5
2018-11-04 10:28:00 10.0M 188.7M 512 2048 5
2018-11-04 10:28:00 10.0M 188.7M 2048 512 5
2018-11-04 10:28:00 10.0M 188.7M 4096 256 5
2018-11-04 10:28:00 16.0M 298.1M 4096 2048 1
2018-11-04 10:28:00 18.0M 334.3M 1024 1024 9
2018-11-04 10:28:00 18.0M 334.3M 512 2048 9
2018-11-04 10:28:00 18.0M 334.3M 2048 512 9
2018-11-04 10:28:00 18.0M 334.3M 4096 256 9
2018-11-04 10:28:00 20.0M 370.4M 1024 2048 5
2018-11-04 10:28:00 20.0M 370.4M 2048 1024 5
2018-11-04 10:28:00 20.0M 370.4M 4096 512 5
2018-11-04 10:28:00 36.0M 656.2M 1024 2048 9
2018-11-04 10:28:00 36.0M 656.2M 2048 1024 9
2018-11-04 10:28:00 36.0M 656.2M 4096 512 9
2018-11-04 10:28:00 40.0M 727.0M 2048 2048 5
2018-11-04 10:28:00 40.0M 727.0M 4096 1024 5
2018-11-04 10:28:00 72.0M 1287.5M 2048 2048 9
2018-11-04 10:28:00 72.0M 1287.5M 4096 1024 9
2018-11-04 10:28:00 80.0M 1426.4M 4096 2048 5
2018-11-04 10:28:00 144.0M 2525.2M 4096 2048 9[/CODE]110503 fails, 132049 has errors, 216091 runs correctly to completion
(not surprising due to low # of bits/word, and not a request for yet smaller fft lengths, just observations)

[CODE]C:\msys64\home\ken\gpuowl-compile\v5x>openowl
2018-11-04 10:33:00 gpuowl 5.0-df2bdf2
2018-11-04 10:33:00 110503 FFT 128K: Width 64x4, Height 64x4; 0.84 bits/word
2018-11-04 10:33:00 using long carry kernels
2018-11-04 10:33:00 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-11-04 10:33:03 OpenCL compilation in 2391 ms, with "-DEXP=110503u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-04 10:33:03 110503.owl not found, starting from the beginning.
2018-11-04 10:33:03 powerSmooth(110503, 10000) has 14484 bits
Assertion failed!

Program: C:\msys64\home\ken\gpuowl-compile\v5x\openowl.exe
File: state.cpp, Line 24

Expression: 0 <= w && w < (1 << nBits)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

C:\msys64\home\ken\gpuowl-compile\v5x>openowl
2018-11-04 10:33:18 gpuowl 5.0-df2bdf2
2018-11-04 10:33:18 132049 FFT 128K: Width 64x4, Height 64x4; 1.01 bits/word
2018-11-04 10:33:18 using long carry kernels
2018-11-04 10:33:19 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-11-04 10:33:21 OpenCL compilation in 2432 ms, with "-DEXP=132049u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-04 10:33:21 132049.owl not found, starting from the beginning.
2018-11-04 10:33:21 powerSmooth(132049, 10000) has 14484 bits
2018-11-04 10:33:23 132049 P-1 10000 69.04%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; d8645cee5574c284
2018-11-04 10:33:24 132049.owl loaded: k 0, B1 10000, block 400, res64 6379d2d731e5e48e, stage 1, baseBits 0
2018-11-04 10:33:24 132049 B1=10000 B2=70000 (effective B2=70000) selected 4142 P-1 points in 0.01s
2018-11-04 10:33:24 132049 EE 800 0.60%; 0.16 ms/sq, 1 MULs; ETA 0d 00:00; da4be711d7cf309d (check 0.08s)
2018-11-04 10:33:24 132049.owl loaded: k 0, B1 10000, block 400, res64 6379d2d731e5e48e, stage 1, baseBits 0
2018-11-04 10:33:24 132049 EE 800 0.60%; 0.28 ms/sq, 1 MULs; ETA 0d 00:01; da4be711d7cf309d (check 0.08s)
2018-11-04 10:33:24 132049.owl loaded: k 0, B1 10000, block 400, res64 6379d2d731e5e48e, stage 1, baseBits 0
2018-11-04 10:33:25 132049 EE 800 0.60%; 0.28 ms/sq, 1 MULs; ETA 0d 00:01; da4be711d7cf309d (check 0.08s)
2018-11-04 10:33:25 3 sequential errors, will stop.
2018-11-04 10:33:25 Exiting because "too many errors"
2018-11-04 10:33:25 Bye

C:\msys64\home\ken\gpuowl-compile\v5x>openowl
2018-11-04 10:33:42 gpuowl 5.0-df2bdf2
2018-11-04 10:33:42 216091 FFT 128K: Width 64x4, Height 64x4; 1.65 bits/word
2018-11-04 10:33:42 using long carry kernels
2018-11-04 10:33:43 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-11-04 10:33:46 OpenCL compilation in 2434 ms, with "-DEXP=216091u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-04 10:33:46 216091.owl not found, starting from the beginning.
2018-11-04 10:33:46 powerSmooth(216091, 10000) has 14484 bits
2018-11-04 10:33:48 216091 P-1 10000 69.04%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 9e7518aa03950b26
2018-11-04 10:33:48 216091.owl loaded: k 0, B1 10000, block 400, res64 d8a71ba2415f2773, stage 1, baseBits 0
2018-11-04 10:33:48 216091 B1=10000 B2=130000 (effective B2=130000) selected 7611 P-1 points in 0.02s
2018-11-04 10:33:49 216091 OK 800 0.37%; 0.17 ms/sq, 1 MULs; ETA 0d 00:01; 5646ce8634d76602 (check 0.08s)
2018-11-04 10:33:49 216091 GCD no factor (0.07s)
2018-11-04 10:33:50 216091 10000 4.62%; 0.16 ms/sq, 287 MULs; ETA 0d 00:01; 6d0028f1a3744d15
2018-11-04 10:33:52 216091 20000 9.24%; 0.16 ms/sq, 1067 MULs; ETA 0d 00:01; e4c865e7e023a233
2018-11-04 10:33:54 216091 30000 13.86%; 0.16 ms/sq, 1053 MULs; ETA 0d 00:01; 52c18ca42b2e5a40
2018-11-04 10:33:55 216091 40000 18.48%; 0.16 ms/sq, 990 MULs; ETA 0d 00:01; 003716dc307b3768
2018-11-04 10:33:57 216091 50000 23.11%; 0.16 ms/sq, 882 MULs; ETA 0d 00:00; f19c985e00f9ab66
2018-11-04 10:33:59 216091 60000 27.73%; 0.16 ms/sq, 794 MULs; ETA 0d 00:00; 6679d5a415aece9e
2018-11-04 10:34:01 216091 70000 32.35%; 0.16 ms/sq, 566 MULs; ETA 0d 00:00; da83220b76e8b55b
2018-11-04 10:34:02 216091 80000 36.97%; 0.16 ms/sq, 339 MULs; ETA 0d 00:00; 3700d4ccd97a326a
2018-11-04 10:34:04 216091 90000 41.59%; 0.16 ms/sq, 346 MULs; ETA 0d 00:00; c0472c1f976aa2c1
2018-11-04 10:34:05 216091 100000 46.21%; 0.16 ms/sq, 329 MULs; ETA 0d 00:00; d3f402b7fb9adb65
2018-11-04 10:34:07 216091 110000 50.83%; 0.16 ms/sq, 304 MULs; ETA 0d 00:00; a625b1471a8e6481
2018-11-04 10:34:09 216091 120000 55.45%; 0.16 ms/sq, 328 MULs; ETA 0d 00:00; fe8081748d1a088e
2018-11-04 10:34:10 216091 130000 60.07%; 0.16 ms/sq, 325 MULs; ETA 0d 00:00; b38dbc5b3d73c584
2018-11-04 10:34:12 216091 140000 64.70%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; a71c266b43cc9171
2018-11-04 10:34:14 216091 150000 69.32%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; a6a2e15e86701788
2018-11-04 10:34:15 216091 OK 160000 73.94%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 6cc0b6cbc453946a (check 0.09s)
2018-11-04 10:34:17 216091 170000 78.56%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 16838b06bd004e23
2018-11-04 10:34:18 216091 180000 83.18%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 38a44921f392a2fc
2018-11-04 10:34:20 216091 190000 87.80%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 63580cfe1f80b303
2018-11-04 10:34:22 216091 200000 92.42%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; 7c3f2446e5e6fd09
2018-11-04 10:34:23 216091 210000 97.04%; 0.16 ms/sq, 0 MULs; ETA 0d 00:00; b6e9bb0a7c8ede6b
2018-11-04 10:34:24 PP 216090 / 216091, d8a71ba2415f2773 (base d8a71ba2415f2773)
2018-11-04 10:34:24 216091 OK 216400 100.00%; 0.17 ms/sq, 0 MULs; ETA 0d 00:00; e898188ce32335d4 (check 0.09s)
2018-11-04 10:34:24 {"exponent":"216091", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:3
4:24 UTC", "aid":"0", "fft-length":131072, "res64":"d8a71ba2415f2773", "b2":"130000", "base":{"b1":"10000", "bias":{"2":19}, "res64":"d8a71ba2415f2773"}}[/CODE]B2 bounds appear to be correctly reported
[CODE]{"exponent":"216091", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:34:24 UTC", "aid":"0", "fft-length":131072, "res64":"d8a71ba2415f2773", "b2":"130000", "base":{"b1":"10000", "bias":{"2":19}, "res64":"d8a71ba2415f2773"}}
{"exponent":"756839", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:36:55 UTC", "aid":"0", "fft-length":131072, "res64":"0e12589efe2be6c5", "b2":"500000", "base":{"b1":"20000", "bias":{"2":19}, "res64":"0e12589efe2be6c5"}}
{"exponent":"859433", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:39:23 UTC", "aid":"0", "fft-length":131072, "res64":"ac86e7a51cecadb0", "b2":"580000", "base":{"b1":"20000", "bias":{"2":19}, "res64":"ac86e7a51cecadb0"}}
[/CODE]m89000167 timing 4.521 ms/sq with no P-1 on RX480, Win7 x64, Adrenalin 18.10.2 driver.

preda 2018-11-04 21:13

[QUOTE=kriesel;499222]
Some radix-3 transforms, and maybe 7 if it helps speed.
6M and 12M in particular.
It's a particularly long jump between 20M and 36M, so adding 24M or 32M or both would be good.
Similarly between 40M and 72M, 48M or 64M or both.
[/QUOTE]

I just added an FFT-3 "middle" step.

kriesel 2018-11-04 23:18

V5.0-9c13870 build for Win 7 x64 OpenCL on AMD
 
1 Attachment(s)
[QUOTE=preda;499580]I just added an FFT-3 "middle" step.[/QUOTE]
[CODE]$ make openowl-win
g++ -std=c++17 -O2 -DREV=\"9c13870\" -Wall Worktodo.cpp Result.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp GCD.cpp Primes.cpp Stats.cpp state.cpp Signal.cpp -o openowl -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
Gpu.cpp: In member function 'PRPState Gpu::loadPRP(u32, u32, u32)':
Gpu.cpp:557:9: warning: unknown conversion type character 'l' in format [-Wformat=]
log("%u EE loaded: %d, B1 %u, blockSize %d, %016llx (expected %016llx)\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gpu.cpp:557:9: warning: unknown conversion type character 'l' in format [-Wformat=]
Gpu.cpp:557:9: warning: too many arguments for format [-Wformat-extra-args]
Gpu.cpp: In member function 'PRPResult Gpu::isPrimePRP(u32, const Args&, u32, u32)':
Gpu.cpp:690:11: warning: unknown conversion type character 'l' in format [-Wformat=]
log("%s %8d / %d, %016llx (base %016llx)\n", isPrime ? "PP" : "CC", kEnd, E, finalRes64, residue(base));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gpu.cpp:690:11: warning: unknown conversion type character 'l' in format [-Wformat=]
Gpu.cpp:690:11: warning: too many arguments for format [-Wformat-extra-args]
checkpoint.cpp: In member function 'void PRPState::loadInt(u32, u32, u32)':
checkpoint.cpp:167:7: warning: unknown conversion type character 'l' in format [-Wformat=]
log("%s loaded: k %u, B1 %u, block %u, res64 %016llx, stage %u, baseBits %u\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
checkpoint.cpp:167:7: warning: format '%u' expects argument of type 'unsigned int', but argument 6 has type 'u64' {aka 'long long unsigned int'} [-Wformat=]
checkpoint.cpp:167:7: warning: too many arguments for format [-Wformat-extra-args]
[/CODE]Lots of new fft lengths due to the x3: 0.4, 0.8, 1.5, 3, 6, 12, 24, 48M. It's getting to be a large list. Please change the -list fft to not require a worktodo.txt.[CODE]C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>openowl -h
2018-11-04 16:07:53 gpuowl 5.0-9c13870

Command line options:

-user <name> : specify the user name.
-cpu <name> : specify the hardware name.
-time : display kernel profiling information.
-fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner.
-carry long|short : force carry type. Short carry may be faster, but requires high bits/word.
-list fft : display a list of available FFT configurations.
-tf <bit-offset> : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth.
-device <N> : select a specific device:
0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
1 : gfx804-8x1203-@3:0.0 Radeon 550 Series

C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>openowl -list fft
2018-11-04 16:08:03 gpuowl 5.0-9c13870
2018-11-04 16:08:03 -list fft
2018-11-04 16:08:03 Can't open 'worktodo.txt' (mode 'rb')
2018-11-04 16:08:03 Bye

C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>copy ..\v3.8\worktodo.txt .
1 file(s) copied.

C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>openowl -list fft
2018-11-04 16:08:34 gpuowl 5.0-9c13870
2018-11-04 16:08:34 -list fft
2018-11-04 16:08:34 FFT maxExp W H M
2018-11-04 16:08:34 0.1M 2.6M 256 256 1
2018-11-04 16:08:34 0.2M 5.2M 256 512 1
2018-11-04 16:08:34 0.2M 5.2M 512 256 1
2018-11-04 16:08:34 0.4M 7.7M 256 256 3
2018-11-04 16:08:34 0.5M 10.2M 1024 256 1
2018-11-04 16:08:34 0.5M 10.2M 256 1024 1
2018-11-04 16:08:34 0.5M 10.2M 512 512 1
2018-11-04 16:08:34 0.6M 12.7M 256 256 5
2018-11-04 16:08:34 0.8M 15.1M 256 512 3
2018-11-04 16:08:34 0.8M 15.1M 512 256 3
2018-11-04 16:08:34 1.0M 20.0M 1024 512 1
2018-11-04 16:08:34 1.0M 20.0M 256 2048 1
2018-11-04 16:08:34 1.0M 20.0M 512 1024 1
2018-11-04 16:08:34 1.0M 20.0M 2048 256 1
2018-11-04 16:08:34 1.1M 22.5M 256 256 9
2018-11-04 16:08:34 1.2M 24.9M 256 512 5
2018-11-04 16:08:34 1.2M 24.9M 512 256 5
2018-11-04 16:08:34 1.5M 29.7M 1024 256 3
2018-11-04 16:08:34 1.5M 29.7M 256 1024 3
2018-11-04 16:08:35 1.5M 29.7M 512 512 3
2018-11-04 16:08:35 2.0M 39.3M 1024 1024 1
2018-11-04 16:08:35 2.0M 39.3M 512 2048 1
2018-11-04 16:08:35 2.0M 39.3M 2048 512 1
2018-11-04 16:08:35 2.0M 39.3M 4096 256 1
2018-11-04 16:08:35 2.2M 44.1M 256 512 9
2018-11-04 16:08:35 2.2M 44.1M 512 256 9
2018-11-04 16:08:35 2.5M 48.9M 1024 256 5
2018-11-04 16:08:35 2.5M 48.9M 256 1024 5
2018-11-04 16:08:35 2.5M 48.9M 512 512 5
2018-11-04 16:08:35 3.0M 58.4M 1024 512 3
2018-11-04 16:08:35 3.0M 58.4M 256 2048 3
2018-11-04 16:08:35 3.0M 58.4M 512 1024 3
2018-11-04 16:08:35 3.0M 58.4M 2048 256 3
2018-11-04 16:08:35 4.0M 77.3M 1024 2048 1
2018-11-04 16:08:35 4.0M 77.3M 2048 1024 1
2018-11-04 16:08:35 4.0M 77.3M 4096 512 1
2018-11-04 16:08:35 4.5M 86.7M 1024 256 9
2018-11-04 16:08:35 4.5M 86.7M 256 1024 9
2018-11-04 16:08:35 4.5M 86.7M 512 512 9
2018-11-04 16:08:35 5.0M 96.1M 1024 512 5
2018-11-04 16:08:35 5.0M 96.1M 256 2048 5
2018-11-04 16:08:35 5.0M 96.1M 512 1024 5
2018-11-04 16:08:35 5.0M 96.1M 2048 256 5
2018-11-04 16:08:35 6.0M 114.7M 1024 1024 3
2018-11-04 16:08:35 6.0M 114.7M 512 2048 3
2018-11-04 16:08:35 6.0M 114.7M 2048 512 3
2018-11-04 16:08:35 6.0M 114.7M 4096 256 3
2018-11-04 16:08:35 8.0M 151.8M 2048 2048 1
2018-11-04 16:08:35 8.0M 151.8M 4096 1024 1
2018-11-04 16:08:35 9.0M 170.3M 1024 512 9
2018-11-04 16:08:35 9.0M 170.3M 256 2048 9
2018-11-04 16:08:35 9.0M 170.3M 512 1024 9
2018-11-04 16:08:35 9.0M 170.3M 2048 256 9
2018-11-04 16:08:35 10.0M 188.7M 1024 1024 5
2018-11-04 16:08:35 10.0M 188.7M 512 2048 5
2018-11-04 16:08:35 10.0M 188.7M 2048 512 5
2018-11-04 16:08:35 10.0M 188.7M 4096 256 5
2018-11-04 16:08:35 12.0M 225.3M 1024 2048 3
2018-11-04 16:08:35 12.0M 225.3M 2048 1024 3
2018-11-04 16:08:35 12.0M 225.3M 4096 512 3
2018-11-04 16:08:35 16.0M 298.1M 4096 2048 1
2018-11-04 16:08:35 18.0M 334.3M 1024 1024 9
2018-11-04 16:08:35 18.0M 334.3M 512 2048 9
2018-11-04 16:08:35 18.0M 334.3M 2048 512 9
2018-11-04 16:08:35 18.0M 334.3M 4096 256 9
2018-11-04 16:08:35 20.0M 370.4M 1024 2048 5
2018-11-04 16:08:35 20.0M 370.4M 2048 1024 5
2018-11-04 16:08:35 20.0M 370.4M 4096 512 5
2018-11-04 16:08:35 24.0M 442.3M 2048 2048 3
2018-11-04 16:08:35 24.0M 442.3M 4096 1024 3
2018-11-04 16:08:35 36.0M 656.2M 1024 2048 9
2018-11-04 16:08:35 36.0M 656.2M 2048 1024 9
2018-11-04 16:08:35 36.0M 656.2M 4096 512 9
2018-11-04 16:08:35 40.0M 727.0M 2048 2048 5
2018-11-04 16:08:35 40.0M 727.0M 4096 1024 5
2018-11-04 16:08:35 48.0M 868.1M 4096 2048 3
2018-11-04 16:08:35 72.0M 1287.5M 2048 2048 9
2018-11-04 16:08:35 72.0M 1287.5M 4096 1024 9
2018-11-04 16:08:35 80.0M 1426.4M 4096 2048 5
2018-11-04 16:08:35 144.0M 2525.2M 4096 2048 9
[/CODE]On RX480, Adrenalin 18.10.2 driver:[CODE]...
2018-11-04 17:05:29 6972593 5960000 85.47%; 0.38 ms/sq, 0 MULs; ETA 0d 00:06; 9192684b7c1359cd
2018-11-04 17:05:33 6972593 5970000 85.62%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; c2f9539990824bd3
2018-11-04 17:05:36 6972593 5980000 85.76%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; 0e55f43c273e071f
2018-11-04 17:05:40 6972593 5990000 85.91%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; 1b238cbce00977ec
2018-11-04 17:05:44 6972593 6000000 86.05%; 0.38 ms/sq, 0 MULs; ETA 0d 00:06; 226f17e463c15782
2018-11-04 17:05:48 6972593 6010000 86.19%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; 37cb92ee936c55d2
2018-11-04 17:05:51 6972593 6020000 86.34%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; c1966294670fbb2f
2018-11-04 17:05:55 6972593 6030000 86.48%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; f03d90475b5f5672
2018-11-04 17:05:59 6972593 6040000 86.62%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; 3130d3e8833a08d3
2018-11-04 17:06:03 6972593 6050000 86.77%; 0.38 ms/sq, 0 MULs; ETA 0d 00:06; fd70900ff37a05c4
2018-11-04 17:06:06 6972593 6060000 86.91%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; a0ececf155185dba
2018-11-04 17:06:10 6972593 6070000 87.05%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; e869127794b701de
2018-11-04 17:06:14 6972593 OK 6080000 87.20%; 0.37 ms/sq, 0 MULs; ETA 0d 00:06; d11c156803bb5922 (check 0.19s)
...[/CODE][CODE]{"exponent":"6972593", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-9c13870"}, "timestamp":"2018-11-04 23:11:49 UTC", "aid":"0", "fft-length":393216, "res64":"bc16906ca9e08ff7", "b2":"1440000", "base":{"b1":"80000", "bias":{"2":19}, "res64":"bc16906ca9e08ff7"}}
[/CODE]


All times are UTC. The time now is 23:10.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.