mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2018-11-04, 12:15   #859
SELROC
 

2·1,523 Posts
Default

Quote:
Originally Posted by preda View Post
0 MULs means that no GCD multiplications were done. This is normal with B1=0. When B1 != 0, sometimes a number of MULs are done, and sometimes 0 are done, depending on iteration.

When MULs==0, ms/sq is the same as the old ms/it.
But when MULs are done, the ms/sq tries to measure only the "squaring" time (the normal PRP iteration), excluding the time taken by the MULs. Thus I changed the name to ms/sq to show that it's not the same as the old ms/it.

Why I though that indicating speed this way is good: because this number, ms/sq, is relatively stable and does not depend on the number (or time taken) by the MULs that are in variable number in iteration blocks. Thus this number can be used to compare speed.

The other option would be: take total time (squares + muls) and divide it by the number of iterations in the block. This number would be larger where there are more MULs and smaller with less MULs, thus a bit more difficult to read GPU perf from it, IMO.

Thank you now it is clear.
This is a good attempt to get a stable measure, in the latest test the ms/sq time is 0.18 everywhere except for the last iteration which is 0.19 ms/sq.


Also the introduction of smaller FFT size is good. Now the program can be validated on new hardware with a quick test against the smallest prime.
  Reply With Quote
Old 2018-11-04, 15:14   #860
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default RX550 timings versus gpuowl version for m89m

RX550, AMD Adrenaline 18.10.2 driver for Win7 x64

m89000167 5000K for v2.0, 5120k for others (iterations 10000-20000)
Ver ms/it (no P-1 or TF)
2.0 17.38
3.3 16.90
3.5 16.42 <--min
3.6 16.44
3.8 16.43
3.9 17.22
4.3 17.35
4.6 17.25
4.7 NA
5.0 17.25

Note the more exactly comparable methodology in this report than the RX480 timings reported earlier, and the different location of the minimum. Difference v3.5-3.8 is +-1 least digit, so may be insignificant
kriesel is offline   Reply With Quote
Old 2018-11-04, 15:35   #861
SELROC
 

167248 Posts
Default

Quote:
Originally Posted by kriesel View Post
RX550, AMD Adrenaline 18.10.2 driver for Win7 x64

m89000167 5000K for v2.0, 5120k for others (iterations 10000-20000)
Ver ms/it (no P-1 or TF)
2.0 17.38
3.3 16.90
3.5 16.42 <--min
3.6 16.44
3.8 16.43
3.9 17.22
4.3 17.35
4.6 17.25
4.7 NA
5.0 17.25

Note the more exactly comparable methodology in this report than the RX480 timings reported earlier, and the different location of the minimum. Difference v3.5-3.8 is +-1 least digit, so may be insignificant

The numbers are not exactly comparable by now:


Code:
2018-11-04 16:24:42 gpuowl 5.0--mod
2018-11-04 16:24:42 RX580 -user selroc -cpu RX580 -device 0 
2018-11-04 16:24:42 RX580 89000167 FFT 5120K: Width 256x4, Height 64x8, Middle 5; 16.98 bits/word
2018-11-04 16:24:42 RX580 using short carry kernels
2018-11-04 16:24:43 RX580 gfx803-36x1360-@4a:0.0 Ellesmere [Radeon RX 470/480]
2018-11-04 16:24:44 RX580 OpenCL compilation in 1076 ms, with "-DEXP=89000167u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=5u  -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-04 16:24:44 RX580 89000167.owl not found, starting from the beginning.
2018-11-04 16:24:52 RX580 89000167 OK      800  0.00%; 4.44 ms/sq,    0 MULs; ETA 4d 13:50; 2744231e7051f3fe (check 1.95s)
2018-11-04 16:25:33 RX580 89000167       10000  0.01%; 4.46 ms/sq,    0 MULs; ETA 4d 14:20; 2a55d51cdf0d91cb
2018-11-04 16:26:18 RX580 89000167       20000  0.02%; 4.47 ms/sq,    0 MULs; ETA 4d 14:35; 8dcb0029e791db2a
2018-11-04 16:27:03 RX580 89000167       30000  0.03%; 4.48 ms/sq,    0 MULs; ETA 4d 14:37; 2fbd246d68f86f29
2018-11-04 16:27:47 RX580 89000167       40000  0.04%; 4.48 ms/sq,    0 MULs; ETA 4d 14:47; d85f84a6744d7090
2018-11-04 16:28:32 RX580 89000167       50000  0.06%; 4.49 ms/sq,    0 MULs; ETA 4d 14:50; afa46f7cdc5ffb7d
2018-11-04 16:29:17 RX580 89000167       60000  0.07%; 4.49 ms/sq,    0 MULs; ETA 4d 14:49; 98906e9529e4667f
2018-11-04 16:30:02 RX580 89000167       70000  0.08%; 4.49 ms/sq,    0 MULs; ETA 4d 14:49; 90b5e67934fcdcff
2018-11-04 16:30:07 RX580 Stopping, please wait..
2018-11-04 16:30:09 RX580 89000167 OK    71200  0.08%; 4.49 ms/sq,    0 MULs; ETA 4d 14:51; 14f11cfb55a43415 (check 1.96s)
2018-11-04 16:30:09 RX580 Exiting because "stop requested"
2018-11-04 16:30:09 RX580 Bye
  Reply With Quote
Old 2018-11-04, 15:58   #862
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by SELROC View Post
The numbers are not exactly comparable by now:
I claim the numbers are very comparable for the same gpu, versus gpuowl version, despite the ms/it vs. ms/sq difference in labeling, as long as all versions were run with no P-1 activity, as my recent benchmark resuls for RX550 and RX480 posted for m89000167 were. Speed difference between RX550 and RX480 or RX580 is expected to be a considerable ratio. RX550 is a low wattage slow card. Running the same iteration span for each gpuowl version made the RX550 timings more comparable than my earlier m89m benchmarking for the RX480, which was successive iteration ranges (not same iteration span for different gpuowl versions).

Last fiddled with by kriesel on 2018-11-04 at 16:03
kriesel is offline   Reply With Quote
Old 2018-11-04, 16:08   #863
SELROC
 

23×5×112 Posts
Default

Quote:
Originally Posted by kriesel View Post
I claim the numbers are very comparable for the same gpu, versus gpuowl version, despite the ms/it vs. ms/sq difference in labeling, as long as all versions were run with no P-1 activity, as my recent benchmark resuls for RX550 and RX480 posted for m89000167 were. Speed difference between RX550 and RX480 or RX580 is expected to be a considerable ratio. RX550 is a low wattage slow card. Running the same iteration span for each gpuowl version made the RX550 timings more comparable than my earlier m89m benchmarking for the RX480, which was successive iteration ranges (not same iteration span for different gpuowl versions).

Our numbers come from different gpus, different operating systems, different drivers, a difference must be accounted for.
  Reply With Quote
Old 2018-11-04, 16:27   #864
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default -list fft fails if worktodo does not exist

Code:
C:\msys64\home\ken\gpuowl-compile\v5x>openowl -h
2018-11-04 10:20:41 gpuowl 5.0-df2bdf2

Command line options:

-user <name>       : specify the user name.
-cpu  <name>       : specify the hardware name.
-time              : display kernel profiling information.
-fft <size>        : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value>     : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner.
-carry long|short  : force carry type. Short carry may be faster, but requires high bits/word.
-list fft          : display a list of available FFT configurations.
-tf <bit-offset>   : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth.
-device <N>        : select a specific device:
 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series

C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft
2018-11-04 10:20:56 gpuowl 5.0-df2bdf2
2018-11-04 10:20:56 -list fft
2018-11-04 10:20:56 Can't open 'worktodo.txt' (mode 'rb')
2018-11-04 10:20:56 Bye
A user might want to list the fft selection available, to create the worktodo list. Slight catch-22 here. I think it's reasonable to have -list fft act like -h, which runs whether there's a worktodo available or not, and terminates. -list fft is a form of help.

Last fiddled with by kriesel on 2018-11-04 at 16:45
kriesel is offline   Reply With Quote
Old 2018-11-04, 17:13   #865
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by kriesel View Post
A user might want to list the fft selection available, to create the worktodo list. Slight catch-22 here. I think it's reasonable to have -list fft act like -h, which runs whether there's a worktodo available or not, and terminates. -list fft is a form of help.
Yes it makes sense. I'll look into implementing that.
preda is offline   Reply With Quote
Old 2018-11-04, 17:16   #866
SELROC
 

32·7·59 Posts
Default

Quote:
Originally Posted by kriesel View Post
Code:
C:\msys64\home\ken\gpuowl-compile\v5x>openowl -h
2018-11-04 10:20:41 gpuowl 5.0-df2bdf2

Command line options:

-user <name>       : specify the user name.
-cpu  <name>       : specify the hardware name.
-time              : display kernel profiling information.
-fft <size>        : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value>     : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner.
-carry long|short  : force carry type. Short carry may be faster, but requires high bits/word.
-list fft          : display a list of available FFT configurations.
-tf <bit-offset>   : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth.
-device <N>        : select a specific device:
 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series

C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft
2018-11-04 10:20:56 gpuowl 5.0-df2bdf2
2018-11-04 10:20:56 -list fft
2018-11-04 10:20:56 Can't open 'worktodo.txt' (mode 'rb')
2018-11-04 10:20:56 Bye
A user might want to list the fft selection available, to create the worktodo list. Slight catch-22 here. I think it's reasonable to have -list fft act like -h, which runs whether there's a worktodo available or not, and terminates. -list fft is a form of help.

I concur.
  Reply With Quote
Old 2018-11-04, 18:10   #867
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124758 Posts
Default gpuowl v5.0-df2bdf2 build for Win7 x64

A quick check shows it will run at least a few known primes m216091 and up apparently correctly.
-list fft output:
Code:
C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft
2018-11-04 10:20:56 gpuowl 5.0-df2bdf2
2018-11-04 10:20:56 -list fft
2018-11-04 10:20:56 Can't open 'worktodo.txt' (mode 'rb')
2018-11-04 10:20:56 Bye

C:\msys64\home\ken\gpuowl-compile\v5x>openowl -list fft
2018-11-04 10:28:00 gpuowl 5.0-df2bdf2
2018-11-04 10:28:00 -list fft
2018-11-04 10:28:00    FFT  maxExp    W    H M
2018-11-04 10:28:00   0.1M    2.6M  256  256 1
2018-11-04 10:28:00   0.2M    5.2M  256  512 1
2018-11-04 10:28:00   0.2M    5.2M  512  256 1
2018-11-04 10:28:00   0.5M   10.2M 1024  256 1
2018-11-04 10:28:00   0.5M   10.2M  256 1024 1
2018-11-04 10:28:00   0.5M   10.2M  512  512 1
2018-11-04 10:28:00   0.6M   12.7M  256  256 5
2018-11-04 10:28:00   1.0M   20.0M 1024  512 1
2018-11-04 10:28:00   1.0M   20.0M  256 2048 1
2018-11-04 10:28:00   1.0M   20.0M  512 1024 1
2018-11-04 10:28:00   1.0M   20.0M 2048  256 1
2018-11-04 10:28:00   1.1M   22.5M  256  256 9
2018-11-04 10:28:00   1.2M   24.9M  256  512 5
2018-11-04 10:28:00   1.2M   24.9M  512  256 5
2018-11-04 10:28:00   2.0M   39.3M 1024 1024 1
2018-11-04 10:28:00   2.0M   39.3M  512 2048 1
2018-11-04 10:28:00   2.0M   39.3M 2048  512 1
2018-11-04 10:28:00   2.0M   39.3M 4096  256 1
2018-11-04 10:28:00   2.2M   44.1M  256  512 9
2018-11-04 10:28:00   2.2M   44.1M  512  256 9
2018-11-04 10:28:00   2.5M   48.9M 1024  256 5
2018-11-04 10:28:00   2.5M   48.9M  256 1024 5
2018-11-04 10:28:00   2.5M   48.9M  512  512 5
2018-11-04 10:28:00   4.0M   77.3M 1024 2048 1
2018-11-04 10:28:00   4.0M   77.3M 2048 1024 1
2018-11-04 10:28:00   4.0M   77.3M 4096  512 1
2018-11-04 10:28:00   4.5M   86.7M 1024  256 9
2018-11-04 10:28:00   4.5M   86.7M  256 1024 9
2018-11-04 10:28:00   4.5M   86.7M  512  512 9
2018-11-04 10:28:00   5.0M   96.1M 1024  512 5
2018-11-04 10:28:00   5.0M   96.1M  256 2048 5
2018-11-04 10:28:00   5.0M   96.1M  512 1024 5
2018-11-04 10:28:00   5.0M   96.1M 2048  256 5
2018-11-04 10:28:00   8.0M  151.8M 2048 2048 1
2018-11-04 10:28:00   8.0M  151.8M 4096 1024 1
2018-11-04 10:28:00   9.0M  170.3M 1024  512 9
2018-11-04 10:28:00   9.0M  170.3M  256 2048 9
2018-11-04 10:28:00   9.0M  170.3M  512 1024 9
2018-11-04 10:28:00   9.0M  170.3M 2048  256 9
2018-11-04 10:28:00  10.0M  188.7M 1024 1024 5
2018-11-04 10:28:00  10.0M  188.7M  512 2048 5
2018-11-04 10:28:00  10.0M  188.7M 2048  512 5
2018-11-04 10:28:00  10.0M  188.7M 4096  256 5
2018-11-04 10:28:00  16.0M  298.1M 4096 2048 1
2018-11-04 10:28:00  18.0M  334.3M 1024 1024 9
2018-11-04 10:28:00  18.0M  334.3M  512 2048 9
2018-11-04 10:28:00  18.0M  334.3M 2048  512 9
2018-11-04 10:28:00  18.0M  334.3M 4096  256 9
2018-11-04 10:28:00  20.0M  370.4M 1024 2048 5
2018-11-04 10:28:00  20.0M  370.4M 2048 1024 5
2018-11-04 10:28:00  20.0M  370.4M 4096  512 5
2018-11-04 10:28:00  36.0M  656.2M 1024 2048 9
2018-11-04 10:28:00  36.0M  656.2M 2048 1024 9
2018-11-04 10:28:00  36.0M  656.2M 4096  512 9
2018-11-04 10:28:00  40.0M  727.0M 2048 2048 5
2018-11-04 10:28:00  40.0M  727.0M 4096 1024 5
2018-11-04 10:28:00  72.0M 1287.5M 2048 2048 9
2018-11-04 10:28:00  72.0M 1287.5M 4096 1024 9
2018-11-04 10:28:00  80.0M 1426.4M 4096 2048 5
2018-11-04 10:28:00 144.0M 2525.2M 4096 2048 9
110503 fails, 132049 has errors, 216091 runs correctly to completion
(not surprising due to low # of bits/word, and not a request for yet smaller fft lengths, just observations)

Code:
C:\msys64\home\ken\gpuowl-compile\v5x>openowl
2018-11-04 10:33:00 gpuowl 5.0-df2bdf2
2018-11-04 10:33:00 110503 FFT 128K: Width 64x4, Height 64x4; 0.84 bits/word
2018-11-04 10:33:00 using long carry kernels
2018-11-04 10:33:00 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-11-04 10:33:03 OpenCL compilation in 2391 ms, with "-DEXP=110503u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u  -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-04 10:33:03 110503.owl not found, starting from the beginning.
2018-11-04 10:33:03 powerSmooth(110503, 10000) has 14484 bits
Assertion failed!

Program: C:\msys64\home\ken\gpuowl-compile\v5x\openowl.exe
File: state.cpp, Line 24

Expression: 0 <= w && w < (1 << nBits)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

C:\msys64\home\ken\gpuowl-compile\v5x>openowl
2018-11-04 10:33:18 gpuowl 5.0-df2bdf2
2018-11-04 10:33:18 132049 FFT 128K: Width 64x4, Height 64x4; 1.01 bits/word
2018-11-04 10:33:18 using long carry kernels
2018-11-04 10:33:19 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-11-04 10:33:21 OpenCL compilation in 2432 ms, with "-DEXP=132049u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u  -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-04 10:33:21 132049.owl not found, starting from the beginning.
2018-11-04 10:33:21 powerSmooth(132049, 10000) has 14484 bits
2018-11-04 10:33:23 132049 P-1    10000 69.04%; 0.16 ms/sq,    0 MULs; ETA 0d 00:00; d8645cee5574c284
2018-11-04 10:33:24 132049.owl loaded: k 0, B1 10000, block 400, res64 6379d2d731e5e48e, stage 1, baseBits 0
2018-11-04 10:33:24 132049 B1=10000 B2=70000 (effective B2=70000) selected 4142 P-1 points in 0.01s
2018-11-04 10:33:24 132049 EE      800  0.60%; 0.16 ms/sq,    1 MULs; ETA 0d 00:00; da4be711d7cf309d (check 0.08s)
2018-11-04 10:33:24 132049.owl loaded: k 0, B1 10000, block 400, res64 6379d2d731e5e48e, stage 1, baseBits 0
2018-11-04 10:33:24 132049 EE      800  0.60%; 0.28 ms/sq,    1 MULs; ETA 0d 00:01; da4be711d7cf309d (check 0.08s)
2018-11-04 10:33:24 132049.owl loaded: k 0, B1 10000, block 400, res64 6379d2d731e5e48e, stage 1, baseBits 0
2018-11-04 10:33:25 132049 EE      800  0.60%; 0.28 ms/sq,    1 MULs; ETA 0d 00:01; da4be711d7cf309d (check 0.08s)
2018-11-04 10:33:25 3 sequential errors, will stop.
2018-11-04 10:33:25 Exiting because "too many errors"
2018-11-04 10:33:25 Bye

C:\msys64\home\ken\gpuowl-compile\v5x>openowl
2018-11-04 10:33:42 gpuowl 5.0-df2bdf2
2018-11-04 10:33:42 216091 FFT 128K: Width 64x4, Height 64x4; 1.65 bits/word
2018-11-04 10:33:42 using long carry kernels
2018-11-04 10:33:43 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-11-04 10:33:46 OpenCL compilation in 2434 ms, with "-DEXP=216091u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u  -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-04 10:33:46 216091.owl not found, starting from the beginning.
2018-11-04 10:33:46 powerSmooth(216091, 10000) has 14484 bits
2018-11-04 10:33:48 216091 P-1    10000 69.04%; 0.16 ms/sq,    0 MULs; ETA 0d 00:00; 9e7518aa03950b26
2018-11-04 10:33:48 216091.owl loaded: k 0, B1 10000, block 400, res64 d8a71ba2415f2773, stage 1, baseBits 0
2018-11-04 10:33:48 216091 B1=10000 B2=130000 (effective B2=130000) selected 7611 P-1 points in 0.02s
2018-11-04 10:33:49 216091 OK      800  0.37%; 0.17 ms/sq,    1 MULs; ETA 0d 00:01; 5646ce8634d76602 (check 0.08s)
2018-11-04 10:33:49 216091 GCD no factor (0.07s)
2018-11-04 10:33:50 216091       10000  4.62%; 0.16 ms/sq,  287 MULs; ETA 0d 00:01; 6d0028f1a3744d15
2018-11-04 10:33:52 216091       20000  9.24%; 0.16 ms/sq, 1067 MULs; ETA 0d 00:01; e4c865e7e023a233
2018-11-04 10:33:54 216091       30000 13.86%; 0.16 ms/sq, 1053 MULs; ETA 0d 00:01; 52c18ca42b2e5a40
2018-11-04 10:33:55 216091       40000 18.48%; 0.16 ms/sq,  990 MULs; ETA 0d 00:01; 003716dc307b3768
2018-11-04 10:33:57 216091       50000 23.11%; 0.16 ms/sq,  882 MULs; ETA 0d 00:00; f19c985e00f9ab66
2018-11-04 10:33:59 216091       60000 27.73%; 0.16 ms/sq,  794 MULs; ETA 0d 00:00; 6679d5a415aece9e
2018-11-04 10:34:01 216091       70000 32.35%; 0.16 ms/sq,  566 MULs; ETA 0d 00:00; da83220b76e8b55b
2018-11-04 10:34:02 216091       80000 36.97%; 0.16 ms/sq,  339 MULs; ETA 0d 00:00; 3700d4ccd97a326a
2018-11-04 10:34:04 216091       90000 41.59%; 0.16 ms/sq,  346 MULs; ETA 0d 00:00; c0472c1f976aa2c1
2018-11-04 10:34:05 216091      100000 46.21%; 0.16 ms/sq,  329 MULs; ETA 0d 00:00; d3f402b7fb9adb65
2018-11-04 10:34:07 216091      110000 50.83%; 0.16 ms/sq,  304 MULs; ETA 0d 00:00; a625b1471a8e6481
2018-11-04 10:34:09 216091      120000 55.45%; 0.16 ms/sq,  328 MULs; ETA 0d 00:00; fe8081748d1a088e
2018-11-04 10:34:10 216091      130000 60.07%; 0.16 ms/sq,  325 MULs; ETA 0d 00:00; b38dbc5b3d73c584
2018-11-04 10:34:12 216091      140000 64.70%; 0.16 ms/sq,    0 MULs; ETA 0d 00:00; a71c266b43cc9171
2018-11-04 10:34:14 216091      150000 69.32%; 0.16 ms/sq,    0 MULs; ETA 0d 00:00; a6a2e15e86701788
2018-11-04 10:34:15 216091 OK   160000 73.94%; 0.16 ms/sq,    0 MULs; ETA 0d 00:00; 6cc0b6cbc453946a (check 0.09s)
2018-11-04 10:34:17 216091      170000 78.56%; 0.16 ms/sq,    0 MULs; ETA 0d 00:00; 16838b06bd004e23
2018-11-04 10:34:18 216091      180000 83.18%; 0.16 ms/sq,    0 MULs; ETA 0d 00:00; 38a44921f392a2fc
2018-11-04 10:34:20 216091      190000 87.80%; 0.16 ms/sq,    0 MULs; ETA 0d 00:00; 63580cfe1f80b303
2018-11-04 10:34:22 216091      200000 92.42%; 0.16 ms/sq,    0 MULs; ETA 0d 00:00; 7c3f2446e5e6fd09
2018-11-04 10:34:23 216091      210000 97.04%; 0.16 ms/sq,    0 MULs; ETA 0d 00:00; b6e9bb0a7c8ede6b
2018-11-04 10:34:24 PP   216090 / 216091, d8a71ba2415f2773 (base d8a71ba2415f2773)
2018-11-04 10:34:24 216091 OK   216400 100.00%; 0.17 ms/sq,    0 MULs; ETA 0d 00:00; e898188ce32335d4 (check 0.09s)
2018-11-04 10:34:24 {"exponent":"216091", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:3
4:24 UTC", "aid":"0", "fft-length":131072, "res64":"d8a71ba2415f2773", "b2":"130000", "base":{"b1":"10000", "bias":{"2":19}, "res64":"d8a71ba2415f2773"}}
B2 bounds appear to be correctly reported
Code:
{"exponent":"216091", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:34:24 UTC", "aid":"0", "fft-length":131072, "res64":"d8a71ba2415f2773", "b2":"130000", "base":{"b1":"10000", "bias":{"2":19}, "res64":"d8a71ba2415f2773"}}
{"exponent":"756839", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:36:55 UTC", "aid":"0", "fft-length":131072, "res64":"0e12589efe2be6c5", "b2":"500000", "base":{"b1":"20000", "bias":{"2":19}, "res64":"0e12589efe2be6c5"}}
{"exponent":"859433", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-df2bdf2"}, "timestamp":"2018-11-04 16:39:23 UTC", "aid":"0", "fft-length":131072, "res64":"ac86e7a51cecadb0", "b2":"580000", "base":{"b1":"20000", "bias":{"2":19}, "res64":"ac86e7a51cecadb0"}}
m89000167 timing 4.521 ms/sq with no P-1 on RX480, Win7 x64, Adrenalin 18.10.2 driver.
Attached Files
File Type: 7z gpuowl-v50-df2bdf2.7z (373.0 KB, 96 views)
kriesel is offline   Reply With Quote
Old 2018-11-04, 21:13   #868
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

137110 Posts
Default

Quote:
Originally Posted by kriesel View Post
Some radix-3 transforms, and maybe 7 if it helps speed.
6M and 12M in particular.
It's a particularly long jump between 20M and 36M, so adding 24M or 32M or both would be good.
Similarly between 40M and 72M, 48M or 64M or both.
I just added an FFT-3 "middle" step.
preda is offline   Reply With Quote
Old 2018-11-04, 23:18   #869
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124758 Posts
Default V5.0-9c13870 build for Win 7 x64 OpenCL on AMD

Quote:
Originally Posted by preda View Post
I just added an FFT-3 "middle" step.
Code:
$ make openowl-win
g++ -std=c++17 -O2 -DREV=\"9c13870\" -Wall Worktodo.cpp Result.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp GCD.cpp Primes.cpp Stats.cpp state.cpp Signal.cpp -o openowl -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
Gpu.cpp: In member function 'PRPState Gpu::loadPRP(u32, u32, u32)':
Gpu.cpp:557:9: warning: unknown conversion type character 'l' in format [-Wformat=]
     log("%u EE loaded: %d, B1 %u, blockSize %d, %016llx (expected %016llx)\n",
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gpu.cpp:557:9: warning: unknown conversion type character 'l' in format [-Wformat=]
Gpu.cpp:557:9: warning: too many arguments for format [-Wformat-extra-args]
Gpu.cpp: In member function 'PRPResult Gpu::isPrimePRP(u32, const Args&, u32, u32)':
Gpu.cpp:690:11: warning: unknown conversion type character 'l' in format [-Wformat=]
       log("%s %8d / %d, %016llx (base %016llx)\n", isPrime ? "PP" : "CC", kEnd, E, finalRes64, residue(base));
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gpu.cpp:690:11: warning: unknown conversion type character 'l' in format [-Wformat=]
Gpu.cpp:690:11: warning: too many arguments for format [-Wformat-extra-args]
checkpoint.cpp: In member function 'void PRPState::loadInt(u32, u32, u32)':
checkpoint.cpp:167:7: warning: unknown conversion type character 'l' in format [-Wformat=]
   log("%s loaded: k %u, B1 %u, block %u, res64 %016llx, stage %u, baseBits %u\n",
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
checkpoint.cpp:167:7: warning: format '%u' expects argument of type 'unsigned int', but argument 6 has type 'u64' {aka 'long long unsigned int'} [-Wformat=]
checkpoint.cpp:167:7: warning: too many arguments for format [-Wformat-extra-args]
Lots of new fft lengths due to the x3: 0.4, 0.8, 1.5, 3, 6, 12, 24, 48M. It's getting to be a large list. Please change the -list fft to not require a worktodo.txt.
Code:
C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>openowl -h
2018-11-04 16:07:53 gpuowl 5.0-9c13870

Command line options:

-user <name>       : specify the user name.
-cpu  <name>       : specify the hardware name.
-time              : display kernel profiling information.
-fft <size>        : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value>     : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner.
-carry long|short  : force carry type. Short carry may be faster, but requires high bits/word.
-list fft          : display a list of available FFT configurations.
-tf <bit-offset>   : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth.
-device <N>        : select a specific device:
 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series

C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>openowl -list fft
2018-11-04 16:08:03 gpuowl 5.0-9c13870
2018-11-04 16:08:03 -list fft
2018-11-04 16:08:03 Can't open 'worktodo.txt' (mode 'rb')
2018-11-04 16:08:03 Bye

C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>copy ..\v3.8\worktodo.txt .
        1 file(s) copied.

C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>openowl -list fft
2018-11-04 16:08:34 gpuowl 5.0-9c13870
2018-11-04 16:08:34 -list fft
2018-11-04 16:08:34    FFT  maxExp    W    H M
2018-11-04 16:08:34   0.1M    2.6M  256  256 1
2018-11-04 16:08:34   0.2M    5.2M  256  512 1
2018-11-04 16:08:34   0.2M    5.2M  512  256 1
2018-11-04 16:08:34   0.4M    7.7M  256  256 3
2018-11-04 16:08:34   0.5M   10.2M 1024  256 1
2018-11-04 16:08:34   0.5M   10.2M  256 1024 1
2018-11-04 16:08:34   0.5M   10.2M  512  512 1
2018-11-04 16:08:34   0.6M   12.7M  256  256 5
2018-11-04 16:08:34   0.8M   15.1M  256  512 3
2018-11-04 16:08:34   0.8M   15.1M  512  256 3
2018-11-04 16:08:34   1.0M   20.0M 1024  512 1
2018-11-04 16:08:34   1.0M   20.0M  256 2048 1
2018-11-04 16:08:34   1.0M   20.0M  512 1024 1
2018-11-04 16:08:34   1.0M   20.0M 2048  256 1
2018-11-04 16:08:34   1.1M   22.5M  256  256 9
2018-11-04 16:08:34   1.2M   24.9M  256  512 5
2018-11-04 16:08:34   1.2M   24.9M  512  256 5
2018-11-04 16:08:34   1.5M   29.7M 1024  256 3
2018-11-04 16:08:34   1.5M   29.7M  256 1024 3
2018-11-04 16:08:35   1.5M   29.7M  512  512 3
2018-11-04 16:08:35   2.0M   39.3M 1024 1024 1
2018-11-04 16:08:35   2.0M   39.3M  512 2048 1
2018-11-04 16:08:35   2.0M   39.3M 2048  512 1
2018-11-04 16:08:35   2.0M   39.3M 4096  256 1
2018-11-04 16:08:35   2.2M   44.1M  256  512 9
2018-11-04 16:08:35   2.2M   44.1M  512  256 9
2018-11-04 16:08:35   2.5M   48.9M 1024  256 5
2018-11-04 16:08:35   2.5M   48.9M  256 1024 5
2018-11-04 16:08:35   2.5M   48.9M  512  512 5
2018-11-04 16:08:35   3.0M   58.4M 1024  512 3
2018-11-04 16:08:35   3.0M   58.4M  256 2048 3
2018-11-04 16:08:35   3.0M   58.4M  512 1024 3
2018-11-04 16:08:35   3.0M   58.4M 2048  256 3
2018-11-04 16:08:35   4.0M   77.3M 1024 2048 1
2018-11-04 16:08:35   4.0M   77.3M 2048 1024 1
2018-11-04 16:08:35   4.0M   77.3M 4096  512 1
2018-11-04 16:08:35   4.5M   86.7M 1024  256 9
2018-11-04 16:08:35   4.5M   86.7M  256 1024 9
2018-11-04 16:08:35   4.5M   86.7M  512  512 9
2018-11-04 16:08:35   5.0M   96.1M 1024  512 5
2018-11-04 16:08:35   5.0M   96.1M  256 2048 5
2018-11-04 16:08:35   5.0M   96.1M  512 1024 5
2018-11-04 16:08:35   5.0M   96.1M 2048  256 5
2018-11-04 16:08:35   6.0M  114.7M 1024 1024 3
2018-11-04 16:08:35   6.0M  114.7M  512 2048 3
2018-11-04 16:08:35   6.0M  114.7M 2048  512 3
2018-11-04 16:08:35   6.0M  114.7M 4096  256 3
2018-11-04 16:08:35   8.0M  151.8M 2048 2048 1
2018-11-04 16:08:35   8.0M  151.8M 4096 1024 1
2018-11-04 16:08:35   9.0M  170.3M 1024  512 9
2018-11-04 16:08:35   9.0M  170.3M  256 2048 9
2018-11-04 16:08:35   9.0M  170.3M  512 1024 9
2018-11-04 16:08:35   9.0M  170.3M 2048  256 9
2018-11-04 16:08:35  10.0M  188.7M 1024 1024 5
2018-11-04 16:08:35  10.0M  188.7M  512 2048 5
2018-11-04 16:08:35  10.0M  188.7M 2048  512 5
2018-11-04 16:08:35  10.0M  188.7M 4096  256 5
2018-11-04 16:08:35  12.0M  225.3M 1024 2048 3
2018-11-04 16:08:35  12.0M  225.3M 2048 1024 3
2018-11-04 16:08:35  12.0M  225.3M 4096  512 3
2018-11-04 16:08:35  16.0M  298.1M 4096 2048 1
2018-11-04 16:08:35  18.0M  334.3M 1024 1024 9
2018-11-04 16:08:35  18.0M  334.3M  512 2048 9
2018-11-04 16:08:35  18.0M  334.3M 2048  512 9
2018-11-04 16:08:35  18.0M  334.3M 4096  256 9
2018-11-04 16:08:35  20.0M  370.4M 1024 2048 5
2018-11-04 16:08:35  20.0M  370.4M 2048 1024 5
2018-11-04 16:08:35  20.0M  370.4M 4096  512 5
2018-11-04 16:08:35  24.0M  442.3M 2048 2048 3
2018-11-04 16:08:35  24.0M  442.3M 4096 1024 3
2018-11-04 16:08:35  36.0M  656.2M 1024 2048 9
2018-11-04 16:08:35  36.0M  656.2M 2048 1024 9
2018-11-04 16:08:35  36.0M  656.2M 4096  512 9
2018-11-04 16:08:35  40.0M  727.0M 2048 2048 5
2018-11-04 16:08:35  40.0M  727.0M 4096 1024 5
2018-11-04 16:08:35  48.0M  868.1M 4096 2048 3
2018-11-04 16:08:35  72.0M 1287.5M 2048 2048 9
2018-11-04 16:08:35  72.0M 1287.5M 4096 1024 9
2018-11-04 16:08:35  80.0M 1426.4M 4096 2048 5
2018-11-04 16:08:35 144.0M 2525.2M 4096 2048 9
On RX480, Adrenalin 18.10.2 driver:
Code:
...
2018-11-04 17:05:29 6972593     5960000 85.47%; 0.38 ms/sq,    0 MULs; ETA 0d 00:06; 9192684b7c1359cd
2018-11-04 17:05:33 6972593     5970000 85.62%; 0.37 ms/sq,    0 MULs; ETA 0d 00:06; c2f9539990824bd3
2018-11-04 17:05:36 6972593     5980000 85.76%; 0.37 ms/sq,    0 MULs; ETA 0d 00:06; 0e55f43c273e071f
2018-11-04 17:05:40 6972593     5990000 85.91%; 0.37 ms/sq,    0 MULs; ETA 0d 00:06; 1b238cbce00977ec
2018-11-04 17:05:44 6972593     6000000 86.05%; 0.38 ms/sq,    0 MULs; ETA 0d 00:06; 226f17e463c15782
2018-11-04 17:05:48 6972593     6010000 86.19%; 0.37 ms/sq,    0 MULs; ETA 0d 00:06; 37cb92ee936c55d2
2018-11-04 17:05:51 6972593     6020000 86.34%; 0.37 ms/sq,    0 MULs; ETA 0d 00:06; c1966294670fbb2f
2018-11-04 17:05:55 6972593     6030000 86.48%; 0.37 ms/sq,    0 MULs; ETA 0d 00:06; f03d90475b5f5672
2018-11-04 17:05:59 6972593     6040000 86.62%; 0.37 ms/sq,    0 MULs; ETA 0d 00:06; 3130d3e8833a08d3
2018-11-04 17:06:03 6972593     6050000 86.77%; 0.38 ms/sq,    0 MULs; ETA 0d 00:06; fd70900ff37a05c4
2018-11-04 17:06:06 6972593     6060000 86.91%; 0.37 ms/sq,    0 MULs; ETA 0d 00:06; a0ececf155185dba
2018-11-04 17:06:10 6972593     6070000 87.05%; 0.37 ms/sq,    0 MULs; ETA 0d 00:06; e869127794b701de
2018-11-04 17:06:14 6972593 OK  6080000 87.20%; 0.37 ms/sq,    0 MULs; ETA 0d 00:06; d11c156803bb5922 (check 0.19s)
...
Code:
{"exponent":"6972593", "worktype":"PRP,P-1", "status":"P", "program":{"name":"gpuowl", "version":"5.0-9c13870"}, "timestamp":"2018-11-04 23:11:49 UTC", "aid":"0", "fft-length":393216, "res64":"bc16906ca9e08ff7", "b2":"1440000", "base":{"b1":"80000", "bias":{"2":19}, "res64":"bc16906ca9e08ff7"}}
Attached Files
File Type: 7z gpuowl-v5.0-9c13870.7z (373.1 KB, 74 views)
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 06:56.


Fri Aug 6 06:56:32 UTC 2021 up 14 days, 1:25, 1 user, load averages: 2.68, 2.65, 2.70

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.