View Single Post
Old 2018-08-11, 19:34   #565
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19·311 Posts
Default Transform selection in V3.5 OpenOwL

Code:
C:\msys64\home\ken\v35test>openowl-v35-457601f-w64 -h
gpuowl-OpenCL 3.5-457601f

Command line options:

-user <name>       : specify the user name.
-cpu  <name>       : specify the hardware name.
-time              : display kernel profiling information.
-fft <size>        : specify FFT size, such as: 5000K, 4M, +2, -1.
-block 100|200|400 : select PRP-check block size. Smaller block is slower but detects errors earlier.
-carry long|short  : force carry type. Short carry may be faster, but requires high bits/word.
-list fft          : display a list of available FFT configurations.
-device <N>        : select a specific device:
 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series
 2 : Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz-12x2394-@0:0.0
Code:
C:\msys64\home\ken\v35test>openowl-v35-457601f-w64 -list fft
gpuowl-OpenCL 3.5-457601f
   FFT  maxExp    W    H M
  0.5M   10.3M  512  512 1
  1.0M   20.3M 1024  512 1
  1.0M   20.3M  512 1024 1
  2.0M   39.8M 1024 1024 1
  2.0M   39.8M  512 2048 1
  2.0M   39.8M 2048  512 1
  2.5M   49.4M  512  512 5
  4.0M   78.0M 1024 2048 1
  4.0M   78.0M 2048 1024 1
  4.0M   78.0M 4096  512 1
  4.5M   87.5M  512  512 9
  5.0M   96.9M 1024  512 5
  5.0M   96.9M  512 1024 5
  8.0M  153.0M 2048 2048 1
  8.0M  153.0M 4096 1024 1
  9.0M  171.6M 1024  512 9
  9.0M  171.6M  512 1024 9
 10.0M  190.0M 1024 1024 5
 10.0M  190.0M  512 2048 5
 10.0M  190.0M 2048  512 5
 16.0M  300.0M 4096 2048 1
 18.0M  336.3M 1024 1024 9
 18.0M  336.3M  512 2048 9
 18.0M  336.3M 2048  512 9
 20.0M  372.5M 1024 2048 5
 20.0M  372.5M 2048 1024 5
 20.0M  372.5M 4096  512 5
 36.0M  659.0M 1024 2048 9
 36.0M  659.0M 2048 1024 9
 36.0M  659.0M 4096  512 9
 40.0M  730.0M 2048 2048 5
 40.0M  730.0M 4096 1024 5
 72.0M 1290.9M 2048 2048 9
 72.0M 1290.9M 4096 1024 9
 80.0M 1429.8M 4096 2048 5
144.0M 2527.5M 4096 2048 9

FFT 4096K: Width 1024 (256x4), Height 2048 (256x8); 18.48 bits/word
Note: using short carry kernels
 ...
But how does a user specify selection among transforms of the same size? (Most lengths above have more than one flavor.)

Presumably the program selects the minimum adequate length for speed. How does the program select one flavor of a given length versus another flavor?

Are there speed differences or other differences known?

Last fiddled with by kriesel on 2018-08-11 at 19:38
kriesel is offline   Reply With Quote