mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2018-01-21 20:53

[QUOTE=kriesel;478003]The behavior I see in my logs is it starts with 1k iterations.
I have sequences that differ near start;
iteration counts differ from line to line, by 1k,4k,5k,10k,20k...,50k,...,100k
[CODE]OK 0 / 76812401 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [2018-01-19 13:02:44 Central Standard Time]
OK 1000 / 76812401 [ 0.00%], 11.96 ms/it; ETA 10d 15:06; aadc1acf24bf7d60 [2018-01-19 13:03:04 Central Standard Time]
OK 5000 / 76812401 [ 0.01%], 11.94 ms/it; ETA 10d 14:44; 3db0edb3db578456 [2018-01-19 13:03:59 Central Standard Time]
[/CODE]Is it possible to save the ending Gerbicz step size from one run to begin the next? If not, would you consider it as an option, for hardware and installations that are stable? There's a slight execution speed advantage, and it reduces screen clutter. (My test run has worked its way up to 200k in under 24 hours, with no errors flagged yet, but if halted and restarted will reset to 1k interval and start the slow climb again from there. It should be very stable, as it's a brand new GPU on a fresh Windows install, patched to current, then gpuowl installed and run.)[/QUOTE]

Let me explain my thinking here:
- after startup, the user expects something to come up on screen, to get feedback that it's working.
- after startup, a check should be done quickly, to validate early that it's not all broken.
- after startup, the "gerbicz memory" should be reset because the hardware situation may be different, such as: different GPU, different GPU setup (clocks), different fan setup, case, etc.

The ramp-up from 1K to 10K is fast (assuming no errors). The overhead difference between 10K and 200K is minor, and the log size is not big enough over that ramp-up to be a big problem.

Now, let's see the opposite where there is memory between start-ups:
- the users starts with the GPU "too hot", and gathers let's say 10 gerbicz errors during the night.
- the user realizes the problem, fixes the cooling, and restarts. But now it won't ramp up (only very slowly) because of the memory of those past 10 errors.

preda 2018-01-21 21:18

[QUOTE=kriesel;477927](Later...) Whoa, what happened in that middle line? Over a minute per iteration computed (>5000 times the preceding and following). Momentarily projecting runtime of over 150 years!

[CODE]OK 80000 / 76812401 [ 0.10%], 11.99 ms/it; ETA 10d 15:27; 6ee0f8a8a97d7812 [2018-01-19 13:19:36 Central Standard Time]
OK 100000 / 76812401 [ 0.13%], 63362.75 ms/it; ETA 56258d 15:28; 3fb24c04ec7569db [2018-01-19 13:23:43 Central Standard Time]
OK 150000 / 76812401 [ 0.20%], 11.99 ms/it; ETA 10d 15:25; 10bf91703f69c302 [2018-01-19 13:33:50 Central Standard Time][/CODE][/QUOTE]

I attempted a fix for the time-per-iteration overflow, committed.

preda 2018-01-21 21:19

[QUOTE=kriesel;477923]Hi,
I recently pulled down the Windows executable zip file for gpuowl 1.9 from [URL]http://www.mersenneforum.org/showpost.php?p=471663&postcount=226[/URL], unzipped, read its README.md, which says in part:[...]
Please update it to cover gpuowl 1.x also.

Thanks![/QUOTE]

I updated the README to some degree, let me know what else is missing.

preda 2018-01-21 21:49

[QUOTE=kriesel;477927] logging to gpuowl.log appears to not be occurring.[/QUOTE]

I tried to fix the log flush on windows (pending verification).

On the topic of output redirection, GpuOwl does write to stdout (the normal standard output), so that's not the reason for any trouble there. I don't know more though.

kriesel 2018-01-21 22:46

[QUOTE=preda;478057]Let me explain my thinking here:
- after startup, the user expects something to come up on screen, to get feedback that it's working.
- after startup, a check should be done quickly, to validate early that it's not all broken.
- after startup, the "gerbicz memory" should be reset because the hardware situation may be different, such as: different GPU, different GPU setup (clocks), different fan setup, case, etc.

The ramp-up from 1K to 10K is fast (assuming no errors). The overhead difference between 10K and 200K is minor, and the log size is not big enough over that ramp-up to be a big problem.

Now, let's see the opposite where there is memory between start-ups:
- the users starts with the GPU "too hot", and gathers let's say 10 gerbicz errors during the night.
- the user realizes the problem, fixes the cooling, and restarts. But now it won't ramp up (only very slowly) because of the memory of those past 10 errors.[/QUOTE]

Thanks for the info.
I still think having the option is useful, particularly in the normal case of a familiar user who does not have a thermal problem. Some of us are Windows users and routinely applying patches, or have other gpus that go awol until a reboot, or have unstable power and no UPS on some systems, but otherwise reliable and stable hardware and stable gpuowl installation.
I understand you have other priorities on your to-do list. (~6M fft seems like a high one to me.)

Re your 3 posts following the one quoted above:
Excellent and thank you, times 3.
If someone would build and post a Windows binary I'll give those changes a try.

xx005fs 2018-01-22 02:31

GV100
 
How fast is the new GV100 chip from Nvidia going to do in LL tests, because it's got full double precision (around 7 Tflop on the V100) and insane memory bandwidth. I suppose that it will be the only card on the market that's able to do under 1ms/it? has anyone ever tested with a similar Quadro card such as GP100 which has similar specs?

kriesel 2018-01-22 17:30

readme suggestions
 
[QUOTE=preda;478061]I updated the README to some degree, let me know what else is missing.[/QUOTE]

Thanks for getting to that.

I think it's helpful to explicitly state at the top of a readme which version number it applies to and was written about. (Then later if it lags behind releases, the reader has been warned at the start of reading.)

I feel it would be useful to expand about what exponent ranges are practical, and what are recommended, versus fft length and transform type; probably a little table for at least each of the more useful transforms.

I think the statement "GpuOwL best handles exponents 70M - 78M." in the usage section applies specifically to 4M fft length DP transform.

Made-up example for illustration only follows (don't use these numbers! I'm showing here, intentionally composite min and max values as a sign they're not valid values.). Min and max express what is possible to run accurately with the program, while recommended range subsets that, according to what is more efficiently run with some other program, due to GpuOwL currently implementing power-of-two fft lengths only.

DP transform
fftlength | min p | max p | recommended p range
2M | 19531255 | 38999995 | 35M-39M
4M | 38999915 | 78000005 | 70M-78M
8M | 77991235 | 155000015 | 140M-155M

M61 transform:
fftlength | min p | max p | recommended p range
2M | ? | ? | ?M-?M
4M | ? | ? | ?M-?M
8M | ? | ? | ?M-?M

Also something more specific about the Gerbicz check intervals seems to me a useful addition. Pasting in part of one of your previous explanatory forum posts perhaps with a little editing.

On Windows, -h did not work for me in the gpuowl-v1.9-94aa58f build. (--help did) Has that been changed? Tested on Windows?

Including what you wrote in a recent forum post about the various transform types would also be useful.

And, some sample output of the beginning of a normal run could be helpful.

kracker 2018-01-22 19:12

1 Attachment(s)
Latest build for windows as of right now (commit 74f1a38)

kriesel 2018-01-23 17:11

please add a requirements section to the readme
 
Feel free to use, edit, or replace any of the following

Requirements

OpenCL installed, at least version x.x.
One or more units of OpenCL compatible hardware, with corresponding driver(s) supporting OpenCL of the required level, such as certain AMD GPUs, Intel IGPs, or CPUs. (Do NVIDIA GPUs work?)

Discrete (add-in card) GPUs give better performance because of their dedicated memory. Integrated graphics processors use memory shared with the CPU(s) and will affect performance of CPU applications.

In case of difficulty, it's recommended to verify the successful installation of OpenCL and compatible drivers with a utility, such as clinfo, oclDeviceQuery.exe, or the advanced tab of GPU-Z.

An indication of GPU ram requirements vs. transform type and fft length would be useful, perhaps as an additional column in the little tables I proposed earlier. I'm seeing only about 290MB occupied during 4M fft length -DP transform on an RX550. That may scale to roughly 1.3GB for a future 16M fft implementation, 2.7GB? for 32M, which would not fit on that 2GB card. (It would probably also run way too slowly for that card to be practical, at roughly estimated 2-3 years per exponent.)

kriesel 2018-01-23 17:43

Does gpuowl run on nvidia gtx10x0? Crashes and other surprises
 
gpuowl attempts failed on three different model cards. This is the v1.9-74f1a38 version on Windows 7 Pro 64bit.

gpuowl.log:
[CODE]gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 1070, 15x1708MHz


OpenCL compilation in 1794 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:43:26 Central Standard Time]
Starting at iteration 0
error -5 (carryConv)
[/CODE]gpuowl.log again:[CODE]
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 1050 Ti, 6x1468MHz


OpenCL compilation in 46 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:45:56 Central Standard Time]
Starting at iteration 0
error -5 (carryConv)
[/CODE]console running gpuowl via a tiny batch script for uniformity and time stamping:[CODE]c:\Users\Ken\Documents\gpuowl>echo starting gpuowl at Tue 01/23/2018 10:43:04.71 1>>gpuowlrun.txt

c:\Users\Ken\Documents\gpuowl>gpuowl -user kriesel -cpu condorette-gtx1070 -device 0
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 1070, 15x1708MHz


OpenCL compilation in 1794 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -

DLOG_NWORDS=22u -DFP_DP=1
PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:43:26 Central Standard Time]
Starting at iteration 0
error -5 (carryConv)
Assertion failed!

Program: c:\Users\Ken\Documents\gpuowl\gpuowl.exe
File: clwrap.h, Line 230

Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str())

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

c:\Users\Ken\Documents\gpuowl>echo exiting gpuowl at Tue 01/23/2018 10:43:43.67 1>>gpuowlrun.txt
[/CODE]This caused GPU-z to be unable to access the gtx1070 gpu's sensors subsequently, and apparently crashed the driver or card. Both CUDALucas processes on the card also terminated. They were automatically restarted by their batch wrappers.

Retrying gpuowl failed in the same way. But on a different gpu without any change to the batch script. Perhaps OpenCL doesn't reconnect to a card or driver that's restarted?

Note that this dropout may mean that gpuowl runs may sometimes occur on gpus other than the units intended by the owner. A feature I've proposed for CUDALucas is device confirmation, to guard against such occurrences. Commodity gpus don't have queryable serial numbers, so other parameters, such as model, bios version, pcie address, etc have been considered. Device number in the CUDA or OpenCL sense is not reliable. I've seen this device dropout push a primality test from a reliable gpu to a less reliable one, and cause multiple tasks to land on a single gpu.

console again:
[CODE]
c:\Users\Ken\Documents\gpuowl>echo starting gpuowl at Tue 01/23/2018 10:45:53.77 1>>gpuowlrun.txt

c:\Users\Ken\Documents\gpuowl>gpuowl -user kriesel -cpu condorette-gtx1070 -device 0
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 1050 Ti, 6x1468MHz


OpenCL compilation in 46 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -

DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:45:56 Central Standard Time]
Starting at iteration 0
error -5 (carryConv)
Assertion failed!

Program: c:\Users\Ken\Documents\gpuowl\gpuowl.exe
File: clwrap.h, Line 230

Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str())

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

c:\Users\Ken\Documents\gpuowl>echo exiting gpuowl at Tue 01/23/2018 10:46:09.12 1>>gpuowlrun.txt
[/CODE]For some reason (perhaps one CUDALucas instance instead of two?) the GTX1050Ti's CUDALucas and GPU-Z were not affected like the GTX1070's were.

OpenCL seems confused about the memory capacity of the 3GB GTX1050Ti, reporting 4GB. At one point GPU-Z was reporting 3.8GB in use.

Killing and restarting GPU-Z for the GTX1070 and checking OpenCL on its advanced tab results in "OpenCL Device not found" on that GPU. GPU-Z also did not recover access to the card's sensors.

Console again; for good measure I stopped all gpu tasks on the system, before trying on the Quadro 2000.
[CODE]c:\Users\Ken\Documents\gpuowl>echo starting gpuowl at Tue 01/23/2018 11:34:37.89 1>>gpuowlrun.txt

c:\Users\Ken\Documents\gpuowl>gpuowl -user kriesel -cpu condorette-quadro200 -device 1
gpuOwL v1.9- GPU Mersenne primality checker
Quadro 2000, 4x1251MHz


OpenCL compilation in 1762 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 11:34:40 Central Standard Time]
Starting at iteration 0
error -5 (carryConv)
Assertion failed!

Program: c:\Users\Ken\Documents\gpuowl\gpuowl.exe
File: clwrap.h, Line 230

Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str())

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

c:\Users\Ken\Documents\gpuowl>echo exiting gpuowl at Tue 01/23/2018 11:35:16.57 1>>gpuowlrun.txt[/CODE]For some reason the GTX1050Ti and Quadro 2000 GPU-Z instances were unaffected; sensors and opencl parameters still displayed. I was able to make multiple run attempts on the 1050Ti.

Oddly, at one point GPU-Z indicated about 3.8GB of memory in use on the 3GB GTX1050Ti.

Time for a system restart to clean things up and resume CUDA.

kriesel 2018-01-23 17:53

ocldevicequery
 
Here's the OclDeviceQuery.exe output for the system in the preceding post, obtained a few days before the GpuOwLexperiment. A rerun just now, before restarting the system, omits the GTX1070 entirely.

[CODE]C:\Users\Ken\Documents\oclDeviceQuery.exe Starting...

OpenCL SW Info:

CL_PLATFORM_NAME: NVIDIA CUDA
CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 8.0.0
OpenCL SDK Revision: 7027912


OpenCL Device Info:

3 devices found supporting OpenCL:

---------------------------------
Device GeForce GTX 1070
---------------------------------
CL_DEVICE_NAME: GeForce GTX 1070
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 378.66
CL_DEVICE_VERSION: OpenCL 1.2 CUDA
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 15
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1708 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 2048 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 8192 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 256
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 16384
2D_MAX_HEIGHT 32768
3D_MAX_WIDTH 16384
3D_MAX_HEIGHT 16384
3D_MAX_DEPTH 16384

CL_DEVICE_EXTENSIONS: cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_fp64
cl_khr_byte_addressable_store
cl_khr_icd
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
cl_nv_d3d9_sharing
cl_nv_d3d10_sharing
cl_khr_d3d10_sharing
cl_nv_d3d11_sharing


CL_DEVICE_COMPUTE_CAPABILITY_NV: 6.1
NUMBER OF MULTIPROCESSORS: 15
NUMBER OF CUDA CORES: 4294967281
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 65536
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_TRUE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1


---------------------------------
Device Quadro 2000
---------------------------------
CL_DEVICE_NAME: Quadro 2000
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 378.66
CL_DEVICE_VERSION: OpenCL 1.1 CUDA
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.1
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 4
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1251 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 256 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 1024 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 16384
2D_MAX_HEIGHT 16384
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048

CL_DEVICE_EXTENSIONS: cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_fp64
cl_khr_byte_addressable_store
cl_khr_icd
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
cl_nv_d3d9_sharing
cl_nv_d3d10_sharing
cl_khr_d3d10_sharing
cl_nv_d3d11_sharing
cl_nv_copy_opts


CL_DEVICE_COMPUTE_CAPABILITY_NV: 2.1
NUMBER OF MULTIPROCESSORS: 4
NUMBER OF CUDA CORES: 192
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 32768
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_TRUE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1


---------------------------------
Device GeForce GTX 1050 Ti
---------------------------------
CL_DEVICE_NAME: GeForce GTX 1050 Ti
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 378.66
CL_DEVICE_VERSION: OpenCL 1.2 CUDA
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 6
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1468 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1024 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 4096 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 256
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 16384
2D_MAX_HEIGHT 32768
3D_MAX_WIDTH 16384
3D_MAX_HEIGHT 16384
3D_MAX_DEPTH 16384

CL_DEVICE_EXTENSIONS: cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_fp64
cl_khr_byte_addressable_store
cl_khr_icd
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
cl_nv_d3d9_sharing
cl_nv_d3d10_sharing
cl_khr_d3d10_sharing
cl_nv_d3d11_sharing


CL_DEVICE_COMPUTE_CAPABILITY_NV: 6.1
NUMBER OF MULTIPROCESSORS: 6
NUMBER OF CUDA CORES: 4294967290
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 65536
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_TRUE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1


---------------------------------
2D Image Formats Supported (75)
---------------------------------
# Channel Order Channel Type

1 CL_R CL_FLOAT
2 CL_R CL_HALF_FLOAT
3 CL_R CL_UNORM_INT8
4 CL_R CL_UNORM_INT16
5 CL_R CL_SNORM_INT16
6 CL_R CL_SIGNED_INT8
7 CL_R CL_SIGNED_INT16
8 CL_R CL_SIGNED_INT32
9 CL_R CL_UNSIGNED_INT8
10 CL_R CL_UNSIGNED_INT16
11 CL_R CL_UNSIGNED_INT32
12 CL_A CL_FLOAT
13 CL_A CL_HALF_FLOAT
14 CL_A CL_UNORM_INT8
15 CL_A CL_UNORM_INT16
16 CL_A CL_SNORM_INT16
17 CL_A CL_SIGNED_INT8
18 CL_A CL_SIGNED_INT16
19 CL_A CL_SIGNED_INT32
20 CL_A CL_UNSIGNED_INT8
21 CL_A CL_UNSIGNED_INT16
22 CL_A CL_UNSIGNED_INT32
23 CL_RG CL_FLOAT
24 CL_RG CL_HALF_FLOAT
25 CL_RG CL_UNORM_INT8
26 CL_RG CL_UNORM_INT16
27 CL_RG CL_SNORM_INT16
28 CL_RG CL_SIGNED_INT8
29 CL_RG CL_SIGNED_INT16
30 CL_RG CL_SIGNED_INT32
31 CL_RG CL_UNSIGNED_INT8
32 CL_RG CL_UNSIGNED_INT16
33 CL_RG CL_UNSIGNED_INT32
34 CL_RA CL_FLOAT
35 CL_RA CL_HALF_FLOAT
36 CL_RA CL_UNORM_INT8
37 CL_RA CL_UNORM_INT16
38 CL_RA CL_SNORM_INT16
39 CL_RA CL_SIGNED_INT8
40 CL_RA CL_SIGNED_INT16
41 CL_RA CL_SIGNED_INT32
42 CL_RA CL_UNSIGNED_INT8
43 CL_RA CL_UNSIGNED_INT16
44 CL_RA CL_UNSIGNED_INT32
45 CL_RGBA CL_FLOAT
46 CL_RGBA CL_HALF_FLOAT
47 CL_RGBA CL_UNORM_INT8
48 CL_RGBA CL_UNORM_INT16
49 CL_RGBA CL_SNORM_INT16
50 CL_RGBA CL_SIGNED_INT8
51 CL_RGBA CL_SIGNED_INT16
52 CL_RGBA CL_SIGNED_INT32
53 CL_RGBA CL_UNSIGNED_INT8
54 CL_RGBA CL_UNSIGNED_INT16
55 CL_RGBA CL_UNSIGNED_INT32
56 CL_BGRA CL_UNORM_INT8
57 CL_BGRA CL_SIGNED_INT8
58 CL_BGRA CL_UNSIGNED_INT8
59 CL_ARGB CL_UNORM_INT8
60 CL_ARGB CL_SIGNED_INT8
61 CL_ARGB CL_UNSIGNED_INT8
62 CL_INTENSITY CL_FLOAT
63 CL_INTENSITY CL_HALF_FLOAT
64 CL_INTENSITY CL_UNORM_INT8
65 CL_INTENSITY CL_UNORM_INT16
66 CL_INTENSITY CL_SNORM_INT16
67 CL_LUMINANCE CL_FLOAT
68 CL_LUMINANCE CL_HALF_FLOAT
69 CL_LUMINANCE CL_UNORM_INT8
70 CL_LUMINANCE CL_UNORM_INT16
71 CL_LUMINANCE CL_SNORM_INT16
72 CL_BGRA CL_SNORM_INT8
73 CL_BGRA CL_SNORM_INT16
74 CL_ARGB CL_SNORM_INT8
75 CL_ARGB CL_SNORM_INT16

---------------------------------
3D Image Formats Supported (75)
---------------------------------
# Channel Order Channel Type

1 CL_R CL_FLOAT
2 CL_R CL_HALF_FLOAT
3 CL_R CL_UNORM_INT8
4 CL_R CL_UNORM_INT16
5 CL_R CL_SNORM_INT16
6 CL_R CL_SIGNED_INT8
7 CL_R CL_SIGNED_INT16
8 CL_R CL_SIGNED_INT32
9 CL_R CL_UNSIGNED_INT8
10 CL_R CL_UNSIGNED_INT16
11 CL_R CL_UNSIGNED_INT32
12 CL_A CL_FLOAT
13 CL_A CL_HALF_FLOAT
14 CL_A CL_UNORM_INT8
15 CL_A CL_UNORM_INT16
16 CL_A CL_SNORM_INT16
17 CL_A CL_SIGNED_INT8
18 CL_A CL_SIGNED_INT16
19 CL_A CL_SIGNED_INT32
20 CL_A CL_UNSIGNED_INT8
21 CL_A CL_UNSIGNED_INT16
22 CL_A CL_UNSIGNED_INT32
23 CL_RG CL_FLOAT
24 CL_RG CL_HALF_FLOAT
25 CL_RG CL_UNORM_INT8
26 CL_RG CL_UNORM_INT16
27 CL_RG CL_SNORM_INT16
28 CL_RG CL_SIGNED_INT8
29 CL_RG CL_SIGNED_INT16
30 CL_RG CL_SIGNED_INT32
31 CL_RG CL_UNSIGNED_INT8
32 CL_RG CL_UNSIGNED_INT16
33 CL_RG CL_UNSIGNED_INT32
34 CL_RA CL_FLOAT
35 CL_RA CL_HALF_FLOAT
36 CL_RA CL_UNORM_INT8
37 CL_RA CL_UNORM_INT16
38 CL_RA CL_SNORM_INT16
39 CL_RA CL_SIGNED_INT8
40 CL_RA CL_SIGNED_INT16
41 CL_RA CL_SIGNED_INT32
42 CL_RA CL_UNSIGNED_INT8
43 CL_RA CL_UNSIGNED_INT16
44 CL_RA CL_UNSIGNED_INT32
45 CL_RGBA CL_FLOAT
46 CL_RGBA CL_HALF_FLOAT
47 CL_RGBA CL_UNORM_INT8
48 CL_RGBA CL_UNORM_INT16
49 CL_RGBA CL_SNORM_INT16
50 CL_RGBA CL_SIGNED_INT8
51 CL_RGBA CL_SIGNED_INT16
52 CL_RGBA CL_SIGNED_INT32
53 CL_RGBA CL_UNSIGNED_INT8
54 CL_RGBA CL_UNSIGNED_INT16
55 CL_RGBA CL_UNSIGNED_INT32
56 CL_BGRA CL_UNORM_INT8
57 CL_BGRA CL_SIGNED_INT8
58 CL_BGRA CL_UNSIGNED_INT8
59 CL_ARGB CL_UNORM_INT8
60 CL_ARGB CL_SIGNED_INT8
61 CL_ARGB CL_UNSIGNED_INT8
62 CL_INTENSITY CL_FLOAT
63 CL_INTENSITY CL_HALF_FLOAT
64 CL_INTENSITY CL_UNORM_INT8
65 CL_INTENSITY CL_UNORM_INT16
66 CL_INTENSITY CL_SNORM_INT16
67 CL_LUMINANCE CL_FLOAT
68 CL_LUMINANCE CL_HALF_FLOAT
69 CL_LUMINANCE CL_UNORM_INT8
70 CL_LUMINANCE CL_UNORM_INT16
71 CL_LUMINANCE CL_SNORM_INT16
72 CL_BGRA CL_SNORM_INT8
73 CL_BGRA CL_SNORM_INT16
74 CL_ARGB CL_SNORM_INT8
75 CL_ARGB CL_SNORM_INT16

oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.2 CUDA 8.0.0, SDK Revision = 7027912, NumDevs = 3, Device = GeForce GTX 1070, Device = Quadro 2000, Device = GeForce GTX 1050 Ti

System Info:

Local Time/Date = 18:48:31, 1/19/2018
CPU Arch: 0
CPU Level: 6
# of CPU processors: 12
Windows Build: 7601
Windows Ver: 6.1 (Windows Vista / Windows 7)
[/CODE]


All times are UTC. The time now is 22:22.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.