mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2018-01-21, 20:53   #265
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

55B16 Posts
Default

Quote:
Originally Posted by kriesel View Post
The behavior I see in my logs is it starts with 1k iterations.
I have sequences that differ near start;
iteration counts differ from line to line, by 1k,4k,5k,10k,20k...,50k,...,100k
Code:
OK        0 / 76812401 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [2018-01-19 13:02:44 Central Standard Time]
OK     1000 / 76812401 [ 0.00%], 11.96 ms/it; ETA 10d 15:06; aadc1acf24bf7d60 [2018-01-19 13:03:04 Central Standard Time]
OK     5000 / 76812401 [ 0.01%], 11.94 ms/it; ETA 10d 14:44; 3db0edb3db578456 [2018-01-19 13:03:59 Central Standard Time]
Is it possible to save the ending Gerbicz step size from one run to begin the next? If not, would you consider it as an option, for hardware and installations that are stable? There's a slight execution speed advantage, and it reduces screen clutter. (My test run has worked its way up to 200k in under 24 hours, with no errors flagged yet, but if halted and restarted will reset to 1k interval and start the slow climb again from there. It should be very stable, as it's a brand new GPU on a fresh Windows install, patched to current, then gpuowl installed and run.)
Let me explain my thinking here:
- after startup, the user expects something to come up on screen, to get feedback that it's working.
- after startup, a check should be done quickly, to validate early that it's not all broken.
- after startup, the "gerbicz memory" should be reset because the hardware situation may be different, such as: different GPU, different GPU setup (clocks), different fan setup, case, etc.

The ramp-up from 1K to 10K is fast (assuming no errors). The overhead difference between 10K and 200K is minor, and the log size is not big enough over that ramp-up to be a big problem.

Now, let's see the opposite where there is memory between start-ups:
- the users starts with the GPU "too hot", and gathers let's say 10 gerbicz errors during the night.
- the user realizes the problem, fixes the cooling, and restarts. But now it won't ramp up (only very slowly) because of the memory of those past 10 errors.
preda is offline   Reply With Quote
Old 2018-01-21, 21:18   #266
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by kriesel View Post
(Later...) Whoa, what happened in that middle line? Over a minute per iteration computed (>5000 times the preceding and following). Momentarily projecting runtime of over 150 years!

Code:
OK    80000 / 76812401 [ 0.10%], 11.99 ms/it; ETA 10d 15:27; 6ee0f8a8a97d7812 [2018-01-19 13:19:36 Central Standard Time]
OK   100000 / 76812401 [ 0.13%], 63362.75 ms/it; ETA 56258d 15:28; 3fb24c04ec7569db [2018-01-19 13:23:43 Central Standard Time]
OK   150000 / 76812401 [ 0.20%], 11.99 ms/it; ETA 10d 15:25; 10bf91703f69c302 [2018-01-19 13:33:50 Central Standard Time]
I attempted a fix for the time-per-iteration overflow, committed.
preda is offline   Reply With Quote
Old 2018-01-21, 21:19   #267
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101010110112 Posts
Default

Quote:
Originally Posted by kriesel View Post
Hi,
I recently pulled down the Windows executable zip file for gpuowl 1.9 from http://www.mersenneforum.org/showpos...&postcount=226, unzipped, read its README.md, which says in part:[...]
Please update it to cover gpuowl 1.x also.

Thanks!
I updated the README to some degree, let me know what else is missing.
preda is offline   Reply With Quote
Old 2018-01-21, 21:49   #268
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by kriesel View Post
logging to gpuowl.log appears to not be occurring.
I tried to fix the log flush on windows (pending verification).

On the topic of output redirection, GpuOwl does write to stdout (the normal standard output), so that's not the reason for any trouble there. I don't know more though.
preda is offline   Reply With Quote
Old 2018-01-21, 22:46   #269
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×13×139 Posts
Default

Quote:
Originally Posted by preda View Post
Let me explain my thinking here:
- after startup, the user expects something to come up on screen, to get feedback that it's working.
- after startup, a check should be done quickly, to validate early that it's not all broken.
- after startup, the "gerbicz memory" should be reset because the hardware situation may be different, such as: different GPU, different GPU setup (clocks), different fan setup, case, etc.

The ramp-up from 1K to 10K is fast (assuming no errors). The overhead difference between 10K and 200K is minor, and the log size is not big enough over that ramp-up to be a big problem.

Now, let's see the opposite where there is memory between start-ups:
- the users starts with the GPU "too hot", and gathers let's say 10 gerbicz errors during the night.
- the user realizes the problem, fixes the cooling, and restarts. But now it won't ramp up (only very slowly) because of the memory of those past 10 errors.
Thanks for the info.
I still think having the option is useful, particularly in the normal case of a familiar user who does not have a thermal problem. Some of us are Windows users and routinely applying patches, or have other gpus that go awol until a reboot, or have unstable power and no UPS on some systems, but otherwise reliable and stable hardware and stable gpuowl installation.
I understand you have other priorities on your to-do list. (~6M fft seems like a high one to me.)

Re your 3 posts following the one quoted above:
Excellent and thank you, times 3.
If someone would build and post a Windows binary I'll give those changes a try.

Last fiddled with by kriesel on 2018-01-21 at 22:50
kriesel is online now   Reply With Quote
Old 2018-01-22, 02:31   #270
xx005fs
 
"Eric"
Jan 2018
USA

22×53 Posts
Default GV100

How fast is the new GV100 chip from Nvidia going to do in LL tests, because it's got full double precision (around 7 Tflop on the V100) and insane memory bandwidth. I suppose that it will be the only card on the market that's able to do under 1ms/it? has anyone ever tested with a similar Quadro card such as GP100 which has similar specs?
xx005fs is offline   Reply With Quote
Old 2018-01-22, 17:30   #271
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001011012 Posts
Default readme suggestions

Quote:
Originally Posted by preda View Post
I updated the README to some degree, let me know what else is missing.
Thanks for getting to that.

I think it's helpful to explicitly state at the top of a readme which version number it applies to and was written about. (Then later if it lags behind releases, the reader has been warned at the start of reading.)

I feel it would be useful to expand about what exponent ranges are practical, and what are recommended, versus fft length and transform type; probably a little table for at least each of the more useful transforms.

I think the statement "GpuOwL best handles exponents 70M - 78M." in the usage section applies specifically to 4M fft length DP transform.

Made-up example for illustration only follows (don't use these numbers! I'm showing here, intentionally composite min and max values as a sign they're not valid values.). Min and max express what is possible to run accurately with the program, while recommended range subsets that, according to what is more efficiently run with some other program, due to GpuOwL currently implementing power-of-two fft lengths only.

DP transform
fftlength | min p | max p | recommended p range
2M | 19531255 | 38999995 | 35M-39M
4M | 38999915 | 78000005 | 70M-78M
8M | 77991235 | 155000015 | 140M-155M

M61 transform:
fftlength | min p | max p | recommended p range
2M | ? | ? | ?M-?M
4M | ? | ? | ?M-?M
8M | ? | ? | ?M-?M

Also something more specific about the Gerbicz check intervals seems to me a useful addition. Pasting in part of one of your previous explanatory forum posts perhaps with a little editing.

On Windows, -h did not work for me in the gpuowl-v1.9-94aa58f build. (--help did) Has that been changed? Tested on Windows?

Including what you wrote in a recent forum post about the various transform types would also be useful.

And, some sample output of the beginning of a normal run could be helpful.
kriesel is online now   Reply With Quote
Old 2018-01-22, 19:12   #272
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Latest build for windows as of right now (commit 74f1a38)
Attached Files
File Type: zip gpuowl-74f1a38.zip (148.4 KB, 102 views)
kracker is offline   Reply With Quote
Old 2018-01-23, 17:11   #273
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

152D16 Posts
Default please add a requirements section to the readme

Feel free to use, edit, or replace any of the following

Requirements

OpenCL installed, at least version x.x.
One or more units of OpenCL compatible hardware, with corresponding driver(s) supporting OpenCL of the required level, such as certain AMD GPUs, Intel IGPs, or CPUs. (Do NVIDIA GPUs work?)

Discrete (add-in card) GPUs give better performance because of their dedicated memory. Integrated graphics processors use memory shared with the CPU(s) and will affect performance of CPU applications.

In case of difficulty, it's recommended to verify the successful installation of OpenCL and compatible drivers with a utility, such as clinfo, oclDeviceQuery.exe, or the advanced tab of GPU-Z.

An indication of GPU ram requirements vs. transform type and fft length would be useful, perhaps as an additional column in the little tables I proposed earlier. I'm seeing only about 290MB occupied during 4M fft length -DP transform on an RX550. That may scale to roughly 1.3GB for a future 16M fft implementation, 2.7GB? for 32M, which would not fit on that 2GB card. (It would probably also run way too slowly for that card to be practical, at roughly estimated 2-3 years per exponent.)
kriesel is online now   Reply With Quote
Old 2018-01-23, 17:43   #274
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

542110 Posts
Default Does gpuowl run on nvidia gtx10x0? Crashes and other surprises

gpuowl attempts failed on three different model cards. This is the v1.9-74f1a38 version on Windows 7 Pro 64bit.

gpuowl.log:
Code:
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 1070, 15x1708MHz


OpenCL compilation in 1794 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:43:26 Central Standard Time]
Starting at iteration 0
error -5 (carryConv)
gpuowl.log again:
Code:
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 1050 Ti,  6x1468MHz


OpenCL compilation in 46 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:45:56 Central Standard Time]
Starting at iteration 0
error -5 (carryConv)
console running gpuowl via a tiny batch script for uniformity and time stamping:
Code:
c:\Users\Ken\Documents\gpuowl>echo starting gpuowl at Tue 01/23/2018 10:43:04.71  1>>gpuowlrun.txt

c:\Users\Ken\Documents\gpuowl>gpuowl -user kriesel -cpu condorette-gtx1070 -device 0
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 1070, 15x1708MHz


OpenCL compilation in 1794 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -

DLOG_NWORDS=22u -DFP_DP=1
PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:43:26 Central Standard Time]
Starting at iteration 0
error -5 (carryConv)
Assertion failed!

Program: c:\Users\Ken\Documents\gpuowl\gpuowl.exe
File: clwrap.h, Line 230

Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str())

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

c:\Users\Ken\Documents\gpuowl>echo exiting gpuowl at Tue 01/23/2018 10:43:43.67  1>>gpuowlrun.txt
This caused GPU-z to be unable to access the gtx1070 gpu's sensors subsequently, and apparently crashed the driver or card. Both CUDALucas processes on the card also terminated. They were automatically restarted by their batch wrappers.

Retrying gpuowl failed in the same way. But on a different gpu without any change to the batch script. Perhaps OpenCL doesn't reconnect to a card or driver that's restarted?

Note that this dropout may mean that gpuowl runs may sometimes occur on gpus other than the units intended by the owner. A feature I've proposed for CUDALucas is device confirmation, to guard against such occurrences. Commodity gpus don't have queryable serial numbers, so other parameters, such as model, bios version, pcie address, etc have been considered. Device number in the CUDA or OpenCL sense is not reliable. I've seen this device dropout push a primality test from a reliable gpu to a less reliable one, and cause multiple tasks to land on a single gpu.

console again:
Code:
c:\Users\Ken\Documents\gpuowl>echo starting gpuowl at Tue 01/23/2018 10:45:53.77  1>>gpuowlrun.txt

c:\Users\Ken\Documents\gpuowl>gpuowl -user kriesel -cpu condorette-gtx1070 -device 0
gpuOwL v1.9- GPU Mersenne primality checker
GeForce GTX 1050 Ti,  6x1468MHz


OpenCL compilation in 46 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -

DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:45:56 Central Standard Time]
Starting at iteration 0
error -5 (carryConv)
Assertion failed!

Program: c:\Users\Ken\Documents\gpuowl\gpuowl.exe
File: clwrap.h, Line 230

Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str())

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

c:\Users\Ken\Documents\gpuowl>echo exiting gpuowl at Tue 01/23/2018 10:46:09.12  1>>gpuowlrun.txt
For some reason (perhaps one CUDALucas instance instead of two?) the GTX1050Ti's CUDALucas and GPU-Z were not affected like the GTX1070's were.

OpenCL seems confused about the memory capacity of the 3GB GTX1050Ti, reporting 4GB. At one point GPU-Z was reporting 3.8GB in use.

Killing and restarting GPU-Z for the GTX1070 and checking OpenCL on its advanced tab results in "OpenCL Device not found" on that GPU. GPU-Z also did not recover access to the card's sensors.

Console again; for good measure I stopped all gpu tasks on the system, before trying on the Quadro 2000.
Code:
c:\Users\Ken\Documents\gpuowl>echo starting gpuowl at Tue 01/23/2018 11:34:37.89  1>>gpuowlrun.txt

c:\Users\Ken\Documents\gpuowl>gpuowl -user kriesel -cpu condorette-quadro200 -device 1
gpuOwL v1.9- GPU Mersenne primality checker
Quadro 2000,  4x1251MHz


OpenCL compilation in 1762 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 "
PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 11:34:40 Central Standard Time]
Starting at iteration 0
error -5 (carryConv)
Assertion failed!

Program: c:\Users\Ken\Documents\gpuowl\gpuowl.exe
File: clwrap.h, Line 230

Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str())

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

c:\Users\Ken\Documents\gpuowl>echo exiting gpuowl at Tue 01/23/2018 11:35:16.57  1>>gpuowlrun.txt
For some reason the GTX1050Ti and Quadro 2000 GPU-Z instances were unaffected; sensors and opencl parameters still displayed. I was able to make multiple run attempts on the 1050Ti.

Oddly, at one point GPU-Z indicated about 3.8GB of memory in use on the 3GB GTX1050Ti.

Time for a system restart to clean things up and resume CUDA.

Last fiddled with by kriesel on 2018-01-23 at 17:55
kriesel is online now   Reply With Quote
Old 2018-01-23, 17:53   #275
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·13·139 Posts
Default ocldevicequery

Here's the OclDeviceQuery.exe output for the system in the preceding post, obtained a few days before the GpuOwLexperiment. A rerun just now, before restarting the system, omits the GTX1070 entirely.

Code:
C:\Users\Ken\Documents\oclDeviceQuery.exe Starting...

OpenCL SW Info:

 CL_PLATFORM_NAME:     NVIDIA CUDA
 CL_PLATFORM_VERSION:     OpenCL 1.2 CUDA 8.0.0
 OpenCL SDK Revision:     7027912


OpenCL Device Info:

 3 devices found supporting OpenCL:

 ---------------------------------
 Device GeForce GTX 1070
 ---------------------------------
  CL_DEVICE_NAME:             GeForce GTX 1070
  CL_DEVICE_VENDOR:             NVIDIA Corporation
  CL_DRIVER_VERSION:             378.66
  CL_DEVICE_VERSION:             OpenCL 1.2 CUDA
  CL_DEVICE_OPENCL_C_VERSION:         OpenCL C 1.2 
  CL_DEVICE_TYPE:            CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:        15
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:    3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:    1024 / 1024 / 64 
  CL_DEVICE_MAX_WORK_GROUP_SIZE:    1024
  CL_DEVICE_MAX_CLOCK_FREQUENCY:    1708 MHz
  CL_DEVICE_ADDRESS_BITS:        32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:        2048 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:        8192 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:    no
  CL_DEVICE_LOCAL_MEM_TYPE:        local
  CL_DEVICE_LOCAL_MEM_SIZE:        48 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:    64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:        CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
  CL_DEVICE_QUEUE_PROPERTIES:        CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:        1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:    256
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:    16
  CL_DEVICE_SINGLE_FP_CONFIG:        denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma 

  CL_DEVICE_IMAGE <dim>            2D_MAX_WIDTH     16384
                    2D_MAX_HEIGHT     32768
                    3D_MAX_WIDTH     16384
                    3D_MAX_HEIGHT     16384
                    3D_MAX_DEPTH     16384

  CL_DEVICE_EXTENSIONS:            cl_khr_global_int32_base_atomics
                    cl_khr_global_int32_extended_atomics
                    cl_khr_local_int32_base_atomics
                    cl_khr_local_int32_extended_atomics
                    cl_khr_fp64
                    cl_khr_byte_addressable_store
                    cl_khr_icd
                    cl_khr_gl_sharing
                    cl_nv_compiler_options
                    cl_nv_device_attribute_query
                    cl_nv_pragma_unroll
                    cl_nv_d3d9_sharing
                    cl_nv_d3d10_sharing
                    cl_khr_d3d10_sharing
                    cl_nv_d3d11_sharing


  CL_DEVICE_COMPUTE_CAPABILITY_NV:    6.1
  NUMBER OF MULTIPROCESSORS:        15
  NUMBER OF CUDA CORES:            4294967281
  CL_DEVICE_REGISTERS_PER_BLOCK_NV:    65536
  CL_DEVICE_WARP_SIZE_NV:        32
  CL_DEVICE_GPU_OVERLAP_NV:        CL_TRUE
  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:    CL_TRUE
  CL_DEVICE_INTEGRATED_MEMORY_NV:    CL_FALSE
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>    CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1


 ---------------------------------
 Device Quadro 2000
 ---------------------------------
  CL_DEVICE_NAME:             Quadro 2000
  CL_DEVICE_VENDOR:             NVIDIA Corporation
  CL_DRIVER_VERSION:             378.66
  CL_DEVICE_VERSION:             OpenCL 1.1 CUDA
  CL_DEVICE_OPENCL_C_VERSION:         OpenCL C 1.1 
  CL_DEVICE_TYPE:            CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:        4
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:    3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:    1024 / 1024 / 64 
  CL_DEVICE_MAX_WORK_GROUP_SIZE:    1024
  CL_DEVICE_MAX_CLOCK_FREQUENCY:    1251 MHz
  CL_DEVICE_ADDRESS_BITS:        32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:        256 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:        1024 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:    no
  CL_DEVICE_LOCAL_MEM_TYPE:        local
  CL_DEVICE_LOCAL_MEM_SIZE:        48 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:    64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:        CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
  CL_DEVICE_QUEUE_PROPERTIES:        CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:        1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:    128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:    8
  CL_DEVICE_SINGLE_FP_CONFIG:        denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma 

  CL_DEVICE_IMAGE <dim>            2D_MAX_WIDTH     16384
                    2D_MAX_HEIGHT     16384
                    3D_MAX_WIDTH     2048
                    3D_MAX_HEIGHT     2048
                    3D_MAX_DEPTH     2048

  CL_DEVICE_EXTENSIONS:            cl_khr_global_int32_base_atomics
                    cl_khr_global_int32_extended_atomics
                    cl_khr_local_int32_base_atomics
                    cl_khr_local_int32_extended_atomics
                    cl_khr_fp64
                    cl_khr_byte_addressable_store
                    cl_khr_icd
                    cl_khr_gl_sharing
                    cl_nv_compiler_options
                    cl_nv_device_attribute_query
                    cl_nv_pragma_unroll
                    cl_nv_d3d9_sharing
                    cl_nv_d3d10_sharing
                    cl_khr_d3d10_sharing
                    cl_nv_d3d11_sharing
                    cl_nv_copy_opts


  CL_DEVICE_COMPUTE_CAPABILITY_NV:    2.1
  NUMBER OF MULTIPROCESSORS:        4
  NUMBER OF CUDA CORES:            192
  CL_DEVICE_REGISTERS_PER_BLOCK_NV:    32768
  CL_DEVICE_WARP_SIZE_NV:        32
  CL_DEVICE_GPU_OVERLAP_NV:        CL_TRUE
  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:    CL_TRUE
  CL_DEVICE_INTEGRATED_MEMORY_NV:    CL_FALSE
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>    CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1


 ---------------------------------
 Device GeForce GTX 1050 Ti
 ---------------------------------
  CL_DEVICE_NAME:             GeForce GTX 1050 Ti
  CL_DEVICE_VENDOR:             NVIDIA Corporation
  CL_DRIVER_VERSION:             378.66
  CL_DEVICE_VERSION:             OpenCL 1.2 CUDA
  CL_DEVICE_OPENCL_C_VERSION:         OpenCL C 1.2 
  CL_DEVICE_TYPE:            CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:        6
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:    3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:    1024 / 1024 / 64 
  CL_DEVICE_MAX_WORK_GROUP_SIZE:    1024
  CL_DEVICE_MAX_CLOCK_FREQUENCY:    1468 MHz
  CL_DEVICE_ADDRESS_BITS:        32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:        1024 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:        4096 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:    no
  CL_DEVICE_LOCAL_MEM_TYPE:        local
  CL_DEVICE_LOCAL_MEM_SIZE:        48 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:    64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:        CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
  CL_DEVICE_QUEUE_PROPERTIES:        CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:        1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:    256
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:    16
  CL_DEVICE_SINGLE_FP_CONFIG:        denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma 

  CL_DEVICE_IMAGE <dim>            2D_MAX_WIDTH     16384
                    2D_MAX_HEIGHT     32768
                    3D_MAX_WIDTH     16384
                    3D_MAX_HEIGHT     16384
                    3D_MAX_DEPTH     16384

  CL_DEVICE_EXTENSIONS:            cl_khr_global_int32_base_atomics
                    cl_khr_global_int32_extended_atomics
                    cl_khr_local_int32_base_atomics
                    cl_khr_local_int32_extended_atomics
                    cl_khr_fp64
                    cl_khr_byte_addressable_store
                    cl_khr_icd
                    cl_khr_gl_sharing
                    cl_nv_compiler_options
                    cl_nv_device_attribute_query
                    cl_nv_pragma_unroll
                    cl_nv_d3d9_sharing
                    cl_nv_d3d10_sharing
                    cl_khr_d3d10_sharing
                    cl_nv_d3d11_sharing


  CL_DEVICE_COMPUTE_CAPABILITY_NV:    6.1
  NUMBER OF MULTIPROCESSORS:        6
  NUMBER OF CUDA CORES:            4294967290
  CL_DEVICE_REGISTERS_PER_BLOCK_NV:    65536
  CL_DEVICE_WARP_SIZE_NV:        32
  CL_DEVICE_GPU_OVERLAP_NV:        CL_TRUE
  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:    CL_TRUE
  CL_DEVICE_INTEGRATED_MEMORY_NV:    CL_FALSE
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>    CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1


  ---------------------------------
  2D Image Formats Supported (75)
  ---------------------------------
  #     Channel Order   Channel Type          

  1     CL_R            CL_FLOAT              
  2     CL_R            CL_HALF_FLOAT         
  3     CL_R            CL_UNORM_INT8         
  4     CL_R            CL_UNORM_INT16        
  5     CL_R            CL_SNORM_INT16        
  6     CL_R            CL_SIGNED_INT8        
  7     CL_R            CL_SIGNED_INT16       
  8     CL_R            CL_SIGNED_INT32       
  9     CL_R            CL_UNSIGNED_INT8      
  10    CL_R            CL_UNSIGNED_INT16     
  11    CL_R            CL_UNSIGNED_INT32     
  12    CL_A            CL_FLOAT              
  13    CL_A            CL_HALF_FLOAT         
  14    CL_A            CL_UNORM_INT8         
  15    CL_A            CL_UNORM_INT16        
  16    CL_A            CL_SNORM_INT16        
  17    CL_A            CL_SIGNED_INT8        
  18    CL_A            CL_SIGNED_INT16       
  19    CL_A            CL_SIGNED_INT32       
  20    CL_A            CL_UNSIGNED_INT8      
  21    CL_A            CL_UNSIGNED_INT16     
  22    CL_A            CL_UNSIGNED_INT32     
  23    CL_RG           CL_FLOAT              
  24    CL_RG           CL_HALF_FLOAT         
  25    CL_RG           CL_UNORM_INT8         
  26    CL_RG           CL_UNORM_INT16        
  27    CL_RG           CL_SNORM_INT16        
  28    CL_RG           CL_SIGNED_INT8        
  29    CL_RG           CL_SIGNED_INT16       
  30    CL_RG           CL_SIGNED_INT32       
  31    CL_RG           CL_UNSIGNED_INT8      
  32    CL_RG           CL_UNSIGNED_INT16     
  33    CL_RG           CL_UNSIGNED_INT32     
  34    CL_RA           CL_FLOAT              
  35    CL_RA           CL_HALF_FLOAT         
  36    CL_RA           CL_UNORM_INT8         
  37    CL_RA           CL_UNORM_INT16        
  38    CL_RA           CL_SNORM_INT16        
  39    CL_RA           CL_SIGNED_INT8        
  40    CL_RA           CL_SIGNED_INT16       
  41    CL_RA           CL_SIGNED_INT32       
  42    CL_RA           CL_UNSIGNED_INT8      
  43    CL_RA           CL_UNSIGNED_INT16     
  44    CL_RA           CL_UNSIGNED_INT32     
  45    CL_RGBA         CL_FLOAT              
  46    CL_RGBA         CL_HALF_FLOAT         
  47    CL_RGBA         CL_UNORM_INT8         
  48    CL_RGBA         CL_UNORM_INT16        
  49    CL_RGBA         CL_SNORM_INT16        
  50    CL_RGBA         CL_SIGNED_INT8        
  51    CL_RGBA         CL_SIGNED_INT16       
  52    CL_RGBA         CL_SIGNED_INT32       
  53    CL_RGBA         CL_UNSIGNED_INT8      
  54    CL_RGBA         CL_UNSIGNED_INT16     
  55    CL_RGBA         CL_UNSIGNED_INT32     
  56    CL_BGRA         CL_UNORM_INT8         
  57    CL_BGRA         CL_SIGNED_INT8        
  58    CL_BGRA         CL_UNSIGNED_INT8      
  59    CL_ARGB         CL_UNORM_INT8         
  60    CL_ARGB         CL_SIGNED_INT8        
  61    CL_ARGB         CL_UNSIGNED_INT8      
  62    CL_INTENSITY    CL_FLOAT              
  63    CL_INTENSITY    CL_HALF_FLOAT         
  64    CL_INTENSITY    CL_UNORM_INT8         
  65    CL_INTENSITY    CL_UNORM_INT16        
  66    CL_INTENSITY    CL_SNORM_INT16        
  67    CL_LUMINANCE    CL_FLOAT              
  68    CL_LUMINANCE    CL_HALF_FLOAT         
  69    CL_LUMINANCE    CL_UNORM_INT8         
  70    CL_LUMINANCE    CL_UNORM_INT16        
  71    CL_LUMINANCE    CL_SNORM_INT16        
  72    CL_BGRA         CL_SNORM_INT8         
  73    CL_BGRA         CL_SNORM_INT16        
  74    CL_ARGB         CL_SNORM_INT8         
  75    CL_ARGB         CL_SNORM_INT16        

  ---------------------------------
  3D Image Formats Supported (75)
  ---------------------------------
  #     Channel Order   Channel Type          

  1     CL_R            CL_FLOAT              
  2     CL_R            CL_HALF_FLOAT         
  3     CL_R            CL_UNORM_INT8         
  4     CL_R            CL_UNORM_INT16        
  5     CL_R            CL_SNORM_INT16        
  6     CL_R            CL_SIGNED_INT8        
  7     CL_R            CL_SIGNED_INT16       
  8     CL_R            CL_SIGNED_INT32       
  9     CL_R            CL_UNSIGNED_INT8      
  10    CL_R            CL_UNSIGNED_INT16     
  11    CL_R            CL_UNSIGNED_INT32     
  12    CL_A            CL_FLOAT              
  13    CL_A            CL_HALF_FLOAT         
  14    CL_A            CL_UNORM_INT8         
  15    CL_A            CL_UNORM_INT16        
  16    CL_A            CL_SNORM_INT16        
  17    CL_A            CL_SIGNED_INT8        
  18    CL_A            CL_SIGNED_INT16       
  19    CL_A            CL_SIGNED_INT32       
  20    CL_A            CL_UNSIGNED_INT8      
  21    CL_A            CL_UNSIGNED_INT16     
  22    CL_A            CL_UNSIGNED_INT32     
  23    CL_RG           CL_FLOAT              
  24    CL_RG           CL_HALF_FLOAT         
  25    CL_RG           CL_UNORM_INT8         
  26    CL_RG           CL_UNORM_INT16        
  27    CL_RG           CL_SNORM_INT16        
  28    CL_RG           CL_SIGNED_INT8        
  29    CL_RG           CL_SIGNED_INT16       
  30    CL_RG           CL_SIGNED_INT32       
  31    CL_RG           CL_UNSIGNED_INT8      
  32    CL_RG           CL_UNSIGNED_INT16     
  33    CL_RG           CL_UNSIGNED_INT32     
  34    CL_RA           CL_FLOAT              
  35    CL_RA           CL_HALF_FLOAT         
  36    CL_RA           CL_UNORM_INT8         
  37    CL_RA           CL_UNORM_INT16        
  38    CL_RA           CL_SNORM_INT16        
  39    CL_RA           CL_SIGNED_INT8        
  40    CL_RA           CL_SIGNED_INT16       
  41    CL_RA           CL_SIGNED_INT32       
  42    CL_RA           CL_UNSIGNED_INT8      
  43    CL_RA           CL_UNSIGNED_INT16     
  44    CL_RA           CL_UNSIGNED_INT32     
  45    CL_RGBA         CL_FLOAT              
  46    CL_RGBA         CL_HALF_FLOAT         
  47    CL_RGBA         CL_UNORM_INT8         
  48    CL_RGBA         CL_UNORM_INT16        
  49    CL_RGBA         CL_SNORM_INT16        
  50    CL_RGBA         CL_SIGNED_INT8        
  51    CL_RGBA         CL_SIGNED_INT16       
  52    CL_RGBA         CL_SIGNED_INT32       
  53    CL_RGBA         CL_UNSIGNED_INT8      
  54    CL_RGBA         CL_UNSIGNED_INT16     
  55    CL_RGBA         CL_UNSIGNED_INT32     
  56    CL_BGRA         CL_UNORM_INT8         
  57    CL_BGRA         CL_SIGNED_INT8        
  58    CL_BGRA         CL_UNSIGNED_INT8      
  59    CL_ARGB         CL_UNORM_INT8         
  60    CL_ARGB         CL_SIGNED_INT8        
  61    CL_ARGB         CL_UNSIGNED_INT8      
  62    CL_INTENSITY    CL_FLOAT              
  63    CL_INTENSITY    CL_HALF_FLOAT         
  64    CL_INTENSITY    CL_UNORM_INT8         
  65    CL_INTENSITY    CL_UNORM_INT16        
  66    CL_INTENSITY    CL_SNORM_INT16        
  67    CL_LUMINANCE    CL_FLOAT              
  68    CL_LUMINANCE    CL_HALF_FLOAT         
  69    CL_LUMINANCE    CL_UNORM_INT8         
  70    CL_LUMINANCE    CL_UNORM_INT16        
  71    CL_LUMINANCE    CL_SNORM_INT16        
  72    CL_BGRA         CL_SNORM_INT8         
  73    CL_BGRA         CL_SNORM_INT16        
  74    CL_ARGB         CL_SNORM_INT8         
  75    CL_ARGB         CL_SNORM_INT16        

oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.2 CUDA 8.0.0, SDK Revision = 7027912, NumDevs = 3, Device = GeForce GTX 1070, Device = Quadro 2000, Device = GeForce GTX 1050 Ti

System Info: 

 Local Time/Date = 18:48:31, 1/19/2018
 CPU Arch: 0
 CPU Level: 6
 # of CPU processors: 12
 Windows Build: 7601
 Windows Ver: 6.1 (Windows Vista / Windows 7)
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 16:56.


Mon Aug 2 16:56:49 UTC 2021 up 10 days, 11:25, 0 users, load averages: 2.49, 2.37, 2.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.