![]() |
[QUOTE=kriesel;478003]The behavior I see in my logs is it starts with 1k iterations.
I have sequences that differ near start; iteration counts differ from line to line, by 1k,4k,5k,10k,20k...,50k,...,100k [CODE]OK 0 / 76812401 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [2018-01-19 13:02:44 Central Standard Time] OK 1000 / 76812401 [ 0.00%], 11.96 ms/it; ETA 10d 15:06; aadc1acf24bf7d60 [2018-01-19 13:03:04 Central Standard Time] OK 5000 / 76812401 [ 0.01%], 11.94 ms/it; ETA 10d 14:44; 3db0edb3db578456 [2018-01-19 13:03:59 Central Standard Time] [/CODE]Is it possible to save the ending Gerbicz step size from one run to begin the next? If not, would you consider it as an option, for hardware and installations that are stable? There's a slight execution speed advantage, and it reduces screen clutter. (My test run has worked its way up to 200k in under 24 hours, with no errors flagged yet, but if halted and restarted will reset to 1k interval and start the slow climb again from there. It should be very stable, as it's a brand new GPU on a fresh Windows install, patched to current, then gpuowl installed and run.)[/QUOTE] Let me explain my thinking here: - after startup, the user expects something to come up on screen, to get feedback that it's working. - after startup, a check should be done quickly, to validate early that it's not all broken. - after startup, the "gerbicz memory" should be reset because the hardware situation may be different, such as: different GPU, different GPU setup (clocks), different fan setup, case, etc. The ramp-up from 1K to 10K is fast (assuming no errors). The overhead difference between 10K and 200K is minor, and the log size is not big enough over that ramp-up to be a big problem. Now, let's see the opposite where there is memory between start-ups: - the users starts with the GPU "too hot", and gathers let's say 10 gerbicz errors during the night. - the user realizes the problem, fixes the cooling, and restarts. But now it won't ramp up (only very slowly) because of the memory of those past 10 errors. |
[QUOTE=kriesel;477927](Later...) Whoa, what happened in that middle line? Over a minute per iteration computed (>5000 times the preceding and following). Momentarily projecting runtime of over 150 years!
[CODE]OK 80000 / 76812401 [ 0.10%], 11.99 ms/it; ETA 10d 15:27; 6ee0f8a8a97d7812 [2018-01-19 13:19:36 Central Standard Time] OK 100000 / 76812401 [ 0.13%], 63362.75 ms/it; ETA 56258d 15:28; 3fb24c04ec7569db [2018-01-19 13:23:43 Central Standard Time] OK 150000 / 76812401 [ 0.20%], 11.99 ms/it; ETA 10d 15:25; 10bf91703f69c302 [2018-01-19 13:33:50 Central Standard Time][/CODE][/QUOTE] I attempted a fix for the time-per-iteration overflow, committed. |
[QUOTE=kriesel;477923]Hi,
I recently pulled down the Windows executable zip file for gpuowl 1.9 from [URL]http://www.mersenneforum.org/showpost.php?p=471663&postcount=226[/URL], unzipped, read its README.md, which says in part:[...] Please update it to cover gpuowl 1.x also. Thanks![/QUOTE] I updated the README to some degree, let me know what else is missing. |
[QUOTE=kriesel;477927] logging to gpuowl.log appears to not be occurring.[/QUOTE]
I tried to fix the log flush on windows (pending verification). On the topic of output redirection, GpuOwl does write to stdout (the normal standard output), so that's not the reason for any trouble there. I don't know more though. |
[QUOTE=preda;478057]Let me explain my thinking here:
- after startup, the user expects something to come up on screen, to get feedback that it's working. - after startup, a check should be done quickly, to validate early that it's not all broken. - after startup, the "gerbicz memory" should be reset because the hardware situation may be different, such as: different GPU, different GPU setup (clocks), different fan setup, case, etc. The ramp-up from 1K to 10K is fast (assuming no errors). The overhead difference between 10K and 200K is minor, and the log size is not big enough over that ramp-up to be a big problem. Now, let's see the opposite where there is memory between start-ups: - the users starts with the GPU "too hot", and gathers let's say 10 gerbicz errors during the night. - the user realizes the problem, fixes the cooling, and restarts. But now it won't ramp up (only very slowly) because of the memory of those past 10 errors.[/QUOTE] Thanks for the info. I still think having the option is useful, particularly in the normal case of a familiar user who does not have a thermal problem. Some of us are Windows users and routinely applying patches, or have other gpus that go awol until a reboot, or have unstable power and no UPS on some systems, but otherwise reliable and stable hardware and stable gpuowl installation. I understand you have other priorities on your to-do list. (~6M fft seems like a high one to me.) Re your 3 posts following the one quoted above: Excellent and thank you, times 3. If someone would build and post a Windows binary I'll give those changes a try. |
GV100
How fast is the new GV100 chip from Nvidia going to do in LL tests, because it's got full double precision (around 7 Tflop on the V100) and insane memory bandwidth. I suppose that it will be the only card on the market that's able to do under 1ms/it? has anyone ever tested with a similar Quadro card such as GP100 which has similar specs?
|
readme suggestions
[QUOTE=preda;478061]I updated the README to some degree, let me know what else is missing.[/QUOTE]
Thanks for getting to that. I think it's helpful to explicitly state at the top of a readme which version number it applies to and was written about. (Then later if it lags behind releases, the reader has been warned at the start of reading.) I feel it would be useful to expand about what exponent ranges are practical, and what are recommended, versus fft length and transform type; probably a little table for at least each of the more useful transforms. I think the statement "GpuOwL best handles exponents 70M - 78M." in the usage section applies specifically to 4M fft length DP transform. Made-up example for illustration only follows (don't use these numbers! I'm showing here, intentionally composite min and max values as a sign they're not valid values.). Min and max express what is possible to run accurately with the program, while recommended range subsets that, according to what is more efficiently run with some other program, due to GpuOwL currently implementing power-of-two fft lengths only. DP transform fftlength | min p | max p | recommended p range 2M | 19531255 | 38999995 | 35M-39M 4M | 38999915 | 78000005 | 70M-78M 8M | 77991235 | 155000015 | 140M-155M M61 transform: fftlength | min p | max p | recommended p range 2M | ? | ? | ?M-?M 4M | ? | ? | ?M-?M 8M | ? | ? | ?M-?M Also something more specific about the Gerbicz check intervals seems to me a useful addition. Pasting in part of one of your previous explanatory forum posts perhaps with a little editing. On Windows, -h did not work for me in the gpuowl-v1.9-94aa58f build. (--help did) Has that been changed? Tested on Windows? Including what you wrote in a recent forum post about the various transform types would also be useful. And, some sample output of the beginning of a normal run could be helpful. |
1 Attachment(s)
Latest build for windows as of right now (commit 74f1a38)
|
please add a requirements section to the readme
Feel free to use, edit, or replace any of the following
Requirements OpenCL installed, at least version x.x. One or more units of OpenCL compatible hardware, with corresponding driver(s) supporting OpenCL of the required level, such as certain AMD GPUs, Intel IGPs, or CPUs. (Do NVIDIA GPUs work?) Discrete (add-in card) GPUs give better performance because of their dedicated memory. Integrated graphics processors use memory shared with the CPU(s) and will affect performance of CPU applications. In case of difficulty, it's recommended to verify the successful installation of OpenCL and compatible drivers with a utility, such as clinfo, oclDeviceQuery.exe, or the advanced tab of GPU-Z. An indication of GPU ram requirements vs. transform type and fft length would be useful, perhaps as an additional column in the little tables I proposed earlier. I'm seeing only about 290MB occupied during 4M fft length -DP transform on an RX550. That may scale to roughly 1.3GB for a future 16M fft implementation, 2.7GB? for 32M, which would not fit on that 2GB card. (It would probably also run way too slowly for that card to be practical, at roughly estimated 2-3 years per exponent.) |
Does gpuowl run on nvidia gtx10x0? Crashes and other surprises
gpuowl attempts failed on three different model cards. This is the v1.9-74f1a38 version on Windows 7 Pro 64bit.
gpuowl.log: [CODE]gpuOwL v1.9- GPU Mersenne primality checker GeForce GTX 1070, 15x1708MHz OpenCL compilation in 1794 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 " PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:43:26 Central Standard Time] Starting at iteration 0 error -5 (carryConv) [/CODE]gpuowl.log again:[CODE] gpuOwL v1.9- GPU Mersenne primality checker GeForce GTX 1050 Ti, 6x1468MHz OpenCL compilation in 46 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 " PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:45:56 Central Standard Time] Starting at iteration 0 error -5 (carryConv) [/CODE]console running gpuowl via a tiny batch script for uniformity and time stamping:[CODE]c:\Users\Ken\Documents\gpuowl>echo starting gpuowl at Tue 01/23/2018 10:43:04.71 1>>gpuowlrun.txt c:\Users\Ken\Documents\gpuowl>gpuowl -user kriesel -cpu condorette-gtx1070 -device 0 gpuOwL v1.9- GPU Mersenne primality checker GeForce GTX 1070, 15x1708MHz OpenCL compilation in 1794 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u - DLOG_NWORDS=22u -DFP_DP=1 PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:43:26 Central Standard Time] Starting at iteration 0 error -5 (carryConv) Assertion failed! Program: c:\Users\Ken\Documents\gpuowl\gpuowl.exe File: clwrap.h, Line 230 Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str()) This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. c:\Users\Ken\Documents\gpuowl>echo exiting gpuowl at Tue 01/23/2018 10:43:43.67 1>>gpuowlrun.txt [/CODE]This caused GPU-z to be unable to access the gtx1070 gpu's sensors subsequently, and apparently crashed the driver or card. Both CUDALucas processes on the card also terminated. They were automatically restarted by their batch wrappers. Retrying gpuowl failed in the same way. But on a different gpu without any change to the batch script. Perhaps OpenCL doesn't reconnect to a card or driver that's restarted? Note that this dropout may mean that gpuowl runs may sometimes occur on gpus other than the units intended by the owner. A feature I've proposed for CUDALucas is device confirmation, to guard against such occurrences. Commodity gpus don't have queryable serial numbers, so other parameters, such as model, bios version, pcie address, etc have been considered. Device number in the CUDA or OpenCL sense is not reliable. I've seen this device dropout push a primality test from a reliable gpu to a less reliable one, and cause multiple tasks to land on a single gpu. console again: [CODE] c:\Users\Ken\Documents\gpuowl>echo starting gpuowl at Tue 01/23/2018 10:45:53.77 1>>gpuowlrun.txt c:\Users\Ken\Documents\gpuowl>gpuowl -user kriesel -cpu condorette-gtx1070 -device 0 gpuOwL v1.9- GPU Mersenne primality checker GeForce GTX 1050 Ti, 6x1468MHz OpenCL compilation in 46 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u - DLOG_NWORDS=22u -DFP_DP=1 " PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 10:45:56 Central Standard Time] Starting at iteration 0 error -5 (carryConv) Assertion failed! Program: c:\Users\Ken\Documents\gpuowl\gpuowl.exe File: clwrap.h, Line 230 Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str()) This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. c:\Users\Ken\Documents\gpuowl>echo exiting gpuowl at Tue 01/23/2018 10:46:09.12 1>>gpuowlrun.txt [/CODE]For some reason (perhaps one CUDALucas instance instead of two?) the GTX1050Ti's CUDALucas and GPU-Z were not affected like the GTX1070's were. OpenCL seems confused about the memory capacity of the 3GB GTX1050Ti, reporting 4GB. At one point GPU-Z was reporting 3.8GB in use. Killing and restarting GPU-Z for the GTX1070 and checking OpenCL on its advanced tab results in "OpenCL Device not found" on that GPU. GPU-Z also did not recover access to the card's sensors. Console again; for good measure I stopped all gpu tasks on the system, before trying on the Quadro 2000. [CODE]c:\Users\Ken\Documents\gpuowl>echo starting gpuowl at Tue 01/23/2018 11:34:37.89 1>>gpuowlrun.txt c:\Users\Ken\Documents\gpuowl>gpuowl -user kriesel -cpu condorette-quadro200 -device 1 gpuOwL v1.9- GPU Mersenne primality checker Quadro 2000, 4x1251MHz OpenCL compilation in 1762 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=77959589u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 " PRP-3: FFT 4M (1024 * 2048 * 2) of 77959589 (18.59 bits/word) [2018-01-23 11:34:40 Central Standard Time] Starting at iteration 0 error -5 (carryConv) Assertion failed! Program: c:\Users\Ken\Documents\gpuowl\gpuowl.exe File: clwrap.h, Line 230 Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str()) This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. c:\Users\Ken\Documents\gpuowl>echo exiting gpuowl at Tue 01/23/2018 11:35:16.57 1>>gpuowlrun.txt[/CODE]For some reason the GTX1050Ti and Quadro 2000 GPU-Z instances were unaffected; sensors and opencl parameters still displayed. I was able to make multiple run attempts on the 1050Ti. Oddly, at one point GPU-Z indicated about 3.8GB of memory in use on the 3GB GTX1050Ti. Time for a system restart to clean things up and resume CUDA. |
ocldevicequery
Here's the OclDeviceQuery.exe output for the system in the preceding post, obtained a few days before the GpuOwLexperiment. A rerun just now, before restarting the system, omits the GTX1070 entirely.
[CODE]C:\Users\Ken\Documents\oclDeviceQuery.exe Starting... OpenCL SW Info: CL_PLATFORM_NAME: NVIDIA CUDA CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 8.0.0 OpenCL SDK Revision: 7027912 OpenCL Device Info: 3 devices found supporting OpenCL: --------------------------------- Device GeForce GTX 1070 --------------------------------- CL_DEVICE_NAME: GeForce GTX 1070 CL_DEVICE_VENDOR: NVIDIA Corporation CL_DRIVER_VERSION: 378.66 CL_DEVICE_VERSION: OpenCL 1.2 CUDA CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU CL_DEVICE_MAX_COMPUTE_UNITS: 15 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64 CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024 CL_DEVICE_MAX_CLOCK_FREQUENCY: 1708 MHz CL_DEVICE_ADDRESS_BITS: 32 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 2048 MByte CL_DEVICE_GLOBAL_MEM_SIZE: 8192 MByte CL_DEVICE_ERROR_CORRECTION_SUPPORT: no CL_DEVICE_LOCAL_MEM_TYPE: local CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_IMAGE_SUPPORT: 1 CL_DEVICE_MAX_READ_IMAGE_ARGS: 256 CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16 CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 16384 2D_MAX_HEIGHT 32768 3D_MAX_WIDTH 16384 3D_MAX_HEIGHT 16384 3D_MAX_DEPTH 16384 CL_DEVICE_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing CL_DEVICE_COMPUTE_CAPABILITY_NV: 6.1 NUMBER OF MULTIPROCESSORS: 15 NUMBER OF CUDA CORES: 4294967281 CL_DEVICE_REGISTERS_PER_BLOCK_NV: 65536 CL_DEVICE_WARP_SIZE_NV: 32 CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_TRUE CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1 --------------------------------- Device Quadro 2000 --------------------------------- CL_DEVICE_NAME: Quadro 2000 CL_DEVICE_VENDOR: NVIDIA Corporation CL_DRIVER_VERSION: 378.66 CL_DEVICE_VERSION: OpenCL 1.1 CUDA CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.1 CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU CL_DEVICE_MAX_COMPUTE_UNITS: 4 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64 CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024 CL_DEVICE_MAX_CLOCK_FREQUENCY: 1251 MHz CL_DEVICE_ADDRESS_BITS: 32 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 256 MByte CL_DEVICE_GLOBAL_MEM_SIZE: 1024 MByte CL_DEVICE_ERROR_CORRECTION_SUPPORT: no CL_DEVICE_LOCAL_MEM_TYPE: local CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_IMAGE_SUPPORT: 1 CL_DEVICE_MAX_READ_IMAGE_ARGS: 128 CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8 CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 16384 2D_MAX_HEIGHT 16384 3D_MAX_WIDTH 2048 3D_MAX_HEIGHT 2048 3D_MAX_DEPTH 2048 CL_DEVICE_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts CL_DEVICE_COMPUTE_CAPABILITY_NV: 2.1 NUMBER OF MULTIPROCESSORS: 4 NUMBER OF CUDA CORES: 192 CL_DEVICE_REGISTERS_PER_BLOCK_NV: 32768 CL_DEVICE_WARP_SIZE_NV: 32 CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_TRUE CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1 --------------------------------- Device GeForce GTX 1050 Ti --------------------------------- CL_DEVICE_NAME: GeForce GTX 1050 Ti CL_DEVICE_VENDOR: NVIDIA Corporation CL_DRIVER_VERSION: 378.66 CL_DEVICE_VERSION: OpenCL 1.2 CUDA CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU CL_DEVICE_MAX_COMPUTE_UNITS: 6 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64 CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024 CL_DEVICE_MAX_CLOCK_FREQUENCY: 1468 MHz CL_DEVICE_ADDRESS_BITS: 32 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1024 MByte CL_DEVICE_GLOBAL_MEM_SIZE: 4096 MByte CL_DEVICE_ERROR_CORRECTION_SUPPORT: no CL_DEVICE_LOCAL_MEM_TYPE: local CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_IMAGE_SUPPORT: 1 CL_DEVICE_MAX_READ_IMAGE_ARGS: 256 CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16 CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 16384 2D_MAX_HEIGHT 32768 3D_MAX_WIDTH 16384 3D_MAX_HEIGHT 16384 3D_MAX_DEPTH 16384 CL_DEVICE_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing CL_DEVICE_COMPUTE_CAPABILITY_NV: 6.1 NUMBER OF MULTIPROCESSORS: 6 NUMBER OF CUDA CORES: 4294967290 CL_DEVICE_REGISTERS_PER_BLOCK_NV: 65536 CL_DEVICE_WARP_SIZE_NV: 32 CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_TRUE CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1 --------------------------------- 2D Image Formats Supported (75) --------------------------------- # Channel Order Channel Type 1 CL_R CL_FLOAT 2 CL_R CL_HALF_FLOAT 3 CL_R CL_UNORM_INT8 4 CL_R CL_UNORM_INT16 5 CL_R CL_SNORM_INT16 6 CL_R CL_SIGNED_INT8 7 CL_R CL_SIGNED_INT16 8 CL_R CL_SIGNED_INT32 9 CL_R CL_UNSIGNED_INT8 10 CL_R CL_UNSIGNED_INT16 11 CL_R CL_UNSIGNED_INT32 12 CL_A CL_FLOAT 13 CL_A CL_HALF_FLOAT 14 CL_A CL_UNORM_INT8 15 CL_A CL_UNORM_INT16 16 CL_A CL_SNORM_INT16 17 CL_A CL_SIGNED_INT8 18 CL_A CL_SIGNED_INT16 19 CL_A CL_SIGNED_INT32 20 CL_A CL_UNSIGNED_INT8 21 CL_A CL_UNSIGNED_INT16 22 CL_A CL_UNSIGNED_INT32 23 CL_RG CL_FLOAT 24 CL_RG CL_HALF_FLOAT 25 CL_RG CL_UNORM_INT8 26 CL_RG CL_UNORM_INT16 27 CL_RG CL_SNORM_INT16 28 CL_RG CL_SIGNED_INT8 29 CL_RG CL_SIGNED_INT16 30 CL_RG CL_SIGNED_INT32 31 CL_RG CL_UNSIGNED_INT8 32 CL_RG CL_UNSIGNED_INT16 33 CL_RG CL_UNSIGNED_INT32 34 CL_RA CL_FLOAT 35 CL_RA CL_HALF_FLOAT 36 CL_RA CL_UNORM_INT8 37 CL_RA CL_UNORM_INT16 38 CL_RA CL_SNORM_INT16 39 CL_RA CL_SIGNED_INT8 40 CL_RA CL_SIGNED_INT16 41 CL_RA CL_SIGNED_INT32 42 CL_RA CL_UNSIGNED_INT8 43 CL_RA CL_UNSIGNED_INT16 44 CL_RA CL_UNSIGNED_INT32 45 CL_RGBA CL_FLOAT 46 CL_RGBA CL_HALF_FLOAT 47 CL_RGBA CL_UNORM_INT8 48 CL_RGBA CL_UNORM_INT16 49 CL_RGBA CL_SNORM_INT16 50 CL_RGBA CL_SIGNED_INT8 51 CL_RGBA CL_SIGNED_INT16 52 CL_RGBA CL_SIGNED_INT32 53 CL_RGBA CL_UNSIGNED_INT8 54 CL_RGBA CL_UNSIGNED_INT16 55 CL_RGBA CL_UNSIGNED_INT32 56 CL_BGRA CL_UNORM_INT8 57 CL_BGRA CL_SIGNED_INT8 58 CL_BGRA CL_UNSIGNED_INT8 59 CL_ARGB CL_UNORM_INT8 60 CL_ARGB CL_SIGNED_INT8 61 CL_ARGB CL_UNSIGNED_INT8 62 CL_INTENSITY CL_FLOAT 63 CL_INTENSITY CL_HALF_FLOAT 64 CL_INTENSITY CL_UNORM_INT8 65 CL_INTENSITY CL_UNORM_INT16 66 CL_INTENSITY CL_SNORM_INT16 67 CL_LUMINANCE CL_FLOAT 68 CL_LUMINANCE CL_HALF_FLOAT 69 CL_LUMINANCE CL_UNORM_INT8 70 CL_LUMINANCE CL_UNORM_INT16 71 CL_LUMINANCE CL_SNORM_INT16 72 CL_BGRA CL_SNORM_INT8 73 CL_BGRA CL_SNORM_INT16 74 CL_ARGB CL_SNORM_INT8 75 CL_ARGB CL_SNORM_INT16 --------------------------------- 3D Image Formats Supported (75) --------------------------------- # Channel Order Channel Type 1 CL_R CL_FLOAT 2 CL_R CL_HALF_FLOAT 3 CL_R CL_UNORM_INT8 4 CL_R CL_UNORM_INT16 5 CL_R CL_SNORM_INT16 6 CL_R CL_SIGNED_INT8 7 CL_R CL_SIGNED_INT16 8 CL_R CL_SIGNED_INT32 9 CL_R CL_UNSIGNED_INT8 10 CL_R CL_UNSIGNED_INT16 11 CL_R CL_UNSIGNED_INT32 12 CL_A CL_FLOAT 13 CL_A CL_HALF_FLOAT 14 CL_A CL_UNORM_INT8 15 CL_A CL_UNORM_INT16 16 CL_A CL_SNORM_INT16 17 CL_A CL_SIGNED_INT8 18 CL_A CL_SIGNED_INT16 19 CL_A CL_SIGNED_INT32 20 CL_A CL_UNSIGNED_INT8 21 CL_A CL_UNSIGNED_INT16 22 CL_A CL_UNSIGNED_INT32 23 CL_RG CL_FLOAT 24 CL_RG CL_HALF_FLOAT 25 CL_RG CL_UNORM_INT8 26 CL_RG CL_UNORM_INT16 27 CL_RG CL_SNORM_INT16 28 CL_RG CL_SIGNED_INT8 29 CL_RG CL_SIGNED_INT16 30 CL_RG CL_SIGNED_INT32 31 CL_RG CL_UNSIGNED_INT8 32 CL_RG CL_UNSIGNED_INT16 33 CL_RG CL_UNSIGNED_INT32 34 CL_RA CL_FLOAT 35 CL_RA CL_HALF_FLOAT 36 CL_RA CL_UNORM_INT8 37 CL_RA CL_UNORM_INT16 38 CL_RA CL_SNORM_INT16 39 CL_RA CL_SIGNED_INT8 40 CL_RA CL_SIGNED_INT16 41 CL_RA CL_SIGNED_INT32 42 CL_RA CL_UNSIGNED_INT8 43 CL_RA CL_UNSIGNED_INT16 44 CL_RA CL_UNSIGNED_INT32 45 CL_RGBA CL_FLOAT 46 CL_RGBA CL_HALF_FLOAT 47 CL_RGBA CL_UNORM_INT8 48 CL_RGBA CL_UNORM_INT16 49 CL_RGBA CL_SNORM_INT16 50 CL_RGBA CL_SIGNED_INT8 51 CL_RGBA CL_SIGNED_INT16 52 CL_RGBA CL_SIGNED_INT32 53 CL_RGBA CL_UNSIGNED_INT8 54 CL_RGBA CL_UNSIGNED_INT16 55 CL_RGBA CL_UNSIGNED_INT32 56 CL_BGRA CL_UNORM_INT8 57 CL_BGRA CL_SIGNED_INT8 58 CL_BGRA CL_UNSIGNED_INT8 59 CL_ARGB CL_UNORM_INT8 60 CL_ARGB CL_SIGNED_INT8 61 CL_ARGB CL_UNSIGNED_INT8 62 CL_INTENSITY CL_FLOAT 63 CL_INTENSITY CL_HALF_FLOAT 64 CL_INTENSITY CL_UNORM_INT8 65 CL_INTENSITY CL_UNORM_INT16 66 CL_INTENSITY CL_SNORM_INT16 67 CL_LUMINANCE CL_FLOAT 68 CL_LUMINANCE CL_HALF_FLOAT 69 CL_LUMINANCE CL_UNORM_INT8 70 CL_LUMINANCE CL_UNORM_INT16 71 CL_LUMINANCE CL_SNORM_INT16 72 CL_BGRA CL_SNORM_INT8 73 CL_BGRA CL_SNORM_INT16 74 CL_ARGB CL_SNORM_INT8 75 CL_ARGB CL_SNORM_INT16 oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.2 CUDA 8.0.0, SDK Revision = 7027912, NumDevs = 3, Device = GeForce GTX 1070, Device = Quadro 2000, Device = GeForce GTX 1050 Ti System Info: Local Time/Date = 18:48:31, 1/19/2018 CPU Arch: 0 CPU Level: 6 # of CPU processors: 12 Windows Build: 7601 Windows Ver: 6.1 (Windows Vista / Windows 7) [/CODE] |
| All times are UTC. The time now is 22:22. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.