mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

sdbardwick 2013-01-29 10:33

Output from clinfo on Win7-64 with i7-3770s (HD4000) as sole GPU.
Intel OpenCL SDK 2011 (compute 1.1 only) installed, along with latest Intel graphics driver.

Attempting to run mfakto gives no platform found error.
I suspect that while the OpenCL code itself (.cl) files are mostly portable (the offline compiler doesn't seem to complain except for missing linked files and missing compile time variables - BUT I am FAR from a competent witness), there must be changes to accommodate the Intel ecosystem.


[CODE]C:\Windows\System32>clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1
Platform Name: Intel(R) OpenCL
Platform Vendor: Intel(R) Corporation
Platform Extensions: cl_intel_dx9_media_sharing cl_k
hr_byte_addressable_store cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_
khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_
khr_local_int32_extended_atomics


Platform Name: Intel(R) OpenCL
Number of devices: 2
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 32902
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 2
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 2
Max clock frequency: 3100Mhz
Address bits: 64
Max memory allocation: 2069654528
Image support: Yes
Max number of images read arguments: 480
Max number of images write arguments: 480
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 480
Max size of kernel argument: 3840
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 262144
Global memory size: 8278618112
Constant buffer size: 131072
Max number of constant args: 480
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 128
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 331
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 00000000008B6C10
Name: Intel(R) Core(TM) i7-377
0S CPU @ 3.10GHz
Vendor: Intel(R) Corporation
Device OpenCL C version: OpenCL C 1.1
Driver version: 1.1
Profile: FULL_PROFILE
Version: OpenCL 1.1 (Build 37149.37214)
Extensions: cl_khr_fp64 cl_khr_icd cl_khr_g
lobal_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32
_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store
cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread cl_khr_gl_sh
aring cl_intel_dx9_media_sharing


Device Type: CL_DEVICE_TYPE_GPU
Device ID: 32902
Max compute units: 16
Max work items dimensions: 3
Max work items[0]: 512
Max work items[1]: 512
Max work items[2]: 512
Max work group size: 512
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 0
Native vector width char: 1
Native vector width short: 1
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 0
Max clock frequency: 350Mhz
Address bits: 64
Max memory allocation: 392167424
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 2097152
Global memory size: 1568669696
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Kernel Preferred work group size multiple: 16
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 80
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 00000000008B6C10
Name: Intel(R) HD Graphics 4000
Vendor: Intel(R) Corporation
Device OpenCL C version: OpenCL C 1.1
Driver version: 9.17.10.2932
Profile: FULL_PROFILE
Version: OpenCL 1.1
Extensions: cl_khr_icd cl_khr_global_int32_
base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomic
s cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_d3d10_sharing cl_
intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_
khr_gl_event



C:\Windows\System32>
[/CODE]

Bdot 2013-01-29 12:53

[QUOTE=sdbardwick;326470]Output from clinfo on Win7-64 with i7-3770s (HD4000) as sole GPU.
Intel OpenCL SDK 2011 (compute 1.1 only) installed, along with latest Intel graphics driver.

Attempting to run mfakto gives no platform found error.
I suspect that while the OpenCL code itself (.cl) files are mostly portable (the offline compiler doesn't seem to complain except for missing linked files and missing compile time variables - BUT I am FAR from a competent witness), there must be changes to accommodate the Intel ecosystem.


[CODE]C:\Windows\System32>clinfo
Number of platforms: 1
...

Device Type: CL_DEVICE_TYPE_GPU
Device ID: 32902
Max compute units: 16
Max work items dimensions: 3
Max work items[0]: 512
Max work items[1]: 512
Max work items[2]: 512
Max work group size: 512
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 0
Native vector width char: 1
Native vector width short: 1
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 0
Max clock frequency: 350Mhz
Address bits: 64
Max memory allocation: 392167424
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 2097152
Global memory size: 1568669696
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Kernel Preferred work group size multiple: 16
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 80
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 00000000008B6C10
Name: Intel(R) HD Graphics 4000
Vendor: Intel(R) Corporation
Device OpenCL C version: OpenCL C 1.1
Driver version: 9.17.10.2932
Profile: FULL_PROFILE
Version: OpenCL 1.1
Extensions: cl_khr_icd cl_khr_global_int32_
base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomic
s cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_d3d10_sharing cl_
intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_
khr_gl_event



C:\Windows\System32>
[/CODE][/QUOTE]
Cool, you get the bonus for being the first one to show the HD4000 in the clinfo output!

Further, mfakto telling "no platform found" indicates that it [B]can[/B] load the required libraries, and that OpenCL initialization [B]did[/B] succeed. I did not think this was possible without installing AMD's Catalyst drivers.

4 command lines I'd like you to test and provide the output, if possible:

mfakto -d 11
mfakto -d 12
mfakto -d 11 --CLtest
mfakto -d 12 --CLtest

(For the later two, the platform and device info as read by mfakto are useful; as mfakto reports but then ignores any error in this mode, a program crash is likely.)

A next step could be to try and install the Catalyst driver, depending on what the above tells us.

Rodrigo 2013-01-29 16:52

[QUOTE=Bdot;326467]If it did not even find the CPU, then you most likely did not install the AMD Catalyst driver. This machine probably has the highest chance to see mfakto running on the HD4000 as its display is already working ...

I just hope Catalyst would at least install the OpenCL runtime, even if it does not find any AMD hardware ... Would you give it a try?
[/QUOTE]
Hi Bdot,

OK, but don't I need to have an AMD GPU in place, in order to install the Catalyst driver? (It's a laptop, so adding a card is out of the question.) If that's not required, then I'll give it a try.

[QUOTE=Bdot;326467]When you have connected both GPUs to some monitor (can be the same), could you try pressing the Windows-P combination and select "Duplicate"? I think by default, windows does not automatically enable additional screens.

Also, if you go the the "Screen resolution" control panel applet, does it show both GPUs?
[/QUOTE]
The screen resolution applet finds only one GPU. :no: If I click on "Detect," it'll tell me "Another display not detected." However (and strangely), at that point the "Display" option starts showing two alternatives. Instead of only the name of the monitor, it starts to show also "Available display output on AMD Radeon HD 7770 series." If I select this, then the Resolution option goes gray, while the Orientation option changes name to "Multiple displays" and starts showing two choices, "No display detected" and "Try to connect anyway on VGA." (Since I don't have any VGA ports, I won't chance trying that one.)

This goes away if I close the screen resolution applet and reopen it, when once again only the monitor is shown.

I tried typing Win+P and selecting Duplicate, but it didn't seem to make any difference: going into the Screen Resolution applet afterward didn't yield any different choices or find any new displays.

But it's encouraging to see that sdbardwick got mfakto to recognize the HD4000 on his machine. I'll be listening closely to that conversation!

Rodrigo

kracker 2013-01-29 17:05

[QUOTE=Rodrigo;326489]Hi Bdot,

OK, but don't I need to have an AMD GPU in place, in order to install the Catalyst driver? (It's a laptop, so adding a card is out of the question.) If that's not required, then I'll give it a try.
...
[/QUOTE]

I don't think it will let you even install the drivers without a AMD card... I think

sdbardwick 2013-01-29 19:26

mfakto-x64 -d 11
 
[CODE]C:\mfakto>mfakto-x64 -d 11
mfakto 0.12-Win (64bit build)


Runtime options
Inifile mfakto.ini
SievePrimesMin 5000
SievePrimesMax 200000
SievePrimes 25000
SievePrimesAdjust 1
NumStreams 3
GridSize 4
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode full
V5UserID none
ComputerID none
AllowSleep yes
TimeStampInResults no
VectorSize 4
GPUType AUTO
SieveOnGPU no
SmallExp no
SieveCPUMask 0
Compiletime options
SIEVE_SIZE_LIMIT 36kiB
SIEVE_SIZE 289731bits
SIEVE_SPLIT 250
MORE_CLASSES enabled
Select device - GPU not found, fallback to CPU.
Get device info - Compiling kernels ..........

OpenCL device info
name Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz (Int
el(R) Corporation)
device (driver) version OpenCL 1.1 (Build 37149.37214) (1.1)
maximum threads per block 1024
maximum threads per grid 1073741824
number of multiprocessors 8 (8 compute elements)
clock rate 3100MHz

Automatic parameters
threads per grid 2097152
optimizing kernels for CPU

running a simple selftest ...
ERROR: selftest failed for M53015323 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M601983997 (barrett15_75)
no factor found
ERROR: selftest failed for M601983997 (mfakto_cl_barrett72)
no factor found
ERROR: selftest failed for M601983997 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M601983997 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M601983997 (mfakto_cl_71)
no factor found
ERROR: selftest failed for M49635893 (barrett15_75)
no factor found
ERROR: selftest failed for M49635893 (mfakto_cl_barrett72)
no factor found
ERROR: selftest failed for M49635893 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M49635893 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M49635893 (mfakto_cl_71)
no factor found
ERROR: selftest failed for M51375383 (barrett15_75)
no factor found
ERROR: selftest failed for M51375383 (mfakto_cl_barrett72)
no factor found
ERROR: selftest failed for M51375383 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M51375383 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M51375383 (mfakto_cl_71)
no factor found
ERROR: selftest failed for M51406301 (barrett15_75)
no factor found
ERROR: selftest failed for M51406301 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M51406301 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M51406301 (mfakto_cl_71)
no factor found
ERROR: selftest failed for M47644171 (barrett15_75)
no factor found
ERROR: selftest failed for M47644171 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M47644171 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M47644171 (mfakto_cl_71)
no factor found
ERROR: selftest failed for M51038681 (barrett15_75)
no factor found
ERROR: selftest failed for M51038681 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M51038681 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M51038681 (mfakto_cl_71)
no factor found
ERROR: selftest failed for M49717271 (barrett15_75)
no factor found
ERROR: selftest failed for M49717271 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M49717271 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M49717271 (mfakto_cl_71)
no factor found
ERROR: selftest failed for M50752613 (barrett15_75)
no factor found
ERROR: selftest failed for M50752613 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M50752613 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M50908933 (barrett15_75)
no factor found
ERROR: selftest failed for M50908933 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M50908933 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M53076719 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M53076719 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M53123843 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M53123843 (mfakto_cl_barrett79)
no factor found
ERROR: selftest failed for M60009109 (mfakto_cl_63)
no factor found
ERROR: selftest failed for M60002273 (mfakto_cl_63)
no factor found
ERROR: selftest failed for M60004333 (barrett15_75)
no factor found
ERROR: selftest failed for M60004333 (mfakto_cl_barrett72)
no factor found
ERROR: selftest failed for M60004333 (mfakto_cl_71)
no factor found
ERROR: selftest failed for M60004333 (mfakto_cl_63)
no factor found
ERROR: selftest failed for M3321928703 (mfakto_cl_barrett92)
no factor found
ERROR: selftest failed for M6599953 (mfakto_cl_63)
no factor found
Selftest statistics
number of tests 50
successful tests 0
no factor found 50

selftest FAILED!


C:\mfakto>[/CODE]

sdbardwick 2013-01-29 19:29

mfakto-x64 -d 12
 
WAIT A SEC; LET ME CHECK SOMETHING BEFORE RELYING ON THIS .... [CODE]C:\mfakto>mfakto-x64 -d 12
mfakto 0.12-Win (64bit build)


Runtime options
Inifile mfakto.ini
SievePrimesMin 5000
SievePrimesMax 200000
SievePrimes 25000
SievePrimesAdjust 1
NumStreams 3
GridSize 4
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode full
V5UserID none
ComputerID none
AllowSleep yes
TimeStampInResults no
VectorSize 4
GPUType AUTO
SieveOnGPU no
SmallExp no
SieveCPUMask 0
Compiletime options
SIEVE_SIZE_LIMIT 36kiB
SIEVE_SIZE 289731bits
SIEVE_SPLIT 250
MORE_CLASSES enabled
Select device - GPU not found, fallback to CPU.
Error: Only 1 devices found. Cannot use device 2 (bad parameter to option -d).
init_CL(3, 12) failed

C:\mfakto>[/CODE]

Rodrigo 2013-01-29 19:38

These last two posts look more like what I've been getting on my machines. (Meaning: mfakto isn't recognizing the HD 4000.)

Rodrigo

sdbardwick 2013-01-29 19:56

1 Attachment(s)
I rebooted the computer to start fresh.
This box only has the Intel SDK, Intel graphics driver, Chrome, MSE, and Prime95 installed.

Attached is the results of mfakto-x64, mfakto-x64 -d 11, mfakto-x64 -d 12, and mfakto-x64 -d 11 --CLtest.
The last is truncated because mfakto crashes.

mfakto-x64 -d 12 --CLtest crashes also:
[CODE]C:\mfakto>mfakto-x64 -d 12 --CLtest

Runtime options
Inifile mfakto.ini
SievePrimesMin 5000
SievePrimesMax 200000
SievePrimes 25000
SievePrimesAdjust 1
NumStreams 3
GridSize 4
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode full
V5UserID none
ComputerID none
AllowSleep yes
TimeStampInResults no
VectorSize 4
GPUType AUTO
SieveOnGPU no
SmallExp no
SieveCPUMask 0
OpenCL Platform 1/1: Intel(R) Corporation, Version: OpenCL 1.1
Error: Only 1 devices found. Cannot use device 2 (bad parameter to option -d).
Device 1/1: Intel(R) HD Graphics 4000 (Intel(R) Corporation),
device version: OpenCL 1.1 , driver version: 9.17.10.2932
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_exte
nded_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_gl_sharing cl_khr_d3d10_sharing cl_intel_dx9_media_sharing cl_khr_3d_ima
ge_writes cl_khr_byte_addressable_store cl_khr_gl_event
Global memory:1568669696, Global memory cache: 2097152, local memory: 65536, wor
kgroup size: 512, Work dimensions: 3[512, 512, 512, 0, 0] , Max clock speed:350,
compute units:16
Error -35: clCreateCommandQueue(dev#3)
Error -33: clGetProgramBuildInfo failed.Error -33: clGetProgramBuildInfo failed.

BUILD OUTPUT
ยง
END OF BUILD OUTPUT

C:\mfakto>[/CODE]

sdbardwick 2013-01-29 20:10

After Catalyst install
 
1 Attachment(s)
Results after installing AMD Catalyst (it automatically installed the AMD APP SDK).
Any truncated output is due to crash.

Rodrigo 2013-01-29 20:50

[QUOTE=sdbardwick;326540]Results after installing AMD Catalyst (it automatically installed the AMD APP SDK).
Any truncated output is due to crash.[/QUOTE]
Hmm, so unless I'm reading it wrong, this means that:

1. One [B]can[/B] install the Catalyst driver even without having AMD graphics in the machine; but

2. In spite of that, mfakto still doesn't work on the HD 4000.

Thanks for uploading the output files.

Rodrigo

Koyaanisqatsi 2013-01-30 01:01

[QUOTE=flashjh;319690]What I'd give for one of those, except $3000, of course ;)[/QUOTE]

They do 5 ms/it on the current workload for CUDALucas.


All times are UTC. The time now is 23:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.