![]() |
|
|
#67 | |
|
Nov 2010
Germany
59710 Posts |
Quote:
Thinking about it, maybe both are required ... I have both of them in my boxes ... |
|
|
|
|
|
|
#68 |
|
Nov 2010
Germany
25516 Posts |
Nvidia supports OpenCL 1.0, but we need 1.1. For now the OpenCL version requires an AMD(ATI) GPU. If you install the above SDK and Catalyst drivers on an Nvidia box, then you will at least be able to run mfakto on your CPU ;-)
Most likely there is not much sense in running the OpenCL version on Nvidia when there is a CUDA version of the program (which in this case is the original). Maybe for performance comparison - but then you only compare Nvidia's cuda-compiler against Nvidia's OpenCL-compiler. |
|
|
|
|
|
#69 |
|
Nov 2010
Germany
10010101012 Posts |
mfakto 0.07 for Linux-64
|
|
|
|
|
|
#70 |
|
Sep 2010
Annapolis, MD, USA
2·32·11 Posts |
I posted earlier in this thread saying I was interested. Still am. I installed AMD APP 2.5 and downloaded the mfakto 0.07 binary. It seems to run, but claims not to find my GPU so it uses the CPU instead. Any suggestions? I finally found that the help option is -h, but even that is not too clear. (I had expected --help to work, but it silently ignored me and continued doing what it was doing.)
(Note that I don't expect my lousy GPU to actually be very valuable to the project, but it's better than having it run idle, especially if it won't siphon CPU cycles away from what they're doing now.) My card is a Radeon HD 5450. |
|
|
|
|
|
#71 | ||
|
Nov 2010
Germany
3·199 Posts |
Quote:
The APP SDK contains a clinfo binary. (APP SDK path/bin/x86_64/clinfo) Run it and see if it is reporting both the CPU and the GPU (one block CL_DEVICE_TYPE_GPU, one CL_DEVICE_TYPE_CPU). If the GPU block is there, paste it in here, maybe I can spot something. You can also play around with the "-d" option. Try "-d 1", "-d 2", "-d 11", "-d 21". See if any of them picks the GPU. If the GPU is missing from the clinfo output you probably run the open "radeon" graphics driver. Try installing the Catalyst graphics driver mentioned in an earlier post. If that helps I need to update the documentation ... Quote:
I would expect this card to deliver about 10-12M factors per second, which should be somewhere between 12 and 18 GHz-days per day. And be consuming ~300 Mhz of CPU power. Very rough estimates - once you get it running, let me know how reality looks like ;-) |
||
|
|
|
|
|
#72 | |
|
Sep 2010
Annapolis, MD, USA
2·32·11 Posts |
Quote:
Code:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Device Topology: PCI[ B#0, D#0, F#0 ]
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 128
Max work items[1]: 128
Max work items[2]: 128
Max work group size: 128
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 0Mhz
Address bits: 32
Max memory allocation: 134217728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 536870912
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7f1bb5145060
Name: Cedar
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.1
Driver version: CAL 1.4.1353
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 6
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 2700Mhz
Address bits: 64
Max memory allocation: 4149655552
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 65536
Global memory size: 16598622208
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7f1bb5145060
Name: AMD Phenom(tm) II X6 1045T Processor
Vendor: AuthenticAMD
Device OpenCL C version: OpenCL C 1.1
Driver version: 2.0
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf
|
|
|
|
|
|
|
#73 |
|
Nov 2010
Germany
3×199 Posts |
The clinfo output looks good (except for the "Max clock frequency: 0Mhz" part
). Based on this, the correct device number should be 11, the CPU should be 12.Could you please run mfakto -d 11 --CLtest and paste the output? --CLtest will display any errors but continue to invoke a small test kernel. Depending on what errors occurred this may lead to mfakto hanging (use Ctrl-C) or crashing - that is kind of expected. Also, what exactly is the error when running mfakto -d 11? I still wonder if installing the latest Catalyst driver would solve this. Do you see a chance for upgrading it? Last fiddled with by Bdot on 2011-08-17 at 21:13 |
|
|
|
|
|
#74 |
|
Sep 2010
Annapolis, MD, USA
2·32·11 Posts |
Code:
kurly@hex:~/mfakto/mfakto-0.07 - Linux/x86_64$ ./mfakto -d 11 --CLtest Runtime options SievePrimes 50000 SievePrimesAdjust 0 NumStreams 10 GridSize 3 WorkFile worktodo.txt Checkpoints enabled Stages enabled StopAfterFactor class PrintMode full AllowSleep yes VectorSize 4 No protocol specified OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) GPU not found, fallback to CPU. Device 1/1: AMD Phenom(tm) II X6 1045T Processor (AuthenticAMD), device version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213), driver version: 2.0 Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf Global memory:16598622208, Global memory cache: 65536, local memory: 32768, workgroup size: 1024, Work dimensions: 3[1024, 1024, 1024, 0, 0] , Max clock speed:2700, compute units:6 loop 0: Error -7 in clGetEventProfilingInfo.(startTime) 0 threads: RES (32): 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 loop 1: 1 threads: RES (32): 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 loop 2: 2 threads: RES (32): 2 2 2 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 loop 3: 3 threads: RES (32): 3 2 2 2 3 3 3 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 loop 4: 4 threads: RES (32): 4 2 2 2 3 3 3 4 4 4 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 loop 5: 5 threads: RES (32): 5 2 2 2 3 3 3 4 4 4 5 5 5 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 loop 6: 6 threads: RES (32): 6 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 loop 7: 7 threads: RES (32): 7 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 1 1 1 0 0 0 0 0 0 0 0 0 0 loop 8: 8 threads: RES (32): 8 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 1 1 1 0 0 0 0 0 0 0 loop 9: 9 threads: RES (32): 9 2 2 2 3 3 3 4 4 4 6 6 6 7 7 7 5 5 5 8 8 8 9 9 9 1 1 1 0 0 0 0 loop 10: Edit: Also, it did not crash, nor did I need to Ctrl-C. Last fiddled with by KingKurly on 2011-08-17 at 22:37 Reason: added 'non-crash' info |
|
|
|
|
|
#75 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
And second, as the AMD GPU driver is closed source, many distributions ship the open "radeon" driver, which will not work with the AMD APP. "lsmod | grep radeon" should return empty, while "lsmod|grep fglrx" should list at least one line if you have the AMD driver. I just verified that the driver version you have (CAL 1.4.1353) is even lower than what comes with Catalyst 11.6, which is the minimum supported with AMP APP 2.4 and 2.5. So even if Ubuntu 11.04 ships the proprietary driver, it is too old. Just yesterday AMD released Catalyst 11.8, and so I tried to install it via a remote ssh session. Just run "sh ati-driver-installer-11-8-x86.x86_64.run" and it all worked well. You should reboot afterwards, that's it. According to the change log they now don't even need a running X-Server anymore, but I did not test that (I need X anyway). BTW, your --CLtest output is totally correct, except that it was the CPU which calculated it. The one error line is expected as the test tried to start a kernel with 0 threads ;-) |
|
|
|
|
|
|
#76 | |||
|
Sep 2010
Annapolis, MD, USA
2×32×11 Posts |
Quote:
Quote:
Quote:
Code:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Device Topology: PCI[ B#1, D#0, F#0 ]
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 128
Max work items[1]: 128
Max work items[2]: 128
Max work group size: 128
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 0Mhz
Address bits: 32
Max memory allocation: 134217728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 536870912
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7f8a50e37060
Name: Cedar
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.1
Driver version: CAL 1.4.1523
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
|
|||
|
|
|
|
|
#77 |
|
Sep 2010
Annapolis, MD, USA
2·32·11 Posts |
Sorry to make back-to-back posts, but there was enough new information that I thought it warranted making a new post.
I found that if I plug in a monitor and keyboard to that computer and then login to the computer locally, the video card is found and can be used just fine. It would be a bit of a burden to have to always login locally, but I guess I can do that until a better solution is determined. That said, I do have a new problem to report: Code:
kurly@hex:~/mfakto/mfakto-0.07 - Linux/x86_64$ ./mfakto -d 11 -CLtest mfakto 0.07 (64bit build) Runtime options SievePrimes 50000 SievePrimesAdjust 0 NumStreams 10 GridSize 3 WorkFile worktodo.txt Checkpoints enabled Stages enabled StopAfterFactor class PrintMode full AllowSleep yes VectorSize 4 Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 64kiB SIEVE_SIZE 482885bits SIEVE_SPLIT 250 MORE_CLASSES enabled Select device - Get device info - Compiling kernels.......... OpenCL device info name Cedar (Advanced Micro Devices, Inc.) device (driver) version OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) (CAL 1.4.1523) maximum threads per block 128 maximum threads per grid 2097152 number of multiprocessors 2 (160 compute elements(estimate for ATI GPUs)) clock rate 0MHz ERROR: THREADS_PER_BLOCK (256) > deviceinfo.maxThreadsPerBlock ![]() ----------------------------------------------------------------------------- ***Edit: I was able to build the source with THREADS_PER_BLOCK changed from 256 to 128, but it fails selftest 1-5 and 9-11. See below: Code:
kurly@hex:~/mfakto/mfakto-0.07 - Linux/x86_64$ ./mfakto -d 11 -CLtest mfakto 0.07 (64bit build) Runtime options SievePrimes 50000 SievePrimesAdjust 0 NumStreams 10 GridSize 3 WorkFile worktodo.txt Checkpoints enabled Stages enabled StopAfterFactor class PrintMode full AllowSleep yes VectorSize 4 Compiletime options THREADS_PER_BLOCK 128 SIEVE_SIZE_LIMIT 64kiB SIEVE_SIZE 482885bits SIEVE_SPLIT 250 MORE_CLASSES enabled Select device - Get device info - Compiling kernels.......... OpenCL device info name Cedar (Advanced Micro Devices, Inc.) device (driver) version OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) (CAL 1.4.1523) maximum threads per block 128 maximum threads per grid 2097152 number of multiprocessors 2 (160 compute elements(estimate for ATI GPUs)) clock rate 0MHz Automatic parameters threads per grid 1048576 running a simple selftest... ########## testcase 1/14 ########## ERROR: selftest failed for M49635893 (mfakto_cl_71) no factor found ########## testcase 2/14 ########## ERROR: selftest failed for M51375383 (mfakto_cl_71) no factor found ########## testcase 3/14 ########## ERROR: selftest failed for M47644171 (mfakto_cl_71) no factor found ########## testcase 4/14 ########## ERROR: selftest failed for M51038681 (mfakto_cl_71) no factor found ########## testcase 5/14 ########## ERROR: selftest failed for M49717271 (mfakto_cl_71) no factor found ########## testcase 6/14 ########## ########## testcase 7/14 ########## ########## testcase 8/14 ########## ########## testcase 9/14 ########## ERROR: selftest failed for M60009109 (mfakto_cl_71) no factor found ########## testcase 10/14 ########## ERROR: selftest failed for M60002273 (mfakto_cl_71) no factor found ########## testcase 11/14 ########## ERROR: selftest failed for M60004333 (mfakto_cl_71) no factor found ########## testcase 12/14 ########## ########## testcase 13/14 ########## ########## testcase 14/14 ########## Selftest statistics number of tests 52 successfull tests 44 no factor found 8 selftest FAILED! Last fiddled with by KingKurly on 2011-08-19 at 01:37 Reason: Got code to compile |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2938 | 2023-06-30 14:04 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3628 | 2023-04-17 22:08 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |