mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-08-16, 06:38   #67
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by monst View Post
Can you also please post the correct versions of OpenCL.dll and any other dll's that are required? Thanks.
I recommend to install AMD APP 2.5. Though for windows the latest Catalyst driver should also contain OpenCL - if you have that already, then you may just need to extend the PATH.

Thinking about it, maybe both are required ... I have both of them in my boxes ...
Bdot is offline   Reply With Quote
Old 2011-08-16, 07:43   #68
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

25516 Posts
Default

Quote:
Originally Posted by firejuggler View Post
isn' t open-cl supposed to work on nvidia and radeon card?
Nvidia supports OpenCL 1.0, but we need 1.1. For now the OpenCL version requires an AMD(ATI) GPU. If you install the above SDK and Catalyst drivers on an Nvidia box, then you will at least be able to run mfakto on your CPU ;-)

Most likely there is not much sense in running the OpenCL version on Nvidia when there is a CUDA version of the program (which in this case is the original). Maybe for performance comparison - but then you only compare Nvidia's cuda-compiler against Nvidia's OpenCL-compiler.
Bdot is offline   Reply With Quote
Old 2011-08-16, 08:55   #69
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

mfakto 0.07 for Linux-64
Attached Files
File Type: zip mfakto-0.07 - Linux.zip (87.9 KB, 262 views)
Bdot is offline   Reply With Quote
Old 2011-08-16, 21:13   #70
KingKurly
 
KingKurly's Avatar
 
Sep 2010
Annapolis, MD, USA

2×32×11 Posts
Default

Quote:
Originally Posted by Bdot View Post
mfakto 0.07 for Linux-64
I posted earlier in this thread saying I was interested. Still am. I installed AMD APP 2.5 and downloaded the mfakto 0.07 binary. It seems to run, but claims not to find my GPU so it uses the CPU instead. Any suggestions? I finally found that the help option is -h, but even that is not too clear. (I had expected --help to work, but it silently ignored me and continued doing what it was doing.)

(Note that I don't expect my lousy GPU to actually be very valuable to the project, but it's better than having it run idle, especially if it won't siphon CPU cycles away from what they're doing now.)

My card is a Radeon HD 5450.
KingKurly is offline   Reply With Quote
Old 2011-08-17, 08:22   #71
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by KingKurly View Post
I posted earlier in this thread saying I was interested. Still am. I installed AMD APP 2.5 and downloaded the mfakto 0.07 binary. It seems to run, but claims not to find my GPU so it uses the CPU instead.
If it runs (using the CPU) all libs are found OK.
The APP SDK contains a clinfo binary. (APP SDK path/bin/x86_64/clinfo) Run it and see if it is reporting both the CPU and the GPU (one block CL_DEVICE_TYPE_GPU, one CL_DEVICE_TYPE_CPU). If the GPU block is there, paste it in here, maybe I can spot something. You can also play around with the "-d" option. Try "-d 1", "-d 2", "-d 11", "-d 21". See if any of them picks the GPU.

If the GPU is missing from the clinfo output you probably run the open "radeon" graphics driver. Try installing the Catalyst graphics driver mentioned in an earlier post. If that helps I need to update the documentation ...

Quote:
Originally Posted by KingKurly View Post
I finally found that the help option is -h, but even that is not too clear. (I had expected --help to work, but it silently ignored me and continued doing what it was doing.)
I take that as an enhancement for the next version. And also I'll try to implement a "-d g" option to force running on GPU or fail. There is already "-d c" forcing it to run on the CPU ...

Quote:
Originally Posted by KingKurly View Post
(Note that I don't expect my lousy GPU to actually be very valuable to the project, but it's better than having it run idle, especially if it won't siphon CPU cycles away from what they're doing now.)

My card is a Radeon HD 5450.
I would expect this card to deliver about 10-12M factors per second, which should be somewhere between 12 and 18 GHz-days per day. And be consuming ~300 Mhz of CPU power. Very rough estimates - once you get it running, let me know how reality looks like ;-)
Bdot is offline   Reply With Quote
Old 2011-08-17, 14:49   #72
KingKurly
 
KingKurly's Avatar
 
Sep 2010
Annapolis, MD, USA

2·32·11 Posts
Default

Quote:
Originally Posted by Bdot View Post
If it runs (using the CPU) all libs are found OK.
The APP SDK contains a clinfo binary. (APP SDK path/bin/x86_64/clinfo) Run it and see if it is reporting both the CPU and the GPU (one block CL_DEVICE_TYPE_GPU, one CL_DEVICE_TYPE_CPU). If the GPU block is there, paste it in here, maybe I can spot something. You can also play around with the "-d" option. Try "-d 1", "-d 2", "-d 11", "-d 21". See if any of them picks the GPU.
I was able to run clinfo okay, and I will include the output below. I tried playing around with the -d option with no success, several different error messages.

Code:
Number of platforms:                 1
  Platform Profile:                 FULL_PROFILE
  Platform Version:                 OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:                 Advanced Micro Devices, Inc.
  Platform Extensions:                 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:                 2
  Device Type:                     CL_DEVICE_TYPE_GPU
  Device ID:                     4098
  Device Topology:                 PCI[ B#0, D#0, F#0 ]
  Max compute units:                 2
  Max work items dimensions:             3
    Max work items[0]:                 128
    Max work items[1]:                 128
    Max work items[2]:                 128
  Max work group size:                 128
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         0
  Native vector width char:             16
  Native vector width short:             8
  Native vector width int:             4
  Native vector width long:             2
  Native vector width float:             4
  Native vector width double:             0
  Max clock frequency:                 0Mhz
  Address bits:                     32
  Max memory allocation:             134217728
  Image support:                 Yes
  Max number of images read arguments:         128
  Max number of images write arguments:         8
  Max image 2D width:                 8192
  Max image 2D height:                 8192
  Max image 3D width:                 2048
  Max image 3D height:                 2048
  Max image 3D depth:                 2048
  Max samplers within kernel:             16
  Max size of kernel argument:             1024
  Alignment (bits) of base address:         32768
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 Yes
    Round to +ve and infinity:             Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                     None
  Cache line size:                 0
  Cache size:                     0
  Global memory size:                 536870912
  Constant buffer size:                 65536
  Max number of constant args:             8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     32
  Error correction support:             0
  Unified memory for Host and Device:         0
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             No
  Queue properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Platform ID:                     0x7f1bb5145060
  Name:                         Cedar
  Vendor:                     Advanced Micro Devices, Inc.
  Device OpenCL C version:             OpenCL C 1.1 
  Driver version:                 CAL 1.4.1353
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
  Extensions:                     cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt 


  Device Type:                     CL_DEVICE_TYPE_CPU
  Device ID:                     4098
  Max compute units:                 6
  Max work items dimensions:             3
    Max work items[0]:                 1024
    Max work items[1]:                 1024
    Max work items[2]:                 1024
  Max work group size:                 1024
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         0
  Native vector width char:             16
  Native vector width short:             8
  Native vector width int:             4
  Native vector width long:             2
  Native vector width float:             4
  Native vector width double:             0
  Max clock frequency:                 2700Mhz
  Address bits:                     64
  Max memory allocation:             4149655552
  Image support:                 Yes
  Max number of images read arguments:         128
  Max number of images write arguments:         8
  Max image 2D width:                 8192
  Max image 2D height:                 8192
  Max image 3D width:                 2048
  Max image 3D height:                 2048
  Max image 3D depth:                 2048
  Max samplers within kernel:             16
  Max size of kernel argument:             4096
  Alignment (bits) of base address:         1024
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 Yes
    Round to +ve and infinity:             Yes
    IEEE754-2008 fused multiply-add:         No
  Cache type:                     Read/Write
  Cache line size:                 64
  Cache size:                     65536
  Global memory size:                 16598622208
  Constant buffer size:                 65536
  Max number of constant args:             8
  Local memory type:                 Global
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     1
  Error correction support:             0
  Unified memory for Host and Device:         1
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             Yes
  Queue properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Platform ID:                     0x7f1bb5145060
  Name:                         AMD Phenom(tm) II X6 1045T Processor
  Vendor:                     AuthenticAMD
  Device OpenCL C version:             OpenCL C 1.1 
  Driver version:                 2.0
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
  Extensions:                     cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf
Thank you for your help and your code, and I look forward to hearing back from you.
KingKurly is offline   Reply With Quote
Old 2011-08-17, 21:10   #73
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Default

The clinfo output looks good (except for the "Max clock frequency: 0Mhz" part ). Based on this, the correct device number should be 11, the CPU should be 12.

Could you please run mfakto -d 11 --CLtest and paste the output? --CLtest will display any errors but continue to invoke a small test kernel. Depending on what errors occurred this may lead to mfakto hanging (use Ctrl-C) or crashing - that is kind of expected.

Also, what exactly is the error when running mfakto -d 11?

I still wonder if installing the latest Catalyst driver would solve this. Do you see a chance for upgrading it?

Last fiddled with by Bdot on 2011-08-17 at 21:13
Bdot is offline   Reply With Quote
Old 2011-08-17, 22:36   #74
KingKurly
 
KingKurly's Avatar
 
Sep 2010
Annapolis, MD, USA

C616 Posts
Default

Code:
kurly@hex:~/mfakto/mfakto-0.07 - Linux/x86_64$ ./mfakto -d 11 --CLtest

Runtime options
  SievePrimes               50000
  SievePrimesAdjust         0
  NumStreams                10
  GridSize                  3
  WorkFile                  worktodo.txt
  Checkpoints               enabled
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 full
  AllowSleep                yes
  VectorSize                4
No protocol specified
OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
GPU not found, fallback to CPU.
Device 1/1: AMD Phenom(tm) II X6 1045T Processor (AuthenticAMD),
device version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213), driver version: 2.0
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf 
Global memory:16598622208, Global memory cache: 65536, local memory: 32768, workgroup size: 1024, Work dimensions: 3[1024, 1024, 1024, 0, 0] , Max clock speed:2700, compute units:6
loop 0: 
Error -7 in clGetEventProfilingInfo.(startTime)
0 threads: RES (32): 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
loop 1: 
1 threads: RES (32): 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
loop 2: 
2 threads: RES (32): 2 2 2 2 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
loop 3: 
3 threads: RES (32): 3 2 2 2 3 3 3 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
loop 4: 
4 threads: RES (32): 4 2 2 2 3 3 3 4 4 4 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
loop 5: 
5 threads: RES (32): 5 2 2 2 3 3 3 4 4 4 5 5 5 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
loop 6: 
6 threads: RES (32): 6 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 
loop 7: 
7 threads: RES (32): 7 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 1 1 1 0 0 0 0 0 0 0 0 0 0 
loop 8: 
8 threads: RES (32): 8 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 1 1 1 0 0 0 0 0 0 0 
loop 9: 
9 threads: RES (32): 9 2 2 2 3 3 3 4 4 4 6 6 6 7 7 7 5 5 5 8 8 8 9 9 9 1 1 1 0 0 0 0 
loop 10:
I would be willing to upgrade the drivers, although I would think that Ubuntu 11.04 (latest release) would be up to date. I generally use the computer without a monitor, so if you have a way to do it from the command-line, that would be best.

Edit: Also, it did not crash, nor did I need to Ctrl-C.

Last fiddled with by KingKurly on 2011-08-17 at 22:37 Reason: added 'non-crash' info
KingKurly is offline   Reply With Quote
Old 2011-08-18, 20:46   #75
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Default

Quote:
Originally Posted by KingKurly View Post
I would be willing to upgrade the drivers, although I would think that Ubuntu 11.04 (latest release) would be up to date. I generally use the computer without a monitor, so if you have a way to do it from the command-line, that would be best.

Edit: Also, it did not crash, nor did I need to Ctrl-C.
That leaves two possible issues: in the past the AMD GPU drivers required a running X-Server - do you have that?

And second, as the AMD GPU driver is closed source, many distributions ship the open "radeon" driver, which will not work with the AMD APP.
"lsmod | grep radeon" should return empty, while "lsmod|grep fglrx" should list at least one line if you have the AMD driver.

I just verified that the driver version you have (CAL 1.4.1353) is even lower than what comes with Catalyst 11.6, which is the minimum supported with AMP APP 2.4 and 2.5. So even if Ubuntu 11.04 ships the proprietary driver, it is too old.

Just yesterday AMD released Catalyst 11.8, and so I tried to install it via a remote ssh session. Just run "sh ati-driver-installer-11-8-x86.x86_64.run" and it all worked well. You should reboot afterwards, that's it. According to the change log they now don't even need a running X-Server anymore, but I did not test that (I need X anyway).

BTW, your --CLtest output is totally correct, except that it was the CPU which calculated it. The one error line is expected as the test tried to start a kernel with 0 threads ;-)
Bdot is offline   Reply With Quote
Old 2011-08-19, 01:11   #76
KingKurly
 
KingKurly's Avatar
 
Sep 2010
Annapolis, MD, USA

3068 Posts
Default

Quote:
Originally Posted by Bdot View Post
That leaves two possible issues: in the past the AMD GPU drivers required a running X-Server - do you have that?
Yes, I do.

Quote:
Originally Posted by Bdot View Post
And second, as the AMD GPU driver is closed source, many distributions ship the open "radeon" driver, which will not work with the AMD APP.
"lsmod | grep radeon" should return empty, while "lsmod|grep fglrx" should list at least one line if you have the AMD driver.
I checked, and radeon was empty and fglrx had one line. (Continue reading, there's more!)

Quote:
Originally Posted by Bdot View Post
I just verified that the driver version you have (CAL 1.4.1353) is even lower than what comes with Catalyst 11.6, which is the minimum supported with AMP APP 2.4 and 2.5. So even if Ubuntu 11.04 ships the proprietary driver, it is too old.

Just yesterday AMD released Catalyst 11.8, and so I tried to install it via a remote ssh session. Just run "sh ati-driver-installer-11-8-x86.x86_64.run" and it all worked well. You should reboot afterwards, that's it. According to the change log they now don't even need a running X-Server anymore, but I did not test that (I need X anyway).
I downloaded 11.8 and upgraded to it. I believe I did it correctly, but I am still not able to make mfakto use the GPU, I have tried many parameters to -d but none work, they only use CPU. This is the new output from clinfo:
Code:
Number of platforms:                 1
  Platform Profile:                 FULL_PROFILE
  Platform Version:                 OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:                 Advanced Micro Devices, Inc.
  Platform Extensions:                 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:                 2
  Device Type:                     CL_DEVICE_TYPE_GPU
  Device ID:                     4098
  Device Topology:                 PCI[ B#1, D#0, F#0 ]
  Max compute units:                 2
  Max work items dimensions:             3
    Max work items[0]:                 128
    Max work items[1]:                 128
    Max work items[2]:                 128
  Max work group size:                 128
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         0
  Native vector width char:             16
  Native vector width short:             8
  Native vector width int:             4
  Native vector width long:             2
  Native vector width float:             4
  Native vector width double:             0
  Max clock frequency:                 0Mhz
  Address bits:                     32
  Max memory allocation:             134217728
  Image support:                 Yes
  Max number of images read arguments:         128
  Max number of images write arguments:         8
  Max image 2D width:                 8192
  Max image 2D height:                 8192
  Max image 3D width:                 2048
  Max image 3D height:                 2048
  Max image 3D depth:                 2048
  Max samplers within kernel:             16
  Max size of kernel argument:             1024
  Alignment (bits) of base address:         32768
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 Yes
    Round to +ve and infinity:             Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                     None
  Cache line size:                 0
  Cache size:                     0
  Global memory size:                 536870912
  Constant buffer size:                 65536
  Max number of constant args:             8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     32
  Error correction support:             0
  Unified memory for Host and Device:         0
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             No
  Queue properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Platform ID:                     0x7f8a50e37060
  Name:                         Cedar
  Vendor:                     Advanced Micro Devices, Inc.
  Device OpenCL C version:             OpenCL C 1.1 
  Driver version:                 CAL 1.4.1523
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
  Extensions:                     cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
Any more ideas?
KingKurly is offline   Reply With Quote
Old 2011-08-19, 01:25   #77
KingKurly
 
KingKurly's Avatar
 
Sep 2010
Annapolis, MD, USA

3068 Posts
Default

Sorry to make back-to-back posts, but there was enough new information that I thought it warranted making a new post.

I found that if I plug in a monitor and keyboard to that computer and then login to the computer locally, the video card is found and can be used just fine. It would be a bit of a burden to have to always login locally, but I guess I can do that until a better solution is determined.

That said, I do have a new problem to report:

Code:
kurly@hex:~/mfakto/mfakto-0.07 - Linux/x86_64$ ./mfakto -d 11 -CLtest
mfakto 0.07 (64bit build)


Runtime options
  SievePrimes               50000
  SievePrimesAdjust         0
  NumStreams                10
  GridSize                  3
  WorkFile                  worktodo.txt
  Checkpoints               enabled
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 full
  AllowSleep                yes
  VectorSize                4
Compiletime options
  THREADS_PER_BLOCK         256
  SIEVE_SIZE_LIMIT          64kiB
  SIEVE_SIZE                482885bits
  SIEVE_SPLIT               250
  MORE_CLASSES              enabled
Select device - Get device info - Compiling kernels..........

OpenCL device info
  name                      Cedar (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) (CAL 1.4.1523)
  maximum threads per block 128
  maximum threads per grid  2097152
  number of multiprocessors 2 (160 compute elements(estimate for ATI GPUs))
  clock rate                0MHz

ERROR: THREADS_PER_BLOCK (256) > deviceinfo.maxThreadsPerBlock
As you can see, my GPU can only do 128 max threads per block, but you have the build compiled for 256. I will try downloading the source from a previous post in this thread and seeing if I can work around this issue myself, but I wanted to let you know that I surpassed one hurdle, now onto the next one!

-----------------------------------------------------------------------------
***Edit: I was able to build the source with THREADS_PER_BLOCK changed from 256 to 128, but it fails selftest 1-5 and 9-11. See below:

Code:
kurly@hex:~/mfakto/mfakto-0.07 - Linux/x86_64$ ./mfakto -d 11 -CLtest
mfakto 0.07 (64bit build)


Runtime options
  SievePrimes               50000
  SievePrimesAdjust         0
  NumStreams                10
  GridSize                  3
  WorkFile                  worktodo.txt
  Checkpoints               enabled
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 full
  AllowSleep                yes
  VectorSize                4
Compiletime options
  THREADS_PER_BLOCK         128
  SIEVE_SIZE_LIMIT          64kiB
  SIEVE_SIZE                482885bits
  SIEVE_SPLIT               250
  MORE_CLASSES              enabled
Select device - Get device info - Compiling kernels..........

OpenCL device info
  name                      Cedar (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) (CAL 1.4.1523)
  maximum threads per block 128
  maximum threads per grid  2097152
  number of multiprocessors 2 (160 compute elements(estimate for ATI GPUs))
  clock rate                0MHz

Automatic parameters
  threads per grid          1048576

running a simple selftest...
########## testcase 1/14 ##########
ERROR: selftest failed for M49635893 (mfakto_cl_71)
  no factor found
########## testcase 2/14 ##########
ERROR: selftest failed for M51375383 (mfakto_cl_71)
  no factor found
########## testcase 3/14 ##########
ERROR: selftest failed for M47644171 (mfakto_cl_71)
  no factor found
########## testcase 4/14 ##########
ERROR: selftest failed for M51038681 (mfakto_cl_71)
  no factor found
########## testcase 5/14 ##########
ERROR: selftest failed for M49717271 (mfakto_cl_71)
  no factor found
########## testcase 6/14 ##########
########## testcase 7/14 ##########
########## testcase 8/14 ##########
########## testcase 9/14 ##########
ERROR: selftest failed for M60009109 (mfakto_cl_71)
  no factor found
########## testcase 10/14 ##########
ERROR: selftest failed for M60002273 (mfakto_cl_71)
  no factor found
########## testcase 11/14 ##########
ERROR: selftest failed for M60004333 (mfakto_cl_71)
  no factor found
########## testcase 12/14 ##########
########## testcase 13/14 ##########
########## testcase 14/14 ##########
Selftest statistics
  number of tests           52
  successfull tests         44
  no factor found           8

selftest FAILED!
The GPU does not make very much noise at all during the test, but I assume it is doing something!

Last fiddled with by KingKurly on 2011-08-19 at 01:37 Reason: Got code to compile
KingKurly is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2938 2023-06-30 14:04
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3628 2023-04-17 22:08
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 14:47.


Fri Jul 7 14:47:56 UTC 2023 up 323 days, 12:16, 0 users, load averages: 1.38, 1.40, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔