mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-07-05, 11:01   #56
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·23·31 Posts
Default

Shifts are 'modular' like this on any processor architecture you're likely to encounter. The only exception is the PowerPC line of processors, which allow one more bit to figure into the shift amount, so that e.g. a shift greater than 31 will zero a register.
jasonp is offline   Reply With Quote
Old 2011-07-05, 11:34   #57
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3×199 Posts
Default

Quote:
Originally Posted by jasonp View Post
Shifts are 'modular' like this on any processor architecture you're likely to encounter.
ARM doesn't behave that way, but that perhaps doesn't fall into the "likely to encounter" category.

FWIW I had a bug in some MP code when compiled with MS C compiler where a right shift by 32 was considered as a NOP. That can be considered as a legal treatment given that the ANSI C standard leaves shift amounts >= size of operands as undefined.
ldesnogu is offline   Reply With Quote
Old 2011-07-05, 23:13   #58
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by jasonp View Post
Shifts are 'modular' like this on any processor architecture you're likely to encounter. The only exception is the PowerPC line of processors, which allow one more bit to figure into the shift amount, so that e.g. a shift greater than 31 will zero a register.
Well, I took that over from mfaktc, and we all know that it works there ... so you have another exception

And I learned something

Quote:
Originally Posted by ldesnogu View Post
ARM doesn't behave that way, but that perhaps doesn't fall into the "likely to encounter" category.
I also happen to program for ARM a little, but on higher level. I don't think I had to use bit-shifts there so far ...

But most importantly, I completed first tests with the vectorized barretts. Extending the previous list:
Quote:
Originally Posted by Bdot View Post
76 M/s mfakto_cl_71_4: 3x24-bit, 4-vectored kernel
68 M/s mfakto_cl_barrett79: 2.5x32-bit unvectored barrett kernel
53 M/s mfakto_cl_barrett92: 3x32-bit unvectored barrett kernel
44 M/s mfakto_cl_71: 3x24-bit unvectored kernel
96 M/s mfakto_cl_barrett79_8: 2.5x32-bit 8-vectored barrett kernel
92 M/s mfakto_cl_barrett79_2: 2.5x32-bit 2-vectored barrett kernel
88 M/s mfakto_cl_barrett79_4: 2.5x32-bit 4-vectored barrett kernel
72 M/s mfakto_cl_barrett92_8: 3x32-bit 8-vectored barrett kernel
71 M/s mfakto_cl_barrett92_4: 3x32-bit 4-vectored barrett kernel
70 M/s mfakto_cl_barrett92_2: 3x32-bit 2-vectored barrett kernel

Now it would be interesting how a 24-bit vectored barrett kernel would do ...
Bdot is offline   Reply With Quote
Old 2011-07-12, 09:03   #59
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default Anyone running a HD4xxx?

I'd like to see what capabilities a HD4xxx has. Would anyone in possession of such a GPU please post the output of clinfo? I'm mostly interested in the "Extensions:" part, but if you could pm me the full output that'd be nice too.

clinfo is part of the AMD-APP-SDK ...

Thanks a lot ...
Bdot is offline   Reply With Quote
Old 2011-07-12, 13:51   #60
apsen
 
Jun 2011

131 Posts
Default

Quote:
Originally Posted by Bdot View Post
I'd like to see what capabilities a HD4xxx has. Would anyone in possession of such a GPU please post the output of clinfo? I'm mostly interested in the "Extensions:" part, but if you could pm me the full output that'd be nice too.

clinfo is part of the AMD-APP-SDK ...

Thanks a lot ...
HD4550:

Code:
Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 1.1 AMD-APP-SDK-v2.4 (650.9)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 2
  Device Type:					 CL_DEVICE_TYPE_GPU
  Device ID:					 4098
  Max compute units:				 2
  Max work items dimensions:			 3
    Max work items[0]:				 128
    Max work items[1]:				 128
    Max work items[2]:				 128
  Max work group size:				 128
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 4
  Preferred vector width double:		 0
  Native vector width char:			 16
  Native vector width short:			 8
  Native vector width int:			 4
  Native vector width long:			 2
  Native vector width float:			 4
  Native vector width double:			 0
  Max clock frequency:				 600Mhz
  Address bits:					 32
  Max memory allocation:			 134217728
  Image support:				 No
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 32768
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 268435456
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Global
  Local memory size:				 16384
  Kernel Preferred work group size multiple:	 32
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Platform ID:					 000000000173B118
  Name:						 ATI RV710
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 1.0 
  Driver version:				 CAL 1.4.1332
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.0 AMD-APP-SDK-v2.4 (650.9)
  Extensions:					 cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing 


  Device Type:					 CL_DEVICE_TYPE_CPU
  Device ID:					 4098
  Max compute units:				 2
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 1024
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 4
  Preferred vector width double:		 0
  Native vector width char:			 16
  Native vector width short:			 8
  Native vector width int:			 4
  Native vector width long:			 2
  Native vector width float:			 4
  Native vector width double:			 0
  Max clock frequency:				 2400Mhz
  Address bits:					 64
  Max memory allocation:			 2147483648
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 8192
  Max image 2D height:				 8192
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 4096
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 No
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 32768
  Global memory size:				 4294287360
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Global
  Local memory size:				 32768
  Kernel Preferred work group size multiple:	 1
  Error correction support:			 0
  Unified memory for Host and Device:		 1
  Profiling timer resolution:			 426
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 Yes
  Queue properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Platform ID:					 000000000173B118
  Name:						 Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
  Vendor:					 GenuineIntel
  Device OpenCL C version:			 OpenCL C 1.1 
  Driver version:				 2.0
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.1 AMD-APP-SDK-v2.4 (650.9)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf cl_khr_d3d10_sharing
apsen is offline   Reply With Quote
Old 2011-07-13, 08:33   #61
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Default

Quote:
Originally Posted by apsen View Post
HD4550:

Code:
  Extensions:                     cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing
Thanks, so there is really no atomics available. I'll try to make mfakto adjust automatically ...
Bdot is offline   Reply With Quote
Old 2011-07-19, 08:35   #62
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default Status

Just a short update (so you don't think I've lost interest ):

After I upgraded my ATI drivers to a pre-version of Catalyst 11.7 the barrett92 kernel does not find any factors anymore (72-bit and barrett79 are still fine). I'm currently (re-)introducing Oliver's MODBASECASE checks that I skipped so far.

George has already enabled primenet's manual page for mfakto's results, but as (parts of) mfakto are broken with the new driver (or compiler?), I need to delay mfakto's "official release".

Going back to 11.6 is of no use as it has serious issues with the kernel files as they were growing bigger, and 11.7 does not lock up my machine anymore - so there are improvements ...

Last fiddled with by Bdot on 2011-07-19 at 08:36
Bdot is offline   Reply With Quote
Old 2011-08-15, 20:35   #63
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

11258 Posts
Default mfakto Release!

After I found and fixed the last (?) serious bugs in mfakto, all tests finished as they should. Therefore:

mfakto 0.07 releases

This is the first version going public, let me know of any issues.

Attached are the Windows 32-bit and 64-bit binaries. Source will follow right away, Linux (SuSE 11.4) will come tomorrow.
Attached Files
File Type: zip mfakto-0.07 - Win.zip (190.1 KB, 294 views)
Bdot is offline   Reply With Quote
Old 2011-08-15, 20:37   #64
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

mfakto 0.07 sources
Attached Files
File Type: zip mfakto-0.07 - src.zip (135.9 KB, 265 views)
Bdot is offline   Reply With Quote
Old 2011-08-15, 20:59   #65
monst
 
monst's Avatar
 
Mar 2007

179 Posts
Default

Can you also please post the correct versions of OpenCL.dll and any other dll's that are required? Thanks.
monst is offline   Reply With Quote
Old 2011-08-15, 20:59   #66
firejuggler
 
firejuggler's Avatar
 
"Vincent"
Apr 2010
Over the rainbow

23×5×73 Posts
Default

isn' t open-cl supposed to work on nvidia and radeon card?
firejuggler is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2938 2023-06-30 14:04
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3628 2023-04-17 22:08
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 14:47.


Fri Jul 7 14:47:59 UTC 2023 up 323 days, 12:16, 0 users, load averages: 1.38, 1.40, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔