mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-10-08, 21:35   #56
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Is it because one vector is unsigned and the other is signed?
Hmm. Hard to believe. Normally, signed/unsigned mismatch is just a warning. It's true that the (boolean) result of a comparison is signed in OpenCL. Therefore I have the ternary there that should provide unsigned long ("<n>UL"). But here, it seems more like there is no way of promoting an unsigned long ("ulong") to a vector ("ulong2"). Normally, that promotion should be implicit - but not here. Even explicit type cast did not work.

Maybe you or kracker could try a few more things, like

r2 += ((r1!=0)? (ulong_v)1UL : (ulong_v)0UL);

or have it select the size automatically?

r2 += ((r1!=0)? 1 : 0);

or integrate the addition into the selection:

r2 = ((r1!=0)? r2+1 : r2);

or ... almost worst case (because threads take a different code path):

if (r1!=0) ++r2;

or ... really worst case (because this is really a function call for conversion):

r2 += convert_##ulong_v((r1!=0)? 1UL : 0UL);


I did not try any of those - possibly some won't even work on AMD. If you find something that works, let me know
Bdot is offline   Reply With Quote
Old 2013-10-08, 23:06   #57
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

37×59 Posts
Default

r2 += ((r1!=0)? (ulong_v)1UL : (ulong_v)0UL);
works Do you want -st with that?
kracker is offline   Reply With Quote
Old 2013-10-09, 21:44   #58
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17×487 Posts
Default

Errors when building with TRACE_KERNEL=2

This compiler is very picky.
Attached Files
File Type: txt errors.txt (15.5 KB, 287 views)
Prime95 is offline   Reply With Quote
Old 2013-10-09, 22:06   #59
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

I've just started to look at the code. My OpenCL knowledge is near zero.

Anyway, this code scared me:

Code:
// AS_UINT is applied only to logical results. For vector operations, these are 0 (false) or -1 (true)
// For scalar operations, they result in 0 (false) or 1 (true) ==> to unify, negate here
#define AS_UINT_V -as_uint
It looks like this is typecasting to an unsigned and then negating it. Shouldn't you negate it while it is a signed int, as in:

Code:
// AS_UINT is applied only to logical results. For vector operations, these are 0 (false) or -1 (true)
// For scalar operations, they result in 0 (false) or 1 (true) ==> to unify, negate here
#define AS_UINT_V(x)  as_uint(-(x))

Last fiddled with by Prime95 on 2013-10-09 at 22:07
Prime95 is offline   Reply With Quote
Old 2013-10-10, 20:34   #60
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I've just started to look at the code. My OpenCL knowledge is near zero.

Anyway, this code scared me:

Code:
// AS_UINT is applied only to logical results. For vector operations, these are 0 (false) or -1 (true)
// For scalar operations, they result in 0 (false) or 1 (true) ==> to unify, negate here
#define AS_UINT_V -as_uint
It looks like this is typecasting to an unsigned and then negating it. Shouldn't you negate it while it is a signed int, as in:

Code:
// AS_UINT is applied only to logical results. For vector operations, these are 0 (false) or -1 (true)
// For scalar operations, they result in 0 (false) or 1 (true) ==> to unify, negate here
#define AS_UINT_V(x)  as_uint(-(x))

You are correct with your observation. In the past, I did not care too much about VECTOR_SIZE=1 as it does not fit the AMD GPUs very well.
On the other hand, again, signed/unsigned issues are only warnings - apart from a right shift and mul_hi, they don't differ in their operations. I think I use AS_UINT only for additions/subtractions, but I did not verify that.

Regarding the TRACE_KERNEL, the compiler is right. For VECTOR_SIZE=1, the variables do not have the vector components. To trace the kernels, set at least VECTOR_SIZE=2. Or modify the trace statements to remove the ".s0" everywhere.


I currently have too much other stuff to do to actually dig into that - I hope in two weeks the situation normalizes again.
Bdot is offline   Reply With Quote
Old 2013-10-10, 20:42   #61
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by kracker View Post
r2 += ((r1!=0)? (ulong_v)1UL : (ulong_v)0UL);
works Do you want -st with that?
I'll build a PerformanceInfo version when I have time ... but that has to wait a bit.
Bdot is offline   Reply With Quote
Old 2013-11-02, 19:58   #62
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

If I didn't screw things up too badly, I have a trace for bdot.
Attached Files
File Type: zip trace.zip (405.0 KB, 171 views)
Prime95 is offline   Reply With Quote
Old 2013-11-03, 22:23   #63
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

The calculation looks good - no exceptionally big values, all within bounds. What does look odd is the initial shifted value, bb:
Code:
cl_barrett15_82: bb=fffff968:ffffffff:1e43:dd5b57:dd5bff:0:0:dd5b57:ffffffff:0:ef:216c0e, bit_max65=5
First, it is supposed to have only one bit set in the whole value, second, only the lower 15 bits of each components should ever be used.

This makes me think that passing this custom type (int180, a struct of 12 uints) does not work and leaves the initial shifted value uninitialized.

I have seen this bug in the AMD drivers two years ago and made a workaround at that time.
Changelog:
Code:
version 0.10 (2011-12-19)
- added workaround for compatibility with Catalyst 11.10 and above
...
However, since the GPU sieve was added, that workaround is implemented only in the top level of the functions, where it needs to be passed on one more time. This is OK for AMD, but maybe not for Intel. I'll try to change that.
Bdot is offline   Reply With Quote
Old 2013-11-03, 23:01   #64
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

827910 Posts
Default

Quote:
Originally Posted by Bdot View Post
What does look odd is the initial shifted value, bb
Sorry, I should have explained what all I did to get it to compile. I commented out the bb initializations and undefined the work around #define. I think it was called something like WA_CATALYST_SOMETHING_OR_OTHER.

The compile problems when tracing are numerous. All the .s0, .s1, etc references are no good. Maybe I'm not working from the latest source?
Prime95 is offline   Reply With Quote
Old 2013-11-03, 23:48   #65
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Sorry, I should have explained what all I did to get it to compile. I commented out the bb initializations and undefined the work around #define. I think it was called something like WA_CATALYST_SOMETHING_OR_OTHER.

The compile problems when tracing are numerous. All the .s0, .s1, etc references are no good. Maybe I'm not working from the latest source?
OK, WA_FOR_CATALYST11_10_BUG is defined in params.h and mfakto_Kernels.cl. Did you remove it from both?

Regarding .s0, .s1 etc.: I have never before traced with VectorSize=1, when all these values are scalar instead of vectors. Starting with VectorSize=2, the subcomponents need to be addressed by .s0, .s1 ... but I only traced .s0 usually.
Bdot is offline   Reply With Quote
Old 2013-11-04, 00:25   #66
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

Quote:
Originally Posted by Bdot View Post
OK, WA_FOR_CATALYST11_10_BUG is defined in params.h and mfakto_Kernels.cl. Did you remove it from both?
I don't have a params.h file.
Prime95 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I run my CPU's integrated GPU along with my discrete GPU? Red Raven GPU Computing 9 2014-10-24 02:01
New integrated CPU-GPU programming paradigm Dubslow GPU Computing 1 2012-02-15 08:45
Ivy Bridge integrated GPU? Dubslow GPU Computing 7 2011-11-18 23:36
Can I use integrated graphics alongside a GPU? mdettweiler GPU Computing 9 2010-09-15 19:41
turn off your integrated Snd card in CMOS nngs Hardware 0 2005-05-20 01:31

All times are UTC. The time now is 15:18.


Fri Jul 7 15:18:13 UTC 2023 up 323 days, 12:46, 0 users, load averages: 0.86, 1.06, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔