![]() |
|
|
#694 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
216810 Posts |
|
|
|
|
|
|
#695 |
|
Mar 2013
1 Posts |
How about this ?
Code:
mask = (1 << (i37 & 31))
| (1 << (i41 & 31)) | (1 << (i43 & 31)) | (1 << (i47 & 31))
| (1 << (i53 & 31)) | (1 << (i59 & 31)) | (1 << (i61 & 31));
|
|
|
|
|
|
#696 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
|
|
|
|
|
|
#697 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
7,537 Posts |
Does OpenCL generate better code than CUDA does for 64-bit variables? The masking code can be rewritten to use 64-bit masks -- although it is not a trivial matter.
If memory lookups are cheap then 1 << i37 can be replaced by a lookup into a 37-element array. If generating 0 or 1 from a conditional is "cheap" then 1 << i37 can be replaced with (i37 < 32) << i37, where i37 < 32 evaluates to 0 or 1. |
|
|
|
|
|
#698 | |
|
Nov 2010
Germany
11258 Posts |
Quote:
The array-lookup also looks promising, as I have lots of constant-memory available. |
|
|
|
|
|
|
#699 | |
|
∂2ω=0
Sep 2002
República de California
19·613 Posts |
Quote:
If it's a matter of forcing OpenCL to respect your (unmasked) shift count, is writing a tiny inline-ASM macro for such shifts an option? |
|
|
|
|
|
|
#700 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
|
|
|
|
|
|
|
#701 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
976710 Posts |
|
|
|
|
|
|
#702 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
753710 Posts |
Doesn't the C standard state that x << y is implementation dependent if y is greater than the word size? If so, this is simply a case of non-portable C code written to extract maximum efficiency of a particular architecture. Neither AMD nor Nvidia did anything wrong.
|
|
|
|
|
|
#703 | |
|
Bemusing Prompter
"Danny"
Dec 2002
California
45338 Posts |
Quote:
|
|
|
|
|
|
|
#704 | ||
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
Quote:
Quote:
Last fiddled with by chalsall on 2013-03-08 at 00:54 |
||
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |