![]() |
|
|
#1079 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
9,497 Posts |
Code:
warning: operator '>>' has lower precedence than '+'; '+' will be evaluated first With that in mind, you are right to worry about ternary '?'s possible precedence clash with '+'. I've looked at the OCL specs book (all versions) and they never even specified the precedence. That's odd. |
|
|
|
|
|
#1080 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
|
|
|
|
|
|
#1081 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
Debugging nightmare. Intel does not support printf in kernels.
|
|
|
|
|
|
#1082 | |||
|
Nov 2010
Germany
59710 Posts |
Quote:
You can use the ini file option OCLCompileOptions to set whichever option you need. Do you know what is the best available optimization I should use for INTEL? -O2 or just -O? Quote:
Quote:
r2 += (ulong_v)((r1!=0)? 1UL : 0UL); |
|||
|
|
|
|
|
#1083 | |||
|
P90 years forever!
Aug 2002
Yeehaw, FL
11101011101112 Posts |
Quote:
I saw the ini file option. I think we need a solution that works without using that. Maybe grep for "Intel" in the device capabilities or something? Quote:
if((status == CL_BUILD_PROGRAM_FAILURE) || (mystuff.verbosity > 2)) I'd set verbosity to 3, so clGetProgramBuildInfo was called even when there was no error. Your log_size > 0 ought to solve this. Quote:
|
|||
|
|
|
|
|
#1084 | ||
|
Nov 2010
Germany
3×199 Posts |
Quote:
Quote:
Do you attempt to get all kernels to run? Most likely, there is no performance improvement when using mul24 instead of 32-bit multiplications on Intel. Therefore barrett24 and barrett15 will be rather slow. montgomery.cl is good only in some corner case, it was rather a test so I got a feeling how it compares to barrett. |
||
|
|
|
|
|
#1085 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
My bad. Printf does work -- whew! One must delete the .elf file every time you change a .cl file. I'll see if I can change the makefile to do this for me automatically.
Right now, I'm just trying to get the program to pass the self-test. The (or a) problem is in GPU sieving or in translating the sieve into k_deltas. Optimization, if I'm so motivated, comes later. |
|
|
|
|
|
#1086 |
|
Nov 2010
Germany
3·199 Posts |
|
|
|
|
|
|
#1087 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
165678 Posts |
Quote:
Next mini-bug: In extract_bits exponent, k_base, shiftcount, bit_max64, bb are undefined in this code Code:
#if (TRACE_SIEVE_KERNEL > 0)
if (lid==TRACE_SIEVE_TID) printf((__constant char *)"extract_bits: exp=%d=%#x, k=%x:%x:%x, bits=%d, shift=%d, bit_max64=%d, bb=%x:%x:%x:%x:%x:%x, wpt=%u, base addr=%#x\n",
exponent, exponent, k_base.d2, k_base.d1, k_base.d0, bits_to_process, shiftcount, bit_max64, bb.d5, bb.d4, bb.d3, bb.d2, bb.d1, bb.d0, words_per_thread, bit_array);
#endif
Last fiddled with by Prime95 on 2014-06-30 at 04:30 |
|
|
|
|
|
|
#1088 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
165678 Posts |
Here is the code in CalcModularInverses that is tripping up the Intel port:
Code:
facdist = (ulong) (2 * NUM_CLASSES) * (ulong) exponent; I also tried facdist = 2*NUM_CLASSES; ulongtemp = exponent; facdist *= ulongtemp; without success. Any other ideas? |
|
|
|
|
|
#1089 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
I'd try (can't test right now) facdist = (ulong) exponent * 2 * NUM_CLASSES; or even facdist = (ulong) exponent * 2ULL * NUM_CLASSES##ULL; |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3498 | 2021-08-06 21:07 |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2719 | 2021-08-05 22:43 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |