![]() |
[CODE]warning: operator '>>' has lower precedence than '+'; '+' will be evaluated first[/CODE]Sheesh! That's quite an inconvenient precedence order. At least they put a warning.
With that in mind, you are right to worry about ternary '?'s possible precedence clash with '+'. I've looked at the OCL specs book (all versions) and they never even specified the precedence. That's odd. |
[QUOTE=Batalov;376972]I've looked at the OCL specs book (all versions) and they never even specified the precedence. That's odd.[/QUOTE]
I presume the standard specifies using C operator precedence. |
Debugging nightmare. Intel does not support printf in kernels.
|
[QUOTE=Prime95;376965]The Intel compiler is whining, can someone please change the source code for the warnings below[/QUOTE]
Thank you, this is probably the reason why I never got this function to work. It is only used in a test kernel, not in any factoring. [QUOTE=Prime95;376970]bug reports: 1) The Intel compiler does not like -O3. [/QUOTE] You can use the ini file option OCLCompileOptions to set whichever option you need. Do you know what is the best available optimization I should use for INTEL? -O2 or just -O? [QUOTE=Prime95;376970] 2) When clBuildProgram is called with invalid build options it returns error -43. If verbosity is set to 3, then clGetBuildInfo tries to get the build log -- there is none -- and some trash characters are output[/QUOTE] Hmm, if you dont see "Error ... clGetProgramBuildInfo failed.", then this would be an INTEL API bug. Or, it returns an OK status but no log ... I've added a check if the resulting log size is >0. [QUOTE=Prime95;376971]Please change line 84 of Montgomery.cl to: r2 += ((r1!=0)? (ulong_v)1UL : (ulong_v)0UL);[/QUOTE] OK, done. But why? Shouldn't this be the same as r2 += (ulong_v)((r1!=0)? 1UL : 0UL); |
[QUOTE=Bdot;376992]
You can use the ini file option OCLCompileOptions to set whichever option you need. Do you know what is the best available optimization I should use for INTEL? -O2 or just -O?[/quote] I eliminated the -O argument completely. I've not read the Intel docs to see if -O is supported. I saw the ini file option. I think we need a solution that works without using that. Maybe grep for "Intel" in the device capabilities or something? [quote]Hmm, if you dont see "Error ... clGetProgramBuildInfo failed.", then this would be an INTEL API bug. Or, it returns an OK status but no log ... I've added a check if the resulting log size is >0.[/quote] The problem I think is this line: if((status == CL_BUILD_PROGRAM_FAILURE) || (mystuff.verbosity > 2)) I'd set verbosity to 3, so clGetProgramBuildInfo was called even when there was no error. Your log_size > 0 ought to solve this. [quote]OK, done. But why? Shouldn't this be the same as r2 += (ulong_v)((r1!=0)? 1UL : 0UL);[/QUOTE] I'm with you, it should be the same. |
[QUOTE=Prime95;376995]I eliminated the -O argument completely. I've not read the Intel docs to see if -O is supported.
I saw the ini file option. I think we need a solution that works without using that. Maybe grep for "Intel" in the device capabilities or something? [/QUOTE] Agreed, I may need to decouple the device detection from the kernel compilation, so that the compilation can use what has been found out about the device. Currently that needs to be set as GPUType ini file option. When that is set to NVIDIA, for example, then the kernel compilation skips O3. This should be automatic - I'll put that in for the next version. [QUOTE=Prime95;376995] The problem I think is this line: if((status == CL_BUILD_PROGRAM_FAILURE) || (mystuff.verbosity > 2)) I'd set verbosity to 3, so clGetProgramBuildInfo was called even when there was no error. [/QUOTE] This was intentional so there is a chance to see build warnings or other output. AMD always provides non-zero build info. Do you attempt to get all kernels to run? Most likely, there is no performance improvement when using mul24 instead of 32-bit multiplications on Intel. Therefore barrett24 and barrett15 will be rather slow. montgomery.cl is good only in some corner case, it was rather a test so I got a feeling how it compares to barrett. |
[QUOTE=Prime95;376980]Debugging nightmare. Intel does not support printf in kernels.[/QUOTE]
My bad. Printf does work -- whew! One must delete the .elf file every time you change a .cl file. I'll see if I can change the makefile to do this for me automatically. Right now, I'm just trying to get the program to pass the self-test. The (or a) problem is in GPU sieving or in translating the sieve into k_deltas. Optimization, if I'm so motivated, comes later. |
[QUOTE=Prime95;377009]My bad. Printf does work -- whew! One must delete the .elf file every time you change a .cl file. [/QUOTE]
Remove the UseBinfile setting from the ini file, then it will always recompile. |
[QUOTE=Bdot;377011]Remove the UseBinfile setting from the ini file, then it will always recompile.[/QUOTE]
Thanks! Next mini-bug: In extract_bits exponent, k_base, shiftcount, bit_max64, bb are undefined in this code [CODE]#if (TRACE_SIEVE_KERNEL > 0) if (lid==TRACE_SIEVE_TID) printf((__constant char *)"extract_bits: exp=%d=%#x, k=%x:%x:%x, bits=%d, shift=%d, bit_max64=%d, bb=%x:%x:%x:%x:%x:%x, wpt=%u, base addr=%#x\n", exponent, exponent, k_base.d2, k_base.d1, k_base.d0, bits_to_process, shiftcount, bit_max64, bb.d5, bb.d4, bb.d3, bb.d2, bb.d1, bb.d0, words_per_thread, bit_array); #endif [/CODE] |
Here is the code in CalcModularInverses that is tripping up the Intel port:
[CODE] facdist = (ulong) (2 * NUM_CLASSES) * (ulong) exponent; [/CODE] I can't get Intel to produce 64-bit quantities here. What does the OpenCL spec say about 64-bit multiplies? I also tried facdist = 2*NUM_CLASSES; ulongtemp = exponent; facdist *= ulongtemp; without success. Any other ideas? |
[QUOTE=Prime95;377037]Here is the code in CalcModularInverses that is tripping up the Intel port:
[CODE] facdist = (ulong) (2 * NUM_CLASSES) * (ulong) exponent; [/CODE]I can't get Intel to produce 64-bit quantities here. What does the OpenCL spec say about 64-bit multiplies? I also tried facdist = 2*NUM_CLASSES; ulongtemp = exponent; facdist *= ulongtemp; without success. Any other ideas?[/QUOTE] I noticed that the calculations are done from left to right, and the first factor usually determines the size of the calculation. If the target needs size adjustment, that is done on the result of the multiplication. I'd try (can't test right now) facdist = (ulong) exponent * 2 * NUM_CLASSES; or even facdist = (ulong) exponent * 2ULL * NUM_CLASSES##ULL; |
| All times are UTC. The time now is 23:06. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.