mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2014-06-29, 02:35   #1079
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

9,497 Posts
Default

Code:
warning: operator '>>' has lower precedence than '+'; '+'  will be evaluated first
Sheesh! That's quite an inconvenient precedence order. At least they put a warning.

With that in mind, you are right to worry about ternary '?'s possible precedence clash with '+'.

I've looked at the OCL specs book (all versions) and they never even specified the precedence. That's odd.
Batalov is offline   Reply With Quote
Old 2014-06-29, 05:19   #1080
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19·397 Posts
Default

Quote:
Originally Posted by Batalov View Post
I've looked at the OCL specs book (all versions) and they never even specified the precedence. That's odd.
I presume the standard specifies using C operator precedence.
Prime95 is offline   Reply With Quote
Old 2014-06-29, 05:21   #1081
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default

Debugging nightmare. Intel does not support printf in kernels.
Prime95 is offline   Reply With Quote
Old 2014-06-29, 11:43   #1082
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Default

Quote:
Originally Posted by Prime95 View Post
The Intel compiler is whining, can someone please change the source code for the warnings below
Thank you, this is probably the reason why I never got this function to work. It is only used in a test kernel, not in any factoring.

Quote:
Originally Posted by Prime95 View Post
bug reports:

1) The Intel compiler does not like -O3.
You can use the ini file option OCLCompileOptions to set whichever option you need. Do you know what is the best available optimization I should use for INTEL? -O2 or just -O?
Quote:
Originally Posted by Prime95 View Post
2) When clBuildProgram is called with invalid build options it returns error -43. If verbosity is set to 3, then clGetBuildInfo tries to get the build log -- there is none -- and some trash characters are output
Hmm, if you dont see "Error ... clGetProgramBuildInfo failed.", then this would be an INTEL API bug. Or, it returns an OK status but no log ... I've added a check if the resulting log size is >0.
Quote:
Originally Posted by Prime95 View Post
Please change line 84 of Montgomery.cl to:

r2 += ((r1!=0)? (ulong_v)1UL : (ulong_v)0UL);
OK, done. But why? Shouldn't this be the same as
r2 += (ulong_v)((r1!=0)? 1UL : 0UL);
Bdot is offline   Reply With Quote
Old 2014-06-29, 13:45   #1083
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011101112 Posts
Default

Quote:
Originally Posted by Bdot View Post
You can use the ini file option OCLCompileOptions to set whichever option you need. Do you know what is the best available optimization I should use for INTEL? -O2 or just -O?
I eliminated the -O argument completely. I've not read the Intel docs to see if -O is supported.

I saw the ini file option. I think we need a solution that works without using that. Maybe grep for "Intel" in the device capabilities or something?

Quote:
Hmm, if you dont see "Error ... clGetProgramBuildInfo failed.", then this would be an INTEL API bug. Or, it returns an OK status but no log ... I've added a check if the resulting log size is >0.
The problem I think is this line:

if((status == CL_BUILD_PROGRAM_FAILURE) || (mystuff.verbosity > 2))

I'd set verbosity to 3, so clGetProgramBuildInfo was called even when there was no error.

Your log_size > 0 ought to solve this.

Quote:
OK, done. But why? Shouldn't this be the same as
r2 += (ulong_v)((r1!=0)? 1UL : 0UL);
I'm with you, it should be the same.
Prime95 is offline   Reply With Quote
Old 2014-06-29, 19:46   #1084
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I eliminated the -O argument completely. I've not read the Intel docs to see if -O is supported.

I saw the ini file option. I think we need a solution that works without using that. Maybe grep for "Intel" in the device capabilities or something?
Agreed, I may need to decouple the device detection from the kernel compilation, so that the compilation can use what has been found out about the device. Currently that needs to be set as GPUType ini file option. When that is set to NVIDIA, for example, then the kernel compilation skips O3. This should be automatic - I'll put that in for the next version.

Quote:
Originally Posted by Prime95 View Post
The problem I think is this line:

if((status == CL_BUILD_PROGRAM_FAILURE) || (mystuff.verbosity > 2))

I'd set verbosity to 3, so clGetProgramBuildInfo was called even when there was no error.
This was intentional so there is a chance to see build warnings or other output. AMD always provides non-zero build info.

Do you attempt to get all kernels to run? Most likely, there is no performance improvement when using mul24 instead of 32-bit multiplications on Intel. Therefore barrett24 and barrett15 will be rather slow. montgomery.cl is good only in some corner case, it was rather a test so I got a feeling how it compares to barrett.
Bdot is offline   Reply With Quote
Old 2014-06-29, 20:33   #1085
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19·397 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Debugging nightmare. Intel does not support printf in kernels.
My bad. Printf does work -- whew! One must delete the .elf file every time you change a .cl file. I'll see if I can change the makefile to do this for me automatically.

Right now, I'm just trying to get the program to pass the self-test. The (or a) problem is in GPU sieving or in translating the sieve into k_deltas. Optimization, if I'm so motivated, comes later.
Prime95 is offline   Reply With Quote
Old 2014-06-29, 21:32   #1086
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by Prime95 View Post
My bad. Printf does work -- whew! One must delete the .elf file every time you change a .cl file.
Remove the UseBinfile setting from the ini file, then it will always recompile.
Bdot is offline   Reply With Quote
Old 2014-06-30, 00:47   #1087
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

165678 Posts
Default

Quote:
Originally Posted by Bdot View Post
Remove the UseBinfile setting from the ini file, then it will always recompile.
Thanks!

Next mini-bug: In extract_bits exponent, k_base, shiftcount, bit_max64, bb are undefined in this code

Code:
#if (TRACE_SIEVE_KERNEL > 0)
    if (lid==TRACE_SIEVE_TID) printf((__constant char *)"extract_bits: exp=%d=%#x, k=%x:%x:%x, bits=%d, shift=%d, bit_max64=%d, bb=%x:%x:%x:%x:%x:%x, wpt=%u, base addr=%#x\n",
        exponent, exponent, k_base.d2, k_base.d1, k_base.d0, bits_to_process, shiftcount, bit_max64, bb.d5, bb.d4, bb.d3, bb.d2, bb.d1, bb.d0, words_per_thread, bit_array);
#endif

Last fiddled with by Prime95 on 2014-06-30 at 04:30
Prime95 is offline   Reply With Quote
Old 2014-06-30, 04:29   #1088
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

165678 Posts
Default

Here is the code in CalcModularInverses that is tripping up the Intel port:

Code:
	facdist = (ulong) (2 * NUM_CLASSES) * (ulong) exponent;
I can't get Intel to produce 64-bit quantities here. What does the OpenCL spec say about 64-bit multiplies?

I also tried

facdist = 2*NUM_CLASSES; ulongtemp = exponent; facdist *= ulongtemp;

without success.

Any other ideas?
Prime95 is offline   Reply With Quote
Old 2014-06-30, 09:33   #1089
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Here is the code in CalcModularInverses that is tripping up the Intel port:

Code:
    facdist = (ulong) (2 * NUM_CLASSES) * (ulong) exponent;
I can't get Intel to produce 64-bit quantities here. What does the OpenCL spec say about 64-bit multiplies?

I also tried

facdist = 2*NUM_CLASSES; ulongtemp = exponent; facdist *= ulongtemp;

without success.

Any other ideas?
I noticed that the calculations are done from left to right, and the first factor usually determines the size of the calculation. If the target needs size adjustment, that is done on the result of the multiplication.

I'd try (can't test right now)

facdist = (ulong) exponent * 2 * NUM_CLASSES;

or even

facdist = (ulong) exponent * 2ULL * NUM_CLASSES##ULL;
Bdot is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3498 2021-08-06 21:07
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2719 2021-08-05 22:43
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 23:30.


Fri Aug 6 23:30:48 UTC 2021 up 14 days, 17:59, 1 user, load averages: 3.86, 3.86, 3.94

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.