![]() |
|
|
#45 | |
|
Nov 2010
Germany
25516 Posts |
Quote:
All errors are caused by the 32- or 24-bit kernels. The 15-bit kernels have no failures in this run. However, the 32-bit kernels are much faster. This (probably) rules out different rounding, as the 15-bit kernels rely much more on proper rounding - they require up to 22 bits per float to be accurate, whereas the 32-bit kernels only require 20. The only other "big" difference is, that the 15-bit kernels do not use mul_hi or any other 32-bit multiplication ![]() What next?
|
|
|
|
|
|
|
#46 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
37×59 Posts |
Quote:
|
|
|
|
|
|
|
#47 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
Quote:
|
|
|
|
|
|
|
#48 |
|
Nov 2010
Germany
3·199 Posts |
OK, I'll prepare that.
Could you please run the quick selftest for VectorSize=2? I think I have some issue with VectorSize=1, also on AMD I see more or less "random" failures there as well ...
|
|
|
|
|
|
#49 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
37·59 Posts |
kernels don't compile other than 1...
![]() Code:
mfakto 0.14pre1-Win (64bit build)
Runtime options
Inifile mfakto.ini
Verbosity 1
SieveOnGPU no
SievePrimesMin 5000
SievePrimesMax 200000
SievePrimes 25000
SievePrimesAdjust 1
NumStreams 3
GridSize 4
SieveCPUMask 0
GPUSieveSize 64Mi bits
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode compact
V5UserID none
ComputerID none
TimeStampInResults yes
VectorSize 2
GPUType AUTO
SmallExp no
WARNING: Cannot read UseBinfile from inifile, set to 0 by default
UseBinfile 0
Compiletime options
SIEVE_SIZE_LIMIT 36kiB
SIEVE_SIZE 289731bits
SIEVE_SPLIT 250
MORE_CLASSES enabled
Select device - Get device info -
INFO: Device does not support out-of-order operations. Fallback to in-order queues.
Compiling kernels.
BUILD OUTPUT
In file included from :90:
.\montgomery.cl:82:6: error: can't convert between vector values of different size ('ulong2' and 'long __attribute__((ext_vector_type(2)))')
r2 += (r1!=0)? 1UL : 0UL;
~~ ^ ~~~~~~~~~~~~~~~~~~
error: front end compiler failed build.
END OF BUILD OUTPUT
Error -11 (Build program failure ): clBuildProgram
init_CL(3, 11) failed
|
|
|
|
|
|
#50 |
|
Nov 2010
Germany
3·199 Posts |
|
|
|
|
|
|
#51 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
42078 Posts |
|
|
|
|
|
|
#52 |
|
Nov 2010
Germany
3·199 Posts |
|
|
|
|
|
|
#53 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
37·59 Posts |
This is with cpu sieving and vectorsize 1: it works "normally" only on vectorsize 1.
edit: more later, sorry for quick message, busy today :( edit2: with new montgomery.cl vectorsize 2 Code:
.\montgomery.cl:82:9: error: invalid conversion between ext-vector type 'ulong2' and 'long __attribute__((ext_vector_type(2)))'
r2 += (ulong_v)((r1!=0)? 1UL : 0UL);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Last fiddled with by kracker on 2013-10-05 at 23:59 |
|
|
|
|
|
#54 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
Speaking of which, AMD compiles fine but has difficulties in evaluating simple expressions: Code:
if (get_global_id(0)==TRACE_TID) printf((__constant char *)"div2.6: q.d4=%x, carry=%x, nn.d3=%x\n",
q.d4, carry, nn.d3);
q.d4 = q.d4 - nn.d3 + carry;
if (get_global_id(0)==TRACE_TID) printf((__constant char *)"div2.7: q.d4=%x, carry=%x, nn.d3=%x\n",
q.d4, carry, nn.d3);
Code:
div2.6: q.d4=51ed, carry=0, nn.d3=51ed div2.7: q.d4=394d8646, carry=0, nn.d3=51ed 0x51ed - 0x51ed + 0 = 0x394d8646. This is the reason why for me all the selftests fail at VectorSize=1. Only at higher VectorSizes, the GPU seems to feel the need to save the calculated result to the intended place ... (not sure, which part of the memory is overwritten at VectorSize=1 - at every run I get a different result for q.d4). |
|
|
|
|
|
|
#55 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17×487 Posts |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Can I run my CPU's integrated GPU along with my discrete GPU? | Red Raven | GPU Computing | 9 | 2014-10-24 02:01 |
| New integrated CPU-GPU programming paradigm | Dubslow | GPU Computing | 1 | 2012-02-15 08:45 |
| Ivy Bridge integrated GPU? | Dubslow | GPU Computing | 7 | 2011-11-18 23:36 |
| Can I use integrated graphics alongside a GPU? | mdettweiler | GPU Computing | 9 | 2010-09-15 19:41 |
| turn off your integrated Snd card in CMOS | nngs | Hardware | 0 | 2005-05-20 01:31 |