![]() |
|
|
#122 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
27AE16 Posts |
|
|
|
|
|
|
#123 |
|
Mar 2010
Jyvaskyla, Finland
22×32 Posts |
FWIW:
Code:
Adapter 0 - AMD Radeon HD 7900 Series
New Core Peak : 1000
New Memory Peak : 1500
Platform :Advanced Micro Devices, Inc.
Device 0 : Tahiti
start M57885161 fft length = 3145728
err = 0.352051, increasing n from 3145728
start M57885161 fft length = 3276800
Iteration 10000 M( 57885161 )C, 0x76c27556683cd84d, n = 3276800, clLucas v1.00 err = 0.1289 (4:01 real, 24.1701 ms/iter, ETA 388:32:03)
Iteration 20000 M( 57885161 )C, 0xfd8e311d20ffe6ab, n = 3276800, clLucas v1.00 err = 0.1289 (4:00 real, 23.9985 ms/iter, ETA 385:42:34)
I also tested a couple of 5870s, but they keep throwing this error Code:
Error: CommandQueue::enqueueNDRangeKernel() failed. Error code : CL_INVALID_WORK_GROUP_SIZE Location : Kernels.cpp:425 However, I'm quite happy with the overall development so far, nice work msft
|
|
|
|
|
|
#126 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
HD 7770
Code:
Iteration 10000 M( 22256453 )C, 0x3d9450d492b7e880, n = 1179648, clLucas v1.00 err = 0.2656 (2:39 real, 15.9270 ms/iter, ETA 98:23:35) Iteration 10000 M( 24732709 )C, 0x81a12a304a754572, n = 1310720, clLucas v1.00 err = 0.2813 (3:04 real, 18.4028 ms/iter, ETA 126:21:56) Iteration 10000 M( 29412433 )C, 0x27d7d112a73aa203, n = 1572864, clLucas v1.00 err = 0.25 (4:02 real, 24.1517 ms/iter, ETA 197:14:18) Iteration 10000 M( 30620113 )C, 0x212dca3cec0acde2, n = 1638400, clLucas v1.00 err = 0.25 (5:11 real, 31.0753 ms/iter, ETA 264:13:34) Iteration 10000 M( 32993419 )C, 0xcf86a69b844e35c0, n = 1769472, clLucas v1.00 err = 0.2813 (7:18 real, 43.7117 ms/iter, ETA 400:26:52) Iteration 10000 M( 36418493 )C, 0x2f1388379572d5b4, n = 1966080, clLucas v1.00 err = 0.25 (5:30 real, 33.0295 ms/iter, ETA 333:57:52) Iteration 10000 M( 38955173 )C, 0x8a45e3bbd4e4fc9b, n = 2097152, clLucas v1.00 err = 0.25 (2:03 real, 12.2586 ms/iter, ETA 132:35:49) Iteration 10000 M( 43792559 )C, 0x7048d84bbfb0f810, n = 2359296, clLucas v1.00 err = 0.2813 (5:54 real, 35.3544 ms/iter, ETA 429:56:54) Iteration 10000 M( 48375209 )C, 0xf957e240d591a99e, n = 2621440, clLucas v1.00 err = 0.2188 (6:33 real, 39.2538 ms/iter, ETA 527:18:36) Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3145728, clLucas v1.00 err = 0.25 (9:00 real, 53.9709 ms/iter, ETA 867:43:57) Iteration 10000 M( 60622229 )C, 0xd81c849f11fd1054, n = 3276800, clLucas v1.00 err = 0.2813 (11:06 real, 66.5953 ms/iter, ETA 1121:12:22) Iteration 10000 M( 65066623 )C, 0xde7aeb8cc7a2a826, n = 3538944, clLucas v1.00 err = 0.2539 (16:00 real, 96.0663 ms/iter, ETA 1735:51:51) Iteration 10000 M( 67662869 )C, 0xf854d1dee3fbb5d7, n = 3932160, clLucas v1.00 err = 0.05078 (11:59 real, 71.8933 ms/iter, ETA 1350:59:44) Iteration 10000 M( 76722161 )C, 0x4b6ba0a6078e4bbb, n = 4194304, clLucas v1.00 err = 0.25 (4:14 real, 24.9663 ms/iter, ETA 540:43:00) |
|
|
|
|
|
#127 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
31510 Posts |
The errors are nicely low. Is this using the double double sin and cos data for the ffts?
|
|
|
|
|
|
#128 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Quote:
![]() EDIT: Just realized I won't be able to submit my DC(yet) when it is doen to Primenet. Last fiddled with by kracker on 2013-09-16 at 02:04 |
|
|
|
|
|
|
#129 |
|
Jul 2009
Tokyo
2·5·61 Posts |
|
|
|
|
|
|
#130 | |
|
Jul 2009
Tokyo
2·5·61 Posts |
Quote:
Double sin cos,I make fftw like function. Code:
/*
* Improve accuracy by reducing x to range [0..1/8]
* before multiplication by 2 * PI.
*/
#define K2PI 6.2831853071795864769252867665590057683943388
#define m2pi(m, n) ((K2PI * (m)) / (n))
static void ap_sincos(int m, int n, double * si, double * co)
{
double s,c,theta;
int n14,n24,n34,n44;
int m14,m44;
int n18,n28,n38,n48,n58,n68,n78;
int m88;
n14 = n;
n24 = n14 + n14;
n34 = n14 + n24;
n44 = n24 + n24;
m14 = m;
m44 = m14 + m14;
m44 = m44 + m44;
n18 = n;
n28 = n18 + n18;
n38 = n18 + n28;
n48 = n28 + n28;
n58 = n28 + n38;
n68 = n38 + n38;
n78 = n38 + n48;
m88 = m + m;
m88 = m88 + m88;
m88 = m88 + m88;
if(n18 > m88)
{
theta = m2pi(m44,n44);
s = sin(theta);
c = cos(theta);
}
else if(n28 > m88)
{
theta = m2pi(n14-m44,n44);
s = cos(theta);
c = sin(theta);
} else if(n38 > m88)
{
theta = m2pi(m44-n14,n44);
s = cos(theta);
c = -sin(theta);
} else if(n48 > m88)
{
theta = m2pi(n24-m44,n44);
s = sin(theta);
c = -cos(theta);
} else if(n58 > m88)
{
theta = m2pi(m44-n24,n44);
s = -sin(theta);
c = -cos(theta);
} else if(n68 > m88)
{
theta = m2pi(n34-m44,n44);
s = -cos(theta);
c = -sin(theta);
} else if(n78 > m88)
{
theta = m2pi(m44-n34,n44);
s = -cos(theta);
c = sin(theta);
} else
{
theta = m2pi(n44-m44,n44);
s = -sin(theta);
c = cos(theta);
}
*si = s;
*co = c;
}
|
|
|
|
|
|
|
#131 | |
|
Jul 2009
Tokyo
61010 Posts |
Quote:
I found bug with this issue,Fix next version. |
|
|
|
|
|
|
#132 |
|
Jul 2009
Tokyo
2·5·61 Posts |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS | VictordeHolland | Linux | 4 | 2018-04-11 13:44 |
| OpenCL accellerated lattice siever | pstach | Factoring | 1 | 2014-05-23 01:03 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| AMD's Graphics Core Next- a reason to accelerate towards OpenCL? | Belteshazzar | GPU Computing | 19 | 2012-03-07 18:58 |