![]() |
|
|
#78 |
|
Jul 2009
Tokyo
10011000102 Posts |
|
|
|
|
|
|
#79 |
|
Jul 2009
Tokyo
2·5·61 Posts |
Code abduct from fftw3.
clFFT-2.0/src/library/generator.stockham.cpp: Code:
#define K2PI 6.2831853071795864769252867665590057683943388
#define by2pi(m, n) ((K2PI * (m)) / (n))
/*
* Improve accuracy by reducing x to range [0..1/8]
* before multiplication by 2 * PI.
*/
static void real_cexp(int m, int n, double * si, double * co)
{
double theta, c, s, t;
unsigned octant = 0;
int quarter_n = n;
n += n; n += n;
m += m; m += m;
if (m < 0) m += n;
if (m > n - m) { m = n - m; octant |= 4; }
if (m - quarter_n > 0) { m = m - quarter_n; octant |= 2; }
if (m > quarter_n - m) { m = quarter_n - m; octant |= 1; }
theta = by2pi(m, n);
c = cos(theta); s = sin(theta);
if (octant & 1) { t = c; c = s; s = t; }
if (octant & 2) { t = c; c = -s; s = t; }
if (octant & 4) { s = -s; }
*co = c;
*si = s;
}
....
// Twiddle factors
for(size_t k=0; k<(L/radix); k++)
{
double theta = TWO_PI * ((double)k)/((double)L);
for(size_t j=1; j<radix; j++)
{
//double c = cos(((double)j) * theta);
//double s = sin(((double)j) * theta);
double c,s;
real_cexp(k*j,L,&s,&c);
s = -s;
wc[nt] = c;
ws[nt++] = s;
}
}
Code:
Iteration 10000 M( 22256453 )C, 0x3d9450d492b7e880, n = 1179648, CUDALucas v1.66 err = 0.3125 err2 = 0.004543 (3:04 real, 18.4173 ms/iter, ETA 0:00) Iteration 10000 M( 24732709 )C, 0x81a12a304a754572, n = 1310720, CUDALucas v1.66 err = 0.2812 err2 = 0.004707 (3:22 real, 20.2297 ms/iter, ETA 0:00) Iteration 10000 M( 29412433 )C, 0x27d7d112a73aa203, n = 1572864, CUDALucas v1.66 err = 0.2656 err2 = 0.005665 (4:39 real, 27.9322 ms/iter, ETA 0:00) Iteration 10000 M( 30620113 )C, 0x212dca3cec0acde2, n = 1638400, CUDALucas v1.66 err = 0.2812 err2 = 0.005173 (6:24 real, 38.4612 ms/iter, ETA 0:00) Iteration 10000 M( 32993419 )C, 0xcf86a69b844e35c0, n = 1769472, CUDALucas v1.66 err = 0.2812 err2 = 0.006081 (9:09 real, 54.9640 ms/iter, ETA 0:00) Iteration 10000 M( 36418493 )C, 0x2f1388379572d5b4, n = 1966080, CUDALucas v1.66 err = 0.2812 err2 = 0.005577 (7:16 real, 43.5101 ms/iter, ETA 0:00) Iteration 10000 M( 38955173 )C, 0x8a45e3bbd4e4fc9b, n = 2097152, CUDALucas v1.66 err = 0.2812 err2 = 0.006908 (2:38 real, 15.8387 ms/iter, ETA 0:00) Iteration 10000 M( 43792559 )C, 0x7048d84bbfb0f810, n = 2359296, CUDALucas v1.66 err = 0.2969 err2 = 0.007465 (6:52 real, 41.1908 ms/iter, ETA 0:00) Iteration 10000 M( 48375209 )C, 0xf957e240d591a99e, n = 2621440, CUDALucas v1.66 err = 0.2188 err2 = 0.006596 (7:31 real, 45.1089 ms/iter, ETA 0:00) Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3145728, CUDALucas v1.66 err = 0.2734 err2 = 0.007641 (11:05 real, 66.5079 ms/iter, ETA 0:00) Iteration 10000 M( 60622229 )C, 0xd81c849f11fd1054, n = 3276800, CUDALucas v1.66 err = 0.2812 err2 = 0.008522 (13:49 real, 82.9004 ms/iter, ETA 0:00) Iteration 10000 M( 65066623 )C, 0xde7aeb8cc7a2a826, n = 3538944, CUDALucas v1.66 err = 0.3125 err2 = 0.007557 (20:36 real, 123.6623 ms/iter, ETA 0:00) Iteration 10000 M( 67662869 )C, 0xf854d1dee3fbb5d7, n = 3932160, CUDALucas v1.66 err = 0.05469 err2 = 0.01732 (15:33 real, 93.2952 ms/iter, ETA 0:00) Iteration 10000 M( 72000007 )C, 0x404aa83a2e247882, n = 3932160, CUDALucas v1.66 err = 0.2812 err2 = 0.008392 (16:11 real, 97.1101 ms/iter, ETA 0:00) Iteration 10000 M( 76722161 )C, 0x4b6ba0a6078e4bbb, n = 4194304, CUDALucas v1.66 err = 0.2812 err2 = 0.0108 (5:22 real, 32.2733 ms/iter, ETA 0:00) |
|
|
|
|
|
#80 |
|
Jul 2009
Tokyo
2×5×61 Posts |
Radeon on Ubuntu have overclock tool.
Code:
$ amdconfig
Adapter 0 - AMD Radeon HD 7700 Series
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Current Peak : 800 1150
Configurable Peak Range : [300-900] [150-1250]
GPU load : 0%
Adapter 0 - AMD Radeon HD 7700 Series
Sensor 0: Temperature - 63.00 C
$ ./CUDALucas -f 2097152 31813487
start M31813487 fft length = 2097152
Iteration 10000 M( 31813487 )C, 0xcb7faef05fdc6491, n = 2097152, CUDALucas v1.66 err = 0.002441 err2 = 0.04688 (2:53 real, 17.3402 ms/iter, ETA 0:00)
$ amdconfig --od-setclocks=900,1250
Adapter 0 - AMD Radeon HD 7700 Series
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Current Peak : 800 1150
Configurable Peak Range : [300-900] [150-1250]
GPU load : 0%
Adapter 0 - AMD Radeon HD 7700 Series
Sensor 0: Temperature - 53.00 C
Adapter 0 - AMD Radeon HD 7700 Series
New Core Peak : 900
New Memory Peak : 1250
$ ./CUDALucas -f 2097152 31813487
start M31813487 fft length = 2097152
Iteration 10000 M( 31813487 )C, 0xcb7faef05fdc6491, n = 2097152, CUDALucas v1.66 err = 0.002441 err2 = 0.04654 (2:36 real, 15.5674 ms/iter, ETA 0:00)
|
|
|
|
|
|
#81 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
1000100001112 Posts |
Quote:
|
|
|
|
|
|
|
#82 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
88716 Posts |
Quote:
![]() I also set from WIN32 to _WIN32 because apparently for some strange reason WIN32 doesn't. Have to attach. |
|
|
|
|
|
|
#83 | |
|
Jul 2009
Tokyo
10011000102 Posts |
Quote:
Code:
Microsoft (R) C/C++ Optimizing Compiler Version 16.00.30319.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. CUDALucas.cpp CUDALucas.cpp(741) : error C2632: 'int' followed by 'char' is illegal CUDALucas.cpp(743) : error C2062: type 'char' unexpected Anyone can help this issue. |
|
|
|
|
|
|
#84 |
|
Jul 2009
Tokyo
11428 Posts |
research opencl fft
https://github.com/miracle2121/hpc12-fp-rl1609 Code:
/opt/AMDAPP/samples/opencl/lucas/ny$ pwd /opt/AMDAPP/samples/opencl/lucas/ny /opt/AMDAPP/samples/opencl/lucas/ny$ tar -xvf hpc12-fp-rl1609-master.tar.bz2 /opt/AMDAPP/samples/opencl/lucas/ny/hpc12-fp-rl1609-master$ make /opt/AMDAPP/samples/opencl/lucas/ny/hpc12-fp-rl1609-master$ cp build/debug/x86_64/clfft . /opt/AMDAPP/samples/opencl/lucas/ny/hpc12-fp-rl1609-master$ ./clfft 2097152 Capeverde Found platform #0: AMD Accelerated Parallel Processing Found device #0: Capeverde Found device #1: Intel(R) Celeron(R) CPU G465 @ 1.90GHz Elapsed time: 0.006272s Performance: 35.11 Gflops /opt/AMDAPP/samples/opencl/lucas/ny/hpc12-fp-rl1609-master$ ./clfft 16777216 Capeverde Found platform #0: AMD Accelerated Parallel Processing Found device #0: Capeverde Found device #1: Intel(R) Celeron(R) CPU G465 @ 1.90GHz Elapsed time: 0.051074s Performance: 39.42 Gflops |
|
|
|
|
|
#87 |
|
Jul 2009
Tokyo
11428 Posts |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1724 | 2023-06-04 23:31 |
| Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS | VictordeHolland | Linux | 4 | 2018-04-11 13:44 |
| OpenCL accellerated lattice siever | pstach | Factoring | 1 | 2014-05-23 01:03 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| AMD's Graphics Core Next- a reason to accelerate towards OpenCL? | Belteshazzar | GPU Computing | 19 | 2012-03-07 18:58 |