mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-09-01, 15:01   #78
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Quote:
Originally Posted by kracker View Post
g++ and VC same.
I know.
This program useless.
If you have VC,why you not make CLLucas with Windows.
msft is offline   Reply With Quote
Old 2013-09-02, 11:10   #79
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Code abduct from fftw3.

clFFT-2.0/src/library/generator.stockham.cpp:
Code:
#define K2PI 6.2831853071795864769252867665590057683943388
#define by2pi(m, n) ((K2PI * (m)) / (n))
/*
 * Improve accuracy by reducing x to range [0..1/8]
 * before multiplication by 2 * PI.
 */
static void real_cexp(int m, int n, double * si, double * co)
{
     double theta, c, s, t;
     unsigned octant = 0;
     int quarter_n = n;
     n += n; n += n;
     m += m; m += m;
     if (m < 0) m += n;
     if (m > n - m) { m = n - m; octant |= 4; }
     if (m - quarter_n > 0) { m = m - quarter_n; octant |= 2; }
     if (m > quarter_n - m) { m = quarter_n - m; octant |= 1; }
     theta = by2pi(m, n);
     c = cos(theta); s = sin(theta);
     if (octant & 1) { t = c; c = s; s = t; }
     if (octant & 2) { t = c; c = -s; s = t; }
     if (octant & 4) { s = -s; }
     *co = c;
     *si = s;
}
....
// Twiddle factors
for(size_t k=0; k<(L/radix); k++)
{
        double theta = TWO_PI * ((double)k)/((double)L);
        for(size_t j=1; j<radix; j++)
        {
                //double c = cos(((double)j) * theta);
                //double s = sin(((double)j) * theta);

                double c,s;
                real_cexp(k*j,L,&s,&c);
                s = -s;
                wc[nt]   = c;
                ws[nt++] = s;
        }
}
Code:
Iteration 10000 M( 22256453 )C, 0x3d9450d492b7e880, n = 1179648, CUDALucas v1.66 err = 0.3125 err2 = 0.004543 (3:04 real, 18.4173 ms/iter, ETA 0:00)
Iteration 10000 M( 24732709 )C, 0x81a12a304a754572, n = 1310720, CUDALucas v1.66 err = 0.2812 err2 = 0.004707 (3:22 real, 20.2297 ms/iter, ETA 0:00)
Iteration 10000 M( 29412433 )C, 0x27d7d112a73aa203, n = 1572864, CUDALucas v1.66 err = 0.2656 err2 = 0.005665 (4:39 real, 27.9322 ms/iter, ETA 0:00)
Iteration 10000 M( 30620113 )C, 0x212dca3cec0acde2, n = 1638400, CUDALucas v1.66 err = 0.2812 err2 = 0.005173 (6:24 real, 38.4612 ms/iter, ETA 0:00)
Iteration 10000 M( 32993419 )C, 0xcf86a69b844e35c0, n = 1769472, CUDALucas v1.66 err = 0.2812 err2 = 0.006081 (9:09 real, 54.9640 ms/iter, ETA 0:00)
Iteration 10000 M( 36418493 )C, 0x2f1388379572d5b4, n = 1966080, CUDALucas v1.66 err = 0.2812 err2 = 0.005577 (7:16 real, 43.5101 ms/iter, ETA 0:00)
Iteration 10000 M( 38955173 )C, 0x8a45e3bbd4e4fc9b, n = 2097152, CUDALucas v1.66 err = 0.2812 err2 = 0.006908 (2:38 real, 15.8387 ms/iter, ETA 0:00)
Iteration 10000 M( 43792559 )C, 0x7048d84bbfb0f810, n = 2359296, CUDALucas v1.66 err = 0.2969 err2 = 0.007465 (6:52 real, 41.1908 ms/iter, ETA 0:00)
Iteration 10000 M( 48375209 )C, 0xf957e240d591a99e, n = 2621440, CUDALucas v1.66 err = 0.2188 err2 = 0.006596 (7:31 real, 45.1089 ms/iter, ETA 0:00)
Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3145728, CUDALucas v1.66 err = 0.2734 err2 = 0.007641 (11:05 real, 66.5079 ms/iter, ETA 0:00)
Iteration 10000 M( 60622229 )C, 0xd81c849f11fd1054, n = 3276800, CUDALucas v1.66 err = 0.2812 err2 = 0.008522 (13:49 real, 82.9004 ms/iter, ETA 0:00)
Iteration 10000 M( 65066623 )C, 0xde7aeb8cc7a2a826, n = 3538944, CUDALucas v1.66 err = 0.3125 err2 = 0.007557 (20:36 real, 123.6623 ms/iter, ETA 0:00)
Iteration 10000 M( 67662869 )C, 0xf854d1dee3fbb5d7, n = 3932160, CUDALucas v1.66 err = 0.05469 err2 = 0.01732 (15:33 real, 93.2952 ms/iter, ETA 0:00)
Iteration 10000 M( 72000007 )C, 0x404aa83a2e247882, n = 3932160, CUDALucas v1.66 err = 0.2812 err2 = 0.008392 (16:11 real, 97.1101 ms/iter, ETA 0:00)
Iteration 10000 M( 76722161 )C, 0x4b6ba0a6078e4bbb, n = 4194304, CUDALucas v1.66 err = 0.2812 err2 = 0.0108 (5:22 real, 32.2733 ms/iter, ETA 0:00)
Fixed issue.(exchanging the wheel)
msft is offline   Reply With Quote
Old 2013-09-02, 13:29   #80
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Radeon on Ubuntu have overclock tool.
Code:
$ amdconfig

Adapter 0 - AMD Radeon HD 7700 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150
             Current Peak :    800           1150
  Configurable Peak Range : [300-900]     [150-1250]
                 GPU load :    0%

Adapter 0 - AMD Radeon HD 7700 Series     
            Sensor 0: Temperature - 63.00 C  
$ ./CUDALucas -f 2097152 31813487

start M31813487 fft length = 2097152
Iteration 10000 M( 31813487 )C, 0xcb7faef05fdc6491, n = 2097152, CUDALucas v1.66 err = 0.002441 err2 = 0.04688 (2:53 real, 17.3402 ms/iter, ETA 0:00)

$ amdconfig --od-setclocks=900,1250

Adapter 0 - AMD Radeon HD 7700 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150
             Current Peak :    800           1150
  Configurable Peak Range : [300-900]     [150-1250]
                 GPU load :    0%

Adapter 0 - AMD Radeon HD 7700 Series     
            Sensor 0: Temperature - 53.00 C  
             
Adapter 0 - AMD Radeon HD 7700 Series     
            New Core Peak   : 900
            New Memory Peak : 1250
$ ./CUDALucas -f 2097152 31813487

start M31813487 fft length = 2097152
Iteration 10000 M( 31813487 )C, 0xcb7faef05fdc6491, n = 2097152, CUDALucas v1.66 err = 0.002441 err2 = 0.04654 (2:36 real, 15.5674 ms/iter, ETA 0:00)
10% speed up.
msft is offline   Reply With Quote
Old 2013-09-02, 15:38   #81
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

1000100001112 Posts
Default

Quote:
Originally Posted by msft View Post
I know.
This program useless.
If you have VC,why you not make CLLucas with Windows.
I guess I can try, but since I've never really tried it and since MinGW works(fftw and gmp and most of the amd sdk samples compile correctly for me...) Just that clFFT doesn't at the moment.
kracker is offline   Reply With Quote
Old 2013-09-04, 14:37   #82
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

88716 Posts
Default

Quote:
Originally Posted by kracker View Post
I guess I can try, but since I've never really tried it and since MinGW works(fftw and gmp and most of the amd sdk samples compile correctly for me...) Just that clFFT doesn't at the moment.
@msft: Here it is, on request
I also set from WIN32 to _WIN32 because apparently for some strange reason WIN32 doesn't.
Have to attach.
Attached Files
File Type: txt CUDALucas.cpp.txt (32.1 KB, 240 views)
kracker is offline   Reply With Quote
Old 2013-09-04, 16:17   #83
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Quote:
Originally Posted by kracker View Post
@msft: Here it is, on request
I also set from WIN32 to _WIN32 because apparently for some strange reason WIN32 doesn't.
Have to attach.
kracker Thank you very match.

Code:
Microsoft (R) C/C++ Optimizing Compiler Version 16.00.30319.01 for x64
Copyright (C) Microsoft Corporation. All rights reserved.

CUDALucas.cpp
CUDALucas.cpp(741) : error C2632: 'int' followed by 'char' is illegal
CUDALucas.cpp(743) : error C2062: type 'char' unexpected
Source have this error.
Anyone can help this issue.
msft is offline   Reply With Quote
Old 2013-09-05, 04:37   #84
msft
 
msft's Avatar
 
Jul 2009
Tokyo

11428 Posts
Default

research opencl fft

https://github.com/miracle2121/hpc12-fp-rl1609

Code:
/opt/AMDAPP/samples/opencl/lucas/ny$ pwd
/opt/AMDAPP/samples/opencl/lucas/ny
/opt/AMDAPP/samples/opencl/lucas/ny$ tar -xvf hpc12-fp-rl1609-master.tar.bz2
/opt/AMDAPP/samples/opencl/lucas/ny/hpc12-fp-rl1609-master$ make
/opt/AMDAPP/samples/opencl/lucas/ny/hpc12-fp-rl1609-master$ cp build/debug/x86_64/clfft .
/opt/AMDAPP/samples/opencl/lucas/ny/hpc12-fp-rl1609-master$ ./clfft 2097152 Capeverde
Found platform #0: AMD Accelerated Parallel Processing
Found device #0: Capeverde
Found device #1: Intel(R) Celeron(R) CPU G465 @ 1.90GHz

Elapsed time: 0.006272s
Performance: 35.11 Gflops
/opt/AMDAPP/samples/opencl/lucas/ny/hpc12-fp-rl1609-master$ ./clfft 16777216 Capeverde
Found platform #0: AMD Accelerated Parallel Processing
Found device #0: Capeverde
Found device #1: Intel(R) Celeron(R) CPU G465 @ 1.90GHz

Elapsed time: 0.051074s
Performance: 39.42 Gflops
Attached Files
File Type: bz2 hpc12-fp-rl1609-master.tar.bz2 (955.2 KB, 143 views)
msft is offline   Reply With Quote
Old 2013-09-06, 01:16   #85
msft
 
msft's Avatar
 
Jul 2009
Tokyo

26216 Posts
Default

M31813487,First LLD with Radeon.
Code:
M( 31813487 )C, 0xa2fa8271570a8d__, n = 2097152, CUDALucas v1.66
msft is offline   Reply With Quote
Old 2013-09-06, 04:37   #86
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

Quote:
Originally Posted by msft View Post
M31813487,First LLD with Radeon.
Code:
M( 31813487 )C, 0xa2fa8271570a8d__, n = 2097152, CUDALucas v1.66
I call that a milestone!
Hehe, it says CUDALucas in there, but I guess without that the server would not accept the result.
Karl M Johnson is offline   Reply With Quote
Old 2013-09-06, 13:48   #87
msft
 
msft's Avatar
 
Jul 2009
Tokyo

11428 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
I call that a milestone!
Hehe, it says CUDALucas in there, but I guess without that the server would not accept the result.
In the next edition, I will change the name to clLucas.
msft is offline   Reply With Quote
Old 2013-09-06, 13:52   #88
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

37×59 Posts
Default

Quote:
Originally Posted by msft View Post
M31813487,First LLD with Radeon.
Code:
M( 31813487 )C, 0xa2fa8271570a8d__, n = 2097152, CUDALucas v1.66

It looks like it is a match! Nice job!
kracker is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1724 2023-06-04 23:31
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 15:24.


Fri Jul 7 15:24:49 UTC 2023 up 323 days, 12:53, 0 users, load averages: 1.55, 1.19, 1.12

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔