mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-08-31, 15:04   #67
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

3×23×89 Posts
Default

Some of those fft lengths are slower than larger ones. It might be worth finding which those are.
Does CudaLucas use 128 bit sin/cos?
henryzz is offline   Reply With Quote
Old 2013-08-31, 15:34   #68
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Quote:
Originally Posted by henryzz View Post
Some of those fft lengths are slower than larger ones. It might be worth finding which those are.
Does CudaLucas use 128 bit sin/cos?
clFFT is opensource,all of your turn.
CUFFT have enough accurate.
Culu not use 128 bit sin/cos.
msft is offline   Reply With Quote
Old 2013-08-31, 15:39   #69
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

I think I found a way to make do with 128 bit sin/cos without.
It is the arithmetic of the elementary school.
Code:
                                        double theta = TWO_PI * ((double)k)/((double)L);

                                        for(size_t j=1; j<radix; j++)
                                        {
                                                double jt = - (j * theta);
                                                double pi = 3.14159265358979323846264338327950288419716939937510 ;
                                                double hpi=pi*0.5;
                                                double s,c;

                                                if(jt < hpi)     s=-sin(jt);
                                                else if(jt < pi) s=-sin(pi-jt);
                                                else             s=sin(jt-pi);

                                                if(jt < hpi)           c=sin(hpi-jt);
                                                else if(jt < pi)       c=-cos(pi-jt);
                                                else if(jt < (pi+hpi)) c=sin(hpi-jt);
                                                else                   c=cos(jt);

                                                double r = -((double)k/(double)L)*(double)j;
                                                if(r==-0.25){c=0;s=-1;}
                                                if(r==-0.75){c=0;s=1;}

                                                wc[nt]   = c;
                                                ws[nt++] = s;
                                        }
msft is offline   Reply With Quote
Old 2013-08-31, 15:58   #70
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

37×59 Posts
Default

Quote:
Originally Posted by msft View Post
clFFT with 128bit sin,cos.
Code:
Iteration 10000 M( 22256453 )C, 0x3d9450d492b7e880, n = 1179648, CUDALucas v1.66 err = 0.2734 (2:56 real, 17.6419 ms/iter, ETA 0:00)
Iteration 10000 M( 24732709 )C, 0x81a12a304a754572, n = 1310720, CUDALucas v1.66 err = 0.2812 (3:15 real, 19.5596 ms/iter, ETA 0:00)
Iteration 10000 M( 29412433 )C, 0x27d7d112a73aa203, n = 1572864, CUDALucas v1.66 err = 0.2812 (4:23 real, 26.3191 ms/iter, ETA 0:00)
Iteration 10000 M( 30620113 )C, 0x212dca3cec0acde2, n = 1638400, CUDALucas v1.66 err = 0.25 (6:20 real, 37.9453 ms/iter, ETA 0:00)
Iteration 10000 M( 32993419 )C, 0xcf86a69b844e35c0, n = 1769472, CUDALucas v1.66 err = 0.2812 (8:28 real, 50.8171 ms/iter, ETA 0:00)
Iteration 10000 M( 36418493 )C, 0x2f1388379572d5b4, n = 1966080, CUDALucas v1.66 err = 0.25 (7:10 real, 43.0098 ms/iter, ETA 0:00)
Iteration 10000 M( 38955173 )C, 0x8a45e3bbd4e4fc9b, n = 2097152, CUDALucas v1.66 err = 0.2812 (2:48 real, 16.8227 ms/iter, ETA 0:00)
Iteration 10000 M( 43792559 )C, 0x7048d84bbfb0f810, n = 2359296, CUDALucas v1.66 err = 0.2812 (6:52 real, 41.2492 ms/iter, ETA 0:00)
Iteration 10000 M( 48375209 )C, 0xf957e240d591a99e, n = 2621440, CUDALucas v1.66 err = 0.2188 (7:37 real, 45.7406 ms/iter, ETA 0:00)
Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3145728, CUDALucas v1.66 err = 0.2656 (9:59 real, 59.8521 ms/iter, ETA 0:00)
Iteration 10000 M( 60622229 )C, 0xd81c849f11fd1054, n = 3276800, CUDALucas v1.66 err = 0.2812 (14:03 real, 84.2392 ms/iter, ETA 0:00)
Iteration 10000 M( 65066623 )C, 0xde7aeb8cc7a2a826, n = 3538944, CUDALucas v1.66 err = 0.2812 (19:46 real, 118.6206 ms/iter, ETA 0:00)
Iteration 10000 M( 67662869 )C, 0xf854d1dee3fbb5d7, n = 3932160, CUDALucas v1.66 err = 0.05469 (16:38 real, 99.8373 ms/iter, ETA 0:00)
Iteration 10000 M( 72000007 )C, 0x404aa83a2e247882, n = 3932160, CUDALucas v1.66 err = 0.25 (16:41 real, 100.0775 ms/iter, ETA 0:00)
Iteration 10000 M( 76722161 )C, 0x4b6ba0a6078e4bbb, n = 4194304, CUDALucas v1.66 err = 0.2812 (6:03 real, 36.2961 ms/iter, ETA 0:00)
Iteration 10000 M( 86109511 )C, 0x760de83047f1f7e9, n = 4718592, CUDALucas v1.66 err = 0.25 (32:13 real, 193.3191 ms/iter, ETA 0:00)
Accuracy is improved.
Looking back there, it looks like iter times are uneven, for example exp. 36418493 has 43 ms, 38955173 has 16, and it is roughly repeatable down the line. Is that normal?
kracker is offline   Reply With Quote
Old 2013-08-31, 16:16   #71
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Quote:
Originally Posted by kracker View Post
Is that normal?
Normal.
It is slow except power of two.
msft is offline   Reply With Quote
Old 2013-08-31, 16:26   #72
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

3×23×89 Posts
Default

Quote:
Originally Posted by msft View Post
Normal.
It is slow except power of two.
Do you have external control over the fft length? If so tell it to use power of 2 fft lengths.
henryzz is offline   Reply With Quote
Old 2013-08-31, 16:33   #73
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Quote:
Originally Posted by henryzz View Post
Do you have external control over the fft length? If so tell it to use power of 2 fft lengths.
-f option is enable.
msft is offline   Reply With Quote
Old 2013-09-01, 02:52   #74
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

37·59 Posts
Default

Well, apparently same with CUFFT.
kracker is offline   Reply With Quote
Old 2013-09-01, 04:20   #75
msft
 
msft's Avatar
 
Jul 2009
Tokyo

61010 Posts
Default

Quote:
Originally Posted by kracker View Post
Well, apparently same with CUFFT.
CUFFT has progressed.
msft is offline   Reply With Quote
Old 2013-09-01, 13:59   #76
msft
 
msft's Avatar
 
Jul 2009
Tokyo

26216 Posts
Default

Hi,
My system have sin() issue.
Someone Please report Windows+VC system exec result.

Intel Celeron G465 @ 1.90GHz
ubuntu 12.04 LTS 64bit
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
Code:
$ cat sin.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double sinsin(double a)
{ return sin(a);}
int main()
{
        printf(" 64bit sin(%30.27lf) = %30.27lf\n",
                0.509281621578032916985989687,
                sinsin(0.509281621578032916985989687));
        printf("128bit sin(%30.27lf) = %30.27lf\n",
                0.509281621578032916985989687,
                0.487550160148435940410394096);
}
$ cc -S sin.c
$ cat sin.s
        .file   "sin.c"
        .text
        .globl  sinsin
        .type   sinsin, @function
sinsin:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movsd   %xmm0, -8(%rbp)
        movsd   -8(%rbp), %xmm0
        call    sin
        leave
...
$ cc sin.c -lm
$ ./a.out
 64bit sin( 0.509281621578032916985989687) =  0.487550160148435995921545327
128bit sin( 0.509281621578032916985989687) =  0.487550160148435940410394096
This program is wrong.
Please ignore.

Last fiddled with by msft on 2013-09-01 at 14:05 Reason: Misconception.
msft is offline   Reply With Quote
Old 2013-09-01, 14:56   #77
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

37×59 Posts
Default

Code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double sinsin(double a)
{ return sin(a);}
int main()
{
        printf(" 64bit sin(%30.27lf) = %30.27lf\n",
                0.509281621578032916985989687,
                sinsin(0.509281621578032916985989687));
        printf("128bit sin(%30.27lf) = %30.27lf\n",
                0.509281621578032916985989687,
                0.487550160148435940410394096);
}
g++ and VC same.
kracker is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1724 2023-06-04 23:31
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 15:24.


Fri Jul 7 15:24:49 UTC 2023 up 323 days, 12:53, 0 users, load averages: 1.55, 1.19, 1.12

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔