![]() |
![]() |
#1 |
P90 years forever!
Aug 2002
Yeehaw, FL
177308 Posts |
![]()
I cannot locate my Knuth book. I need the fastest possible algorithm to compute sine(pi * a/b) and/or cosine(pi * a/b) on a GPU where a and b are positive 32-bit integers and a is less than half of b. The result is a 64-bit double. Accuracy is important.
GPU coding: Loops very bad, table lookup is bad, numerous if statements are bad. To get an idea of the kind of code I'm looking for. Two related samples are given. Here is sample 32-bit float code: http://mooooo.ooo/chebyshev-sine-approximation/ Here is code that has too much precision from a double-double library: Code:
doubledouble sin(const doubledouble& x) { static const doubledouble tab[9]={ // tab[b] := sin(b*Pi/16)... 0.0, "0.1950903220161282678482848684770222409277", "0.3826834323650897717284599840303988667613", "0.5555702330196022247428308139485328743749", "0.7071067811865475244008443621048490392850", "0.8314696123025452370787883776179057567386", "0.9238795325112867561281831893967882868225", "0.9807852804032304491261822361342390369739", 1.0 }; static const doubledouble sinsTab[7] = { // Chebyshev coefficients "0.9999999999999999999999999999993767021096", "-0.1666666666666666666666666602899977158461", "8333333333333333333322459353395394180616.0e-42", "-1984126984126984056685882073709830240680.0e-43", "2755731922396443936999523827282063607870.0e-45", "-2505210805220830174499424295197047025509.0e-47", "1605649194713006247696761143618673476113.0e-49" }; if (fabs(x.h())<1.0e-7) return x*(1.0-sqr(x)/3); int a,b; doubledouble sins, coss, k1, k3, t2, s, s2, sinb, cosb; // reduce x: -Pi < x <= Pi k1=x/doubledouble::TwoPi; k3=k1-rint(k1); // determine integers a and b to minimize |s|, where s=x-a*Pi/2-b*Pi/16 t2=4*k3; a=int(rint(t2)); b=int(rint((8*(t2-a)))); // must have |a|<=2 and |b|<=7 now s=doubledouble::Pi*(k3+k3-(8*a+b)/16.0); // s is now reduced argument. -Pi/32 < s < Pi/32 s2=sqr(s); // Chebyshev series on -Pi/32..Pi/32, max abs error 2^-98=3.16e-30, whereas // power series has error 6e-28 at Pi/32 with terms up to x^13 included. sins=s*(sinsTab[0]+(sinsTab[1]+(sinsTab[2]+(sinsTab[3]+(sinsTab[4]+ (sinsTab[5]+sinsTab[6]*s2)*s2)*s2)*s2)*s2)*s2); coss=sqrt(1.0-sqr(sins)); // ok as -Pi/32 < s < Pi/32 // here sinb=sin(b*Pi/16) etc. sinb= (b>=0) ? tab[b] : -tab[-b]; cosb=tab[8-abs(b)]; if (0==a) { return sins*cosb+coss*sinb; } else if (1==a) { return -sins*sinb+coss*cosb; } else if (-1==a) { return sins*sinb-coss*cosb; } else { // |a|=2 return -sins*cosb-coss*sinb; } // i.e. return sins*(cosa*cosb-sina*sinb)+coss*(sina*cosb+cosa*sinb); } |
![]() |
![]() |
![]() |
#2 |
P90 years forever!
Aug 2002
Yeehaw, FL
23×1,019 Posts |
![]()
Just found this stackoverflow page which has some promising leads for me to investigate:
https://stackoverflow.com/questions/...math-functions |
![]() |
![]() |
![]() |
#3 |
Dec 2012
The Netherlands
1,801 Posts |
![]()
Stepping back from the immediate problem for a moment, it speaks volumes about current GPU design that sine & cosine are not available as fast accurate primitive instructions!
|
![]() |
![]() |
![]() |
#4 |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
37×163 Posts |
![]()
I am fairly sure I have seen the Knuth books online somewhere.
|
![]() |
![]() |
![]() |
#5 |
Jun 2003
10101001111002 Posts |
![]()
I have TAoCP with me, but a quick look didn't give any formula for sine/cosine.
|
![]() |
![]() |
![]() |
#6 |
"Marv"
May 2009
near the Tannhäuser Gate
32316 Posts |
![]()
I was a bit disappointed that "Hacker's Delight" didn't have any discussion of this.
However, HAKMEM has an algorithm for the PDP-10 in about a dozen lines. It uses push & pop, however and, of course, has a limited word length. Still, it might give you some ideas. It's on page 75 https://w3.pppl.gov/~hammett/work/2009/AIM-239-ocr.pdf I googled pdp 10 emulator and found a ton of references. |
![]() |
![]() |
![]() |
#7 |
"Bob Silverman"
Nov 2003
North of Boston
7,507 Posts |
![]() |
![]() |
![]() |
![]() |
#8 | |
"Yves"
Jul 2017
Belgium
83 Posts |
![]()
The only reference to Sin/Cos I found :
Vol 2 of TAoCP- Seminumerical Algorithms 4.6.4 Evaluation of polynomials - Adaptation of the coefficients page 432 - second printing Quote:
|
|
![]() |
![]() |
![]() |
#9 |
"Ben"
Feb 2007
7·13·41 Posts |
![]()
fastapprox: https://github.com/pmineiro/fastappr...src/fasttrig.h
Contains no loops, conditionals, or lookup tables, but accuracy probably not what you want. Maybe something in the approach could be useful though. |
![]() |
![]() |
![]() |
#10 | ||
P90 years forever!
Aug 2002
Yeehaw, FL
11111110110002 Posts |
![]()
From the stackexchage post, these 2 links are just what the doctor ordered:
Quote:
Quote:
Last fiddled with by Prime95 on 2020-01-09 at 22:10 |
||
![]() |
![]() |
![]() |
#11 | |
Feb 2017
Nowhere
184416 Posts |
![]()
I dug up an old (1966) IBM System 360 manual here. I thought of it because my dad programmed IBM 360's and kept some of the old manuals. Perhaps the old algorithm can be adapted. I had to do some fiddling -- copy-paste did not convert to text very well. While I was doing this, I see some more modern renditions have been found.
Quote:
Last fiddled with by Dr Sardonicus on 2020-01-10 at 01:53 Reason: Fixing transcription errors |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Karatsuba algorithm relation to Matrix Multiplication (Strassen Algorithm)? | neomacdev | Computer Science & Computational Number Theory | 1 | 2019-07-30 01:40 |
A Study of Primes Through Non-Resonant Sine Waves | EdH | Math | 56 | 2010-04-11 19:47 |
Signs of Remainders of Cosine and Sine Series | jinydu | Homework Help | 2 | 2008-04-29 01:22 |
Calculating the number of intersections between sine and cosine curves | ShiningArcanine | Miscellaneous Math | 2 | 2006-08-06 21:55 |
Hard proplem for finding sine function | tinhnho | Miscellaneous Math | 6 | 2005-01-17 05:42 |