20200109, 04:58  #1 
P90 years forever!
Aug 2002
Yeehaw, FL
11^{2}·59 Posts 
Algorithm wanted  sine and/or cosine
I cannot locate my Knuth book. I need the fastest possible algorithm to compute sine(pi * a/b) and/or cosine(pi * a/b) on a GPU where a and b are positive 32bit integers and a is less than half of b. The result is a 64bit double. Accuracy is important.
GPU coding: Loops very bad, table lookup is bad, numerous if statements are bad. To get an idea of the kind of code I'm looking for. Two related samples are given. Here is sample 32bit float code: http://mooooo.ooo/chebyshevsineapproximation/ Here is code that has too much precision from a doubledouble library: Code:
doubledouble sin(const doubledouble& x) { static const doubledouble tab[9]={ // tab[b] := sin(b*Pi/16)... 0.0, "0.1950903220161282678482848684770222409277", "0.3826834323650897717284599840303988667613", "0.5555702330196022247428308139485328743749", "0.7071067811865475244008443621048490392850", "0.8314696123025452370787883776179057567386", "0.9238795325112867561281831893967882868225", "0.9807852804032304491261822361342390369739", 1.0 }; static const doubledouble sinsTab[7] = { // Chebyshev coefficients "0.9999999999999999999999999999993767021096", "0.1666666666666666666666666602899977158461", "8333333333333333333322459353395394180616.0e42", "1984126984126984056685882073709830240680.0e43", "2755731922396443936999523827282063607870.0e45", "2505210805220830174499424295197047025509.0e47", "1605649194713006247696761143618673476113.0e49" }; if (fabs(x.h())<1.0e7) return x*(1.0sqr(x)/3); int a,b; doubledouble sins, coss, k1, k3, t2, s, s2, sinb, cosb; // reduce x: Pi < x <= Pi k1=x/doubledouble::TwoPi; k3=k1rint(k1); // determine integers a and b to minimize s, where s=xa*Pi/2b*Pi/16 t2=4*k3; a=int(rint(t2)); b=int(rint((8*(t2a)))); // must have a<=2 and b<=7 now s=doubledouble::Pi*(k3+k3(8*a+b)/16.0); // s is now reduced argument. Pi/32 < s < Pi/32 s2=sqr(s); // Chebyshev series on Pi/32..Pi/32, max abs error 2^98=3.16e30, whereas // power series has error 6e28 at Pi/32 with terms up to x^13 included. sins=s*(sinsTab[0]+(sinsTab[1]+(sinsTab[2]+(sinsTab[3]+(sinsTab[4]+ (sinsTab[5]+sinsTab[6]*s2)*s2)*s2)*s2)*s2)*s2); coss=sqrt(1.0sqr(sins)); // ok as Pi/32 < s < Pi/32 // here sinb=sin(b*Pi/16) etc. sinb= (b>=0) ? tab[b] : tab[b]; cosb=tab[8abs(b)]; if (0==a) { return sins*cosb+coss*sinb; } else if (1==a) { return sins*sinb+coss*cosb; } else if (1==a) { return sins*sinbcoss*cosb; } else { // a=2 return sins*cosbcoss*sinb; } // i.e. return sins*(cosa*cosbsina*sinb)+coss*(sina*cosb+cosa*sinb); } 
20200109, 05:28  #2 
P90 years forever!
Aug 2002
Yeehaw, FL
15743_{8} Posts 
Just found this stackoverflow page which has some promising leads for me to investigate:
https://stackoverflow.com/questions/...mathfunctions 
20200109, 09:48  #3 
Dec 2012
The Netherlands
10110101101_{2} Posts 
Stepping back from the immediate problem for a moment, it speaks volumes about current GPU design that sine & cosine are not available as fast accurate primitive instructions!

20200109, 09:48  #4 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2^{2}×1,433 Posts 
I am fairly sure I have seen the Knuth books online somewhere.

20200109, 13:39  #5 
Jun 2003
5×23×41 Posts 
I have TAoCP with me, but a quick look didn't give any formula for sine/cosine.

20200109, 15:18  #6 
"Marv"
May 2009
near the Tannhäuser Gate
210_{16} Posts 
I was a bit disappointed that "Hacker's Delight" didn't have any discussion of this.
However, HAKMEM has an algorithm for the PDP10 in about a dozen lines. It uses push & pop, however and, of course, has a limited word length. Still, it might give you some ideas. It's on page 75 https://w3.pppl.gov/~hammett/work/2009/AIM239ocr.pdf I googled pdp 10 emulator and found a ton of references. 
20200109, 15:48  #7 
Nov 2003
16100_{8} Posts 

20200109, 19:17  #8  
"Yves"
Jul 2017
Belgium
32_{16} Posts 
The only reference to Sin/Cos I found :
Vol 2 of TAoCP Seminumerical Algorithms 4.6.4 Evaluation of polynomials  Adaptation of the coefficients page 432  second printing Quote:


20200109, 21:19  #9 
"Ben"
Feb 2007
110011100010_{2} Posts 
fastapprox: https://github.com/pmineiro/fastappr...src/fasttrig.h
Contains no loops, conditionals, or lookup tables, but accuracy probably not what you want. Maybe something in the approach could be useful though. 
20200109, 22:10  #10  
P90 years forever!
Aug 2002
Yeehaw, FL
15743_{8} Posts 
From the stackexchage post, these 2 links are just what the doctor ordered:
Quote:
Quote:
Last fiddled with by Prime95 on 20200109 at 22:10 

20200110, 01:43  #11  
Feb 2017
Nowhere
2^{3}×3×149 Posts 
I dug up an old (1966) IBM System 360 manual here. I thought of it because my dad programmed IBM 360's and kept some of the old manuals. Perhaps the old algorithm can be adapted. I had to do some fiddling  copypaste did not convert to text very well. While I was doing this, I see some more modern renditions have been found.
Quote:
Last fiddled with by Dr Sardonicus on 20200110 at 01:53 Reason: Fixing transcription errors 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Karatsuba algorithm relation to Matrix Multiplication (Strassen Algorithm)?  neomacdev  Computer Science & Computational Number Theory  1  20190730 01:40 
A Study of Primes Through NonResonant Sine Waves  EdH  Math  56  20100411 19:47 
Signs of Remainders of Cosine and Sine Series  jinydu  Homework Help  2  20080429 01:22 
Calculating the number of intersections between sine and cosine curves  ShiningArcanine  Miscellaneous Math  2  20060806 21:55 
Hard proplem for finding sine function  tinhnho  Miscellaneous Math  6  20050117 05:42 