mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2020-01-09, 04:58   #1
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

112·59 Posts
Default Algorithm wanted - sine and/or cosine

I cannot locate my Knuth book. I need the fastest possible algorithm to compute sine(pi * a/b) and/or cosine(pi * a/b) on a GPU where a and b are positive 32-bit integers and a is less than half of b. The result is a 64-bit double. Accuracy is important.

GPU coding: Loops very bad, table lookup is bad, numerous if statements are bad.

To get an idea of the kind of code I'm looking for. Two related samples are given.

Here is sample 32-bit float code: http://mooooo.ooo/chebyshev-sine-approximation/

Here is code that has too much precision from a double-double library:
Code:
doubledouble sin(const doubledouble& x) {
  static const doubledouble tab[9]={ // tab[b] := sin(b*Pi/16)...
    0.0,
    "0.1950903220161282678482848684770222409277",
    "0.3826834323650897717284599840303988667613",
    "0.5555702330196022247428308139485328743749",
    "0.7071067811865475244008443621048490392850",
    "0.8314696123025452370787883776179057567386",
    "0.9238795325112867561281831893967882868225",
    "0.9807852804032304491261822361342390369739",
    1.0
  };
  static const doubledouble sinsTab[7] = { // Chebyshev coefficients
    "0.9999999999999999999999999999993767021096",
    "-0.1666666666666666666666666602899977158461",
    "8333333333333333333322459353395394180616.0e-42",
    "-1984126984126984056685882073709830240680.0e-43",
    "2755731922396443936999523827282063607870.0e-45",
    "-2505210805220830174499424295197047025509.0e-47",
    "1605649194713006247696761143618673476113.0e-49"
  };
  if (fabs(x.h())<1.0e-7) return x*(1.0-sqr(x)/3);
  int a,b; doubledouble sins, coss, k1, k3, t2, s, s2, sinb, cosb;
  // reduce x: -Pi < x <= Pi
  k1=x/doubledouble::TwoPi; k3=k1-rint(k1);
  // determine integers a and b to minimize |s|, where  s=x-a*Pi/2-b*Pi/16
  t2=4*k3;
  a=int(rint(t2));
  b=int(rint((8*(t2-a))));   // must have |a|<=2 and |b|<=7 now
  s=doubledouble::Pi*(k3+k3-(8*a+b)/16.0); // s is now reduced argument. -Pi/32 < s < Pi/32
  s2=sqr(s); 
  // Chebyshev series on -Pi/32..Pi/32, max abs error 2^-98=3.16e-30, whereas
  // power series has error 6e-28 at Pi/32 with terms up to x^13 included.
  sins=s*(sinsTab[0]+(sinsTab[1]+(sinsTab[2]+(sinsTab[3]+(sinsTab[4]+
         (sinsTab[5]+sinsTab[6]*s2)*s2)*s2)*s2)*s2)*s2);
  coss=sqrt(1.0-sqr(sins));  // ok as -Pi/32 < s < Pi/32
  // here sinb=sin(b*Pi/16) etc.
  sinb= (b>=0) ? tab[b] : -tab[-b]; cosb=tab[8-abs(b)];
  if (0==a) {
    return  sins*cosb+coss*sinb;
  } else if (1==a) {
    return -sins*sinb+coss*cosb;
  } else if (-1==a) {
    return  sins*sinb-coss*cosb;
  } else  { // |a|=2
    return -sins*cosb-coss*sinb;
  }
  // i.e. return sins*(cosa*cosb-sina*sinb)+coss*(sina*cosb+cosa*sinb);
}
Prime95 is offline   Reply With Quote
Old 2020-01-09, 05:28   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

112·59 Posts
Default

Just found this stackoverflow page which has some promising leads for me to investigate:

https://stackoverflow.com/questions/...math-functions
Prime95 is offline   Reply With Quote
Old 2020-01-09, 09:48   #3
Nick
 
Nick's Avatar
 
Dec 2012
The Netherlands

101101011012 Posts
Default

Stepping back from the immediate problem for a moment, it speaks volumes about current GPU design that sine & cosine are not available as fast accurate primitive instructions!
Nick is offline   Reply With Quote
Old 2020-01-09, 09:48   #4
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

22·1,433 Posts
Default

I am fairly sure I have seen the Knuth books online somewhere.
henryzz is offline   Reply With Quote
Old 2020-01-09, 13:39   #5
axn
 
axn's Avatar
 
Jun 2003

10010011010112 Posts
Default

I have TAoCP with me, but a quick look didn't give any formula for sine/cosine.
axn is offline   Reply With Quote
Old 2020-01-09, 15:18   #6
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

24·3·11 Posts
Default

I was a bit disappointed that "Hacker's Delight" didn't have any discussion of this.

However, HAKMEM has an algorithm for the PDP-10 in about a dozen lines.
It uses push & pop, however and, of course, has a limited word length.

Still, it might give you some ideas. It's on page 75

https://w3.pppl.gov/~hammett/work/2009/AIM-239-ocr.pdf

I googled pdp 10 emulator and found a ton of references.
tServo is offline   Reply With Quote
Old 2020-01-09, 15:48   #7
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

161008 Posts
Default

Quote:
Originally Posted by axn View Post
I have TAoCP with me, but a quick look didn't give any formula for sine/cosine.
You may want to look up 'Pade Approximation", Wiki has a short article
including an expression for sin()
R.D. Silverman is offline   Reply With Quote
Old 2020-01-09, 19:17   #8
De Wandelaar
 
De Wandelaar's Avatar
 
"Yves"
Jul 2017
Belgium

2×52 Posts
Default

The only reference to Sin/Cos I found :
Vol 2 of TAoCP- Seminumerical Algorithms
4.6.4 Evaluation of polynomials - Adaptation of the coefficients
page 432 - second printing
Quote:
Let us now return to our original problem of evaluating a given polynomialu(x) as rapidly as possible, for "random" values of x. The importance of this problem is due partly to the fact that standard functions such as sin x, cos x, ex, etc., are usually computed by subroutines which rely on evaluation of certain polynomials; these polynomials are evaluated so often, it is desirable to find the fatest possible way to do the computation.
I didn't found a more specific discussion about sin and cos.
De Wandelaar is offline   Reply With Quote
Old 2020-01-09, 21:19   #9
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

2×17×97 Posts
Default

fastapprox: https://github.com/pmineiro/fastappr...src/fasttrig.h
Contains no loops, conditionals, or lookup tables, but accuracy probably not what you want. Maybe something in the approach could be useful though.
bsquared is offline   Reply With Quote
Old 2020-01-09, 22:10   #10
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

112·59 Posts
Default

From the stackexchage post, these 2 links are just what the doctor ordered:

Quote:
One directory includes an implementation in C, contributed by IBM. Since October 2011, this is the code that actually runs when you call sin() on a typical x86-64 Linux system. It is apparently faster than the fsin assembly instruction. Source code: https://sourceware.org/git/?p=glibc....c;hb=HEAD#l194, look for __sin (double x).
Quote:
fdlibm's implementation of sin in pure C is much simpler than glibc's and is nicely commented. Source code: http://www.netlib.org/fdlibm/s_sin.c and http://www.netlib.org/fdlibm/k_sin.c
It is not too hard to convert my [0,pi/2] range to [-pi/4,pi/4]

Last fiddled with by Prime95 on 2020-01-09 at 22:10
Prime95 is offline   Reply With Quote
Old 2020-01-10, 01:43   #11
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

23×3×149 Posts
Default

I dug up an old (1966) IBM System 360 manual here. I thought of it because my dad programmed IBM 360's and kept some of the old manuals. Perhaps the old algorithm can be adapted. I had to do some fiddling -- copy-paste did not convert to text very well. While I was doing this, I see some more modern renditions have been found.

Quote:
xxxlSCN Subprogram (DSIN and DCOS)

Algorithm

1. Divide |x| by pi/4 and separate the quotient (z) into its integer part (q) and
its fraction part (r). Then, z = |x|*4/pi = q + r, where q is an integer and r is within the range, 0 <= r < 1.

2. If the cosine is desired, add 2 to q. If the sine is desired and if x is negative, add 4 to q. This adjustment of q reduces the general case to the computation of
sin (x) for x > 0, because

cos ( ± x) = sin (|x| + pi/2 ) , and

sin (-x) = sin (|x| + pi).

3. Let q0 == q mod 8.

Then, for

q0 = 0,sin (x) = sin(pi/4* r)

q0 = 1, sin (x) = cos(pi/4*(1 - r))

q0 = 2, sin (x) = cos(pi/4*r)

q0 = 3, sin (x) = sin(pi/4*(1 - r))

q0 = 4, sin (x) = -sin(pi/4*r)

q0 = 5, sin (x) = - cos(pi/4*(1 - r))

q0 = 6, sin (x) = -cos(pi/4*r)

q0 = 7, sin (x) = -sin(pi/4*(1 - r))

These formulas reduce each case to the computation of either sin(pi/4*r1) or cos(pi/4*r1); where r1 is either r or (1 - r), and is within the range, 0 <= r1 <= 1.

4. Finally, either sin (pi/4*r1) or cos ( pi/4*r1) is computed, using the Chebyshev interpolation of degree 6 in r1^2 for the sine, and of degree 7 in r1^2 for the cosine.

The maximum relative error of the sine polynomial is 2-58 and that of the cosine polynomial is 2-64.3.

Effect of an Argument Error

E\; \sim \; \Delta As the value of the argument increases, \Delta inceases.Because the function value diminishes periodically, no consistent relative error control can be maintained outside of the principal range, - pi/2 <= x <= + pi/2.

Last fiddled with by Dr Sardonicus on 2020-01-10 at 01:53 Reason: Fixing transcription errors
Dr Sardonicus is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Karatsuba algorithm relation to Matrix Multiplication (Strassen Algorithm)? neomacdev Computer Science & Computational Number Theory 1 2019-07-30 01:40
A Study of Primes Through Non-Resonant Sine Waves EdH Math 56 2010-04-11 19:47
Signs of Remainders of Cosine and Sine Series jinydu Homework Help 2 2008-04-29 01:22
Calculating the number of intersections between sine and cosine curves ShiningArcanine Miscellaneous Math 2 2006-08-06 21:55
Hard proplem for finding sine function tinhnho Miscellaneous Math 6 2005-01-17 05:42

All times are UTC. The time now is 23:22.

Tue Oct 27 23:22:14 UTC 2020 up 47 days, 20:33, 2 users, load averages: 1.37, 1.54, 1.66

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.