CORDIC Arcsine implementation fails

The "single rotate" arcsine goes badly wrong when the argument is just greater than the initial value of 'x', where that is the magical scaling factor -- 1/An ~= 0.607252935 ~= 0x26DD3B6A.

This is because, for all arguments > 0, the first step always has y = 0 < arg, so d = +1, which sets y = 1/An, and leaves x = 1/An. Looking at the second step:

  • if arg <= 1/An, then d = -1, and the steps which follow converge to a good answer

  • if arg > 1/An, then d = +1, and this step moves further away from the right answer, and for a range of values a little bigger than 1/An, the subsequent steps all have d = -1, but are unable to correct the result :-(

I found:

 arg = 0.607 (ie 0x26D91687), relative error 7.139E-09 -- OK    
 arg = 0.608 (ie 0x26E978D5), relative error 1.550E-01 -- APALLING !!
 arg = 0.685 (ie 0x2BD70A3D), relative error 2.667E-04 -- BAD !!
 arg = 0.686 (ie 0x2BE76C8B), relative error 1.232E-09 -- OK, again

The descriptions of the method warn about abs(arg) >= 0.98 (or so), and I found that somewhere after 0.986 the process fails to converge and the relative error jumps to ~5E-02 and hits 1E-01 (!!) at arg=1 :-(

As you did, I also found that for 0.303 < arg < 0.313 the relative error jumps to ~3E-02, and reduces slowly until things return to normal. (In this case step 2 overshoots so far that the remaining steps cannot correct it.)

So... the single rotate CORDIC for arcsine looks rubbish to me :-(

Added later... when I looked even closer at the single rotate CORDIC, I found many more small regions where the relative error is BAD... I would not touch this as a method at all... it's not just rubbish, it's useless.

BTW: I thoroughly recommend "Software Manual for the Elementary Functions", William Cody and William Waite, Prentice-Hall, 1980. The methods for calculating the functions are not so interesting any more (but there is a thorough, practical discussion of the relevant range-reductions required). However, for each function they give a good test procedure.

To review a few things mentioned in the comments:

  • The given code outputs values identical to another CORDIC implementation. This includes the stated inaccuracies.
  • The largest error is as you approach arcsin(1).
  • The second largest error is that the values of arcsin(0.60726) to arcsin(0.68514) all return 0.754805.
  • There are some vague references to inaccuracies in the CORDIC method for some functions including arcsin. The given solution is to perform "double-iterations" although I have been unable to get this to work (all values give a large amount of error).
  • The alternate CORDIC implemention has a comment /* |a| < 0.98 */ in the arcsin() implementation which would seem to reinforce that there is known inaccuracies close to 1.

As a rough comparison of a few different methods consider the following results (all tests performed on a desktop, Windows7 computer using MSVC++ 2010, benchmarks timed using 10M iterations over the arcsin() range 0-1):

  1. Question CORDIC Code: 1050 ms, 0.008 avg error, 0.173 max error
  2. Alternate CORDIC Code (ref): 2600 ms, 0.008 avg error, 0.173 max error
  3. atan() CORDIC Code: 2900 ms, 0.21 avg error, 0.28 max error
  4. CORDIC Using Double-Iterations: 4700 ms, 0.26 avg error, 0.917 max error (???)
  5. Math Built-in asin(): 200 ms, 0 avg error, 0 max error
  6. Rational Approximation (ref): 250 ms, 0.21 avg error, 0.26 max error
  7. Linear Table Lookup (see below) 100 ms, 0.000001 avg error, 0.00003 max error
  8. Taylor Series (7th power, ref): 300 ms, 0.01 avg error, 0.16 max error

These results are on a desktop so how relevant they would be for an embedded system is a good question. If in doubt, profiling/benchmarking on the relevant system would be advised. Most solutions tested don't have very good accuracy over the range (0-1) and all but one are actually slower than the built-in asin() function.

The linear table lookup code is posted below and is my usual method for any expensive mathematical function when speed is desired over accuracy. It simply uses a 1024 element table with linear interpolation. It seems to be both the fastest and most accurate of all methods tested, although the built-in asin() is not much slower really (test it!). It can easily be adjusted for more or less accuracy by changing the size of the table.

// Please test this code before using in anything important!
const size_t ASIN_TABLE_SIZE = 1024;
double asin_table[ASIN_TABLE_SIZE];

int init_asin_table (void)
    for (size_t i = 0; i < ASIN_TABLE_SIZE; ++i)
        float f = (float) i / ASIN_TABLE_SIZE;
        asin_table[i] = asin(f);

    return 0;

double asin_table (double a)
    static int s_Init = init_asin_table(); // Call automatically the first time or call it manually
    double sign = 1.0;

    if (a < 0) 
        a = -a;
        sign = -1.0;

    if (a > 1) return 0;

    double fi = a * ASIN_TABLE_SIZE;
    double decimal = fi - (int)fi;

    size_t i = fi;
    if (i >= ASIN_TABLE_SIZE-1) return Sign * 3.14159265359/2;

    return Sign * ((1.0 - decimal)*asin_table[i] + decimal*asin_table[i+1]);


