CORDIC Arcsine implementation fails
The "single rotate" arcsine goes badly wrong when the argument is just greater than the initial value of 'x', where that is the magical scaling factor -- 1/An ~= 0.607252935 ~= 0x26DD3B6A.
This is because, for all arguments > 0, the first step always has y = 0 < arg, so d = +1, which sets y = 1/An, and leaves x = 1/An. Looking at the second step:
if arg <= 1/An, then d = -1, and the steps which follow converge to a good answer
if arg > 1/An, then d = +1, and this step moves further away from the right answer, and for a range of values a little bigger than 1/An, the subsequent steps all have d = -1, but are unable to correct the result :-(
I found:
arg = 0.607 (ie 0x26D91687), relative error 7.139E-09 -- OK
arg = 0.608 (ie 0x26E978D5), relative error 1.550E-01 -- APALLING !!
arg = 0.685 (ie 0x2BD70A3D), relative error 2.667E-04 -- BAD !!
arg = 0.686 (ie 0x2BE76C8B), relative error 1.232E-09 -- OK, again
The descriptions of the method warn about abs(arg) >= 0.98 (or so), and I found that somewhere after 0.986 the process fails to converge and the relative error jumps to ~5E-02 and hits 1E-01 (!!) at arg=1 :-(
As you did, I also found that for 0.303 < arg < 0.313 the relative error jumps to ~3E-02, and reduces slowly until things return to normal. (In this case step 2 overshoots so far that the remaining steps cannot correct it.)
So... the single rotate CORDIC for arcsine looks rubbish to me :-(
Added later... when I looked even closer at the single rotate CORDIC, I found many more small regions where the relative error is BAD...
...so I would not touch this as a method at all... it's not just rubbish, it's useless.
BTW: I thoroughly recommend "Software Manual for the Elementary Functions", William Cody and William Waite, Prentice-Hall, 1980. The methods for calculating the functions are not so interesting any more (but there is a thorough, practical discussion of the relevant range-reductions required). However, for each function they give a good test procedure.
To review a few things mentioned in the comments:
- The given code outputs values identical to another CORDIC implementation. This includes the stated inaccuracies.
- The largest error is as you approach
arcsin(1)
. - The second largest error is that the values of
arcsin(0.60726)
toarcsin(0.68514)
all return0.754805
. - There are some vague references to inaccuracies in the CORDIC method for some functions including arcsin. The given solution is to perform "double-iterations" although I have been unable to get this to work (all values give a large amount of error).
- The alternate CORDIC implemention has a comment
/* |a| < 0.98 */
in the arcsin() implementation which would seem to reinforce that there is known inaccuracies close to 1.
As a rough comparison of a few different methods consider the following results (all tests performed on a desktop, Windows7 computer using MSVC++ 2010, benchmarks timed using 10M iterations over the arcsin() range 0-1):
- Question CORDIC Code: 1050 ms, 0.008 avg error, 0.173 max error
- Alternate CORDIC Code (ref): 2600 ms, 0.008 avg error, 0.173 max error
- atan() CORDIC Code: 2900 ms, 0.21 avg error, 0.28 max error
- CORDIC Using Double-Iterations: 4700 ms, 0.26 avg error, 0.917 max error (???)
- Math Built-in asin(): 200 ms, 0 avg error, 0 max error
- Rational Approximation (ref): 250 ms, 0.21 avg error, 0.26 max error
- Linear Table Lookup (see below) 100 ms, 0.000001 avg error, 0.00003 max error
- Taylor Series (7th power, ref): 300 ms, 0.01 avg error, 0.16 max error
These results are on a desktop so how relevant they would be for an embedded system is a good question. If in doubt, profiling/benchmarking on the relevant system would be advised. Most solutions tested don't have very good accuracy over the range (0-1) and all but one are actually slower than the built-in asin()
function.
The linear table lookup code is posted below and is my usual method for any expensive mathematical function when speed is desired over accuracy. It simply uses a 1024 element table with linear interpolation. It seems to be both the fastest and most accurate of all methods tested, although the built-in asin()
is not much slower really (test it!). It can easily be adjusted for more or less accuracy by changing the size of the table.
// Please test this code before using in anything important!
const size_t ASIN_TABLE_SIZE = 1024;
double asin_table[ASIN_TABLE_SIZE];
int init_asin_table (void)
{
for (size_t i = 0; i < ASIN_TABLE_SIZE; ++i)
{
float f = (float) i / ASIN_TABLE_SIZE;
asin_table[i] = asin(f);
}
return 0;
}
double asin_table (double a)
{
static int s_Init = init_asin_table(); // Call automatically the first time or call it manually
double sign = 1.0;
if (a < 0)
{
a = -a;
sign = -1.0;
}
if (a > 1) return 0;
double fi = a * ASIN_TABLE_SIZE;
double decimal = fi - (int)fi;
size_t i = fi;
if (i >= ASIN_TABLE_SIZE-1) return Sign * 3.14159265359/2;
return Sign * ((1.0 - decimal)*asin_table[i] + decimal*asin_table[i+1]);
}