Numerically stable method for solving quadratic equations
This answer assumes that the primary concern here is robustness with regard to accuracy, rather than robustness with regard to overflow or underflow in intermediate floating-point computations. The question indicates an awareness of the problem of subtractive cancellation when the commonly used mathematical formula is applied directly using floating-point arithmetic, and the techniques to work around it.
An additional issue to be considered is the accurate computation of the term b²-4ac
. It is examined in detail in the following research note:
William Kahan, "On the Cost of Floating-Point Computation Without Extra-Precise Arithmetic", Nov. 21, 2004 (online)
Recent follow-up work to Kahan's note looked at the more general issue of computing the difference of two products ab-cd
:
Claude-Pierre Jeannerod, Nicolas Louvet, Jean-Michel Muller, "Further analysis of Kahan's algorithm for the accurate computation of 2 x 2 determinants." Mathematics of Computation, Vol. 82, No. 284, Oct. 2013, pp. 2245-2264 (online)
This makes use of the fused multiply-add operation, or FMA, which is available on almost all modern processors including x86-64, ARM64, and GPUs. It is exposed in C/C++ as a standard math function fma()
. Note that on platforms without hardware support for FMA, fma()
must use emulation, which is often quite slow, and some emulations have been found to have serious functional deficiencies.
FMA computes a*b+c
using the full product (neither rounded nor truncated in any way) and applies a single rounding at the end. This allows the accurate computation of the product of two native-precision floating-point numbers as the unevaluated sum of two native-precision floating-point numbers, without resorting to the use of extended-precision arithmetic in intermediate computation: h = a * b
and l = fma (a, b,- h)
where h+l
represents the product a*b
exactly. This provides for the efficient computation of ab-cd
as follows:
/*
diff_of_products() computes a*b-c*d with a maximum error <= 1.5 ulp
Claude-Pierre Jeannerod, Nicolas Louvet, and Jean-Michel Muller,
"Further Analysis of Kahan's Algorithm for the Accurate Computation
of 2x2 Determinants". Mathematics of Computation, Vol. 82, No. 284,
Oct. 2013, pp. 2245-2264
*/
double diff_of_products (double a, double b, double c, double d)
{
double w = d * c;
double e = fma (-d, c, w);
double f = fma (a, b, -w);
return f + e;
}
With this building block, the real roots of a quadratic equation can be computed with high accuracy as follows, provided the discriminant is positive:
/* compute the real roots of a quadratic equation: ax² + bx + c = 0,
provided the discriminant b²-4ac is positive
*/
void solve_quadratic (double a, double b, double c, double *x0, double *x1)
{
double q = -0.5 * (b + copysign (sqrt (diff_of_products (b, b, 4.0*a, c)), b));
*x0 = q / a;
*x1 = c / q;
}
In extensive testing with test cases that do not overflow or underflow in intermediate computation, the maximum error observed in the computed solutions never exceeded 3 ulps.
The Herbie tool for automatically rearranging floating point expressions to reduce rounding error usually provides a good starting point for addressing errors like this.
In this case you can see its output for the positive root of the quadratic using the online demo, to get these results: