Numerically stable algorithm for solving the quadratic equation when $a$ is very small or $0$

The previous answers correctly identify that there are two quadratic formulas, and that each has a different range of numeric stability; however, they miss the other subtle instabilities.

To recap: the standard quadratic formula, \begin{align*} x=\frac{-b\pm\sqrt{b^2-4\,a\,c}}{2\,a}, \end{align*} is not unique. A second formula can be generated by multiplying the numerator and denominator by the "conjugate" of the numerator, i.e., \begin{align*} x&=\frac{-b\pm\sqrt{b^2-4\,a\,c}}{2\,a}\times\frac{-b\mp\sqrt{b^2-4\,a\,c}}{-b\mp\sqrt{b^2-4\,a\,c}}= \frac{2\,c}{-b\mp\sqrt{b^2-4\,a\,c}}, \end{align*} where $\mp=-(\pm)$. These two expressions are analytically equivalent; however, each is numerically unstable for certain values of the coefficients.

The instability discussed by the other posts is if the term, $4\,a\,c$, is small compared to $b^2$ and the sign of the radical term and $b$ are the same, then catastropic cancellation occurs.

Fortunately, the two quadratic formulas have opposite signs on the radical term for the same roots, thus it is possible to avoid catastrophic cancellation by selecting the stable form. I.e.,

\begin{align*} x_1 &= \frac{-b-{\rm sign}(b)\sqrt{b^2-4\,a\,c}}{2a}& x_2 = \frac{c}{a\,x_1} \end{align*}

For completeness, there are at least three other numerical instabilities in this formula:

When $a=0$: thus the equation is linear and has at most one root given by $-c/b$. It should be noted that if $b=0$ as well, the equation is actually $c=0$ and leaves $x$ undefined/unconstrained!
When $a\neq0$ and $c=0$: thus one root is at $x=0$. In this case, the second form of the quadratic equation will yield a NaN ($0/0$) for the second root. It is best therefore to determine the second root by factoring out the root at zero to give $-b/a$.
Overflow: the floating point representation might over/under-flow during calculation and this is most critical for the $b$ coefficient. Defining $\mathcal{R}_{max}$ as the largest representable floating point number ($\approx1.8\times10^{308}$ for double precision) then $b^2$ will overflow to infinity at $|b|>\sqrt{\mathcal{R}_{max}}$ according to the IEEE 754 specification (which is "only" $\approx1.304\times10^{154}$). You might not define this as an instability, more an out of range error, but I think the reduced upper range of $b$ is surprising and certainly caught me out!
??: I'm pretty sure I've missed some more, floating point arithmetic can be very subtle.

I have written a compile-time algebra library and you can take a look at my implementation of the quadratic formula. Below it is a "more-stable" implementation of the cubic formula, shamelessly stolen from D. Herbison-Evans. For higher-order polynomials I rely on bisection techniques based around Budan's theorem which isn't nearly as difficult to make stable!

Sometimes the Citardauq Formula $$x=\frac{2c}{-b\mp \sqrt{b^2-4ac}}$$ can be useful. One chooses the sign that gives no cancellation in the denominator.

Note that only for one from $x_1=\frac{-b-\sqrt{b^2-4ac}}{2a}$ and $x_2\frac{-b+\sqrt{b^2-4ac}}{2a}$ you do "close subtraction", so you can calculate one of them and the second using Viete'a furmula: $x_1 x_2=\frac{c}{a}$.

Numerically stable algorithm for solving the quadratic equation when $a$ is very small or $0$

Tags:

Numerical Methods

Quadratics

Related

Recent Posts