Why does factorizing of polynomials work

I totally recommend getting a copy of first edition of Hungerford's algebra. He really explains abstract algebra starting with the integers: primes and divisibility, then he goes to integer polynomials, then he goes and goes. He starts out with the ideas that elementary students learn (prime numbers) and talks about why unique factorizations exist (Fundamental Theorem of Arithmetic), then shows that polynomial factorizations are basically the same. Lots of examples, lots of problems, and he really explains how things work (also the first edition is really cheap... the later editions go at it like more modern books and introduce groups first... but Hungerford in the 1st edition does the simple and easy to understand examples and ideas first... I just found it much easier to learn.). I really like the book. A really useful thing that you might not have thought about is how being able to factor a number like $21$ into primes $3$ and $7$ is related to factoring a polynomial $x^2 + x - 6$ into smaller polynomials $x-2$ and $x+3$... but actually $x-2$ and $x-3$ are irreducible polynomials: you can't really write it as a product of smaller polynomials, just like $3$ and $7$ cannot be written as the product of smaller numbers (except 1... but the same goes with those linear polynomials, you can write $x-2 = \frac{1}{2}(2x - 2)$... but that really did not do anything special, just like writing $7 = 7 \times 1$ really did not do anything special.)

But to answer your question: polynomials cannot always be factored. Consider $x^2 + 1$. It cannot be factored because if could you could write it as $(x+a)(x+b) = x^2 + (a+b)x + ab,$ that would mean that $ab = 1$ and $a + b = 0$. But solving for $b$ gives: $a = -b$, so plugging this into the other equation relating $a$ and $b$ gives us: $$1 = ab = (-b)b = -b^2,$$ which means that $b^2 = -1$. So, since the square of any real number is positive or zero, we see that we cannot factor it.... using just real numbers. But we can using complex numbers. If $i = \sqrt{-1}$, then $b = i$ and $a = -i$ gives a factorization: $$x^2 + 1 = (x+i)(x-i).$$

In general, if you know a polynomial $f(x)$ (that is not a constant) has a root $r$ (which means that $f(r) = 0$), then $f$ can be factored as $f(x) = (x-r)q(x),$ where $q$ is another polynomial of degree one smaller than $f$. So, if we know that every polynomial that is not a constant has a root, either we are done and $q$ is a constant or $q$ is not a constant then there is a root $r'$ of $q(x)$, so $q(x) = (x-r')q'(x)$ for some polynomial $q'(x)$ of degree one less than $q(x)$. So, we can write $f(x) = (x-r)(x-r')q'(x)$... and we can keep factoring.

So, the complex numbers have the property that every polynomial has a root. (This is a deep result, called the Fundamental Theorem of Algebra) So, this means that any polynomial with real coefficients can be factored using complex numbers. If you just want to factor it with real coefficients then you cannot always factor it, but you can factor it into quadratics and linear terms. E.g. $$x^4-1 = (x^2 - 1)(x^2 + 1) = (x+1)(x-1)(x^2 + 1).$$ The reason is that every odd degree polynomial with real coefficients has a real root by the intermediate value theorem, so I can always factor an odd polynomial $f(x)$ as $(x-r)q(x)$, but $q(x)$ is even and I might not be able to factor it. (An other example is $f(x) = x^5 - x^4 + x - 1 = (x-1)(x^4 + 1)$)


As I'm sure you're aware, you can represent a polynomial as either $p(x)=a_0+a_1x+a_2x^2+\ldots$ or as $p(x)=(x-r_1)(x-r_2)\ldots$ where each $r_k$ is a root of $p(x)$. In other words, as a sum of powers of $x$, or as a product of roots.

So, to my mind, there's a few interesting questions here, listed in order of difficulty.

  1. Why are $r_k$ roots of $p(x)$?
  2. Why can we write $p(x)$ as both a product and a sum, simultaneously?
  3. Are the product and sum formulae equal?
  4. Can we write every polynomial as a product of roots?

$r_k$ are roots of $p(x)$

Starting off with an easy one, each $r_k$ must be a root of $(x-r_1)(x-r_2)\ldots$ since we can substitute in $r_k$ and find that one of the bracketed terms will be $(r_k-r_k)=0$ so the whole expression must be $0$. Hence, each $r_k$ is a root of $p(x)$.

$p(x)$ can be expressed as both a product and as a sum

We demonstrate this algebraically by multiplying out all the factors. Notice that as we multiply everything out, we pick either $x$ or $r_k$ from each bracket. So one term will be a product of $x$, one term will be only $r_k$, and all other combinations in-between.

$$\begin{aligned}(x-r_1)(x-r_2)\ldots(x-r_n) &=(\underbrace{x\times x\ldots}_n)+\ldots+(-1)^n(r_1r_2\ldots r_n) \\&=x^n+b_{n-1}x^{n-1}+\ldots+b_1x+b_0\end{aligned}$$

Where, each $b_k$ is a constant, made of some combination of $r_k$. So now, we've got a representation of $p(x)$ as a sum of powers of $x$, as desired. In fact, the value of each coefficient, $b_k$, is given by Vietas's Formulas. You may ask why there's no coefficient, $b_k$ for $x^n$; it was just easier to read this way but you could multiply both sides by $b_k$.

The product and sum formulae are the same: $\sum a_kx^k=\prod(x-r_k)$ (Matching of polynomial coefficients)

We know that $p(x)=a_0+a_1x+\ldots$ but we've just shown that $p(x)=b_0+b_1x+\ldots$. For me, the next question I'd ask is whether $a_k$ and $b_k$ are the same coefficients. Sure, we've shown that the polynomial equals a sum of powers of $x$ but is it the only possible sum?

Well for our first representation, $p(0)=a_0$ and in our second representation, $p(0)=b_0$. So $p(0)=a_0=b_0$, so we can subtract $a_0$ or $b_0$ from each representation, then divide through by $x$ to see that $a_1+a_2x+\ldots=b_1+b_2x+\ldots$. So then, we can substitute $x=0$ again to see that $a_1=b_1$. We can repeat this method to see that each $a_k=b_k$, so the two representations are identical.

This means our product is the exact same thing as the sum $a_0+a_1x+\ldots$. So we could also have gone the other way around, and factorized $a_0+a_1x+\ldots$ into $(x-r_1)(x-r_2)\ldots$, which may answer your question. But this is all assuming those roots actually exist. How do we know they do?

Polynomials, of degree $n$, have exactly $n$ complex roots

This is the hardest part to prove and I imagine it's what you were most interested in. Why is it valid to say that $p(x)=(x-r_1)(x-r_2)\ldots$? Why should $p(x)$ have any roots, anyway? We assumed it did, but we didn't prove it.

As @flawr and @user357980 have pointed out, what we're now looking for is the Fundamental Theorem of Algebra. This theorem tells us that every $n$th degree polynomial, has $n$ roots.*

What we have to watch out for is that the roots of the polynomial may be Complex Numbers. We could have a polynomial like $p(x)=x^2+1$, where the way to factorise it is as $p(x)=(x+i)(x-i)$. Nevertheless, though we may have to use complex numbers, this theorem gives us the knowledge that we can factorize polynomials.

Please let me know if you think I've made any errors.


* To be more precise, the polynomials need to be non=constant and with coefficients and roots in $\mathbb{C}$


You cannot factor every (real) polynomial into (real) linear factors, for example $x^2+1$ cannot be factored further. In order to factor out one linear factor, you need to have degree 3. But in the complex numbers you can - actually you can by definition, as the complex numbers in some sense are defined to be the numbers such that all polynomials can be factored into linear polynomials. This is the Fundamental Theorem of Algebra.

In order to understand more about that, I recommend reading up on complex numbers and how to calculate with them just to get a feeling, then I recommend working through some algebra book. I like recommending "Algebra" by Michael Artin.