Why are the solutions of polynomial equations so unconstrained over the quaternions?
When I was first learning abstract algebra, the professor gave the usual sequence of results for polynomials over a field: the Division Algorithm, the Remainder Theorem, and the Factor Theorem, followed by the Corollary that if $D$ is an integral domain, and $E$ is any integral domain that contains $D$, then a polynomial of degree $n$ with coefficients in $D$ has at most $n$ distinct roots in $E$.
He then challenged us, as a homework, to go over the proof of the Factor Theorem and to point out exactly which, where, and how the axioms of a field used in the proof.
Every single one of us missed the fact that commutativity is used.
Here's the issue: the division algorithm (on either side), does hold in $\mathbb{H}[x]$ (in fact, over any ring, commutative or not, in which the leading coefficient of the divisor is a unit). So given a polynomial $p(x)$ with coefficients in $\mathbb{H}$, and a nonzero $a(x)\in\mathbb{H}[x]$, there exist unique $q(x)$ and $r(x)$ in $\mathbb{H}[x]$ such that $p(x) = q(x)a(x) + r(x)$, and $r(x)=0$ or $\deg(r)\lt\deg(a)$. (There also exist unique $q'(x)$ and $s(x)$ such that $p(x) = a(x)q'(x) + s(x)$ and $s(x)=0$ or $\deg(s)\lt\deg(a)$.
The usual argument runs as follows: given $a\in\mathbb{H}$ and $p(x)$, divide $p(x)$ by $x-a$ to get $p(x) = q(x)(x-a) + r$, with $r$ constant. Evaluating at $a$ we get $p(a) = q(a)(a-a)+r = r$, so $r=p(a)$. Hence $a$ is a root if and only if $(x-a)$ divides $p(x)$.
If $b$ is a root of $p(x)$, $b\neq a$, then evaluating at $b$ we have $0=p(b) = q(b)(b-a)$; since $b-a\neq 0$, then $q(b)=0$, so $b$ must be a root of $q$; since $\deg(q)=\deg(p)-1$, an inductive hypothesis tells us that $q(x)$ has at most $\deg(p)-1$ distinct roots, so $p$ has at most $\deg(p)$ roots.
And that is where we are using commutativity: to go from $p(x) = q(x)(x-a)$ to $p(b) = q(b)(b-a)$.
Let $R$ be a ring, and let $a\in R$. Then $a$ induces a set-theoretic map from $R[x]$ to $R$, "evaluation at $a$", $\varepsilon_a\colon R[x]\to R$ by evaluation: $$\varepsilon_a(b_0+b_1x+\cdots + b_nx^n) = b_0 + b_1a + \cdots + b_na^n.$$ This map is a group homomorphism, and if $a$ is central, also a ring homomorphism; if $a$ is not central, then it is not a ring homomorphism: given $b\in R$ such that $ab\neq ba$, then we have $bx = xb$ in $R[x]$, but $\varepsilon_a(x)\varepsilon_a(b) = ab\neq ba = \varepsilon_a(xb)$.
The "evaluation" map also induces a set theoretic map from $R[x]$ to $R^R$, the ring of all $R$-valued functions in $R$, with the pointwise addition and multiplication ($(f+g)(a) = f(a)+g(a)$, $(fg)(a) = f(a)g(a)$); the map sends $p(x)$ to the function $\mathfrak{p}\colon R\to R$ given by $\mathfrak{p}(a) = \varepsilon_a(p(x))$. This map is a group homomorphism, but it is not a ring homomorphism unless $R$ is commutative.
This means that from $p(x) = q(x)(x-a) + r(x)$ we cannot in general conclude that $p(c) = q(c)(c-a) +r(c)$ unless $c$ commutes in $R$ with $a$. So the Remainder Theorem may fail to hold (if the coefficients involved do not commute with $a$ in $R$), which in turn means that the Factor Theorem may fail to hold So one has to be careful in the statements (see Marc van Leeuwen's answer). And even when both of them hold for the particular $a$ in question, the inductive argument will fail if $b$ does not commute with $a$, because we cannot go from $p(x) = q(x)(x-a)$ to $p(b)=q(b)(b-a)$.
This is exactly what happens with, say, $p(x) = x^2+1$ in $\mathbb{H}[x]$. We are fine as far as showing that, say, $x-i$ is a factor of $p(x)$, because it so happens that when we divide by $x-i$, all coefficients involved centralize $i$ (we just get $(x+i)(x-i)$). But when we try to argue that any root different from $i$ must be a root of $x+i$, we run into the problem that we cannot guarantee that $b^2+1$ equals $(b+i)(b-i)$ unless we know that $b$ centralizes $i$. As it happens, the centralizer of $i$ in $\mathbb{H}$ is $\mathbb{R}[i]$, so we only conclude that the only other complex root is $-i$. But this leaves the possibility open that there may be some roots of $x^2+1$ that do not centralize $i$, and that is exactly what occurs: $j$, and $k$, and all numbers of the form $ai+bj+ck$ with $a^2+b^2+c^2=1$ are roots, and if either $b$ or $c$ are nonzero, then they don't centralize $i$, so we cannot go from $x^2+1 = (x+i)(x-i)$ to "$(ai+bj+ck)^2+1 = (ai+bj+ck+i)(ai+bj+ck-i)$".
And that is what goes wrong, and there is where commutativity is hiding.
The finiteness of the number of roots of a polynomial $f(x)\in K[x]$ where $K$ is a field depends on two interlaced facts:
$K[x]$ is a Unique Factorization Domain: every polynomial $f(x)$ factors in an essentially unique way as a product of irreducibles;
if $f(\alpha)=0$ then $f(x)=(x-\alpha)g(x)$ where $\deg g(x)=(\deg f(x))-1$.
The combination of these two facts (the first one in particular) does not hold anymore if you think the polynomial $f(x)$ as a polynomial with coefficients in the ring $\Bbb H$ of Hamilton quaternions. This is because the latter is not commutative.
You may also ponder on this fact: in a commutative environment the transformation $a\mapsto\phi_h(a)=hah^{-1}$ (conjugation) is always trivial. Not so in $\Bbb H$, again as a side effect of non-commutativity. The point is that if an element $a$ satisfies a certain algebraic relation with real coefficient (such as $a^2=1$), so will all its conjugates $\phi_h(a)$.
I would like to emphasize a point which is made in Arturo Magidin's answer but perhaps in different words: if $D$ is a noncommutative division ring, then the ring $D[x]$ of polynomials over $D$ does not do what you want it to do.
If $F$ is a field, then one reason you might care about working with polynomials $F[x]$ is that they describe all the expressions you could potentially get from some unknown $x \in F$ (or perhaps $x \in \bar{F}$ or perhaps something even more general than this) via addition and multiplication.
Why does this break down when you replace $F$ with a noncommutative division ring $D$? The problem is that if you work with some unknown $x \in D$ (or in some ring containing $D$) then $x$, by assumption, doesn't necessarily commute with every element in $D$, so starting from $x$ and adding and multiplying you get not only expressions like $$a_0 + a_1 x + a_2 x^2 + ...$$
but more complicated expressions like $$a_0 + a_{1,0} x + x a_{1,1} + a_{1, 2} x a_{1, 3} + a_{2,0} x^2 + x a_{2,1} x + x^2 a_{2,2} + a_{2, 3} x^2 a_{2,4} + a_{2, 5} x a_{2, 6} x a_{2,7} + ... $$
The resulting algebraic structure is quite a bit more complicated than $D[x]$. Already you can't in general combine expressions of the form $axb$ and $cxd$, so even to describe the expressions you can get by using $x$ once I should've actually written $$a_0 + a_{1,0} x a_{1,1} + a_{1,2} x a_{1,3} + a_{1,4} x a_{1,5} + ....$$