How can Busy beaver($10 \uparrow \uparrow 10$) have no provable upper bound?
The Wikipedia article explains exactly what it means. Here is an elaboration.
Define the complexity of a number $n$ to be the smallest number of states for a Turing machine that returns $n$ when run with $0$ as an input. Certainly every number has a complexity. Now let $T$ be any effective formal theory whose language includes statements of the form "$n$ has complexity greater than $k$" for each $n$ and $k$. Some of these sentences may be provable in the theory, and some may not. Say that the theory is complexity sound if each statement "$n$ has complexity greater than $k$" that is provable in the theory is correct. Both PA and ZFC are effective, complexity sound theories.
Theorem. For any effective, complexity sound theory $T$, there is some $k$ such that the theory does not prove that any number $n$ has complexity greater than to $k$.
Proof by contradiction. Assume $T$ is complexity sound and proves statements of the form "$n$ has complexity greater than $k$" for arbitrarily large $k$. Consider a program $P$ that does the following when run with $0$ as an input. First, $P$ computes its own source code (like a Quine; formally, the proof uses Kleene's recursion theorem). Then $P$ counts the number of states $s$ in this source code. Next $P$ enumerates all statements of the form "$n$ has complexity greater than $k$" that are provable from $T$, until it finds one with $k > s$. It will find one by assumption. Finally, $P$ returns $n$. Because $T$ is an effective theory, $P$ can do all this computably. But then the number $n$ returned is by $P$, which has $s$ states, but $T$ proves "$n$ has complexity greater than $k$" for some $k > s$, so $T$ is not complexity sound, which is a contradiction. That completes the proof.
Thus there is a bound on the largest $k$ such that $T$ proves "$n$ has complexity greater than $k$" for any $n$. In fact, we can actually write down $P$ explicitly, and thus $k$ is no more than the number of states of the version of $P$ we write down. For any reasonable theory, like PA or ZFC, this $k$ will be huge but not anywhere near $10\uparrow\uparrow 10$.
If a sufficiently strong theory cannot prove any $n$ has complexity larger than $k$, then it also cannot prove any statement of the form "$m$ is an upper bound on $\Sigma(k)$", because if it could prove that it would also prove "$m+1$ has complexity greater than $k$". So in particular neither PA nor ZFC can prove an upper bound on $\Sigma(k)$ when $k$ is sufficiently large.
Just as the Wikipedia article says, the theorem above is just a recasting of Chaitin's incompleteness theorem in which the length of the program is replaced with the number of states.
In fact, even $\Sigma(7918)$ has no provable upper bound in ZFC, assuming SRP is consistent.
This was shown by Adam Yedidia and Scott Aaronson. Aaronson talks about the result here.
They constructed a machine $Z$ that can't be proven to run forever in ZFC, but it does run forever assuming other axioms (Namely using the SRP (Stationary Ramsey Property)).
If $\Sigma(7918)$ had a provable upper bound in ZFC, then $Z$ could just be run for that amount of steps to prove it halts or doesn't halt. Contradiction.
While what has been discussed in the comments is relevant (there exist individual Turing machines which have undecidable Halting problems), I doubt this is what the statement in the Wikipedia article is referring to, mainly because there's absolutely no need to mention a bound as high as $10 \uparrow \uparrow 10$ to find a Busy Beaver with an undecidable Halting problem.
Any universal Turing machine has an undecidable Halting problem, for if it didn't, you would be able to solve the Halting problem for any machine it simulates. There are universal Turing machines with only a handful of states (see e.g. this paper of Neary and Woods.) The restriction that a Busy Beaver starts with a blank tape isn't of consequence, because the machine can just write a machine description onto the tape, and then proceed to simulate it.
I suspect the phrase "ordinary mathematics" is an oblique reference to first-order Peano Arithmetic or similar, and the statement is referring to some proof that any proof of a bound for TM's with $10 \uparrow \uparrow 10$ states must use transfinite methods, even if the TM is total (like the Hydra rewriting game, or something in a similar vein.) But this is just a guess.
("Ordinary mathematics" normally suggests ZFC to me, but a proof that this result is independent of ZFC would be somewhat too astonishing to me, for me to take that interpretation here.)