Are there non-standard counterexamples to the Fermat Last Theorem?
I am going to respond to two questions quoted below, which come from this comment. Here "TP" means "transfer principle". I've switched the order of the questions.
... why it does not prove that any sentence for which TP is preserved and which is true in the standard model, must be true in all models, and hence provable in PA.
Here is an informal summary of how the hyperreal construction works. If we begin with any structure $M$, with an associated formal language, and take an ultrapower, we will obtain a structure $M^*$. By Los's theorem, $M^*$ will satisfy the same sentences as $M$ in that formal language. In the cases of interest, $M$ has an internal real line $\mathbb{R}$ and an internal semiring of naturals $\mathbb{N}$. The ultraproduct $M^*$ with then have an internal "hyperreal" line $\mathbb{R}^*$ and an internal semiring of "hypernaturals" $\mathbb{N}^*$.
If we begin with different models $M_1$ and $M_2$ (e.g. if one of them is nonstandard), we simply obtain different models $M_1^*$ and $M_2^*$. Sentences from the appropriate language are true in $M_1$ if and only if they are true in $M_1^*$, and true in $M_2$ if and only if they are true in $M_2^*$, but if there was no connection between $M_1$ and $M_2$ to begin with then the ultrapower construction can't create one. The transfer principle only holds between each individual model $M$ and its ultrapower $M^*$.
At this point I am most curious as to why a naive transplantation of the TP argument to "nonstandard" embedding does not (indirectly) prove that FLT holds in all models of PA, even if it does not provide any means of (directly) converting Wiles's proof into PA.
The original question is right - "Wiles's proof uses not as much analysis as high end algebraic geometry". This is bad for nonstandard analysis, though, because when we make the hyperreal line we normally begin with a structure $M$ that is just barely more than the field of real numbers. This is good if we want to work with the kinds of constructions used in elementary analysis, but not as useful if we need to work with more general set-theoretic constructions.
Wiles' proof as literally read uses various set-theoretic constructions of "universes". So, to try to approach that in a nonstandard setting, we would want to start with a model $M$ that is much more than just a copy of the real line. That would not seem to help us show that FLT holds in every model of PA.
The thing about the proof is that the "literal" reading is too strong. I am no expert in the algebraic geometry used, but I followed the discussions in several forums, and here is the situation as I understand it. The proof relies on several general lemmas which, to be proved in utmost generality, were proved using very strong methods. However, in concrete cases such as FLT only weaker versions of the lemmas are needed, and experts seem to believe that the weaker versions should be provable in PA. But actually working out the proof would require a lot of effort to prove in PA the special, weaker cases of all the necessary lemmas and then combine them to get FLT.
This is very analogous in my mind to the fact that in logic we often prove things using strong axioms, but experts in logic recognize that these things are also provable, in concrete cases, in much weaker systems. We don't generally dwell on that, or even point it out, unless there is a specific reason. Indeed, the number of things which I know are provable in PA is much larger than the number for which I have ever written out a proof in PA.
Colin McLarty has published several papers on the axioms needed for FLT. You could look at What does it take to prove Fermat's Last Theorem? from the Bulletin of Symbolic Logic which is relatively accessible.