Why use ZF over NFU?
I've never studied NF or NFU in any great detail, but I have found some points rather subtle and potentially not worth the effort to work around.
Stratification. NF cannot escape its roots in Russellian type theory. This means that it is difficult to directly compare sets of different "types". For example, consider the "maps" $X \to \mathscr{P}(X)$ used in Cantor's theorem. This is in some sense illegal in NF: you can only have maps between sets of the same type, and the type of $\mathscr{P}(X)$ is one higher than that of $X$. The standard NF circumlocution here is to talk about maps $\mathscr{P}_1 (X) \to \mathscr{P} (X)$ instead, where $\mathscr{P}_1 (X)$ denotes the set of singleton subsets of $X$. Cantor's proof carries through without a problem once this workaround is adopted... but one has to realise that this is a different theorem. In particular, if $V$ is the universal set, we can only prove that there is no surjection $\mathscr{P}_1 (V) \to \mathscr{P}(V)$ – a far cry from the indubitable fact that $\mathscr{P}(V) \subseteq V$! (In fact, if you have even one urelement, then $\mathscr{P}(V) \subsetneq V$.)
Failure of cartesian-closedness. Consider the evaluation "map" $\textrm{ev} : Y^X \times X \to Y$. The graph of this "map" cannot be a set in NF for various reasons. [1, 2] Roughly speaking, if $X$ is a set of type $n$, then $Y$ must also be a set of type $n$, in which case $Y^X \subseteq \mathscr{P}(X \times Y)$ is a set of type $n + 1$; but then we are unable to write down a stratified definition of $Y^X \times X$. Thus, the category of NF sets is not cartesian closed!
Local types. What I have said so far appears to contradict the (finite) axiomatisation given by Holmes [3], which says e.g. that you can always form the set $X \times Y$ or $Y^X$. But this is only an apparent contradiction. The fact of the matter is that sets in NF do not actually have a type. I like to think of NF as being "locally typed": sets only gain types when they interact with other sets. Thus, $Y^X \times X$ exists (in some sense), but we cannot make much use of it. Holmes writes,
Another fact about stratification restrictions should be noted: a variable free in $\phi$ in a set definition $\{ x \mid \phi \}$ may appear with more than one type without preventing the set from existing, as long as this set definition is not itself embedded in a further set definition. The reason for this is that such a set definition can be made stratified by distinguishing all free variables: the resulting definition is supposed to work for all assignments of values to those free variables, including those in which some of the free variables (even ones of different type) are identified with one another. For example, the set $\{ x, \{ y \} \}$ has a stratified definition; the set $\{ x, \{ x \} \}$ has an unstratified definition, but the existence of $\{ x, \{ y \} \}$ for all values of $x$ and $y$ ensures the existence of $\{ x, \{ x \} \}$ for any $x$. But a term $\{ x, \{ x \} \}$ cannot appear in the definition of a further set in which $x$ is bound.
Weaknesses. Holmes [4] points out that bare NFU has the same consistency strength as PA, and NFU + Infinity + AC has the same consistency strength as Mac Lane set theory (MAC) (i.e. Zermelo set theory but with only $\Delta_0$-separation). It is true that in many cases we only need $\Delta_0$-separation to form the sets we want – but when induction seeps into the picture we have to start worrying.
Consider the vector space $V$ of all real polynomials. There is no doubt that we can form the dual space $V^* = \textrm{Hom}_\mathbb{R} (V, \mathbb{R})$ in MAC – but Mathias [5] says that MAC cannot prove "the $n$-th iterated dual space exists for all natural numbers $n$". This is because the cardinalities of the iterated dual space grow too fast – after all, $V_{\omega + \omega}$ is a model of MAC. This seems contrived, but it is enough for me to start worrying about whether important inductive constructions can be carried out in MAC. And if MAC is not strong enough to do this bit of mathematics, why should NFU + Infinity + Choice be? (The two theories are not biinterpretable, so it's not literally true that they prove the same theorems.)
Personally, I'd rather have unrestricted comprehension. But then I may as well ask for a pony...
The problem with NF(U) is that it forces you to deal with the logic all the time. You can't really use anything and just make sure it's bounded by some set.
You need to keep verifying whether the defining formulas are stratified, and if you can or cannot use them to define new sets. Of course, you can use classes (in the sense of definable collections which are not sets) but this is not how we want a foundational set theory to work. We want to have the sets for mathematics.
I have to admit my own ignorance about how much of the mathematics it is possible to develop within the confines of the logic-pushing theories of NF and NFU, however there are two points to give here:
Mathematics, and indeed much of the human world, is based on "first come, first serves". It takes time and an incredible force to revolutionize the world. ZF was quite natural in many ways, and it was very usable for other people outside set theory. Of course you can always start working in NF(U) and hope it will catch on, write books and so on.
However as things stand ZF is the king of the hill, and the fact is that most mathematicians don't really care what is the set theoretical framework. They just want to do their mathematics. It happens that I am working on a project related to model-categories and models of ZFC nowadays. I see how little care we have to many foundational questions. Do we take things which are internally definable? Externally definable? Do we assume there is a model and do this and that?
No. We simply do the mathematics and solve the problems as they come along. ZF is very good and natural for that. Here is an example, I want to argue something about the collection of topologies of the real numbers which are Hausdorff. I don't need to argue that the defining formula of being a Hausdorff topology over $\mathbb R$ has some syntactical properties and show that this collection is a set. I simply state it and move along.
We also don't know about the relative consistency of the theories. It is possible that NFU is stronger or weaker than ZF, it could be that both are equiconsistent. Of course this is just a philosophical question how and where do you put your foundational "jump" and believe that your mathematics is consistent (and you have to assume something, anyway).
The proof of the consistency of NFU assumes the consistency of some other theory, and even the consistency of first-order logic. We don't know, we will never know whether or not our extremely-abstract-foundational theories are consistent. We will always have to assume that they are.
Most mathematicians (whatever they officially claim) are not pure formalists: they are guided by various conceptions of the mathematical universes they purport to describe. That goes for set theorists too.
Zermelo set theory with choice was introduced (wasn't it?) as a compendium of the principles about set-building that mathematicians actually appealed to. Later, Replacement was thrown in the mix, giving us what we now call ZFC.
Now, at the outset it all looked a bit ad hoc (cooked up to give us a paradox-avoiding toolkit for the working mathematician). And the fact that it "worked" wasn't initially regarded as enough to quiet the search for alternatives.
However, already in Gödel's 1933 lecture the suggestion is emerging that ZFC (with Foundation) is a theory that actually fits a lovely and natural conception of sets as forming a well-founded cumulative hierarchy. It took (it seems in retrospect) a surprising time for this conception to become canonical: but nowadays most people imbibe it with their first set-theory course -- think of all those diagrams of the universe of the pure sets in a V-shaped stack of levels, with the von Neumann ordinals as the vertical spine. So, in short, ZFC now comes with a nice intuitively graspable story about the universe of sets that it purports to give us a (partial) description of. No wonder it can seem very attractive.
What, on the other hand, does the universe of sets NF purports to describe "look like"? NF, it seems, comes with no such motivating story -- or at least, not one with such intuitive and natural appeal. (Perhaps is no coincidence that NF was proposed by that arch nominalist Quine, who would put no value on our purported intuitive grasp of the structure of cumulative hierarchy.)
So there's a prima facie contrast: a 'natural' intuitive conception underpinning ZFC vs. (what can seem like) a paradox-blocking syntactic trick giving us NF. That's one rationale -- isn't it? -- for the continued predominance of standard set theory (or of a close variant like Scott-Potter, whose attraction is precisely that it more explicitly starts from the idea of a cumulative hierarchy). I am not here endorsing that rationale: I'm just suggesting that, rightly or wrongly, something like this plays a significant part in the story.