What is so special about set theory anyway?
It's not clear to me whether your question is more about the role of sets versus other foundational objects, or about how set theory can be extended with large cardinal axioms to discuss models of stronger and stronger theories. Monroe's answer addresses the latter. Regarding the former, maybe I'll take this opportunity to push an analogy that I think deserves more currency. In computer science there are notions such as:
A Turing-complete programming language. All Turing-complete languages are expressive enough to simulate each other, and this is essentially the definition of the class. In particular, a language is Turing-complete as soon as (1) it can simulate at least one known Turing-complete language and (2) it can be simulated by at least one known Turing-complete language. Thus, to define Turing-completeness it suffices to give one example of such a language; the classical example is Turing machines.
An NP-complete problem is, by definition, one that is solvable in polynomial time by a nondeterministic algorithm, and any other such problem can be reduced to it in polynomial time (by a deterministic algorithm). This means that a problem is NP-complete if, and only if, (1) it can be reduced in polynomial time to some known NP-complete problem, and (2) some known NP-complete problem can be reduced to it in polynomial time. So we could equivalently define the class of NP-complete problems by giving one example.
I suggest that we should recognize an analogous notion of a mathematics-complete theory: a theory which is expressive enough that all of mathematics can be encoded into it (modulo suitable extensions such as large cardinal axioms, use of stronger or weaker logics, etc.) To define the class of mathematics-complete theories, it suffices to give one example thereof, and empirical evidence suggests that set theory is one such choice (indeed, the first to be discovered).
Mathematics-complete theories are often called foundations for mathematics. I don't propose to do away with this terminology, but it sometimes produces confusion as some people seem to sometimes mean something more by it. The phrase "mathematics-complete" emphasizes that we mean nothing more (or less) than that all of mathematics can be encoded into a theory, whether or not this encoding is "natural" or "intuitive".
So I guess my answer to your questions is that one such formal property is "set theory can be encoded into it, and it can be encoded into set theory". The apparent primacy of set theory in this definition is just an artifact of the fact that set theory was the first example of such a theory; an equivalent definition would be "type theory can be encoded into it, and it can be encoded into type theory".
It is my understanding that set theory's interpretive power is a quasi-empirical fact, and that at present there is no grand theoretical explanation of the phenomenon. By "set theory," I mean the mathematical research area, not a specific formal theory.
Set theory has strong interpretive power in the sense that it provides a "measuring stick" for gauging the relative consistency (aka interpretability) of mathematical theories in general--the large cardinal hierarchy. We do not have an explanation for why this system should appear as a measuring stick rather than a partial order, but it appears to be a well-ordered hierarchy. Large cardinals have had enough success in proving relative consistency results that we have come to accept them as the arbiters of consistency (of strong enough theories).
The fact that large cardinal concepts are formalized in set theory rather than some other foundational theory is a historical fact. Things could have conceivably turned out differently. The study of interpretability of weaker theories is typically done in the context of subsystems of second-order arithmetic, where the notion of set plays a much smaller role.
One could raise the following objection to the large cardinal hierarchy as an ultimate arbiter. There are many unsolved relative consistency questions. Proving these statements consistent/independent relative to large cardinals would be considered a solution. But perhaps there are statements that will not be measured by large cardinals because their consistency strength is not comparable to large cardinals. Our belief in the supremacy of large cardinals (or social conformity to the practice) just prevents us from designating other statements as important nodes in the sea of consistency strengths. An incomparable statement would remain invisible to us as such.
I agree with other respondents that it is unlikely that one will be able to come up with some kind of formal argument that distinguishes set theory from other "mathematics-complete" systems (to use Mike Shulman's term, which I like!), because mathematicians are so good at rephrasing one language in terms of another. There is surely also some degree of what one might call "historical accident" involved; we instinctively think of a model as a set because that's how we were taught, and how our teachers were taught, etc.
That said, I think a case can be made that set theory has some psychological advantages when it comes to addressing certain questions, e.g., is mathematics consistent? and exactly which assumptions are used to derive which theorems? Today, most mathematicians take a rather breezy attitude towards the consistency question, assuming that it's all been sorted out by logicians and that it's no concern of theirs. If you take consistency seriously, however, then it is vitally important to try to build everything from the ground up, one small step at a time, in as simple and clear a manner as possible. As long as you limit yourself to finitary mathematics, several alternatives to set theory are available (arithmetic, syntax, type theory, ...), but for infinitary reasoning, set theory seems to be the psychologically best choice for most people.
Even for finitary reasoning, set theory has the advantage of being a very flexible and fine-grained tool. In reverse mathematics, the standard approach is to consider subsystems of second-order arithmetic. Arithmetic is certainly a natural foundation for finitary mathematics, but it quickly becomes an irresistible temptation to introduce set-theoretic reasoning because it makes it so easy to introduce any new concept or axiom that you might come up with. That is why "second-order" arithmetic becomes the de facto foundation.
There are other goals you might have for a "foundation for mathematics" for which set theory is arguably not the optimal approach. For example, maybe you have developed some conception of the mathematical universe, and you want some formal theory that directly captures the structures and concepts you have in mind. Then category theory or homotopy type theory might be more attractive. Still, I think set theory is hard to beat as a way to "get off the ground" so to speak. To fully appreciate the merits of category theory or homotopy type theory requires a lot of prior mathematical knowledge and experience, and I think that most people would have trouble acquiring all that background knowledge without ever appealing to any set-theoretic concepts along the way.