How are sets defined in reverse mathematics?

The answer is in the name of the book.

The system described is equivalent to a particular (monadic) second-order theory using Henkin semantics.

Instead of talking about "sets", Simpson could have instead talked about predicate variables (which would be better in my opinion...) Using uppercase to refer to predicate variables and lowercase to refer to number variables to disambiguate, Simpson could have written comprehension, say, as: $$\exists P.\forall n.P(n)\leftrightarrow \varphi(n)$$ where $\varphi$ is a formula not containing $P$ freely. It is clear what operations you can perform on predicate variables. Namely, given a predicate variable $P$, the only thing you can do with it is apply it to a number term. There's no need to provide a "construction" of these anymore than there is a need to provide a "construction" of "number" variables (which Simpson also doesn't do).

As usual for a logic, Simpson provides a semantics which interprets this formal syntax into set theoretic terms. In this semantics, you get to choose what "number" variables refer to by choosing the semantic domain which could have nothing to do with natural numbers. The question remains on how to interpret predicate variables. The common options are Henkin semantics and full (or standard) semantics. In both of the semantics, we interpret predicate variables as subsets of the domain and $P(n)$, for example, if the interpretation of $n$ is a member of the interpretation of $P$. The difference between Henkin semantics and full semantics is simply whether you get to choose which subsets are allowable interpretations for predicate variables (Henkin semantics), or if any subset of the domain is always allowed (full semantics). In other words, for Henkin semantics you select a subset $\mathcal S\subseteq \mathcal P(D)$ where $D$ is the domain (for "number" variables), and for full semantics we require $\mathcal S = \mathcal P(D)$.

This more logical rendition makes it more clear that the "sets" of $Z_2$ have nothing a priori to do with set-theoretic sets just like the "numbers" have no a priori relation to natural numbers. The semantics connects it to set theory, but we could use other semantics such as categorical semantics that don't interpret things as (sub)sets, or we could eschew semantics entirely and only work with the proof theory.


I'm not sure I've understood your question correctly, but let me take a stab at it:

The short version is that we can develop RCA$_0$ (and any other theory, for that matter) entirely "autonomously," at least as far as the formalism is concerned, in the same way that ZF itself is developed.


You write:

But, right after that, he starts talking about sets, set variables, set memberships, the natural numbers set, etc. without giving a construction or explaining how these are supposed to work.

The development is exactly parallel to that of ZF. We begin with $(i)$ a formal syntax and $(ii)$ an informal semantics, where we talk about sets, elements, etc.; we then $(iii)$ write a formal theory in our syntax meant to (more-or-less) capture those intuitions. Only steps $(i)$ and $(iii)$ are actually needed; step $(ii)$ is "just" providing motivation and intuition. Note that in setting up ZF, we use the word "set" without any formal system already defining it, that is, we use it as an intuitive indicator rather than a technical term.

Reverse mathematics develops similarly. Just as "set" and "element of" are primitive terms in ZF, RCA$_0$ (and similar systems) have as undefined terms "natural number," "set (of natural numbers)," "plus," "times," and "element of." We "come in" with various intuitions about how these should behave, and then write down a specific set of axioms which forms RCA$_0$.

This brings us to the point when you argue that we should have an "ambient set theory,"

since then we can add the suitable comprehension axioms without them being already "true" by the set theory axiomatic.

I think what's going on here is that you are objecting to the decision to include a sentence as an axiom of RCA$_0$ without first showing that it is true according to some larger system. But the theory is autonomous - we don't need to appeal to anything other than intuition when we set RCA$_0$ up than when we set ZF up. This is not to say that an ambient set theory is never useful, but rather that it's not necessary for the development of reverse mathematics.