What "metatheory" did early set theory/logic researchers use to prove semantic results?
I don’t know the history well enough for a full answer, but here is a partial answer, on the mathematical aspects. When you write:
It is clear that these researchers were not talking about using first-order ZFC as a metatheory […] And yet they were obviously talking about something. Did they have a different notion of semantics than the modern set-theoretic one?
and
The modern approach seems to be, usually, to interpret a "model" specifically as a set in some other (typically first-order) "set metatheory."
you seem to be following a somewhat common misconception: that one can’t do set-based semantics without having some set theory in mind as a metatheory.
But this isn’t the case! The fundamental definition of a (Tarskian) model is just as a set with certain extra structure — just like a group, or a ring, or similar. Not “a set in ZFC”, or “a set in NBG”, but just a set, which we can then reason about using whatever techniques and principles we use for mathematical reasoning in general.
Of course, in that reasoning, we’re likely to follow some established principles, like those justified by ZFC or NBG or some other specific theory. (Historically, such foundational theories were developed exactly to try to codify/justify the principles generally used and accepted.) And logicians are, for a variety of reasons, more likely than other mathematicians to be explicit about what principles they’re following in a particular piece of work. But fundamentally, you don’t need an explicit set-theoretic metatheory to study set-based semantics, any more than you need one to study groups or rings or Riemann surfaces.
As I said, I’m not especially well-read historically, but from the papers I’ve read from that period, my impression is mostly that most researchers in the period were using the modern (Tarskian) notion of semantics, and that some authors wrote explicitly about what sort of metatheory they were using, while others didn’t. But the lack of an explicit metatheory is not any failure of rigour or clarity in their notion of models — it’s normal mathematical practice, certainly of the time and at least arguably of today as well.
Peter LeFanu Lumsdaine has correctly remarked that one need not specify a precise set-theoretic metatheory in order to prove something like the completeness theorem. This remark is borne out if we look at, for example, Gödel's original papers.
Gödel's collected works have been published by Oxford University Press, and English translations are included. Regarding the completeness theorem, which was first published in "Über die Vollständigkeit des Loikkalküls," he makes the following remarks:
In conclusion, let me make a remark about the means of proof used in what follows. Concerning them, no restriction whatsoever has been made. In particular, essential use is made of the principle of the excluded middle for infinite collections (the nondenumerable infinite, however, is not used in the main proof).
He then goes into a rather extended defense of his decision to use the law of the excluded middle in his proof. Note in particular that he does not give a careful definition of the "metatheory" in which he is working.
In his famous 1931 paper on the incompleteness theorem, Gödel starts off by mentioning both Principia Mathematica (PM) and Zermelo–Fraenkel set theory, but soon narrows his focus to PM. He does mention that all the syntactic concepts that he uses in his proof are expressible within the system PM, but he does not belabor this point. After the main argument, he remarks that the proof is intuitionistically valid. Again, there is no precise description of the "metatheory" in which he is working.
As Andreas Blass pointed out, the meta-theory can be ordinary mathematics, at least in theory. In practice, without an explicit meta-theory, authority figures decide what is allowed, and what not. Tarski (like Cantor before him) learned this lesson the hard way, as can be read in accounts of Tarski's theorem about choice from 1924:
... when he tried to publish the theorem in Comptes Rendus de l'Académie des Sciences Paris, Fréchet and Lebesgue refused to present it. Fréchet wrote that an implication between two well known propositions is not a new result. Lebesgue wrote that an implication between two false propositions is of no interest.
It is no surprise that the modern notion of model and meta-theory are due to Tarski (and his colleague Robert Vaught) from 1956. But Tarksi already presented a "non-modern" notion of meta-theory in 1933, see the SEP entry on Tarski's Truth Definitions.
The issue is a little bit complex, but here is my own summary of the difference between those two notions for a start:
If I understood it correctly, for the 1933 version, the model (i.e. the algebraic structure about which we talk) is part of the meta-language and not mentioned separately. The assignment of objects to variables on the other hand is what can satisfy a given formula. A formula is (defined to be) true if it is satisfied by all possible assignments of objects to variables.
The 1956 version is treated less explicitly in the linked SEP entry, but it is hinted at that the model is no longer an implicit part of the meta-language, but an explicit object from set-theory. A model can satisfy a given formula (or sentence), similar to how an "assignment of objects to variables" could satisfy a given formula for the 1933 version. But the text also hints that the 1956 now relies stronger on an underlying set-theory, while the 1933 explicitly tried to minimize "the set-theoretic requirements of the truth definition".