Categorical foundations without set theory
On the subject of categorical versus set-theoretic foundations there is too much complicated discussion about structure that misses the essential point about whether "collections" are necessary.
It doesn't matter exactly what your personal list of mathematical requirements may be -- rings, the category of them, fibrations, 2-categories or whatever -- developing the appropriate foundational system for it is just a matter of "programming", once you understand the general setting.
The crucial issue is whether you are taken in by the Great Set-Theoretic Swindle that mathematics depends on collections (completed infinities). (I am sorry that it is necessary to use strong language here in order to flag the fact that I reject a widely held but mistaken opinion.)
Set theory as a purported foundation for mathematics does not and cannot turn collections into objects. It just axiomatises some of the intuitions about how we would like to handle collections, based on the relationship called "inhabits" (eg "Paul inhabits London", "3 inhabits N"). This binary relation, written $\epsilon$, is formalised using first order predicate calculus, usually with just one sort, the universe of sets. The familiar axioms of (whichever) set theory are formulae in first order predicate calculus together with $\epsilon$.
(There are better and more modern ways of capturing the intuitions about collections, based on the whole of the 20th century's experience of algebra and other subjects, for example using pretoposes and arithmetic universes, but they would be a technical distraction from the main foundational issue.)
Lawvere's "Elementary Theory of the Category of Sets" axiomatises some of the intuitions about the category of sets, using the same methodology. Now there are two sorts (the members of one are called "objects" or "sets" and of the other "morphisms" or "functions"). The axioms of a category or of an elementary topos are formulae in first order predicate calculus together with domain, codomain, identity and composition.
Set theorists claim that this use of category theory for foundations depends on prior use of set theory, on the grounds that you need to start with "the collection of objects" and "the collection of morphisms". Curiously, they think that their own approach is immune to the same criticism.
I would like to make it clear that I do NOT share this view of Lawvere's.
Prior to 1870 completed infinities were considered to be nonsense.
When you learned arithmetic at primary school, you learned some rules that said that, when you had certain symbols on the page in front of you, such as "5+7", you could add certain other symbols, in this case "=12". If you followed the rules correctly, the teacher gave you a gold star, but if you broke them you were told off.
Maybe you learned another set of rules about how you could add lines and circles to a geometrical figure ("Euclidean geometry"). Or another one involving "integration by parts". And so on. NEVER was there a "completed infinity".
Whilst the mainstream of pure mathematics allowed itself to be seduced by completed infinities in set theory, symbolic logic continued and continues to formulate systems of rules that permit certain additions to be made to arrays of characters written on a page. There are many different systems -- the point of my opening paragraph is that you can design your own system to meet your own mathematical requirements -- but a certain degree of uniformity has been achieved in the way that they are presented.
We need an inexhaustible supply of VARIABLES for which we may substitute.
There are FUNCTION SYMBOLS that form terms from variables and other terms.
There are BASE TYPES such as 0 and N, and CONSTRUCTORS for forming new types, such as $\times$, $+$, $/$, $\to$, ....
There are TRUTH VALUES ($\bot$ and $\top$), RELATION SYMBOLS ($=$) and CONNECTIVES and QUANTIFIERS for forming new predicates.
Each variable has a type, formation of terms and predicates must respect certain typing rules, and each formation, equality or assertion of a predicate is made in the CONTEXT of certain type-assignments and assumptions.
There are RULES for asserting equations, predicates, etc.
We can, for example, formulate ZERMELO TYPE THEORY in this style. It has type-constructors called powerset and {x:X|p(x)} and a relation-symbol called $\epsilon$. Obviously I am not going to write out all of the details here, but it is not difficult to make this agree with what ordinary mathematicians call "set theory" and is adequate for most of their requirements
Alternatively, one can formulate the theory of an elementary topos is this style, or any other categorical structure that you require. Then a "ring" is a type together with some morphisms for which certain equations are provable.
If you want to talk about "the category of sets" or "the category of rings" WITHIN your tpe theory then this can be done by adding types known as "universes", terms that give names to objects in the internal category of sets and a dependent type that provides a way of externalising the internal sets.
So, although the methodology is the one that is practised by type theorists, it can equally well be used for category theory and the traditional purposes of pure mathematics. (In fact, it is better to formalise a type theory such as my "Zermelo type theory" and then use a uniform construction to turn it into a category such as a topos. This is easier because the associativity of composition is awkward to handle in a recursive setting. However, this is a technical footnote.)
A lot of these ideas are covered in my book "Practical Foundations of Mathematics" (CUP 1999), http://www.PaulTaylor.EU/Practical-Foundations Since writing the book I have written things in a more type-theoretic than categorical style, but they are equivalent. My programme called "Abstract Stone Duality", http://www.PaulTaylor.EU/ASD is an example of the methodology above, but far more radical than the context of this question in its rejection of set theory, ie I see toposes as being just as bad.
Since Tom Leinster queries my reference to actual/completed versus potential/incomplete infinities, maybe we should ask a philosopher whether I am using these terms in the standard way.
In any case, I am not doing metaphysics. I am just describing the way in which it appears to me that mathematicians actually work, in contrast to the way they say they work because they have been trained to say such things. When you compare my remarks with the others on this page, please note that they are based on thinking about these things for myself over 25 years, originally from a categorical perspective but increasingly influenced by symbolic logic, and not on reciting bits of textbooks.
To do ordinary arithmetic, you may need very (arbitrarily) large numbers, but you don't all of them together. So far as I can gather from history, mathematicians up to the mid-19th century managed very well to deal with things in this way, for example defining functions as expressions.
Post-Cantor, 20th century mathematicians got into the habit of introducing the completed infinity before the structure. For example, we say "a group is a set with...", relegating the essence of symmetry to second place. This is like saying that humanity is a collection of pieces of flesh, onto which faces are painted as an afterthought.
Categorists, being part of the pure mathematical culture, did the same thing, in the vast majority of cases with great profit. However, when it comes to foundations, treating the universe first as a completed infinity (and only afterwards containing products, function-spaces, powersets or whatever other structure you require) inevitably leads into the set-theoretic trap.
By contrast, type theoretic methods build up the universe by means of the actual operations that you actually want to consider, just as the symmetry group of the Rubik cube is built up from individual rotations. Moreover, despite the fact that type theory looks completely different from category theory or algebra, it is an accurate underpinning of the actual methods of reasoning of mathematics. See, for example, my discussion of the idiom "there exists" in my book.
This is not dogmatic Finitism or Logicism and is readily adaptable to considering the object $\bf N$ along with individual natural numbers, an internal category $\bf Set$ along with individual types, and so on.
Now let me consider the other approaches to this question.
First order logic. This was the first usable general technique in mathematical logic. Like other disciplines, it starts with the completed infinity and adds properties to it. Does it presuppose a set theory? Well, yes, in the same sense that a boot-loader presupposes a primitive operating system. I would be more convinced that first-order logic is independent of set theory if there were a branch of model theory that had examples of structures whose carriers were topological spaces or algebraic varieties.
In fact, first order logic can be set up in the type-theoretic way that I have described above. But if you're going to do that, you may as well set up the type theory that you actually want to use.
If we're looking for a metalanguage specificaly for categorical logic (say, in which to construct toposes) then first order logic is not the right structure. It is easy to describe an internal category in a category with all finite finits, and, by adding more diagrams, we can talk about internal toposes too. However, it's much more interesting to consider free internal structures, for which we need an arithmetic universe, although unfortunately there is next to zero literature on this topic.
Fibrations, 2-categories, etc. None of what I have said contradicts the use of these categorical techniques. I personally consider that fibrations, and especially hyperdoctrines, are obfuscation, but other people find them useful. However, they organise the world, but they do not bring it into existence, which was the thrust of the original question.
Yes, you can do category theory without set theory. A category can easily be expressed in first order logic, like metacategories in MacLane's book. However, there are many concepts in category theory which require notions of small collections vs large. There is a way to reason with these entirely in category theory, and for this you need (categorical) fibrations, or similarly indexed categories.
You need a base category, say B, which will act like a category of sets. You probably need it to satisfy certain properties, like pullbacks/cartesian-closed/topos... depending on how much you want to do. A fibration is a functor $p:C \to B$ which satisfies certain properties. Objects of C are like collections of objects which are indexed over objects of B.
The standard example is p:Fam(Set)$\to$ Set, where Fam(Set) are families of sets $\lbrace S_i\rbrace_{i\in \Lambda}$, where $p$ sends this family to $\Lambda$.
Small categories over the base category are easy to handle using internal category theory, if your base category has pullbacks. These are just pairs of objects (Obj, Mor) in B along with various morphisms like head:Mor $\to$ Obj expressing the categorical properties.
Now for large categories you will need the concept of locally small categories over the base. For this we can use locally internal categories, which are special fibrations which are also enriched in B. In these you can talk about collections of morphisms, indexed by objects in B. Concepts like complete and cocomplete can be defined for locally internal categories.
With all of this machinery you can prove that indexed adjoint functor theorem and various other theorems in category theory.
Once you have an axiom stating that you have a sufficiently nice base category you do large parts of category theory without mentioning sets. With NBG you can use the category of sets instead of an axiom.
For a article on fibrations see "Fibered Categories and the Foundations of Naive Category Theory" by Bénabou. Johnstone's "Sketches of an Elephant" vol1 has a few chapters on locally internal categories.