Elementary topoi have initial objects, why?
I believe that you can construct the initial object elementarily in the following steps, with no knowledge about monadicity needed.
Recall that we can define for any object $A$ the universal quantification operator $\forall_A : \Omega^A \to \Omega$ as the classifying morphism of the subobject $1 \to \Omega^A$ with transpose given by $1 \times A \to 1 \xrightarrow{t} \Omega$, where $t$ is the universal monomorphism. (In $\mathrm{Set}$, for a family $f : A \to \Omega$ of truth values, $\forall_A(f)$ is the truth value of $\forall x \in A{:}\ f(x)$.)
In particular, we can construct the global element of $\Omega$ denoting falsity as the composition $1 \to \Omega^\Omega \xrightarrow{\forall_\Omega} \Omega$, where $1 \to \Omega^\Omega$ is the transpose of $\Omega \xrightarrow{\mathrm{id}} \Omega$. (In $\mathrm{Set}$, this global element is the truth value of $\forall \varphi \in \Omega{:}\ \varphi$, i.e. $\bot$.)
We then obtain the initial object as the subobject classified by this morphism.
To verify that the constructed object is indeed the initial object, you probably need to know the universal property of $\forall_A$. It is given, with full proof, in Theorem 13.3 of Thomas Streicher's beautiful notes on category theory and categorical logic.
I know of no quick and easy way to see it. However, Johnstone's Topos Theory has a reasonably nice discussion of why with Lemma 1.31, Theorem 1.34, and Corollary 1.36.
The thrust is that the functor $\Omega^-:\mathcal{E}^{op}\to\mathcal{E}$ in a topos is monadic, and therefore creates all limits. This ends up meaning that $\mathcal{E}^{op}$ has all finite limits because $\mathcal{E}$ does; but $\mathcal{E}^{op}$ has finite limits exactly when $\mathcal{E}$ has finite colimits.
Edit: To round out the answer a little more, I incorporate my comment that the usual proof for finite cocompleteness relies on all three axioms in an essential way. To apply Beck's theorem to $\Omega^-$ one must show mainly that $\mathcal{E}^{op}$ has coequalisers of reflexive pairs, and one gets these by the presence of equalisers in $\mathcal{E}$. One shows that these coequalisers are preserved by using the Beck-Chevalley condition for $\Omega^-$, which relies on the subobject classifier's universal property. And finally, if you don't have all finite limits, none of this gives you all finite colimits. It does, on reflection, seem that "has equalisers of coreflexive pairs" could replace the "finite limits" condition if all you want is an initial object, though.
Thus, if $\mathbf{Set}$ were to somehow lack an initial object, our interpretation of it as a category would need to either
- fail in Cartesian closure,
- fail to have some equalisers (and thus also fail in finite completeness),
- or fail to have a subobject classifier.
Any of these would be pretty hard to swallow, and someone who wanted to deny the empty set worked as an initial object would have the burden of explaining what the objects and morphisms are in their category of sets.
Outline: An explicit limit construction of the initial object is contained e.g. in Moerdijks-MacLane's Sheaves in Geometry and Topology. I will recall this construction and hopefully argue elementarily that it indeed describes an initial object. Elementary means that it can be formalized in the first-order theory of elementar topoi.
Construction: Let ${\mathscr E}$ be an elementary topos with power functor $P: {\mathscr E}\to {\mathscr E}^{\text{op}}$ and subobject classifier $\Omega=P1$ . Further, denote $\varepsilon_X: X\to P^2 X$ the map corresp. to the evaluation $X\times PX\to \Omega$ corresp. to the subobject $\in_X\in\text{Sub}(X\times PX)$, and $!: P^2 1 \to 1$ the unique morphism.
Claim: The equalizer of $\varepsilon_{\Omega}, P!: P1\rightrightarrows P^3 1$ is an initial object in ${\mathscr E}$.
Proof: Let $X\in {\mathscr E}$ be any object admitting a morphism $f: X\to P1=\Omega$ such that $$\varepsilon_{\Omega} \circ f = P!\circ f: X\to P^3 1.$$ I claim that $X$ is initial in ${\mathscr E}$. (Remark: This seems stronger than the claim, but it is not, since in any topos any morphism to the initial object is an isomorphism; actually, this is a side-product of our proof)
Denote $\chi: A\hookrightarrow X$ the subobject of $X$ corresponding to $f$ and consider the natural bijection $${\mathscr E}(X,P^3 1)\ \cong\ \text{Sub}_{\mathscr E}(X\times P^2 1=X\times P\Omega).$$ IfI didn't mix up things again, under this bijection the morphism $\varepsilon_\Omega\circ f$ corresponds to the pullback of $\in_\Omega$ along $\chi\times\text{id}: X\times P\Omega\to \Omega\times P\Omega$. On the other hand, the morphism $P!\circ f$ corresponds to $A\times P\Omega\stackrel{\chi\times\text{id}}{\to} X\times P\Omega$. Intuitively, the first subobject contains those $(x,T)$ where the set $T$ of truth values contains the truth value of the statement that $x\in X$ belongs to $T$; the second contains those $(a,T)$ for $a\in A$ and any set of truth values $T$.
To see that this forces $X=0$, consider first a third subobject of $X\times P\Omega$, namely the graph of $X\to \Omega\to P\Omega$, where $\Omega\to P\Omega$ is singleton map. Intuitively, it's the set of $(x,\{\chi(x)\})$, i.e. the set $T$ contains exactly the truth value of $x$ belonging to $A$. This subobject is contained in the subobject corresponding to the l.h.s., hence also in $A\times P\Omega$. Projecting on the first factor reveals $A=X$, so both subobjects are in fact the whole of $X\times P\Omega$. Appealing to the intuitive description of the l.h.s. again, this would mean that in the 'context of $X$', any set of truth values contains $1$.
Let's try to formalize this as saying that ${\mathscr E}/_X$ is trivial: Revisiting the formal description of the l.h.s. subobject of $X\times P\Omega$ again for $A=X$, we see that the map $X\times P\Omega\xrightarrow{\text{eval}_1\circ\pi_{P\Omega}} \Omega$ factors through $1\to\Omega$. Restricting to singleton-sets $s: \Omega\hookrightarrow P\Omega$ and using that $\text{eval}_1\circ s = \text{id}_{\Omega}$, we see that $X\times 1\to X\times \Omega$ is an isomorphism.
This map however is the universal monic in the topos ${\mathscr E}/_X$ - this being an isomorphism implies that the only subobjects are the identities, hence ${\mathscr E}/_X$ is trivial. Finally, the triviality of ${\mathscr E}/_X$ implies that $X$ is initial: if $f,g: X\rightrightarrows Y$ are two parallel arrows, then their equalizer $K\to X$ can be regarded a morphism in ${\mathscr E}/_X$, so is an isomorphism, so $K=X$ and $f=g$.
Edit: To show that there exists a morphism $X\to Y$ for any $Y$, note that $X\times Y\to X$ can be viewed as a morphism in ${\mathscr E}/_X$, hence is an isomorphism, and its inverse gives rise to a morphism $X\to Y$.
Note that these argument can be made explicit in the first-order theory of topoi, avoiding 'stepping outside' of ${\mathscr E}$.