What's an example of a subcategory whose closure under colimits takes a lot of steps to form by iteration?
An example where you need a proper class of steps is 6.38 in my book with Adámek. Concerning the last question, the answer is positive under Vopěnka's principle. The reason is that the colimit closure of $\mathcal S$ is locally presentable (see 6.28 and 6.29 in the same book) and thus it has a small dense subcategory. This argument works for $\mathcal S$ large as well.
Let me just give a community wiki exposition of the example 6.38 that Professor Rosický mentions from his book with Adámek, Locally Presentable and Accessible Categories, since although the book is quite clear, it doesn't quite reach the conclusion needed here, since the point of the example is slightly different.
Consider structures $X$ with partial unary operations indexed by the ordinals $\{\alpha_i\}_{i \in \mathrm{Ord}}$ such that the following condition is satisfied:
For every $j \in \mathrm{Ord}$ and $x \in X$, the value $\alpha_j(x)$ is defined if and only if for every $i<j$, $\alpha_i(x) = x$.
(So for each $x \in X$, there is an initial segment of the ordinals where $\alpha_i(x)$ is defined but just has $\alpha_i(x) = x$, and then at the top there is an ordinal $j$ where $\alpha_j(x)$ is defined but $\alpha_j(x) \neq x$; thereafter $\alpha_i(x)$ is not defined for bigger $i$ (actually there is an alternative which won't be relevant for the example: we could have $\alpha_i(x) = x$ for all $i \in \mathrm{Ord}$). The ordinal $j$ can vary with $x$.)
We let $\mathcal{C}$ be the category of all such structures, and all homomorphisms (i.e. a homomorphism is a function $f: X \to Y$ such that if $x\in X$ and $\alpha_i(x)$ is defined, then $\alpha_i(f(x))$ is defined and $\alpha_i(f(x)) = f(\alpha_i(x))$). Then $\mathcal{C}$ is cocomplete. Coproducts are just disjoint unions. Coequalizers are a little bit tricky (but this is sort of the point). To form the coequalizer of $f,g : X \overset{\to}{\to} Y$, first take the coequalizer in $\mathsf{Set}$, defining the operations in the only possible way. Then we may need to add in more points, because it may be the case that $\alpha_i([y]) = [y]$ for all $i< j$, but nevertheless there is no $y \in [y]$ such that $\alpha_j(y)$ is defined. In this case, we simply freely add a point $[y]_0$ with $\alpha_j([y]) = [y]_0$. Then we need to throw in more points $([y]_n)_{n \in \mathbb{N}}$ with $\alpha_0([y]_n) = [y]_{n+1}$.
For $j \in \mathrm{Ord}$, let $\mathcal{C}_j \subset \mathcal{C}$ denote the subcategory where all operations $\alpha_i$ are undefined for $i \geq j$. The point which is implicit in the book is that if $F: \mathcal{D} \to \mathcal{C}_j$ is a diagram in $\mathcal{C}_j$, then $\mathrm{colim} F \in \mathcal{C}_{j+1}$. After all, $\mathrm{colim} F$ is a coequalizer of coproducts of objects of $\mathcal{C}_j$ -- the coproducts don't introduce any new operations, and we can explicitly analyze the construction of coequalizers to see that we at worst added operations of one index higher than the sup of those already appearing.
Now Adámek and Rosický take $\mathcal{S}$ to be a certain singleton category $\{A\} \subset \mathcal{C}_1$ (namely, $A$ has underlying set $\mathbb{N}$ with $\alpha_0$ the successor operation), and show that at each step, $\mathrm{colim}^\mathcal{C}_j(\mathcal{S})$ does indeed add a new object $A_j \in \mathcal{C}_{j+1}$ (the underlying set is still $\mathbb{N}$, with successor operation, but at $0 \in \mathbb{N}$ the successor is delayed until the $j$th operation). I think this is quite clearly explained in the book. (Although I think there is a slight error -- the initial step of the induction should simply point out that $A = A_0$. The coequalizer of the indicated diagram is actually $A_1$.)
I wish I could give a reference but in 1965 John Isbell claimed that the limit completion of the 2 element group (considered as a one object category) requires a proper class of steps, but I didn't understand his argument and I don't believe he ever published it.