Why is the number of irreducible components upper semicontinuous in nice situations?
Here is an alternative to Count Dracula's (correct) argument that emphasizes instead the constancy of the Hilbert polynomial for a flat family of projective schemes.
As above, assume that $Y$ is a DVR. For one fixed irreducible component $Z_{\eta}$ of $X_\eta$ of minimal dimension $d$, denote by $Z$ the Zariski closure of $Z_{\eta}$ in $X$ together with its closed immersion $u:Z\to X$. Denote by $W_{\eta}$ the union of all other irreducible components of $X_{\eta}$, and denote by $v:W\to X$ the closure of $W_\eta$ in $X$. By construction, both $Z$ and $W$ are flat over $Y$, since every associated point is a generic point of $Z_\eta$, resp. $W_\eta$.
The closed immersion $u$ and $v$ determine an associated morphism of $\mathcal{O}_X$-modules, $$(u^\#, v^\#):\mathcal{O}_X \to u_*\mathcal{O}_Z \oplus v_*\mathcal{O}_W.$$ The restriction of $(u^\#,v^\#)$ on $X_\eta$ is injective. Thus the kernel of $(u^\#,v^\#)$ is a subsheaf of $\mathcal{O}_X$ that is torsion for $\mathcal{O}_Y$. Since $\mathcal{O}_X$ is flat over $\mathcal{O}_Y$, the kernel of $(u^\#,v^\#)$ is the zero sheaf.
Denote the quotient of $(u^\#,v^\#)$ by $\mathcal{Q}$. The restriction of $\mathcal{Q}$ on $X_\eta$ has support whose dimension is strictly smaller than the dimension of any irreducible component of $X_\eta$. In particular, the Hilbert polynomial of $\mathcal{O}_{X_\eta}$ agrees with the Hilbert polynomial of $\mathcal{O}_{Z_\eta}\oplus \mathcal{O}_{W_\eta}$ modulo the subspace of numerical polynomials of degree strictly less than $d = \text{dim}(Z_\eta)$.
Now consider the restriction $(u_0^\#,v_0^\#)$ of $(u^\#,v^\#)$ to the closed fiber $X_0$. By the flatness hypothesis, the Hilbert polynomials of the domain and target of this homomorphism equal the Hilbert polynomials on the generic fiber. Thus the difference of these Hilbert polynomials on $X_0$ equals the difference of these polynomials on $X_\eta$, and we know that this difference is a polynomial of degree strictly less than $d$. The cokernel of $(u_0^\#,v_0^\#)$ equals the restriction $\mathcal{Q}_0$. If the induced morphism $\mathcal{O}_{Z_0}\to \mathcal{Q}_0$ is nonzero at some generic point of $Z_0$, then the support of $\mathcal{Q}_0$ has an irreducible component of dimension $\geq d$. Thus the Hilbert polynomial of $\mathcal{Q}_0$ has degree $\geq d$. Since the difference polynomial has degree strictly less than $d$, the kernel of $(u_0^\#,v_0^\#)$ has Hilbert polynomial of degree $\geq d$ counterbalancing the Hilbert polynomial of $\mathcal{Q}_0$. In particular, the kernel of $(u_0^\#,v_0^\#)$ is not zero.
By the flatness hypothesis, every associated point of $X$ is contained in $X_\eta$. Thus, every generic point $\xi$ of $X_0$ is the specialization of a generic point that is either in $Z$ or in $W$. Thus the localization $\mathcal{O}_{X_0,\xi}$ either factors through $\mathcal{O}_{Z_0,\xi}$ or factors through $\mathcal{O}_{W,\xi}$. Therefore the kernel of $(u_0^\#,v_0^\#)$ is in the kernel of the localization at every generic point of $X_0$. Since the kernel is nonzero, $X_0$ has embedded associated points, contradicting the hypothesis that $X_0$ is geometrically reduced. Therefore, by way of contradiction, the support of $\mathcal{Q}_0$ does not contain $Z_0$. So $Z_0$ is not contained in $W_0$.
Now we continue by induction on the number of irreducible components, replacing $X$ by $W$.
OK, the comment of nfdc23 reduces the question to the case where the base is a discrete valuation ring. I also agree with what she says about closures, but I think there is a small part missing: why is the closure of an irreducible component of $X_\eta$ not contained in the closure of another irreducible component? For example, why can't it happen that $X_\eta$ is the union of a threefold and a point and $X_0$ just a threefold? (I suggest keeping this example in mind when reading below.)
I'm sure there is an easy solution to this, but I find it fun to deduce this from a result of Hartshorne about connectedness of punctured spectra. Namely, let $A, B \subset X$ be closures of irreducible components of $X_\eta$. (By the way, you can always first make a finite extension of the base dvr to make sure that the irreducible components of the generic fibre are geometrically irreducible.) Assume that $A_0 \subset B_0$ to get a contradiction. Let $x \in A_0$ be a generic point of an irreducible component. Consider the local ring $O_{X, x}$.
Case I. $\dim(O_{X, x}) = 1$. In this case $O_{X, x}$ is a dvr because $X_0$ is reduced. In this case it is clear that there is a unique point of $X_\eta$ specializing to $x$ and we get our desired contradiction.
Case II. $\dim(O_{X, x}) \geq 2$. Because $X_0$ is reduced we see that $O_{X, x}/\pi$ has depth at least $1$ where $\pi$ is the uniformizer of the base dvr $R$. Then Hartshorne's connectedness result shows that the punctured spectrum $U$ of $O_{X, x}$ is connected. But the generic point of $A$ is an isolated point of $U$ which is a contradiction unless the generic point of $B$ is the generic point of $A$ and we win.
An expanded version of nfdc23's answer in the comments to this question can be found at http://arxiv.org/pdf/1601.05840v1.pdf, Proposition 2.9. An even more expanded version can be found at http://arxiv.org/pdf/1605.01117v1.pdf, Proposition 3.2.5.