How should one think about sheafification and the difference between a sheaf and a presheaf
There are two ways a presheaf can fail to be a sheaf.
- It has local sections that should patch together to give a global section, but don't,
- It has non-zero sections which are locally zero.
When dividing the problems into two classes, it is easy to see what sheafifying does. It adds the missing sections from the first problem, and it throws away the extra sections from the second problem.
The latter kind of sections tend to be easier to notice, but are less common. Usually, when a construction or functor must be sheafified, it has local sections that should patch together but don't.
A simple example of a presheaf with this property is the presheaf $F_{p=q}$ of continuous functions on the circle $S^1$ which have the same value at two distinct points $p,q\in S^1$. When I restrict to an open neighborhood of $p$ that doesn't have $q$, the condition on their values goes away. Because the same thing is true for open neighborhoods of $q$ which don't contain $p$, the condition on the functions in this presheaf has no effect on sufficiently small open sets. It follows that this presheaf is locally the same as the sheaf of continuous functions. Therefore, for any function on $S^1$ which has different values on $p$ and $q$, I can restrict it to an open cover where each local section is in $F_{p=q}$, but this function is not in $F_{p=q}$. This is why $F_{p=q}$ is not a sheaf.
When we sheafify, we just add in all these sections, to get the full sheaf of continuous functions. This is clear, because any two sheaves which agree locally are the same (though, I mean that the local sections and local restriction maps agree).
This example really does come up in examples. Consider the map $S^1\rightarrow \infty$, where $\infty$ is the topological space which is $S^1$ with $p$ and $q$ identified. If I pull back the sheaf of functions on $\infty$ in the naive way, the resulting presheaf on $S^1$ is $F_{p=q}$. To get a sheaf, we need to sheafify.
This functor does not respect colimits, so, loosely speaking, the way of glueing is not respected in this transition. Maybe, considering sheaves instead of presheaves is a way of repairing this failure.
Actually, that's not at all a bad way of thinking about it: in the classical case of sheaves on a topological space, sheafification of the Yoneda embedding preserves colimits by open covers. In the general case, one replaces "open cover" by "covering sieve".
Let me amplify on that, starting with the topological space case. If $U_i$ is a covering of $U$, then we have an "exact sequence"
$$\sum_{i, j} U_i \cap U_j \stackrel{\to}{\to} \sum_i U_i \stackrel{\pi}{\to} U$$
where the two parallel arrows constitute the kernel pair of $\pi$ and $\pi$ is the coequalizer of its own kernel pair (a regular epi, in the parlance). So $U$ is a colimit of $U_i$'s (either in $Top$ or in the topology as poset). Applying the Yoneda embedding to this colimit diagram, we are led to a sequence
$$\sum_{i, j} \hom(-, U_i \cap U_j) \stackrel{\to}{\to} \sum_i \hom(-, U_i) \to \hom(-, U)$$
which is clearly not exact. It is "left exact" in that the two parallel arrows are again the kernel pair of the arrow on the right (in part because the Yoneda embedding preserves pullbacks), but the arrow on the right is not the coequalizer of the kernel pair in the category of presheaves.
But a sheaf $F$ "thinks" this is exact, because when we hom the sequence above into $F$, we get a diagram
$$F(U) \to \prod_i F(U_i) \stackrel{\to}{\to} \prod_{i, j} F(U_i \cap U_j)$$
(where we have made use of the Yoneda lemma) which is exact by definition of sheaf. Since this is exact for all sheaves, it must be that the associated sheaf functor applied to the inexact sequence above is exact, in particular the associated sheaf functor applied to the arrow on the right is a coequalizer, as you suggested might happen.
The general situation for Grothendieck topologies is similar. If $U$ is an object of the site, a covering sieve $R$ of $U$,
$$R \hookrightarrow \hom(-, U),$$
can be expressed as a colimit of representables mapping into $\hom(-, U)$. A presheaf $F$ is a sheaf for the topology if it "thinks" that $\hom(-, U)$ is the colimit of the diagram, insofar as homming into $F$ induces an isomorphism
$$F(U) \to \hom_{PSh}(R, F)$$
by definition of sheaf. So the associated sheaf functor thinks that the colimit of the evident diagram $El(R) \to C \to PSh(C)$ in the category of presheaves is the representable $\hom(-, U)$, following the same reasoning as above.
Here is another instance of Greg Muller's first way a presheaf can fail to be a sheaf:
Let X be the plane, and let F(U) for any open set U be the set of all bounded continuous functions from U to $\mathbb{R}$. Then F is a presheaf but it is certainly not a sheaf - plenty of functions are locally bounded but not globally bounded! So you are not able to glue together little bounded pieces because you might end up with an unbounded function. What is the sheafification of this presheaf? It is just the sheaf which lets you glue those pieces together - i.e. the sheaf of all continuous real valued functions! You can check this using the universal property, or (I think it will be enlightening), if you go back to your favorite construction of sheafification and see that his is the functor you get.