Intuition behind Cantor-Bernstein-Schröder
$\newcommand{\ran}{\operatorname{ran}}$Here’s one way to think about it. Suppose that $y\in\ran f$; then we can pull $y$ back to $f^{-1}(y)\in X$. If $f^{-1}(y)\in\ran g$, we can pull it back to $g^{-1}(f^{-1}(y))\in Y$. If we continue this pulling back, one of two things must happen: either we reach a dead end at a point of $X$ or $Y$ that can’t be pulled back (because it’s in $Y\setminus\ran f$ or $X\setminus\ran g$), or we don’t.
Let $X_0=X\setminus\ran g$, the set of points of $X$ that cannot be pulled back at all, and let $Y_0=Y\setminus\ran f$. More generally, for each $n\in\omega$ let $X_n$ be the set of points of $X$ that can be pulled back exactly $n$ times, and let $Y_n$ be the set of points of $Y$ that can be pulled back exactly $n$ times. Finally, let $X_\omega$ and $Y_\omega$ by the subsets of $X$ and $Y$, respectively whose points can be pulled back infinitely many times.
At this point a sketch helps; it should show the partitions $\{X_n:n\le\omega\}$ of $X$ and $\{Y_n:n\le\omega\}$ of $Y$, and it should include arrows indicating what parts of $X$ get mapped to what parts of $Y$ and vice versa. To avoid having arrows crossing, I’ve taken $X$ and $Y$ apart in the following diagram.
$$\begin{array}{} X_0&\overset{f}\longrightarrow&Y_1&\overset{g}\longrightarrow&X_2&\overset{f}\longrightarrow& Y_3&\overset{g}\longrightarrow&X_4&\dots&X_\omega\\ Y_0&\overset{g}\longrightarrow&X_1&\overset{f}\longrightarrow&Y_2&\overset{g}\longrightarrow&X_3&\overset{f}\longrightarrow&Y_4&\dots&Y_\omega \end{array}$$
Each of the arrows is a bijection, so I can break up the diagram into $\omega$ self-contained parts. The first two parts are:
$$\begin{array}{} X_0&\overset{f}\longrightarrow&Y_1\\ Y_0&\overset{g}\longrightarrow&X_1 \end{array}\qquad \begin{array}{} X_2&\overset{f}\longrightarrow&Y_3\\ Y_2&\overset{g}\longrightarrow&X_3 \end{array}$$
Ignoring $X_\omega$ and $Y_\omega$ for the moment, I can rearrange the rest of the diagram to give my a bijection from $X\setminus X_\omega$ to $Y\setminus Y_\omega$:
$$\begin{array}{ccc} X_0&\overset{f}\longrightarrow&Y_1\\ X_1&\overset{g^{-1}}\longrightarrow&Y_0\\ X_2&\overset{f}\longrightarrow&Y_3\\ X_3&\overset{g^{-1}}\longrightarrow&Y_2\\ \vdots&\vdots&\vdots\\ X_{2k}&\overset{f}\longrightarrow&Y_{2k+1}\\ X_{2k+1}&\overset{g^{-1}}\longrightarrow&Y_{2k}\\ \vdots&\vdots&\vdots \end{array}$$
Finally, I claim that $f[X_\omega]=Y_\omega$: everything in $X_\omega$ can be pulled back infinitely often, so everything in $f[X_\omega]$ can be pulled back infinitely often, and therefore $f[X_\omega]\subseteq Y_\omega$. On the other hand, if $y\in Y_\omega$, then $y$ can be pulled back infinitely often, so it must be possible to pull $f^{-1}(y)$ back infinitely often, and therefore $f^{-1}(y)\in X_\omega$. Thus, $Y_\omega\subseteq f[X_\omega]$ as well. The diagram above can now be completed to show a bijection from $X$ onto $Y$:
$$\begin{array}{ccc} X_0&\overset{f}\longrightarrow&Y_1\\ X_1&\overset{g^{-1}}\longrightarrow&Y_0\\ X_2&\overset{f}\longrightarrow&Y_3\\ X_3&\overset{g^{-1}}\longrightarrow&Y_2\\ \vdots&\vdots&\vdots\\ X_{2k}&\overset{f}\longrightarrow&Y_{2k+1}\\ X_{2k+1}&\overset{g^{-1}}\longrightarrow&Y_{2k}\\ \vdots&\vdots&\vdots\\ X_\omega&\overset{f}\longrightarrow&Y_\omega \end{array}$$
The bijection is defined piecewise, but that’s no problem.
There are a few details to be filled in to make this a fully rigorous proof, but I think that it does give a reasonable idea of one possible intuition.
Added: Here’s a very rough sketch. Arrows from left to right are (parts of) $f$, and arrows from right to left are (parts of) $g$.
It is a reasonably long and seemingly complicated proof, but actually the idea is very simple (or at least it is in the proof I've seen). We have injections $f:X\to Y$ and $g:Y\to X$. To build a bijection $h:X\to Y$ we need to send each point in $X$ to a unique point in $Y$, making sure we hit everything.
We can start by saying $h(x) = f(x)$. But of course, this doesn't hit everything if f is not surjective. But , for each element $y$ we don't hit in $Y$ (i.e. the set ${Y\setminus{f(X)}}$) there is a unique element $g(y)$ in $X$ which we can make map to $y$. So we edit $h$ such that now $$h(x) = \begin{cases} g^{-1}(x) & \mbox{if } g^{-1}(x) \notin f(X) \\ f(x) & \mbox{otherwise} \end{cases}$$
So this time we hit everything we didn't get first time from the $g^{-1}(x) $ part. But now we're missing some other parts! (namely the values $f(x)$ where $x \in g(Y\setminus f(X))$ but $f(x) \notin Y\setminus f(X)$.) We can define the sets that we "miss" in each iteration of improving $h$ as $C_n$ and say $C = \displaystyle \bigcup_1^{\infty}C_n$. Then defining $$h(x) = \begin{cases} g^{-1}(x) &\mbox{if } g^{-1}(x) \in C \\ f(x) &\mbox{otherwise} \end{cases}$$
And by thinking of how we built it up, it kind of makes a bit more sense as to why it is a bijection.
Please note that this isn't the most efficient way to carry out the proof, but I explained it in the way that I would come up with it whilst trying to see why it works, rather than what I'd write down formally. Also please note it's been a long time since I've seen the proof and I may have gotten some of the specifics wrong (please anyone correct me if this is the case!), but this is the right general idea (I hope!)
My favorite proof is due to R. H. Cox, and I learned it from the wonderful Handbook of Analysis and its Foundations by Schechter, from which I've taken the illustration below. The basic idea is that an injection from $Y$ to $X$ identifies $Y$ with a subset of $X$. So we can reduce the problem to showing that if we have $Y\subseteq X$ and $f:X\to Y$ is an injection, then $|X|=|Y|$ and this is less confusing. let $C=X\backslash Y$. The sequence of sets $\big(f^n(C))$ is disjoint since $f(X)\subseteq Y$ and $f$ is injective. Let $S=\bigcup_{n=0}^\infty f^n(C)$. Define $g:X\to Y$ by
$$g(x) = \begin{cases} f(x)&\mbox{if } x\in S \\ x & \mbox{if }x\notin S. \end{cases}$$
It is easily verified that $g$ is a bijection.
Remark: According to Kunen's book on the foundations of mathematics, Dedekind already proved this version of the Cantor-Bernstein theorem in 1887.