Heuristic behind the Fourier-Mukai transform
First, recall the classical Fourier transform. It's something like this: Take a function $f(x)$, and then the Fourier transform is the function $g(y) := \int f(x)e^{2\pi i xy} dx$. I really know almost nothing about the classical Fourier transform, but one of the main points is that the Fourier transform is supposed to be an invertible operation.
The Fourier-Mukai transform in algebraic geometry gets its name because it at least superficially resembles the classical Fourier transform. (And of course because it was studied by Mukai.) Let me give a rough picture of the Fourier-Mukai transform and how it resembles the classical situation.
Take two varieties $X$ and $Y$, and a sheaf $\mathcal{P}$ on $X \times Y$. The sheaf $\mathcal{P}$ is sometimes called the "integral kernel". Take a sheaf $\mathcal{F}$ on $X$. Think of $\mathcal{F}$ as being analogous to the function $f(x)$ in the classical situation. Think of $\mathcal{P}$ as being analogous to, in the classical situation, some function of $x$ and $y$.
Now pull the sheaf back along the projection $p_1 : X \times Y \to X$. Think of the pullback $p_1^\ast \mathcal{F}$ as being analogous to the function $F(x,y) := f(x)$. Think of $\mathcal{P}$ as being analogous to the function $e^{2\pi i xy}$ (but maybe not exactly, see below).
Next, take the tensor product $p_1^\ast \mathcal{F} \otimes \mathcal{P}$. This is analogous to the function $F(x,y) e^{2\pi i xy}$ $=$ $f(x)e^{2\pi i xy}$.
Finally, push $p_1^\ast\mathcal{F} \otimes \mathcal{P}$ down along the projection $p_2: X \times Y \to Y$. The result is the Fourier-Mukai transform of $\mathcal{F}$ --- it is $p_{2,\ast} (p_1^\ast \mathcal{F} \otimes \mathcal{P})$. This last pushforward step can be thought of as "integration along the fiber" --- here the fiber direction is the $X$ direction. So the analogous thing in the classical situation is $g(y) = \int f(x)e^{2\pi i xy}dx$ --- the Fourier transform of $f(x)$!
But to make all of this actually work out, we have to actually use the derived pushforward, not just the pushforward. And so we have to work with the derived categories.
When $X$ is an abelian variety, $Y$ is the dual abelian variety, and $\mathcal{P}$ is the so-called Poincare line bundle on $X \times Y$, then the Fourier-Mukai transform gives an equivalence of the derived category of coherent sheaves on $X$ with the derived category of coherent sheaves on $Y$. I think this was proven by Mukai. I think this is supposed to be analogous to the statement I made about the classical Fourier transform being invertible. In other words I think the Poincare line bundle is really supposed to be analogous to the function $e^{2\pi i xy}$. A more general choice of $\mathcal{P}$ corresponds to, in the classical situation, so-called integral transforms, which have been previously discussed here. This is probably why $\mathcal{P}$ is called the integral kernel. You may also be interested in reading about Pontryagin duality, which is a version of the Fourier transform for locally compact abelian topological groups --- this is obviously quite similar, at least superficially, to Mukai's result about abelian varieties. However I don't know enough to say anything more than that.
There are some cool theorems of Orlov, I forget the precise statements (but you can probably easily find them in any of the books suggested so far), which say that in certain cases any derived equivalence is induced by a Fourier-Mukai transform. Note that the converse is not true: some random Fourier-Mukai transform (i.e. some random choice of the sheaf $\mathcal{P}$) is probably not a derived equivalence.
I think Huybrechts' book "Fourier-Mukai transforms in algebraic geometry" is a good book to look at.
Edit: I hope this gives you a better idea of what is going on, though I have to admit that I don't know of any good heuristic idea behind, e.g., Mukai's result --- it is analogous to the Fourier transform and to Pontryagin duality, and thus I suppose we can apply whatever heuristic ideas we have about the Fourier transform to the Fourier-Mukai transform --- but I don't know of any heuristic ideas that explain the Fourier-Mukai transform in a direct way, without appealing to any analogies to things that are outside of algebraic geometry proper. Hopefully somebody else can say something about that.
But --- there is certainly something deep going on. Just as CommRing behaves a lot like Setop, I think there is probably some kind of general phenomenon that sheaves (or vector bundles) behave a lot like functions, which is what's happening here. Pullback of sheaves behave a lot like pullback of functions... Pushforward of sheaves behave a lot like integration of functions... Tensor product of sheaves behave a lot like multiplication of functions...
Just a complement to the answer of Kevin Lin.
There is a case where the analogy between sheaves and functions is more than analogy : the case of varieties over finite field. More precisely, if $X$ is a variety over $\mathbb{F}_p$ and $F$ is a $l$-adic constructible sheaf on $X$, one can associate to $F$ a function (in a set theoretic sense) over the set of $\mathbb{F}_p$ points of $X$ by mapping $x$ to the trace of the Frobenius acting on the fiber of $F$ at $x$. This defines a correspondence sheaf-function compatible with all the analogies cited by Kevin.
If we fix a character of $\mathbb{F}_p$ then one have the usual Fourier transform for functions over $\mathbb{F}_p$. One can ask for an analogue for the $l$-adic sheaves over the affine space $\mathbb{A}^n$. It exists, it is the Fourier-Deligne transform. The fact that the function associated to the Fourier-Deligne transform of a sheaf is the (usual) Fourier transform of the function associated to the sheaf is a consequence of the Grothendieck trace formula.
In fact, the Fourier-Deligne transform is a Fourier-Mukai transform for the derived category of $l$-adic constructible sheaf on $\mathbb{A}^n$ ! Ok, when one speaks about Fourier-Mukai, one usually thinks about complex algebraic geometry and categories of coherent sheaves but I think that to have the above situation in mind, where we really have a sheaf/function dictionnary, can be useful. This dictionnary was one of the motivation for the formulation of the geometric Langlands program (see some expository articles of Frenkel for example).
You may want to look at Tom Bridgeland's PhD thesis.