What is spectral leakage?
If you pass a signal trough a linear system, the output will have not any frequency that was not already present at the input.
However, if the system is not linear, the output will have new frequencies not present in the original input. These new frequencies are the spectral leakage.
Some common examples:
a) when evaluating the fourier transform of an infinite (or large) signal, not all of the signal can be considered (or not all signal wants to be considered: if you are interested in how the spectrum changes over time, you will cut repeatedly the signal, always taking the last part). The signal is cut using a window. This step is not linear (is multiplicative by a function, the window) and will cause the creation of artificial new frequencies not present at the real input. That distorts the measurement.
b) when you pass a signal $x(t)=cos(w_{a}t)$ over a system with output $y(t)=Ax(t)+Bx^2(t)$ (any amplifier has some non-linear part), the output signal will have components at frequencies $w=0$, $w=w_a$ (input) and $w=2w_a$. (convolution of the original spectrum with itself). In particular, leakage at w=0 can be problematic.
Even when "leakage" is usually used as a negative term, creation of new frequencies is sometimes the target. An example is a radio modulator (moves the base signal to new radiofrequencies).
Returning to case of the spectrum analyzer, the Fourier transform, leakage causes a conflict of interests: it is expected to display the fft of the "current" input, taking only the last part of it. However, this is a very sharpened window that causes broad leakage. A wider window causes less distortion, but uses older signal data.
The best window form is the rectangular one, in that it causes the least distortion, but it has several problems. The greatest of these is that it is imaginary - it does not exist in the real analogic world.
Detailed analysis of some examples.
1) electrical device, non-linear, memoryless: this kind of device (resistor, diode, transistor, ...) has an output $y(t)=y(x(t))$ (a device with memory that uses current input and input one second ago, has an output $y(t)=y(x(t),x(t-1))$).
Using the Taylor series over $y$ we have:
$y(x(t))=A+Bx(t)+Cx^2(t)+Dx^3(t)+\cdots$
It is typical use as input a single frequency $x(t)=cos(Wt)$. In this way, any output not at frequency $w=W$ is leakage:
$$ \begin{align*} y(\cos(Wt))&=A+B\cos(Wt)+C\cos^2(Wt)+D\cos^3(Wt)+\cdots \\ &=A+B\cos(Wt)+C\frac{1+\cos(2Wt)}{2}+D\frac{\cos(Wt)+\cos(3Wt)}{3}+\cdots \end{align*} $$
Note how the linear part B doesn't creates new frequencies; non-linear A creates output at new frequency $w=0$; the non-linear part C creates $w=0$ and $w=2W$; the non-linear D creates $w=3W$ and an interference over $w=W$; and so on.
2) Multiplicative processes, y(t)=x(t)w(t), is a case of windowing, where x(t) is the input and w(t) the window. It is practical decompose w(t) into a Fourier series:
$w(t)=A+B\cos(w_0t)+C\cos(w_1t)+\cdots+B'\sin(w_0t)+\cdots$.
Note: if w(t) is not periodic, and a window is not, there are infinite different $w_i$ to be taken into account, replace sum as integral; for simplicity, we stay on reals, but is more practical use complex $e^{jwt}$.
Now, if the input is $x(t)=\cos(Wt)$ we have:
$$\begin{align*} y(t)&=\cos(Wt)*w(t) \\ &=\cos(Wt)(A+B\cos(w_0t)+C\cos(w_1t)+\cdots) \\&=A\cos(wt)+B\frac{\cos((W+w_0)t)+\cos((W-w_0)t)}{2}+C\frac{\cos((W+w_1)t)+\cos((W-w_1)t)}{2}+\cdots \end{align*} $$
Notice how the terms of the decomposition of the window $w(t)$ create frequencies around the unique frequency $W$ of the input: $W+w_0, W-w_0, W+w_1, W-w_1, ...$. These are the leakage.
c) Multiplicative processes used in multiplexing, $y(t)=x(t)f(t)$, where $f(t)$ is the carrier $f(t)=cos(Wt)$ and $x(t)$ is the base signal.
We decompose $x(t)$ using Fourier series. Again:
$$y(t)=x(t)\cos(Wt)=\cos(Wt)(A+Bcos(w_0t)+Ccos(w_1t)+\cdots)=Acos(wt)+B\frac{cos((W+w_0)t)+cos((W-w_0)t)}{2}+C\frac{cos((W+w_1)t)+cos((W-w_1)t)}{2}+\cdots $$
in the usual case $w_i<<W$, by example x(t) can be audio frequencies, cos(Wt) is the carrier at gigahertz.
Note how all frequencies of $x(t)$ have been "moved" to higher frequencies: $w_0 => W \pm w_0; w_1=>W \pm w_1; ...$. Thus, we move from base band to radio-frequencies, where atmospheric absorption is smaller making possible radio transmission.