Understanding the statement of the bandwidth theorem
The bandwidth theorem was first fully understood in an all-time fantastic paper: Gabor's Theory of Communication (1946). I'd recommend checking it out. Not the easiest read if you're new to Fourier analysis, but it's really nice, and Part 2 even gives a detailed analysis of human hearing and how it relates.
The bandwidth theorem is a statement about (Fourier) signal analysis. Forget QM for now. Let's just talk about any function $\psi(x)$. Could be complex or real valued.
(Note: I'd call this function $f(x)$, but then $f$ could be confused for frequency. In signal analysis they usually talk about a real-valued signal $s(t)$ which is a function of time, and its spectrum $S(f)$ which is complex.)
Fourier analysis says that every function $\psi(x)$ has a unique frequency spectrum $\phi(f)$.
The meaning of this spectrum is that $\psi(x)$ can be written as a superposition of the complex functions $e^{2\pi i f x}$ (in other words, as a superposition of sines and cosines of frequency $f$). The spectrum $\phi(f)$, also called the "Fourier Transform", tells you how much of each sinusoidal component you need to use.
It turns out that "narrow" functions have a "broad" spectrum, and "broad" functions have a "narrow" spectrum:
- The spectrum of a localized spike (Dirac delta function) contains all frequencies equally.
- The spectrum of a perfect sinusoidal wave (totally spread out through space) has just a single frequency.
- The spectrum of a Gaussian distribution with width $\sigma$ is a Gaussian distribution with width $1/\sigma$.
The bandwidth theorem makes this observation quantitative: The product of the width $\sigma_x$ of a function times the bandwidth $\sigma_f$ of its spectrum obeys $\sigma_x \, \sigma_f \geq \frac{1}{2}$.
The width and bandwidth of the function and spectrum are simply defined as $\sqrt{2\pi}$ times their standard deviations (i.e. times their root-mean-square deviations from the mean): $$ {\sigma_x }^2 = 2\pi \int (x - \bar{x})^2 \, |\psi(x)|^2 \,dx \qquad \textrm{and} \qquad {\sigma_f }^2 = 2\pi \int (f - \bar{f})^2 \, |\phi(f)|^2 \,df $$ where $\bar{x}$ and $\bar{f}$ are the mean (expected) values of the position and frequency. This assumes $\psi(x)$ is normalized, if not then both sides get divided by a normalization factor. The factor of $2\pi$ is conventional and relates to information content, as explained by Gabor.
If you change notation so that $\sigma_x = \Delta x$ and change $x$ to a time variable $t$, nothing at all changes, and you can write the theorem as $\Delta t \, \Delta f \geq \frac{1}{2}$. This is a property of functions, not of energies, quantum mechanics, momentum, or anything else.
This is related to (but not caused by) the fact that $[x,\frac{\partial}{\partial x}]=-1$ (to check this, remember to apply a test function). The relation to quantum operators is obtained by simply plugging in $\hat{x} = x$ and $\hat{p} = -i\hbar \frac{\partial}{\partial x}$. The role of QM in obtaining the uncertainty principle is usually overblown: all the interesting generalized uncertainty relations boil down to the bandwidth theorem. For finite-dimensional observables, there's always a state with zero uncertainty in one operator and finite uncertainty in the other. In other words, the reason for the Heisenberg uncertainty principle in QM is not some magic of operators $-$ it is simply that the state is represented by a wavefunction, and any function can either have a well-defined position, or a well-defined frequency, but not both (spatial frequency corresponds to momentum in QM). If anyone tells you taking $\hbar \to 0$ makes the uncertainty principle "go away in the classical limit", don't listen.
An amazing consequence of this theorem is that, if allowed to communicate using a finite bandwidth for a finite amount of time, you can only transmit a finite amount of information... and Gabor's analysis tells you how much information that is.