Can we find the exponential radioactive decay formula from first principles?
If you want to be very nitpicky about it, the decay will not be exponential. The exponential approximation breaks down both at small times and at long times:
At small times, perturbation theory dictates that the amplitude of the decay channel will increase linearly with time, which means that the probability of decay is at small times only quadratic, and the survival probability is slightly rounded near $t=0$ before of going down as $e^{-t/\tau}$. This should not be surprising, because the survival probability is time-reversal invariant and should therefore be an even function.
At very long times, there are bounds on how fast the bound state amplitude can decay which are essentially due to the fact that the hamiltonian is bounded from below, and which I demonstrate in detail below.
Both of these regimes are very hard to observe experimentally. At short times, you usually need very good time resolution and the ability to instantaneously prepare your system. At long times, you probably wouldn't need to go out that far out, but it is typically very hard to get a good signal-to-noise ratio because the exponential decay has pretty much killed all your systems, so you need very large populations to really see this.
However, both sorts of deviations can indeed be observed experimentally. At long times, the first observation is
Violation of the Exponential-Decay Law at Long Times. C Rothe, SI Hintschich and AP Monkman. Phys. Rev. Lett. 96 163601 (2006); Durham University eprint.
(To emphasize on the difficulty of these observations, they had to observe an unstable system over 20 lifetimes to observe the deviations from the exponential, by which time $\sim10^{-9}$ of the population remains.) For short times, the first observations are
Experimental evidence for non-exponential decay in quantum tunnelling. SR Wilkinson et al. Nature 387 no. 6633 p.575 (1997). UT Austin eprint,
which measured tunnelling of sodium atoms inside an optical lattice, and
Observation of the Quantum Zeno and Anti-Zeno Effects in an Unstable System. MC Fischer, B Gutiérrez-Medina and MG Raizen. Phys. Rev. Lett. 87, 040402 (2001), UT Austin eprint (ps).
To be clear, the survival probability of a metastable state is for all practical intents and purposes exponential. It's only with a careful experiment - with large populations over very long times, or with very fine temporal control - that you can observe these deviations.
Consider a system initialized at $t=0$ in the state $|\psi(0)⟩=|\varphi⟩$ and left to evolve under a time-independent hamiltonian $H$. At time $t$, the survival amplitude is, by definition, $$ A(t)=⟨\varphi|\psi(t)⟩=⟨\varphi|e^{-iHt}|\varphi⟩ $$ and the survival probability is $P(t)=|A(t)|^2$. (Note, however, that this is a reasonable but loaded definition; for more details see this other answer of mine.) Suppose that $H$ has a complete eigenbasis $|E,a⟩$, which can be supplemented by an extra index $a$ denoting the eigenvalues of a set $\alpha$ of operators to form a CSCO, so you can write the identity operator as $$1=\int\mathrm dE \mathrm da|E,a⟩⟨E,a|.$$ If you plug this into the expression for $A(t)$ you can easily bring it into the form $$ A(t)=\int \mathrm dE\, B(E)e^{-iEt},\quad\text{where}\quad B(E)=\int \mathrm da |⟨E,a|\varphi⟩|^2. $$ Here it's easy to see that $B(E)\geq0$ and $\int B(E)\mathrm dE=1$, so $B(E)$ needs to be pretty nicely behaved, and in particular it is in $L^1$ over the energy spectrum.
This is where the energy spectrum comes in. In any actual physical theory, the spectrum of the hamiltonian needs to be bounded from below, so there is a minimal energy $E_\text{min}$, set to 0 for convenience, below which the spectrum has no support. This looks quite innocent, and it allows us to refine our expression for $A(t)$ into the harmless-looking $$ A(t)=\int_{0}^\infty \mathrm dE\, B(E)e^{-iEt}.\tag1 $$ As it turns out, this has now prevented the asymptotic decay $e^{-t/\tau}$ from happening.
The reason for this is that in this form $A(t)$ is analytic in the lower half-plane. To see this, consider a complex time $t\in\mathbb C^-$, for which \begin{align} |A(t)| & =\left|\int_0^\infty B(E)e^{-iEt}\mathrm dE\right| \leq\int_0^\infty \left| B(E)e^{-iEt}\right|\mathrm dE =\int_{0}^\infty \left| B(E)\right|e^{+E \mathrm{Im}(t)}\mathrm dE \\ & \leq\int_{0}^\infty \left| B(E)\right|\mathrm dE=1. \end{align} as $\mathrm{Im}(t)<0$. This means that the integral $(1)$ exists for all $t$ for which $\mathrm{Im}(t)\leq 0$, and because of its form it means that it is analytic in $t$ in the interior of that region.
This is nice, but it is also damning, because analytic functions can be very restricted in terms of how they can behave. In particular, $A(t)$ grows exponentially in the direction of increasing $\mathrm{Im}(t)$ and decays exponentially in the direction of decreasing $\mathrm{Im}(t)$. This means that its behaviour along $\mathrm{Re}(t)$ should in principle be something like oscillatory, but you can get away with something like a decay. What you cannot get away with, however, is exponential decay along both directions of $\mathrm{Re}(t)$ - it is simply no longer compatible with the demands of analyticity.
The way to make this precise is to use something called the Paley-Wiener theorem which in this specific setting demands that $$ \int_{-\infty}^\infty \frac{\left|\ln|A(t)|\right|}{1+t^2}dt<\infty. $$ That is, of course, a wonky integral if anyone ever saw one, but you can see that if $A(t)\sim e^{-|t|/\tau}$ for large times $|t|$ ($A(t)$ must be time-reversal symmetric), then the integral on the left (only just) diverges. There's more one can say about why this happens, but for me the bottom line is: analyticity demands some restrictions on how fast $A(t)$ can decay along the real axis, and when you do the calculation this turns out to be it.
(For those wondering: yes, this bound is saturated. The place to start digging is the Beurling-Malliavin theorem, but I can't promise it won't be painful.)
For more details on the proofs and the intuition behind this stuff, see my MathOverflow question The Paley-Wiener theorem and exponential decay and Alexandre Eremenko's answer there, as well as the paper
L. Fonda, G. C. Ghirardi and A. Rimini. Decay theory of unstable quantum systems. Rep. Prog. Phys. 41, pp. 587-631 (1978). §3.1 and 3.2.
from which most of this stuff was taken.
It's not true in general that radioactive decay is exponential. Emilio Pisanty's answer discusses this from a fancy mathematical point of view, but it's possible to understand this in extremely elementary terms.
Exponential decay follows from linearity, irreversibility, and the assumption of a well-defined initial state
In fields like atomic and nuclear physics, we normally think of the initial state of the decay as a single state, which has a definite energy. In these examples this is normally an excellent approximation, since the uncertainty in energy imposed by the energy-time uncertainty principle is much smaller than the excitation energies. For example, the first excited state of a nucleus might typically have an excitation energy of 1 MeV and a gamma-decay lifetime of 1 ps, which gives it an energy uncertainty $\Delta E\sim 10^{-8}$ MeV. The relation $\Delta E\ll E$, where $E$ is the energy of the decay, is called the Weisskopf-Wigner approximation in the context of atomic physics. When this approximation holds, the initial, unstable state has no internal memory or dynamics that can have any effect on the problem.
Let $|0\rangle$ be the unstable state, and let $|i\rangle$ be a final state labeled by quantum numbers $i$. If the (normalized) state at $t=0$ is
$$\Psi_0 = |0\rangle,$$
then at some later time we will have some other state
$$\Psi_1 = a|0\rangle + \sum b_i|i\rangle.$$
The new state is obtained as $\Psi_1=M\Psi_0$, where $M$ is the time-evolution operator for whatever time interval we're talking about. Now suppose that the decay is irreversible, so that $M|i\rangle$ never overlaps with $|0\rangle$. Then if we operate with $M$ a second time, we get
$$\Psi_2 = a^2|0\rangle + \ldots$$
This is exponential decay, with the survival probability going like $1$, $|a|^2$, $|a|^4$, ... We must have $|a|\le 1$ by unitarity, since otherwise normalization would fail at $t>0$. (We could of course have irreversible absorption rather than irreversible emission, and then $|a|\ge 1$, but then the initial state can't be $|0\rangle$. The case $|a|=1$ happens when $|0\rangle$ is an exact energy eigenstate.)
So this seems like an ironclad argument that decay must be exponential. What could possibly go wrong?
Failure at short times
One thing that goes wrong is if we extrapolate backward in time to $t<0$. The probability of the unstable state is now greater than 1, which violates unitarity. This means that on purely trivial grounds the behavior cannot be exponential for all $t$. At early enough times, there must be strongly nonexponential behavior, and we expect some kind of smooth transition between the two regimes.
There is nothing terribly mysterious about this, and it doesn't require fancy ideas about Fourier transforms or quantum field theory. It just has to do with the preparation of the initial state. For example, suppose we make up a potential consisting of a well, a barrier, and an exterior region, and we solve the one-dimensional Schrodinger equation in this potential. If we start our simulation by initializing the wavefunction to be some kind of arbitrary shape inside the unstable well, then it will consist of some mixture of energy states. The higher-energy states will tunnel through the barrier at a high rate. After some short time, they're all gone, and we'll have something inside the barrier that will look sort of like a half-wavelength standing wave pattern. This will then decay exponentially as described above. The transition is basically just the time period during which the metastable state is settling down after having been prepared.
You could say that you're just going to prepare the system in an energy eigenstate initially, so that there is no transition period before exponential behavior sets in. That doesn't work, because the energy eigenstates are not localized inside the metastable well. States of good energy aren't localized, and states that are localized aren't states of good energy. A localized state can be approximately a state of good energy, and it's this approximation that we're implicitly making in most cases in atomic and nuclear physics -- the Weisskopf-Wigner approximation.
Failure at long times
At very long times, there is a different reason to expect deviations from exponential behavior. A tailing-off exponential falls rapidly through one order of magnitude after another. If there is any rate at all for the inverse process (e.g., reabsorption of a photon), then there will come a time at which this rate balances the rate of decay. Because exponentials die off so fast, we don't actually expect this time to be very long, although it depends on the environment. Beyond this point in time, the rate of emission will approach a nonzero constant that equals the rate of absorption.
An experimental example of this sort of thing is Rothe et al., "Violation of the Exponential-Decay Law at Long Times" Physical Review Letters, 96(16), doi:10.1103/physrevlett.96.163601. It doesn't seem to be on arxiv, but you can find it on sci-hub. They used luminescent organic molecules in a solution of toluene or ethanol. They excited the molecules using a pulsed laser. I don't think there is anything fundamentally very surprising or mysterious about this. The molecules are in a messy liquid environment and are exposed to jostling from the electromagnetic fields of other molecules. In this environment, they get reexcited at some rate. There are some nontrivial facts about the mathematical behavior of the decay curves, but the mere fact of a deviation from exponential decay can only be because the process isn't perfectly irreversible.