Why Large deviation theory?

Large deviation theory deals with the decay of probabilities of rare events on an exponential scale. If $(S_n)_{n \in \mathbb{N}}$ is a random walk, then "rare event" means that $\lim_{n \to \infty} \mathbb{P}(S_n \in A) =0$. Large deviation theory aims to determine the asymptotics of

$$\mathbb{P}(S_n \in A) \quad \text{as $n \to \infty$}$$

How fast does $\mathbb{P}(S_n \in A)$ tend to zero? Roughly speaking, a large deviation estimate tells us that

$$\mathbb{P}(S_n \in A) \approx \exp \left( -n J(A) \right) \qquad \text{for large $n$}$$

for a certain rate $J(A) \geq 0$. This means that the probability $\mathbb{P}(S_n \in A)$ decays for $n \to \infty$ exponentially with rate $J(A)$. If you are for instance interested in finding $n \in \mathbb{N}$ such that $\mathbb{P}(S_n \in A) \leq \epsilon$ for some given $\epsilon>0$, this is really useful because it allows you to determine how large you have to choose $n$.

You are right that it is, in general, not easy to determine the rate function. The case of independent identically distributed random variables illustrates quite nicely that large deviation theory has nevertheless its justification:

Let $(X_j)_{j \in \mathbb{N}}$ be a sequence of independent and identically distributed random variables and $S_n := \sum_{j=1}^n X_j$ the associated random walks. If we want to determine the asymptotics

$$\mathbb{P}(S_n > x)$$

using the distribution function, then this means that we have to calculate the distribution function of $S_n$ for each $n$, and this will certainly require a huge amount of computations. If we use large deviation theory instead, then we have to compute

$$I(x) = \sup_y \{y \cdot x - \lambda(y)\}$$

for the moment generating function $\lambda(y) = \mathbb{E}\exp(y X_1)$. Note that these quantities do not depend on $n$, i.e. we have to compute it once and then we are done. Moreover, the large deviation principle will also allow us to estimate probabilities of the form

$$\mathbb{P}(S_n \in A) $$

using the rate function $I$; we are not restricted to events $A$ of the particular form $(x,\infty)$. Probabilities like that are very hard to compute using the density function $S_n$ (which is itself, in general, very hard/impossible to compute).


For 1: "easily obtain" is a misnomer here. As an example you can get estimates for $P(S_n>x)$ very easily just by appealing to the CLT. The problem is that the CLT does not hold when you are too far from the mean (as a function of $n$). So you can "easily obtain" that $P(S_n>x)$ is close t0 0, just like the gaussian distribution, but it's not at all clear how close.

As an example, imagine that $S_n$ denotes insurance payments for disasters on a yearly basis. The chance of there being a hurricane that causes a 10 trillion dollars of damage is close to 0. But how close to 0? Are we talking once every 10 years? 100 years? 1000000 years? To price your insurance policy, you'd need a good estimate on this, as a gaussian distribution would generally grossly underestimate this probability.

For 2: note that it's not harder, necessarily. Exponention is really natural, as in some easy situations, by markov's inequality, $P(S_n>x)= P(e^{S_nt}>e^{xt})\leq \frac{\phi(t)}{e^{t\epsilon}}$, where $\phi$ is the moment generating function, gives you a decent bound after minimizing in $t$. What's difficult is the lower bound on $P(S_n>x)$, and this usually takes the most work in large deviations theory.

For 3: $E[e^{nkS_n}]$, written out as an approximate Riemman sum is effectively Laplace's Approximation at work: $e^{-an}+e^{-bn}\approx e^{-an}$ when $a<b$ (e.g, even if a=b-0.000001). There's a saying that "rare events happen in the cheapest way possible", and this captures that statement: the most probable rare event is the one that carries the smallest exponent, and this is exactly where the concept of a "rate" comes in, to pin how fast it goes to 0.