Intuitive explanation of $L^2$-norm

OK let's see if this helps you. Suppose you have two functions $f,g:[a,b]\to \mathbb{R}$. If someone asks you what is distance between $f(x)$ and $g(x)$ it is easy you would say $|f(x)-g(x)|$. But if I ask what is the distance between $f$ and $g$, this question is kind of absurd. But I can ask what is the distance between $f$ and $g$ on average? Then it is $$ \dfrac{1}{b-a}\int_a^b |f(x)-g(x)|dx=\dfrac{||f-g||_1}{b-a} $$ which gives the $L^1$-norm. But this is just one of the many different ways you can do the averaging: Another way would be related to the integral $$ \left[\int_a^b|f(x)-g(x)|^p dx\right]^{1/p}:=||f-g||_{p} $$ which is the $L^p$-norm in general.

Let us investigate the norm of $f(x)=x^n$ in $[0,1]$ for different $L_p$ norms. I suggest you draw the graphs of $x^{p}$ for a few $p$ to see how higher $p$ makes $x^{p}$ flatter near the origin and how the integral therefore favors the vicinity of $x=1$ more and more as $p$ becomes bigger. $$ ||x||_p=\left[\int_0^1 x^{p}dx\right]^{1/p}=\frac{1}{(p+1)^{1/p}} $$ The $L^p$ norm is smaller than $L^m$ norm if $m>p$ because the behavior near more points is downplayed in $m$ in comparison to $p$. So depending on what you want to capture in your averaging and how you want to define `the distance' between functions, you utilize different $L^p$ norms.

This also motivates why the $L^\infty$ norm is nothing but the essential supremum of $f$; i.e. you filter everything out other than the highest values of $f(x)$ as you let $p\to \infty$.


There are several good answers here, one accepted. Nevertheless I'm surprised not to see the $L^2$ norm described as the infinite dimensional analogue of Euclidean distance.

In the plane, the length of the vector $(x,y)$ - that is, the distance between $(x,y)$ and the origin - is $\sqrt{x^2 + y^2}$. In $n$-space it's the square root of the sum of the squares of the components.

Now think of a function as a vector with infinitely many components (its value at each point in the domain) and replace summation by integration to get the $L^2$ norm of a function.

Finally, tack on the end of last sentence of @levap 's answer:

... the $L^2$ norm has the advantage that it comes from an inner product and so all the techniques from inner product spaces (orthogonal projections, etc) can be applied when we use the $L^2$ norm.


If you have some physics background, then $L^2$ norm can often be interpreted as the "energy" of the wave functions. Physical interpretation of L1 Norm and L2 Norm

In quantum physics, the $L^2$ norm represents the probability of detecting a particular pure state amount many mixed states.

In statistic, minimizing the $L^2$ norm of the difference between 2 functions is equivalent to the process called "least square method". Differences between the L1-norm and the L2-norm

In mathematics, we prefer it over many other possible norm because it induces the Hilbert Spaces structure on the functions spaces. There are many beautiful theory available once we embrace the $L^2$ norm. I'd say that you'll gain intuition along the way of studying it. It may not seem why we prefer it now but, eventually, you'll see.