Lorentz invariance of the Minkowski metric

I believe it can be useful to define the following concepts (I won't be very formal here for pedagogical reasons):

Any event can be described through four real numbers, which we take to be: the moment in time it happens, and the position in space where it takes place. We call this four numbers the coordinates of the event. We collect these numbers in a tuple, which we call $x\equiv (t,\boldsymbol r)$. These numbers depend, of course, on which reference frame we are using: we could, for example, use a different origin for $t$ or a different orientation for $\boldsymbol r$. This means: for $x$ to make sense, we must pick a certain reference frame. Call it $S$ for example.

Had we chosen a different frame, say $S'$, the components of the same event would be $x'$, i.e., four real numbers, in principle different from those before. We declare that the new reference frame is inertial if and only if $x'$ and $x$ are related through $$ x'=\Lambda x \tag{1} $$ for a certain matrix $\Lambda$, that depends, for example, on the relative orientations of both reference frames. There are certain conditions $\Lambda$ must fulfill, which will be discussed in a moment.

We define a vector to be any set of four real numbers such that, if its components in $S$ are $v=(v^0,\boldsymbol v)$, then in $S'$ its components must be $$ v'=\Lambda v \tag{2} $$

For example, the coordinates $x$ of an event are, by definition, a vector, because of $(1)$. There are more examples of vectors in physics, for example, the electromagnetic potential, or the current density, the momentum of a particle, etc.

It turns out that it is really useful to define the following operation for vectors: if $u,v$ are two vectors, then we define $$ u\cdot v\equiv u^0 v^0-\boldsymbol u\cdot\boldsymbol v\tag{3} $$ The reason this operation is useful is that it is quite ubiquitous in physics: there are many formulas that use this. For example, any conservation law, the wave equation, the Dirac equation, the energy-momentum relation, etc.

We define the operation $\cdot$ through the components of the vectors, but we know these components are frame-dependent, so if $\cdot$ is to be a well-defined operation, we must have $$ u\cdot v=u'\cdot v' \tag{4} $$ because otherwise $\cdot$ would be pretty useless.

This relation $(4)$ won't be true in general, but only for some matrices $\Lambda$. Thus, we declare that the matrices $\Lambda$ can only be those which make $(4)$ to be true. This is a restriction on $\Lambda$: only some matrices will represent changes of reference frames. Note that in pure mathematics, any invertible matrix defines a change of basis. In physics only a subset of matrices are acceptable changes of basis.

So, what are the possible $\Lambda$'s that satisfy $(4)$? Well, the easier way to study this is to rewrite $(3)$ using a different notation: define $$ \eta=\begin{pmatrix} 1 &&&\\&-1&&\\&&-1&\\&&&-1\end{pmatrix} \tag{5} $$

This is just a matrix that will simplify our discussion. We should not try to find a deep meaning for $\eta$ (it turns out there is a lot of geometry behind $\eta$, but this is not important right now). Using $\eta$, its easy to check that $(3)$ can be written as $$ u\cdot v=u^\mathrm{T}\eta v \tag{6} $$ where in the r.h.s. we use the standard matrix product. If we plug $v'=\Lambda v$ and $u'=\Lambda v$ here, and set $u\cdot v=u'\cdot v'$, we find that we must have $$ \Lambda^\mathrm{T} \eta \Lambda=\eta \tag{7} $$

This is a relation that defines $\Lambda$: any possible change of reference frame must be such that $(7)$ is satisfied. If it is not, the $\Lambda$ cannot relate two different frames. This relation is not in fact a statement of how $\eta$ transforms (as you say in the OP), but actually a restriction of $\Lambda$. It is customary to say that $\eta$ transforms as $(7)$, which will be explained in a moment. For now, just think of $(7)$ as what are the possible matrices $\Lambda$.

At this point, it is useful to introduce index notation. If $v$ is a vector, we call its components $v^\mu$, with $\mu=0,1,2,3$. On the other hand, we write the components of changes of frames $\Lambda^\mu{}_\nu$. With this notation, $(2)$ can be written as $$ v'^\mu=\Lambda^\mu{}_\nu v^\nu \tag{8} $$

Also, using index notation, the product of two vectors can be written as $$ u\cdot v=\eta_{\mu\nu}u^\mu v^\nu \tag{9} $$ where $\eta_{\mu\nu}$ are the components of $\eta$.

Index notation is useful because it allows us to define the following concept: a tensor is an object with several indices, e.g. $A^{\mu\nu}$. But not any object with indices is a tensor: the components of a tensor must change in different frames of reference, such that they are related through $$ \begin{align} &A'^{\mu\nu}=\Lambda^\mu{}_\rho \Lambda^\nu{}_\sigma\ A^{\rho\sigma} \\ &B'^\mu{}_\nu=\Lambda^\mu{}_\rho(\Lambda^\mathrm{T})_\nu{}^\sigma\ B^\rho{}_\sigma\\ &C'^{\mu\nu}{}_\pi{}^\tau=\Lambda^\mu{}_\rho \Lambda^\nu{}_\sigma (\Lambda^\mathrm{T})_\pi{}^\psi \Lambda^\tau{}_\omega\ C^{\rho\sigma}{}_\psi{}^\omega \end{align}\tag{10} $$ and the obvious generalisation for more indices: for every upper index, there is a factor of $\Lambda$, and for every lower index, a factor of $\Lambda^\mathrm{T}$. If the components of an object with indices don't satisfy $(10)$ then that object is not a tensor. According to this definition, any vector is a tensor (with just one index).

I don't like to use index notation too much: $v'=\Lambda v$ is easier that $v'^\mu=\Lambda^\mu{}_\nu v^\nu$, don't you think?. But sometimes we have to use index notation, because matrix notation is not possible: when using tensors with three or more indices, matrices cannot be used. Tensors with one index are just vectors. You'll hear sometimes that matrices are tensors with two indices, which is not quite true: if you remember from your course on linear algebra, you know that when you make a change of basis, matrices transform like $M\to C^\mathrm{T} M C$, which is like $(10)$ in the case of one upper/one lower index. Therefore, matrices are like tensors with one uppe/one lower index. This is the reason we wrote $\Lambda$ as $\Lambda^\mu{}_\nu$. This is a matrix, but it is also a tensor.

Also, $(7)$ pretty much looks like $(10)$, right? This is the reason people say $(7)$ expresses the transformation properties of $\eta$. While not false, I you recommend not to take this too seriously: formally, it is right, but in principle $\eta$ is just a set of numbers that simplifies our notation for scalar products. It turns out you can think of it as a tensor, but only a-posteriori. In principle, it is not defined as a tensor, but it turns out it is. Actually, it is a trivial tensor (the only one!) whose components are the same in every frame of reference (by definition). If you were to calculate what are the components of $\eta$ in another frame of reference using $(10)$, you'll find out that they are the same. This is stated as the metric is invariant. We actually define it to be invariant. We define what a change of reference frame through the restriction of $\eta$ being invariant. It doesn't make sense to try to prove $\eta$ is invariant, as this is a definition. $(7)$ doesn't really prove $\eta$ is invariant, but actually defines what a change of reference is.

For completeness I'd like to make the following definitions:

  • We say an object is invariant if it takes the same value on any frame of reference. You can check that if $v$ is a vector, then $v\cdot v$ takes the same value on any frame, i.e., $v^2$ is invariant.

  • We say an object is covariant if it doesn't take the same value on every frame of reference, but the different values are related in a well defined way: the components of a covariant object must satisfy $(10)$. This means tensors are covariant by definition.

For example, a vector is not invariant because its components are frame-dependent. But as vectors are tensors, they are covariant. We really like invariant objects because they simplify a lot of problems. We also like covariant objects because, even though these objects are frame-dependent, they transform in a well-defined way, making them easy to work with. You'll understand this better after you solve many problems in SR and GR: in the end you will be thankful for covariant objects.

So, what does it mean for $\eta$ to be invariant? It means its components are the same in every (inertial) frame of reference. How do we prove this? we actually can't, because we define this to be true. How can we prove $\eta$ is the only invariant tensor? We can't, because it is not actually true. The most general invariant tensor is proportional to the metric. Proof: let $N^\mu{}_\nu$ be an invariant tensor by definition. Then, as it is a tensor, we have $$ N'=\Lambda^\mathrm{T}N\Lambda \tag{11} $$

But we also must have $N'=N$ for it to be invariant. This means $\Lambda^\mathrm T N\Lambda=N$. Multiply on the right by $\eta \Lambda^\mathrm{T} \eta$ and use $(7)$ to get $[N,\Lambda^\mathrm{T}]=0$. By Shur's Lemma, $N$ must be proportional to the identity. QED.


And what about the Levi-Civita symbol? we are usually told that it is also an invariant tensor, which is not actually true: it is invariant, but it is not a tensor, it is a pseudo-tensor. In SR it doesn't satisfy $(10)$ for any $\Lambda$, but only for a certain subset of matrices $\Lambda$ (check Proper Orthochronus Lorentz Group), and in GR it is a tensor density (discussed in many posts on SE).

The proof of the covariance of the LC symbol is usually stated as follows (you'll have to fill in the details): the definition of the determinant of a matrix is can be stated as $\text{det}(A)\varepsilon^{\mu\nu\sigma\rho}=\varepsilon^{abcd}A^\mu{}_a A^\nu{}_b A^\rho{}_c A^\sigma{}_d$. The proper Orthochronus Lorentz Group consists of the subset of matrices with unit determinant, i.e., $\text{det}(\Lambda)=1$. If you use this together with the definition of $\text{det}$, you get $\varepsilon^{\mu\nu\rho\sigma}=\varepsilon^{abcd}\Lambda^\mu{}_a\Lambda^\nu{}_b\Lambda^\rho{}_c\Lambda^\sigma{}_d$, which is the same as $(10)$ for the object $\varepsilon^{\mu\nu\rho\sigma}$. This proves that, when restricted to this subset of the Lorentz Group, the Levi-Civita symbol is a tensor.


Raising and Lowering indices: this is something that is usually made more important that it really is. IMHO, we can fully formulate SR and GR without even mentioning raising and lowering indices. If you define an object with its indices raised, you should keep its indices where they are. In general there is no good reason as why would someone want to move an index. That being said, I'll explain what these are, just for completeness.

The first step is to define the inverse of the metric. Using matrix notation, the metric is its own inverse: $\eta \eta=1$. But we want to use index notation, so we define another object, call it $\zeta$, with components $\zeta^{\mu\nu}=\eta_{\mu\nu}$. With this, you can check that $\eta\eta=1$ can be writen as $\eta_{\mu\nu}\zeta^{\nu\rho}=\delta^\mu_\rho$, where $\delta$ is the Kronecker symbol. For now, $\delta$ is just a symbol that simplifies the notation. Note that $\zeta$ is not standard notation, but we will keep it for the next few paragraphs.

(People usually use the same letter for both $\eta$ and $\zeta$, and write $\eta_{\mu\nu}=\eta^{\mu\nu}$; we'll discuss why in a moment. For now, note that these are different objects, with different index structure: $\eta$ has lower indices and $\zeta$ has upper indices)

We can use $\eta$ and $\zeta$ to raise and lower indices, which we now define.

Let's say you have a certain tensor $A^{\mu\nu}{}_\rho$. We want to define what it means to raise the index $\rho$: it means to define a new object $\bar A$ with components $$ \bar A^{\mu\nu\rho}\equiv \zeta^{\rho\sigma}A^{\mu\nu}{}_\sigma \tag{12} $$ (this is called to raise the index $\rho$ for obvious reasons)

Using $(10)$ you can prove that this new object is actually a tensor. We usually drop the bar $\bar{\phantom{A}}$ and write $A^{\mu\nu\rho}$. We actually shouldn't do this: these objects are different. We can tell them apart from the index placement, so we relax the notation by not writing the bar. In this post, we'll keep the bar for pedagogical reasons.

In an analogous way, we can lower an index, for example the $\mu$ index: we define another object $\tilde A$, with components $$ \tilde A_\mu{}^\nu{}_\rho\equiv \eta_{\mu\sigma} A^{\sigma\nu}{}_\rho \tag{13} $$ (we lowered $\mu$)

This new object is also a tensor. The three objects $A,\bar A,\tilde A$ are actually different, but we can tell them apart through the indices placement, so we can drop the tildes and bars. For now, we won't.

We'll discuss the usefulness of these operations in a moment. For now, note that if you raise both indices of the metric, you get $$ \bar{\bar{\eta}}^{\mu\nu}\equiv\zeta^{\mu\rho}\zeta^{\nu\sigma} \eta_{\rho\sigma}=\zeta^{\mu\rho}\delta^\nu_\rho=\zeta^{\mu\nu} \tag{14} $$ which means that $\bar{\bar{\eta}}=\zeta$. As we usually drop the bars, this means that we can use the same letter $\eta$ for both objects. In principle, they are different: $\eta_{\mu\nu}$ is the metric, and $\zeta^{\mu\nu}$ is its inverse. In practice, we use $\eta_{\mu\nu}$ and $\eta^{\mu\nu}$ for both these objects, and even call them both metric. From now on, we will use $\eta$ both for the metric and its inverse, but we keep the bars for other objects.

With this in mind, we get the following important result: $$ \eta_{\mu\nu}\eta^{\nu\rho}=\delta_\mu^\rho \tag{15} $$ which is actually a tautology: it is the definition of the inverse of the metric.

So, what is the use of these operations? for example, what do we get if we lower the index of a vector $v$? Well, we get a new tensor, but it is not a vector (you can check that $(2)$ is not satisfied), so we call it a covector. This is not really important in SR, but in other branches of physics vectors and covectors are really really different.

So, what is the covector associated to $v$? Call this covector $\bar v$. Its components will be $\bar v_\mu=\eta_{\mu\nu} v^\nu$ by definition. Why is this useful? Well, one reason is that by lowering an index, the scalar product $\cdot$ turns into standard matrix product: $$ u\cdot v=\bar u v \tag{16} $$ as you can check (compare this to $(3)$ or $(6)$). So in principle, raising and lowering indices is supposed to simplify notation. Actually, in the end, you'll see that people write $uv$ instead of $u\cdot v$ or $u_\mu v^\mu$. So you see that the notation is simplified without the need of raising/lowering any index.

The following fact is rather interesting: we know that if we raise both indices of the metric we get the metric again. But what do we get if we raise only one index to the metric? that is, what is $\bar \eta$?, or, put it another way, what is $\eta^\mu{}_\nu$? Well, according to the definition, it is $$ \eta^\mu{}_\nu=\eta_{\nu\rho}\eta^{\mu\rho}=\delta^\mu_\nu \tag{17} $$ where I used $(15)$. This means that $\bar \eta=\delta$: the metric is the same object as the Kronecker symbol, which is a cool result. As we know that raising and lowering indices from a tensor results in a new tensor, we find that the Kronecker symbol is actually a tensor! We can even prove this from the definition of tensors, i.e., we can check that $(10)$ is satisfied for $\delta$. But we don't need to: we know that it must be true (check it if you want to).


As a side note: you (as many people) write prime marks on the indices, while I (as many others) write the primes on the tensors. IMHO the latter convention is the best, because it is the tensor what is changing, not the indices. For example, what you wrote $\eta_{\mu'\nu'}=\eta_{\mu\nu}$ looks better when written $\eta'_{\mu\nu}=\eta_{\mu\nu}$, because the $\mu\nu$ component of both objects are equal, and not the $\mu'$ is equal to the $\mu$ component (which actually makes no sense and makes the indices mismatched).


You're right. $\eta_{\mu\nu}\rightarrow \eta_{\mu^{'}\nu^{'}}=\Lambda^{\alpha}_{\,\,\mu^{'}}\eta_{\alpha\beta}\Lambda^{\beta}_{\,\,\nu^{'}}$ just says that the metric transforms as a tensor as you would expect from its indices; there's nothing special about that. Being invariant means that when you make the transformation you get back the same matrix: $\eta_{\mu'\nu'} = \eta_{\mu\nu}$. In matrix notation, we require that $\eta = \Lambda^T \eta \Lambda$. This is not true for an arbitrary tensor.