Why this $\sigma \pi \sigma^{-1}$ keeps appearing in my group theory book? (cycle decomposition)

Conjugation amounts to changing perspective. Here are three major examples to illustrate this point: notational representations of permutations with respect to labels on objects, matrices representing linear transformations with respect to ordered bases, and loops a la the fundamental group as studied in algebraic topology and homotopy theory.

$\sf \color{DarkOrange}{Permutations}$. As you've seen, $\pi(x_1\cdots x_n)\pi^{-1}=(\pi(x_1)\cdots\pi(x_n))$. Let's see if I can explain what this means. Suppose we have a set $X$ and a permutation $\sigma$ on it. Furthermore, say we relabel all of the elements of $X$. (The way to encode this is as a bijection $\pi:X\to Y$ for some set of labels $Y$.) Then there should be a corresponding permutation $\sigma'$ of $Y$ which does the same thing to $Y$ that $\sigma$ does to $X$, just with the elements relabeled. More concretely, if our permutation $\sigma$ sends $x_1$ to $x_2$ (in $X$) then $\sigma'$ should send $\pi(x_1)$ to $\pi(x_2)$. And thus the resulting effect on cycles (and thus on cycle types of any permutation) are seen: as $\sigma$ maps $x_1\mapsto x_2$ and $x_2\mapsto x_3$ and so on, we must have $\sigma'$ mapping $\pi(x_1)\mapsto\pi(x_2)$ and $\pi(x_2)\mapsto\pi(x_3)$ and so on.

Consider the following scenario. A blind woman shuffles an arrangement of $n$ things placed in front of her - and has mentally memorize this shuffle, even if she can't see the labels that are put on the $n$ things (for instance, they could have the numerals $1$ through $n$, or the word "one" through whatever, or those same words in a different language, etc.). Us seeing people can describe her shuffle using, say, cycle notation or one-line notation. Whatever the case, if we then relabel the objects and the woman performs her shuffle, then there will be a change in how we represent the permutation. Indeed if we swap the labels, then she shuffles the objects, then we swap the labels back to their originals, the effect is the same as if she had just directly performed the shuffle without any label swapping at all. This means the following diagram commutes:

$$\require{AMScd}\begin{CD} X @>\pi>> Y\\@VV{\sigma}V @VV{\sigma'}V\\ X @>\pi>> Y \end{CD} \tag{$\circ$}$$

Thus, we have $\sigma'\circ\pi=\pi\circ\sigma$. If $Y=X$ (so we use the same labels for the objects the woman shuffles, but we shuffle the labels themselves!) then this is $\sigma'\pi=\pi\sigma$, i.e. $\sigma'=\pi\sigma\pi^{-1}$.

The diagram $(\circ)$ is important and ubiquitous in mathematics. It is how we transport the structure of the symmetries of some object to the symmetries of an isomorphic object (often taking place in some category). This pattern repeats itself in other examples. It appears when we wish to define equivariant and intertwining operators. Anytime something happens in one situation, and you want to convert it to another situation, something along these lines is happening.

On the other hand, group theory also has conjugation appear a lot simply because it is useful in exploring the structure of a group. Two elements commuting $ab=ba$ is equivalent to $b=aba^{-1}$ (you can use this to compute the probability two elements commute!). If one defines the maps $\varphi_g(x):=gxg^{-1}$, then for each $g\in G$ the map $\varphi_g:G\to G$ is an automorphism.

You need conjugation to define normal subgroups, which are kernels of group homomorphisms and (equivalently) the congruence classes of the identity whose cosets are the congruence classes of all the other elements. One uses normal subgroups to break apart groups into pieces called composition factors (see the Jordan Holder theorem) - indeed in this way finite cyclic groups break apart into cyclic groups of prime order with multiplicities, exactly analogous to natural numbers breaking apart into products of primes with multiplicities. And another structure-theoretic result coming from the arithmetic of a group's order is seen in the three Sylow theorems, whose statements and proofs rely on conjugation.

$\sf \color{DarkOrange}{Matrices}$. Often students are introduced to linear algebra in a concrete geometric way, and often the only vectors introduced are coordinate vectors and linear maps are simply defined by matrices, but there is more abstraction behind the picture than this. Vectors don't need to be coordinate vectors - they just need to be things, things that can be added or multiplied by scalars. One learns somewhat early on that every vector space has a basis. Given a basis $\{e_1,\cdots,e_n\}$ of a vector space $V$ of finite dimension $n$, any vector $v\in V$ may be written as $v=x_1e_1+\cdots+x_ne_n$ uniquely, and so we can represent any vector $v$ by a corresponding coordinate vector $(x_1,\cdots,x_n)$. So in some sense, nothing of value is lost by restricting our attentions to coordinate vectors (or so a physicist might reason, but a set theorist would certainly argue), but often we want to change bases.

Bases are also how linear transformations $T:V\to W$ get represented as matrices. Say $\{f_i\}$ is a basis for $W$ (with $\dim=m$). Since $T(v)=T(x_1e_1+\cdots+x_ne_n)=x_1T(e_1)+\cdots+x_nT(e_n)$, in order to know where $T$ sends every vector $v\in v$ it suffices to know where $T$ sends the basis vectors, and moreover, each of $T(e_1),\cdots,T(e_n)$ may be written as

$$T(e_1)= t_{11}f_1+\cdots+t_{m1}f_m \\ \vdots \\ T(e_n)=t_{1n}f_1+\cdots+t_{mn}f_m$$

Thus, $T$ is specified by a rectangular array of scalars. One may check that if writing $T(v)$ in coordinates (according to the chosen basis of $W$), the result is the same as if we multiply out

$$\begin{bmatrix}t_{11} & \cdots & t_{1n} \\ \vdots & \ddots & \vdots \\ t_{m1} & \cdots & t_{mn}\end{bmatrix}\begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix}.$$

One may further check that if $T:V\to W$ and $S:W\to U$ are two linear transformations and $\circ$ denotes composition, then the matrix of $S\circ T$ is the matrix product of $S$ and $T$'s matrices (note bases must be chosen for each of $U,V,W$ for this to make sense). This is where matrix multiplication comes from in the first place!

Now suppose we just have $T:V\to V$ for simplicity, and $\cal B$ is a basis for $V$. Denote the corresponding matrix by $[T]_{\cal B}$. If we have another basis ${\cal C}$, how does $[T]_{\cal C}$ relate to $[T]_{\cal B}$?

(I suppose at this point I should observe that all of our bases have not just been bases, but ordered bases. Technically this is important for bookkeeping purposes, but sources frequently forget to mention this qualification at all!) Given ${\cal B}=\{e_i\}$ and ${\cal C}=\{f_i\}$, there is a unique transformation $P:V\to V$ with $P(e_i)=f_i$ for each $i$. Let's abbreviate ${\cal C}=P{\cal B}$. The way to figure out what $[T]_{P{\cal B}}$ is, is to look at how vectors are affected.

Denote by $[v]_{\cal B}$ the vector $v$ written as a coordinate vector with respect to ordered basis $\cal B$. Then we know that $[Tv]_{\cal B}=[T]_{\cal B}[v]_{\cal B}$. Moreover, if $v=x_1e_1+\cdots+x_ne_n$ then $P(v)=x_1f_1+\cdots+x_nf_n$ and so we conclude $[Pv]_{P{\cal B}}=[v]_{\cal B}$, both being $(x_1,\cdots,x_n)$. Now from $[v]_{\cal B}=[Pv]_{P\cal B}$ and $[v]_{P\cal B}=[P^{-1}v]_{\cal B}$ we see that $[P]_{\cal B}=[P]_{P\cal B}$, call this $\Phi^{-1}$. Understandably, $\Phi$ is called the change of basis matrix, because $\Phi:[v]_{\cal B}\mapsto [v]_{P\cal B}$ for all $v$. This $\Phi$ is analogous to our permutation $\pi$: it changes our perspective on the vector space.

Then $[Tv]_{P\cal B}=[P^{-1}Tv]_{\cal B}=[P^{-1}T]_{\cal B}[v]_{\cal B}=[P]_{\cal B}^{-1}[T]_{\cal B}[Pv]_{P\cal B}=\Phi[T]_{\cal B}\Phi^{-1}[v]_{P\cal B}$. Since this holds for all $v$, from this we may conclude $[T]_{P\cal B}=\Phi[T]_{\cal B}\Phi^{-1}$. Thus we see in linear algebra, similar (i.e. conjugate) matrices represent the same linear transformation but with respect to different ordered bases. (Okay, technically we saw the converse of this, but seeing that given any matrix $\Phi$ there are corresponding bases is not too difficult afterwards.)

$\sf \color{DarkOrange}{Loops}$. A path in a topological space $X$ is a continuous map $[0,1]\to X$. If we reparametrize the path (so in particular, it traces out the same image) or continuously wiggle the path, we do not fundamentally change the path - this inspires the idea of a homotopy class of path. Paths that begin and end at the same point are loops. There is an obvious way of concatenating two paths (trace out the first path first, and the second path second - although both with double their usual "speed"), and homotopy equivalence of paths is a congruence relation for this operations so we can define the fundamental group $\pi_1(X,x)$ to be the homotopy classes of loops from $x$ to itself, with this intuitive notion of composition. See the link for more information.

Intuitively, one should be able to continuously move $x$ itself around and subsequently move the loops based in $\pi_1(X,x)$ around with it. Indeed, if $\gamma:[0,1]\to X$ is a path from $x$ to $y$, then we given any (homotopy class of loop) $t\in\pi_1(X,x)$ we can form a new path that starts at $y$, goes along $\gamma$ backwards, goes along $t$ from $x$ to itself, and then goes back along $\gamma$ to $y$. In this way, we input loops $t$ based at $x$ and get loops $t'$ based at $y$. Let's write this up as a diagram:

$$\begin{CD} x @>\gamma>> y\\@VV{t}V @VV{t'}V\\ x @>\gamma>> y \end{CD} $$

This is a bit informal, but you should be able to get the idea. Thus $t'=\gamma\circ t\circ\gamma^{-1}$ where $\gamma^{-1}$ has the obvious interpretation: take $\gamma$ backwards from $y$ to $x$. Then $t=\gamma^{-1}\circ t'\circ\gamma$, so this process is invertible. Altogether this means conjugation is an isomorphism $\pi_1(X,x)\to\pi_1(X,y)$. If we have that $x=y$ then this amounts to taking a loop $\gamma$ and moving other loops $t$ around it (keeping the basepoint always on $\gamma$) to get a new loop $t'$ when we finally come back to the original basepoint.


Suppose you are thinking of symmetries of some object. (Groups are abstractions of the symmetries of objects.) For example, say it's a sphere. A sphere has a family of symmetries that are rotations around the axis through the north and south poles. Let's focus on one symmetry in particular, the rotation by $17°$ around this polar axis, call it $\pi$, for "polar". Perhaps you understand this family of symmetries pretty well from contemplating the rotation of the Earth.

Now you would like to consider the symmetry that is the rotation of the sphere by $17°$ around not the polar axis, but the axis through Stockholm. How can you understand this?

It turns out that it's easy! To rotate by $17°$ around Stockholm, first find the rotation that takes Stockholm up to the north pole. Let's call that $\sigma$, for "Stockholm". Then rotate the sphere by $17°$ around the polar axis; that's $\pi$, which we already understand. And then rotate the north pole back to Stockholm, using $\sigma^{-1}$. Stockholm went up to the north pole, and then came back, so it is a fixed point of the entire transformation, which is therefore a rotation around the axis through Stockholm. Behold, a rotation of $17°$ around Stockholm, which we obtained as $\sigma\pi\sigma^{-1}$.

And in general suppose that $x$ is any symmetry of the sphere. it must take some point up to the north pole, say $p_x$. Then $x\pi x^{-1}$ is the rotation of the sphere by $17°$ around the axis through $p_x$.

From this we can see that whatever properties $\pi$ has, all the $\sigma \pi \sigma^{-1}$ will have the same properties. For example, if the order of $\pi$ is $n$, then the order of $\sigma \pi \sigma^{-1}$ is also $n$, because it is doing the same thing, just in a different direction. Similarly, if $\pi$ acting on some set partitions the st into orbits of certain sizes, then so will $\sigma\pi\sigma^{-1}$; this is why conjugate permutations have the same cycle lengths.

(Now perhaps you should go consider the symmetries of a square. Let $h$ be the horizontal flip, and let $r$ be a $90°$ rotation. Then $rhr^{-1}$ means you do a $90°$ rotation, then a horizontal flip, then rotate back. What does that add up to? The resulting symmetry is conjugate to the horizontal flip. This conjugacy relation divides the symmetries of the square into separate conjugacy classes, which are symmetries that are in a certain sense the same. Calculate the conjugacy classes for the square—there are five—and observe that the conjugate symmetries really are intuitively similar to one another.)