Degrees of Freedom in Affine Transformation and Homogeneous Transformation
I am not an expert and have just starting thinking about this myself. I am intrigued by how many different ways there are to think about transforms / degrees of freedom.
I think the simplest way to see that an Affine transform has 6 degrees of freedom is that there are 6 variables in the matrix:
$$ \begin{bmatrix} m_{00} & m_{01} & m_{02} \\ m_{10} & m_{11} & m_{12} \\ 0 & 0 & 1 \\ \end{bmatrix} $$
No matter what value we choose for any of those variables, it is a valid Affine transform. Although the Similarity transform can also be represented by a 6 variable multiplication matrix, it is more constrained - if we picked 4 of the variables at random, the other 2 we would have to choose carefully in order that it is a valid Similarity transform. So it has less degrees of freedom even though it still can be written as a matrix with 6 variables. Similarly, we can use an Affine transform to describe a simple translation, as long as we set the four left numbers to be the identity matrix, and only change the two translation variables.
The purest mathematical idea of an Affine transform is these 6 numbers and the way you multiply them with a vector to get a new vector. What this transform actually does can be described in a variety of ways - as 6 operations that you are doing one after the other (translate x, translate y, scale x, scale y, rotate, shear), or one thing you are doing all at once. If you think of them in terms of these operations, you might be confused by this matrix:
$$ \begin{bmatrix} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} $$
This matrix can be thought of as either a rotation by 180 degrees about the origin, or of scaling x by -1 and y by -1, or by reflecting x and y through the origin. All of the transformations are equivalent, and this is the only matrix that describes them.
Another way we could think about degrees of freedom is with how many fingers you would need to describe this transform by dragging points. A translation I can describe with one finger - by dragging a single point to its new location. Open Google Maps on your phone and try it. Each finger counts for two degrees of freedom since you can move it horizontally, and vertically.
A euclidean transform has 3 DOF - you need one finger to translate the shape, then the second finger you can use to rotate it, but this finger only has one degree of freedom. This one is better illustrated not in Google maps, but with a credit card on a desk - one finger moves the card, the other rotates it, but the second finger is less free since it always has to follow the first finger around somewhat. Moving the second finger arbitrarily would try to stretch the card, which is impossible. So, the first finger has 2 DOF, the second finger has one more.
A similarity transform has four degrees of freedom - Google Maps works for this one again. Drag two fingers on your phone on Google Maps at the same time. No matter where you drag your two fingers, the app is able to find a similarity transform for you - one that keeps the map the same shape, but translates, rotates, and scales it.
You would need to drag three fingers to do an Affine transform - Google Maps doesn't support this, since it would skew the map, so you wouldn't be able to navigate it using it anymore - but you can kind of pretend using a hankerchief, two of the fingers can translate and rotate it (pretend they can scale it too) and then the third finger can skew it this way and that. Almost any drag
And, dragging four fingers would let you do a 2D homogenous translation. You can try this in a photo editing program called Gimp. It's under tools > transform tools > perspective, and it lets you drag four different points around - so it counts for 8 degrees of freedom.
Note that not every possible position of the 4 points is necessarily a valid transform, but it still counts as 8 degrees of freedom since the points can still freely move in 2 dimensions - there's just certain values they can't take.
Who knows, I hope this helps!
I have finally come across an answer which I find convincing and satisfactory. The actual explanation is available in Page 40 of the book Hartley, Richard, and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003..
Consider a planar (2D) affine transformation,
$H_A = \begin{bmatrix}A& t\\0&1\end{bmatrix}$.
The element A
is a 2 x 2 non-singular matrix. This can be decomposed as:
$A = R(\theta)R(-\phi)DR(\phi)$
where R is a rotation by the angle shown in the argument and $D = \begin{bmatrix}\lambda_1& 0\\0&\lambda_2\end{bmatrix}$.
Note that since there is a rotation first, the non-isotropic scaling is applied in directions different from those of the original geometry (so instead of scaling in the default x and y axis directions, scale at angle $\phi$ (wrt. x axis) and its orthogonal direction). Once scaling is applied in the required direction, the geometry/ shape is rotated back and then a required rotation is applied.
Compared to a similarity transformation the only new geometry is the non-isotropic scaling. This accounts for the two extra degrees of freedom: $phi$ and scaling ratio, which gives a total of 6 DoF (apart from 2 parameters for translation, 1 for rotation and 1 to fix the scaling).