How to follow matrix operations in proofs?
This is a good question which you have already answered for yourself.
Mathematicians do this a lot:
carefully write down, try playing with concrete matrices
After a while (a long while sometimes) you see how it works. Then you can parse similar expressions more quickly.
There really is no shortcut.
Let's visualize it.
We have the scalar expression: $$\boldsymbol\theta^T \mathbf x_i - y_i = \begin{bmatrix}&&\boldsymbol\theta^T&&\end{bmatrix}\begin{bmatrix} \\ \\ \mathbf x_i \\ \\ \\ \end{bmatrix} - y_i \tag 1 $$ Transpose this expression to get the same scalar: $$(\boldsymbol\theta^T \mathbf x_i - y_i)^T = \mathbf x_i^T \boldsymbol\theta - y_i = \begin{bmatrix} & & \mathbf x_i^T & & \end{bmatrix}\begin{bmatrix}\\\\\boldsymbol\theta\\\\\\\end{bmatrix} - y_i \tag 2 $$ Extend into matrices and vectors: $$\begin{bmatrix}\\X \boldsymbol\theta - \mathbf y \\\\\end{bmatrix} = \begin{bmatrix} & & \mathbf x_1^T & & \\ &&\vdots\\ & & \mathbf x_n^T & & \\\end{bmatrix} \begin{bmatrix}\\\\\boldsymbol\theta\\\\\\\end{bmatrix} - \begin{bmatrix}\\\mathbf y\\\\\end{bmatrix} \tag 3 $$ The definition of the norm says: $$\sum |a_i|^2 = \|\mathbf a\|^2 = \mathbf a^T \mathbf a \tag 4$$ Substitute our expression in the norm: $$\sum_{i=1}^n |\boldsymbol\theta^T \mathbf x_i - y_i|^2 = \sum_{i=1}^n |(X \boldsymbol\theta - \mathbf y)_i|^2 = \|X \boldsymbol\theta - \mathbf y\|^2 = (X \boldsymbol\theta - \mathbf y)^T (X \boldsymbol\theta - \mathbf y) \tag 5$$
The fist sum $$\sum_{i=1}^n\|\theta^Tx_i-y_i\|^2$$ is nothing other than the expanded form for the norm squared of some vector (in this case can be regarded as some distance, but I want to simplify things). The norm squared can be expressed as the scalar product of the above mentioned vector with itself. Let me explain: take a vector $\mathbf{v}\in\mathbb{R}^3$ defined as $$\mathbf{v} = (x,y,z).$$ From the Pythagorean theorem, we know that the length squared of this vector is $$\|\mathbf{v}\|^2 = x^2+y^2+z^2.$$ Mathematically speaking, the length and the norm are the same thing (norm is more general). We can define the norm of a vector with the scalar product in this manner $$\|\mathbf{v}\|^2=\mathbf{v}\cdot \mathbf{v} = \mathbb{v}^T\mathbb{v}= \left(\begin{matrix}x&y&z\end{matrix}\right)\left(\begin{matrix}x\\y\\z\end{matrix}\right) = x^2+y^2+z^2 = \sum_{i=1}^3 x_i^2$$ from basic matrix multiplication and where I've written $(x,y,z)=(x_1,x_2,x_3)$.
So back to our example $$\sum_{i=1}^n\|\theta^Tx_i-y_i\|^2.$$ In this case the vector we're taking the norm is defined as having the $i$-th component as $v_i = \theta^Tx_i-y_i$ which is the difference (component by component) of two vectors $\theta\mathbf{x}$ and $\mathbf{y}$. So the vector itself is to be written as $$\mathbf{v} = \theta\mathbf{x}-\mathbf{y}.$$ Now knowing what I told earlier, the norm squared of this vector is $$\mathbf{v}^T \mathbf{v}= (\theta\mathbf{x}-\mathbf{y})^T(\theta\mathbf{x}-\mathbf{y}).$$