Implementing gradient descent for multiple variables in Octave using "sum"
The general "rule of the thumb" is as follows, if you encounter something in the form of
SUM_i f(x_i, y_i, ...) g(a_i, b_i, ...)
then you can easily vectorize it (and this is what is done in the above) through
f(x, y, ...)' * g(a, b, ...)
As this is just a typical dot product, which in mathematics (in Euclidean space of finite dimension) looks like
<A, B> = SUM_i A_i B_i = A'B
thus
(X * theta-y)' * X)
is just
<X * theta-y), X> = <H_theta(X) - y, X> = SUM_i (H_theta(X_i) - y_i) X_i
as you can see this works both ways, as this is just a mathematical definition of dot product.