Why theta*X not theta'*X in practical?
theta'*X is used to calculate the hypothesis for a single training example when X is a vector. Then you have to calculate theta' to get to the h(x) definition.
In the practice, since you have more than one training example, X is a Matrix (your training set) with "m x n" dimension where m is the number of your training examples and n your number of features.
Now, you want to calculate h(x) for all your training examples with your theta parameter in just one move right?
Here is the trick: theta has to be a n x 1 vector then when you do Matrix-Vector Multiplication (X*theta) you will obtain an m x 1 vector with all your h(x)'s training examples in your training set (X matrix). Matrix multiplication will create the vector h(x) row by row making the corresponding math and this will be equal to the h(x) definition at each training example.
You can do the math by hand, I did it and now is clear. Hope i can help someone. :)
I don't know what the dimensions for your theta
and X
are (you haven't provided anything) but actually it all depends on the X
, theta
and hypothesis dimensions. Let's say m
is the number of features and n
- the number of examples. Then, if theta
is a mx1
vector and X
is a nxm
matrix then X*theta
is a nx1
hypothesis vector.
But you will get the same result if calculate theta'*X
. You can also get the same result with theta*X
if theta
is 1xm
and X
- mxn
Edit:
As @Tasos Papastylianou pointed out the same result will be obtained if X
is mxn
then (theta.'*X).'
or X.'*theta
are answers. If the hypothesis should be a 1xn
vector then theta.'*X
is an answer. If theta
is 1xm
, X
- mxn
and the hypothesis is 1xn
then theta*X
is also a correct answer.
In mathematics, a 'vector' is always defined as a vertically-stacked array, e.g. , and signifies a single point in a 3-dimensional space.
A 'horizontal' vector, typically signifies an array of observations, e.g. is a tuple of 3 scalar observations.
Equally, a matrix can be thought of as a collection of vectors. E.g., the following is a collection of four 3-dimensional vectors:
A scalar can be thought of as a matrix of size 1x1, and therefore its transpose is the same as the original.
More generally, an n-by-m matrix W
can also be thought of as a transformation from an m-dimensional vector x
to an n-dimensional vector y
, since multiplying that matrix with an m-dimensional vector will yield a new n-dimensional one. If your 'matrix' W
is '1xn', then this denotes a transformation from an n-dimensional vector to a scalar.
Therefore, notationally, it is customary to introduce the problem from the mathematical notation point of view, e.g. y = Wx
.
However, for computational reasons, sometimes it makes more sense to perform the calculation as a "vector times a matrix" rather than "matrix times a vector". Since (Wx)' === x'W'
, sometimes we solve the problem like that, and treat x'
as a horizontal vector. Also, if W
is not a matrix, but a scalar, then Wx
denotes scalar multiplication, and therefore in this case Wx === xW
.
I don't know the exercises you speak of, but my assumption would be that in the course he introduced theta
as a proper, vertical vector, but then transposed it to perform proper calculations, i.e. a transformation from a vector of n-dimensions to a scalar (which is your prediction).
Then in the exercises, presumably you were either dealing with a scalar 'theta' so there was no point transposing it, and was left as theta for convenience or, theta was now defined as a horizontal (i.e. transposed) vector to begin with for some reason (e.g. printing convenience), and then was left in that state when performing the necessary transformation.