tf.multiply vs tf.matmul to calculate the dot product
tf.multiply(X, Y)
or the *
operator does element-wise multiplication so that:
[[1 2] [[1 3] [[1 6]
[3 4]] . [2 1]] = [6 4]]
wheras tf.matmul
does matrix multiplication so that:
[[1 0] [[1 3] [[1 3]
[0 1]] . [2 1]] = [2 1]]
using tf.matmul(X, X, transpose_b=True)
means that you are calculating X . X^T
where ^T
indicates the transposing of the matrix and .
is the matrix multiplication.
tf.reduce_sum(_, axis=1)
takes the sum along 1st axis (starting counting with 0) which means you are suming the rows:
tf.reduce_sum([[a, b], [c, d]], axis=1) = [a+b, c+d]
This means that:
tf.reduce_sum(tf.multiply(X, X), axis=1) = [X[1].X[1], ..., X[n].X[n]]
so that is the one you want if you only want the norms of each rows. On the other hand:
tf.matmul(X, X, transpose_b=True) = [
[ X[1].X[1], X[1].X[2], ..., X[1].X[n] ],
[ X[2].X[1], ..., X[2].X[n] ],
...
[ X[n].X[1], ..., X[n].X[n] ]
]
so that is what you need if you want the similarity between all pairs of rows.
What tf.multiply(X, X)
does is essentially multiplying each element of the matrix with itself, like
[[1 2]
[3 4]]
would turn into
[[1 4]
[9 16]]
whereas tf.reduce_sum(_, axis=1)
takes a sum of each row, so the result for the previous example will be
[5 25]
which is exactly (by definition) equal to [X[0, :] @ X[0, :], X[1, :] @ X[1, :]]
.
Just put it down with variable names [[a b] [c d]]
instead of actual numbers and look at what does tf.matmul(X, X)
and tf.multiply(X, X)
do.