Binary numbers instead of one hot vectors
It is fine if you encode with binary. But you probably need to add another layer (or a filter) depending on your task and model. Because your encoding now implicates invalid shared features due to the binary representation.
For example, a binary encoding for input (x = [x1, x2]
):
'apple' = [0, 0]
'orange' = [0, 1]
'table' = [1, 0]
'chair' = [1, 1]
It means that orange
and chair
share same feature x2
. Now with predictions for two classes y
:
'fruit' = 0
'furniture' = 1
And linear optimization model (W = [w1, w2]
and bias b
) for labeled data sample:
(argmin W) Loss = y - (w1 * x1 + w2 * x2 + b)
Whenever you update w2
weights for chair
as furniture
you get an undesirable update as if predicting orange
as furniture
as well.
In this particular case, if you add another layer U = [u1, u2]
, you can probably solve this issue:
(argmin U,W) Loss = y - (u1 * (w1 * x1 + w2 * x2 + b) +
u2 * (w1 * x1 + w2 * x2 + b) +
b2)
Ok, why not avoid this miss representation, by using one-hot encoding. :)