Keras Multitask learning with two different input sample size
NEW ANSWER:
Here I am writing a solution with TensorFlow 2. So, what you need is:
to define a dynamic input that takes its shape from the data
to use average pooling so your dens layer dimension is independent of input dimensions.
to calculate losses separately
Here is your example modified to work:
## Do this
#pip install tensorflow==2.0.0
import tensorflow.keras as keras
import numpy as np
from tensorflow.keras.models import Model
data_1=np.array([[25, 5, 11, 24, 6],
[25, 5, 11, 24, 6],
[25, 0, 11, 24, 6],
[25, 11, 28, 11, 24],
[25, 11, 6, 11, 11]])
data_2=np.array([[25, 11, 31, 6, 11],
[25, 11, 28, 11, 31],
[25, 11, 11, 11, 31]])
Y_1=np.array([[2.33],
[2.59],
[2.59],
[2.54],
[4.06]])
Y_2=np.array([[2.9],
[2.54],
[4.06]])
user_input = keras.layers.Input(shape=((None,)), name='Input_1')
products_input = keras.layers.Input(shape=((None,)), name='Input_2')
shared_embed=(keras.layers.Embedding(37, 3, input_length=5))
user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )
x = keras.layers.GlobalAveragePooling1D()(user_vec_1)
nn = keras.layers.Dense(90, activation='relu',name='layer_1')(x)
result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)
# Task 2 FC layers
x = keras.layers.GlobalAveragePooling1D()(user_vec_2)
nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(x)
result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1)
model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])
loss = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.Adam()
loss_values = []
num_iter = 300
for i in range(num_iter):
with tf.GradientTape() as tape:
# Forward pass.
logits = model([data_1, data_2])
loss_value = loss(Y_1, logits[0]) + loss(Y_2, logits[1])
loss_values.append(loss_value)
gradients = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
import matplotlib.pyplot as plt
plt.plot(range(num_iter), loss_values)
plt.xlabel("iterations")
plt.ylabel('loss value')
OLD ANSWER:
It seems your problem is not a coding problem, it's a machine learning problem! You have to pair your datasets: It means, you have to feed your Keras model on both of its input layers at each round.
The solution is up-sampling your smaller dataset in a way that size of both datasets are same. And the way that you do it depends on the semantics of your datasets. The other option is downsampling your bigger dataset, which is not recommended.
In a very basic situation, if we assume samples are i.i.d. across datasets, you can use the following code:
random_indices = np.random.choice(data_2.shape[0],
data_1.shape[0], replace=True)
upsampled_data_2 = data_2[random_indices]
So, you get a new version of your smaller dataset, upsampled_data_2
, that contains some repeated samples, but with the same size to your bigger dataset.
It's not clear in your question if you're trying to:
Build a single model that takes a
user
and aproduct
, and predicts two things about that(user, product)
pair. If theuser
andproduct
aren't paired, then it's not clear that this means anything (as @matias-valdenegro pointed out). If you pair up a random element of the other type (as in the first answer).. hopefully each output will just learn to ignore the other input. This would be equivalent to:Build two models, that share an embedding layer (in which case the concat doesn't make any sense). If
Y1
has the same length asdata1
andY2
has the same shape asdata2
then this is probably what you want. This way if you have auser
you can run theuser
model, and if you have aproduct
you can run theproduct
model.
I think you really want #2. To train it you can do something like:
for user_batch, product_batch in zip(user_data.shuffle().repeat(),
product_data.shuffle().repeat()):
user_model.train_on_batch(*user_batch)
product_model.train_on_batch(*product_batch)
step = 1
if step > STEPS:
break
Or, wrap them both in a combined model:
user_result = user_model(user_input)
product_result = product_model(product_input)
model = Model(inputs=[user_input , products_input],
outputs=[user_result, product_result])
model.compile(optimizer='rmsprop',
loss='mse',
metrics=['accuracy'])
model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)
Regardless of which training procedure you use, you should normalized the output ranges so that the two model's losses are comparable. The first procedure will alternate epochs or steps. The second does a single gradient step on the weighted sum of the two losses. You may want to check which loss weighting works best for you.