Keras Multitask learning with two different input sample size

NEW ANSWER:

Here I am writing a solution with TensorFlow 2. So, what you need is:

  1. to define a dynamic input that takes its shape from the data

  2. to use average pooling so your dens layer dimension is independent of input dimensions.

  3. to calculate losses separately

Here is your example modified to work:

## Do this
#pip install tensorflow==2.0.0

import tensorflow.keras as keras
import numpy as np
from tensorflow.keras.models import Model


data_1=np.array([[25,  5, 11, 24,  6],
       [25,  5, 11, 24,  6],
       [25,  0, 11, 24,  6],
       [25, 11, 28, 11, 24],
       [25, 11,  6, 11, 11]])

data_2=np.array([[25, 11, 31,  6, 11],
       [25, 11, 28, 11, 31],
       [25, 11, 11, 11, 31]])

Y_1=np.array([[2.33],
       [2.59],
       [2.59],
       [2.54],
       [4.06]])


Y_2=np.array([[2.9],
       [2.54],
       [4.06]])



user_input = keras.layers.Input(shape=((None,)), name='Input_1')
products_input =  keras.layers.Input(shape=((None,)), name='Input_2')

shared_embed=(keras.layers.Embedding(37, 3, input_length=5))
user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )

x = keras.layers.GlobalAveragePooling1D()(user_vec_1)
nn = keras.layers.Dense(90, activation='relu',name='layer_1')(x)
result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)

# Task 2 FC layers
x = keras.layers.GlobalAveragePooling1D()(user_vec_2)
nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(x)

result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1)

model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])


loss = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.Adam()

loss_values = []
num_iter = 300
for i in range(num_iter):
    with tf.GradientTape() as tape:
        # Forward pass.
        logits = model([data_1, data_2])          
        loss_value = loss(Y_1, logits[0]) + loss(Y_2, logits[1]) 
        loss_values.append(loss_value)
    gradients = tape.gradient(loss_value, model.trainable_weights)          
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))

import matplotlib.pyplot as plt
plt.plot(range(num_iter), loss_values)
plt.xlabel("iterations")
plt.ylabel('loss value')

enter image description here

OLD ANSWER:

It seems your problem is not a coding problem, it's a machine learning problem! You have to pair your datasets: It means, you have to feed your Keras model on both of its input layers at each round.

The solution is up-sampling your smaller dataset in a way that size of both datasets are same. And the way that you do it depends on the semantics of your datasets. The other option is downsampling your bigger dataset, which is not recommended.

In a very basic situation, if we assume samples are i.i.d. across datasets, you can use the following code:

random_indices = np.random.choice(data_2.shape[0],
data_1.shape[0], replace=True) 

upsampled_data_2 = data_2[random_indices]

So, you get a new version of your smaller dataset, upsampled_data_2, that contains some repeated samples, but with the same size to your bigger dataset.


It's not clear in your question if you're trying to:

  1. Build a single model that takes a user and a product, and predicts two things about that (user, product) pair. If the user and product aren't paired, then it's not clear that this means anything (as @matias-valdenegro pointed out). If you pair up a random element of the other type (as in the first answer).. hopefully each output will just learn to ignore the other input. This would be equivalent to:

  2. Build two models, that share an embedding layer (in which case the concat doesn't make any sense). If Y1 has the same length as data1 and Y2 has the same shape as data2 then this is probably what you want. This way if you have a user you can run the user model, and if you have a product you can run the product model.

I think you really want #2. To train it you can do something like:

for user_batch, product_batch in zip(user_data.shuffle().repeat(), 
                                     product_data.shuffle().repeat()): 
  user_model.train_on_batch(*user_batch)
  product_model.train_on_batch(*product_batch)
  step = 1
  if step > STEPS:
    break

Or, wrap them both in a combined model:

user_result = user_model(user_input)
product_result = product_model(product_input)

model = Model(inputs=[user_input , products_input], 
              outputs=[user_result, product_result])

model.compile(optimizer='rmsprop',
              loss='mse',
              metrics=['accuracy'])

model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)

Regardless of which training procedure you use, you should normalized the output ranges so that the two model's losses are comparable. The first procedure will alternate epochs or steps. The second does a single gradient step on the weighted sum of the two losses. You may want to check which loss weighting works best for you.