Mathematica 11.1 - Issue with Training Function for Sequence Data

The generator NetTrain syntax doesn't currently support training nets with variable-length inputs. This is not well documented, so its a documentation bug.

One workaround for now: explicitly specify the input size and thus use sequences of fixed length.


I think that this is a bug: generator function doesn't work with recurrent layers.

net1 = NetChain[
  {
   LinearLayer[100, "Input" -> {2, 3}],
   2,
   SoftmaxLayer[]
   },
  "Output" -> NetDecoder[{"Class", {0, 1}}]
  ]

net2 = NetChain[
  {
   LongShortTermMemoryLayer[512, "Input" -> {2, 3}],
   SequenceLastLayer[],
   2,
   SoftmaxLayer[]
   },
  "Output" -> NetDecoder[{"Class", {0, 1}}]
  ]

generator = Function[
   Thread[RandomReal[{-1, 1}, {#BatchSize, 2, 3}] -> RandomInteger[1, #BatchSize]]
   ];

With net1 generator works without problems.

trained = NetTrain[net1, generator, BatchSize -> 4]

But for net2 I see an error.

enter image description here

Possible workaround - insert LinearLayer before LongShortTermMemoryLayer:

net3 = NetChain[
  {
   LinearLayer[{2, 3}, "Input" -> {2, 3}],
   LongShortTermMemoryLayer[512],
   SequenceLastLayer[],
   2,
   SoftmaxLayer[]
   },
  "Output" -> NetDecoder[{"Class", {0, 1}}]
  ]

trained = NetTrain[net3, generator, BatchSize -> 4]

UPDATE

As Sebastian write: NetTrain syntax doesn't currently support training nets with variable-length inputs. net2 has fixed-length input. But "Input" should not be in the body of LongShortTermMemoryLayer function. This works with generator:

net2new = NetChain[
  {
   LongShortTermMemoryLayer[512],
   SequenceLastLayer[],
   2,
   SoftmaxLayer[]
   },
  "Input" -> {2, 3},
  "Output" -> NetDecoder[{"Class", {0, 1}}]
  ]