Mathematica 11.1 - Issue with Training Function for Sequence Data
The generator NetTrain
syntax doesn't currently support training nets with variable-length inputs. This is not well documented, so its a documentation bug.
One workaround for now: explicitly specify the input size and thus use sequences of fixed length.
I think that this is a bug: generator function doesn't work with recurrent layers.
net1 = NetChain[
{
LinearLayer[100, "Input" -> {2, 3}],
2,
SoftmaxLayer[]
},
"Output" -> NetDecoder[{"Class", {0, 1}}]
]
net2 = NetChain[
{
LongShortTermMemoryLayer[512, "Input" -> {2, 3}],
SequenceLastLayer[],
2,
SoftmaxLayer[]
},
"Output" -> NetDecoder[{"Class", {0, 1}}]
]
generator = Function[
Thread[RandomReal[{-1, 1}, {#BatchSize, 2, 3}] -> RandomInteger[1, #BatchSize]]
];
With net1 generator works without problems.
trained = NetTrain[net1, generator, BatchSize -> 4]
But for net2 I see an error.
Possible workaround - insert LinearLayer
before LongShortTermMemoryLayer
:
net3 = NetChain[
{
LinearLayer[{2, 3}, "Input" -> {2, 3}],
LongShortTermMemoryLayer[512],
SequenceLastLayer[],
2,
SoftmaxLayer[]
},
"Output" -> NetDecoder[{"Class", {0, 1}}]
]
trained = NetTrain[net3, generator, BatchSize -> 4]
UPDATE
As Sebastian write: NetTrain
syntax doesn't currently support training nets with variable-length inputs. net2 has fixed-length input. But "Input" should not be in the body of LongShortTermMemoryLayer
function. This works with generator:
net2new = NetChain[
{
LongShortTermMemoryLayer[512],
SequenceLastLayer[],
2,
SoftmaxLayer[]
},
"Input" -> {2, 3},
"Output" -> NetDecoder[{"Class", {0, 1}}]
]