Why input is scaled in tf.nn.dropout in tensorflow?
Let's say the network had n
neurons and we applied dropout rate 1/2
Training phase, we would be left with n/2
neurons. So if you were expecting output x
with all the neurons, now you will get on x/2
. So for every batch, the network weights are trained according to this x/2
Testing/Inference/Validation phase, we dont apply any dropout so the output is x. So, in this case, the output would be with x and not x/2, which would give you the incorrect result. So what you can do is scale it to x/2 during testing.
Rather than the above scaling specific to Testing phase. What Tensorflow's dropout layer does is that whether it is with dropout or without (Training or testing), it scales the output so that the sum is constant.
This scaling enables the same network to be used for training (with keep_prob < 1.0
) and evaluation (with keep_prob == 1.0
). From the Dropout paper:
The idea is to use a single neural net at test time without dropout. The weights of this network are scaled-down versions of the trained weights. If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2.
Rather than adding ops to scale down the weights by keep_prob
at test time, the TensorFlow implementation adds an op to scale up the weights by 1. / keep_prob
at training time. The effect on performance is negligible, and the code is simpler (because we use the same graph and treat keep_prob
as a tf.placeholder()
that is fed a different value depending on whether we are training or evaluating the network).