Link to paper: [1603.09382] Deep Networks with Stochastic Depth
This paper presents a way to reduce the expected network depth of deep residual networks during training by randomly dropping a subset of residual blocks and bypassing them with identity connections. The 'survival' probability p_l decreases linearly with depth (from 1.0 to 0.5 at last layer) so as to keep layers that extract low-level features with higher probability. At test time, residual block functions are scaled by the expected number of times it appears during training, i.e. p_l. This model achieves lower test errors than ResNets (with ReLU activations) on CIFAR-10, CIFAR-100 and SVHN.