Link to paper: [1411.1792] How transferable are features in deep neural networks?
This paper studies the transferability of features learnt at different layers of a convolutional neural network. Typically, initial layers of a CNN learn features that resemble Gabor filter or color blobs, and are fairly general, while the later layers are more task-specific. Main contributions:
- They create two splits of the ImageNet dataset (A/B) and explore how performance varies for various network design choices such as
- Base: CNN trained on A or B.
- Selffer: first n layers are copied from a base network, and the rest of the network is randomly initialized and trained on the same task.
- Transfer: first n layers are copied from a base network, and the rest of the network is trained on a different task.
- Each of these 'copied' layers can either be fine-tuned or kept frozen.
- Selffer networks without fine-tuning don't perform well when the split is somewhere in the middle of the network (n = 3-6). This is because neurons in these layers co-adapt to each other's activations in complex ways, which get broken up when split.
- As we approach final layers, there is lesser for the network to learn and so these layers can be trained independently.
- Fine-tuning a selffer network gives it the chance to re-learn co-adaptations.
- Transfer networks transferred at lower n perform better than larger n, indicating that features get more task-specific as we move to higher layers.
- Fine-tuning transfer networks, however, results in better performance. They argue that better generalization is due to the effect of having seen the base dataset, even after considerable fine-tuning.
- Fine-tuning works much better than using random features.
- Features are more transferable across related tasks than unrelated tasks.
- They study transferability by taking two random data splits, and splits of man-made v/s natural data.
Strengths
- Experiments are thorough, and the results are intuitive and insightful.
Weaknesses / Notes
- This paper only analyzes transferability across different splits of ImageNet (as similar/dissimilar tasks). They should have reported results on transferability from one task to another (classification/detection) or from one dataset to another (ImageNet/MSCOCO).
- It would be interesting to study the role of dropout in preventing co-adaptations while transferring features.