Understanding Deep Learning Requires Re-thinking Generalization, https://arxiv.org/abs/1611.03530
This paper discussed the generalization ability of deep neural networks.
Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training.
The authors designed several experiments to examine capacity of neural networks. They made modifications of the labels and input images including true labels, partially corrupted labels, random labels, shuffled pixels, random pixels and Gaussian. The experiment results were
a) we do not need to change the learning rate schedule; b) once the fitting starts, it converges quickly; c) it converges to (over)fit the training set perfectly.
For corrupted data sets, deep neural networks takes longer time to converge.
The authors enable and disable regularization measures in deep neural networks to measure the role of regularization. Without regularization measures, the generalization error of deep neural networks is larger than that with regularization. However, deep neural networks without regularization still have low generalization error.
Both explicit and implicit regularizers are consistently suggesting that regularizers, when properly tuned, could hep to improve the generalization performance. However, it is unlikely that the regularizers are the fundamental reason for generalization, as the networks continue to perform well after all the regularizers removed.
Finite-sample Expressivity
The authors checked the expressive power of neural network on a finite sample of size $n$.
There exists a two-layer neural network with ReLU activiations and $2n + d$ weights that can represent any function on a sample of size $n$ in $d$ dimensions.
Regularization of Linear Models
The authors appealed to linear models to argue that it is not even easy to understand the source of generalization for linear models either.
New Terms
- Non-parametric randomization test
- Early stopping
- Weight decay
- Rademacher complexity
- Uniform stability
网友评论