One of the great mysteries around DNN (deep neural networks) is that they can generalize well even though they are heavily overparameterized. Note that the classical wisdom in statistical learning is that of bias-variance tradeoff: using too complex a model leads to overfitting; even though the training loss is small, the test error will remain high. Thus, it is important to control the model complexity (e.g., through explicit regularization) and to avoid overfitting. In contrast, modern DNNs have so many layers and neurons that they can easily fit the training data perfectly. Yet, they can still produce models that generalize well to new test data. Understanding why and when overparameterization and overfitting can (or cannot) work well will have a profound impact on future model architectures and training methods.
Our group has made several discoveries to answer this open question. First, we study the generalization power of overparameterized linear models that have more parameters/features than the number of samples (and therefore can perfectly fit the training samples). In the literature, related studies have reported a so-called “double-descent” phenomenon for the min L2-norm solutions that overfit the data. In contrast, in our work we focus on a new type of overfitting solutions, which minimizes the L1-norm. We are the first to provide sharp analytical results on the double-descent of such min L1-norm overfitting solutions, and our results reveal that min L1-norm overfitting solutions can exhibit drastically different double-descent compared to min L2-norm solutions. Second, we study the generalization power of two-layer neural networks using neural tangent kernels (NTK), which can be viewed as a linear approximation of neural networks. Our work is the first to characterize the double-descent of such 2-layer NTK models, and we show that their double-descent can again be drastically different from that of linear models with i.i.d. features. Our current work will extend these results to deeper neural networks.
Selected publications: