[Submitted on 3 Mar 2025]
View HTML PDF (experimental]
Abstract :Deep Neural Networks are often seen to be different from other models classes because they defy conventional notions of “generalization”. Examples of anomalous generalization behavior include benign overfitting (also known as double descent), success with overparametrization, and benign overfitting. We argue that these phenomena do not belong to neural networks or are particularly mysterious. This generalization behaviour is intuitively understandable and can be rigorously characterized by using generalization frameworks like PAC-Bayes or countable hypothesis bounds. Soft inductive biases are a key principle that explains these phenomena. Instead of restricting the space for hypotheses to avoid overfitting we embrace a flexible space with a soft preference towards simpler solutions which are consistent with data. This principle is encoded in a variety of model classes. Deep learning is therefore not as mysterious as it may seem. We also show how deep learning differs in other ways. For example, its ability to learn representations, mode connectivity, or its relative universality.
Submission History
Andrew Wilson [view email]
[v1] Monday, 3 March 2025 22:56.04 UTC (1.206 KB)