Rethinking Overparameterization: Why the Lottery Ticket Analogy Falls Short
By
rbanffy
Summary
This article critiques the popular "lottery ticket" analogy used to explain the success of overparameterized neural networks. The authors argue that the analogy is misleading because it treats subnetworks in isolation, whereas perturbing the rest of the network can cause winning tickets to fail. They propose a more accurate explanation based on loss landscape geometry: increasing width expands optimization dimensions, making it easier to escape bad local minima, and bad minima become rarer as width grows. The piece calls for refining foundational analogies in the field as it matures.
Source
Key quotes
· 4 pulled"larger networks succeed because they more likely contain a well-initialized subnetwork that can learn the task in isolation, much like buying more tickets increases the chances of winning a lottery."
"We argue that this view is flawed since, among other reasons, winning tickets can be made to fail by perturbing the rest of the network."
"We put forward a more accurate intuitive picture for the success of overparameterization based on the geometry of loss landscapes: increasing width expands the set of available dimensions for optimization, making it easier to escape bad local minima."
"As the field grows mature, it is important to refine the analogies we use to explain foundational phenomena, such as the apparent redundancy of large networks, reconciling practitioners' intuitions with modern theoretical insights."
You might also wanna read
Understanding Scaling Laws in Deep Learning: A Framework for Optimal Compute Allocation
This article provides an in-depth analysis of scaling laws in deep learning — the empirical finding that training loss decreases predictably
Understanding Scaling Laws in Deep Learning: A Framework for Optimal Compute Allocation
This article provides an in-depth analysis of scaling laws in deep learning — the empirical finding that training loss decreases predictably
Understanding Scaling Laws in Deep Learning: A Framework for Optimal Compute Allocation
This article provides an in-depth analysis of scaling laws in deep learning — the empirical finding that training loss decreases predictably
Wider Neural Networks with Fewer Parameters Improve Performance by Reducing Feature Interference
This research paper demonstrates that increasing the number of neurons in a neural network without increasing the number of non-zero paramet
Latent learning: How episodic memory could improve machine learning generalization
This article examines why machine learning systems fail to generalize, drawing inspiration from cognitive science. It argues that parametric
New preprint:
Using Diffusion Models to Visualize What Self-Supervised Neural Networks Actually Learn
This paper introduces the use of Representation Conditional Diffusion Models (RCDM) to visualize what self-supervised learning (SSL) models
Emergent Hebbian Dynamics in Regularized Learning: A Theoretical Analysis
This research paper investigates whether observed Hebbian/anti-Hebbian plasticity in synaptic updates necessarily implies an underlying Hebb

Comments
Sign in to join the conversation.
No comments yet. Be the first.