Rethinking Overparameterization: Why the Lottery Ticket Analogy Falls Short

rbanffy

1d ago· 2 min readenInsight

technology science artificial intelligence machine learning

Summary

This article critiques the popular "lottery ticket" analogy used to explain the success of overparameterized neural networks. The authors argue that the analogy is misleading because it treats subnetworks in isolation, whereas perturbing the rest of the network can cause winning tickets to fail. They propose a more accurate explanation based on loss landscape geometry: increasing width expands optimization dimensions, making it easier to escape bad local minima, and bad minima become rarer as width grows. The piece calls for refining foundational analogies in the field as it matures.

Source

Hacker NewsRethinking Overparameterization: Why the Lottery Ticket Analogy Falls Shortinfoscience.epfl.ch

Key quotes

· 4 pulled

"larger networks succeed because they more likely contain a well-initialized subnetwork that can learn the task in isolation, much like buying more tickets increases the chances of winning a lottery."

"We argue that this view is flawed since, among other reasons, winning tickets can be made to fail by perturbing the rest of the network."

"We put forward a more accurate intuitive picture for the success of overparameterization based on the geometry of loss landscapes: increasing width expands the set of available dimensions for optimization, making it easier to escape bad local minima."

"As the field grows mature, it is important to refine the analogies we use to explain foundational phenomena, such as the apparent redundancy of large networks, reconciling practitioners' intuitions with modern theoretical insights."

Snippet from the RSS feed

Lotteries and tickets are often used as a didactical analogy to explain the success of overparameterized neural networks: “larger networks succeed because they more likely contain a well-initialized subnetwork that can learn the task in isolation, much li

You might also wanna read

Understanding Scaling Laws in Deep Learning: A Framework for Optimal Compute Allocation

This article provides an in-depth analysis of scaling laws in deep learning — the empirical finding that training loss decreases predictably

lilianweng.github.io·3h ago

Understanding Scaling Laws in Deep Learning: A Framework for Optimal Compute Allocation

This article provides an in-depth analysis of scaling laws in deep learning — the empirical finding that training loss decreases predictably

lilianweng.github.io·3h ago

Understanding Scaling Laws in Deep Learning: A Framework for Optimal Compute Allocation

This article provides an in-depth analysis of scaling laws in deep learning — the empirical finding that training loss decreases predictably

lilianweng.github.io·3h ago

Wider Neural Networks with Fewer Parameters Improve Performance by Reducing Feature Interference

This research paper demonstrates that increasing the number of neurons in a neural network without increasing the number of non-zero paramet

arxiv.org·24d ago

Latent learning: How episodic memory could improve machine learning generalization

This article examines why machine learning systems fail to generalize, drawing inspiration from cognitive science. It argues that parametric

openreview.net·13d ago

New preprint:

doi.org·25d ago

Using Diffusion Models to Visualize What Self-Supervised Neural Networks Actually Learn

This paper introduces the use of Representation Conditional Diffusion Models (RCDM) to visualize what self-supervised learning (SSL) models

arxiv.org·2d ago

Emergent Hebbian Dynamics in Regularized Learning: A Theoretical Analysis

This research paper investigates whether observed Hebbian/anti-Hebbian plasticity in synaptic updates necessarily implies an underlying Hebb

arxiv.org·24d ago

Comments

No comments yet. Be the first.