Proposal: Overparameterized Neural Networks with High Learning Rates Could Bridge AI-Human Intelligence Gap
By
Gwern
Baker's choice. Dense with flavour, light on filler.
Summary
This speculative article proposes a major shift in deep learning scaling paradigms by suggesting that the key difference between artificial neural networks (particularly LLMs) and human brains lies in a bias-variance tradeoff. The author argues that LLMs minimize variance while human brains minimize bias, and that human brains achieve this through deep double descent-style overparameterization combined with extremely high learning rates. The proposal suggests that training overparameterized neural networks with high learning rates and regularization could trigger "catapulting" or "grokking" phenomena, potentially leading to artificial neural networks with human-like performance and true generalization capabilities.
Key quotes
· 3 pulledwhy are artificial neural nets smart in such stupid ways, and biological brains stupid but in smart ways?
the architectural differences between human brains and NNs (particularly LLMs) may be due to a bias-variance tradeoff, where LLMs minimize variance and human brains minimize bias
Human brains do this by deep double descent-style overparameterization, and adopting a scaling strategy of extremely high-learning
You might also wanna read

Neuroscience Challenges AI Optimism: Are Large Language Models a Path to True Intelligence?
The article examines the ambitious claims by tech leaders like Mark Zuckerberg, Dario Amodei, and Sam Altman about achieving superintelligen
Wider Neural Networks with Fewer Parameters Improve Performance by Reducing Feature Interference
This research paper demonstrates that increasing the number of neurons in a neural network without increasing the number of non-zero paramet
Latent learning: How episodic memory could improve machine learning generalization
This article examines why machine learning systems fail to generalize, drawing inspiration from cognitive science. It argues that parametric
Comparing Energy Efficiency: AI Systems vs. the Human Brain
This article compares the energy efficiency of artificial intelligence systems versus biological intelligence (the human brain). While AI ha
Emergent Hebbian Dynamics in Regularized Learning: A Theoretical Analysis
This research paper investigates whether observed Hebbian/anti-Hebbian plasticity in synaptic updates necessarily implies an underlying Hebb
The Bitter Lesson: Why Computation Beats Human Knowledge in AI Research
Rich Sutton argues that the key lesson from 70 years of AI research is that general methods leveraging massive computation ultimately outper
