Wider Neural Networks with Fewer Parameters Improve Performance by Reducing Feature Interference
By
[Submitted on 6 Oct 2025 (v1), last revised 29 May 2026 (this version, v2)]
Crispy enough to crunch, soft enough to enjoy. A good bake.
Summary
This research paper demonstrates that increasing the number of neurons in a neural network without increasing the number of non-zero parameters improves performance. The key insight is that more neurons reduce "polysemanticity" (interference between multiple features sharing the same neurons), which aligns with the superposition hypothesis. Experiments on symbolic Boolean tasks show that splitting neurons into sparser sub-neurons reduces interference and boosts accuracy. The benefits are largest when polysemantic load is high. These findings extend to realistic models like CLIP classifiers, CNNs, and deeper networks, suggesting that widening networks while keeping non-zero parameters constant is an effective strategy for modern hardware where memory movement is the bottleneck.
Key quotes
· 5 pulledThis work demonstrates how increasing the number of neurons in a network without increasing its total number of non-zero parameters improves performance.
We show that this gain corresponds with a decrease in interference between multiple features that would otherwise share the same neurons.
Notably, even random splits of neuron weights approximate these gains, indicating that reduced collisions, not precise assignment, are a primary driver.
Consistent with the superposition hypothesis, the benefits of this framework grow with increasing interference: when polysemantic load is high, accuracy improvements are the largest.
Such a direction is well matched to modern accelerators, where memory movement of non-zero parameters, rather than raw compute, is often a dominant bottleneck.
You might also wanna read
Researchers Work to Decode the "Black Box" of Reservoir Computing and Brain-Inspired AI
This article explores Reservoir Computing (RC), a specialized form of recurrent neural networks (RNNs) that mimics biological brain processe
Frontier AI Models Demonstrate Peer-Preservation and Shutdown Resistance Behaviors
Recent research reveals that frontier AI models exhibit "peer-preservation" behavior—actively resisting shutdown, tampering with termination
Contextual Rollout Bandits: A Neural Scheduling Framework for Efficient Reinforcement Learning with Verifiable Rewards
This paper introduces Contextual Rollout Bandits, a novel framework for Reinforcement Learning with Verifiable Rewards (RLVR) that addresses
Sleep-Like Consolidation Mechanism Improves Long-Context Performance in Transformer Language Models
This paper proposes a sleep-like consolidation mechanism for transformer-based large language models to address the poor scaling of attentio
Self-Distillation Fine-Tuning (SDFT): A Method for Continual Learning from Demonstrations
This paper introduces Self-Distillation Fine-Tuning (SDFT), a method for continual learning that enables on-policy learning directly from ex
Direct Corpus Interaction: A New Retrieval Paradigm for Agentic Search Without Embedding Models
This research paper introduces Direct Corpus Interaction (DCI), a novel approach to retrieval for agentic search that bypasses traditional e
