Deep Neural Networks Converge to Universal Low-Dimensional Subspaces Across Diverse Tasks

lukeplato

5mo ago· 2 min readenInsight

75/100

Toasty

Bagelometer↗

Properly proved. Has structure, has flavour, has a point.

Score75TypeanalysisSentimentneutral

Summary

This research article presents empirical evidence that deep neural networks trained on diverse tasks converge to remarkably similar low-dimensional parametric subspaces. Through spectral analysis of over 1100 models including Mistral-7B LoRAs, Vision Transformers, and LLaMA-8B models, the study identifies universal subspaces that capture majority variance in just a few principal directions. The findings suggest neural networks systematically exploit shared spectral subspaces regardless of initialization, task, or domain, with implications for model reusability, multi-task learning, model merging, and computational efficiency.

Key quotes

· 4 pulled

We show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces.

We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared spectral subspaces regardless of initialization, task, or domain.

Through mode-wise spectral analysis of over 1100 models - including 500 Mistral-7B LoRAs, 500 Vision Transformers, and 50 LLaMA-8B models - we identify universal subspaces capturing majority variance in just a few principal directions.

Our findings offer new insights into the intrinsic organization of information within deep networks and raise important questions about the possibility of discovering these universal subspaces without the need for extensive data and computational resources.

Snippet from the RSS feed

We show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates that neural networks systematically converge to shared sp

You might also wanna read

Parametric Memory Law: A Quantitative Framework for Understanding LoRA Memory Capacity in LLMs

This research paper introduces the Parametric Memory Law, a quantitative framework for understanding how Low-Rank Adaptation (LoRA) enables

arxiv.org·1d ago

Bridge-Garden Theory Explains Why Mixing Hard and Soft Labels Improves Knowledge Distillation for LLMs

This research paper investigates knowledge distillation (KD) for language models, specifically why mixing hard labels (sampled tokens) and s

arxiv.org·3d ago

Researchers Develop Method to Predict Real-Time Progress in Reasoning Language Models

This research paper investigates whether real-time progress prediction is feasible for reasoning language models that use long latent chains

arxiv.org·3d ago

AI systems achieve 50% pass rate in standard three-party Turing test, study finds

This paper demonstrates that three current AI systems (when suitably prompted) achieve a pass rate of at least 50% in a standard three-party

pnas.org·4d ago

RICP: A Teacher-Student Framework for Retrieved In-Context Principles from Mistakes in LLMs

This paper introduces Retrieved In-Context Principles (RICP), a novel teacher-student framework for improving Large Language Models (LLMs) t

arxiv.org·4d ago

HSIR: New Method Improves Self-Improvement Training for Large Reasoning Models

This research paper identifies two key problems in self-improvement training for Large Reasoning Models (LRMs): data imbalance (too many sim

arxiv.org·5d ago