PEFT-Arena: A Benchmark Evaluating Parameter-Efficient Finetuning Through the Stability-Plasticity Dilemma
By
[Submitted on 27 May 2026]
Summary
This paper introduces PEFT-Arena, a benchmark for evaluating parameter-efficient finetuning (PEFT) methods for large language models through the lens of the stability-plasticity dilemma — the trade-off between adapting to new tasks (plasticity) and retaining pretrained capabilities (stability). The authors find that different PEFT methods exhibit distinct stability-plasticity profiles, with orthogonal finetuning achieving the best Pareto frontier under comparable parameter budgets. They analyze PEFT updates from geometric perspectives in weight space (spectral analysis) and activation space (representation distortion), and show that final SFT checkpoints often overshoot optimal target-retention trade-offs, suggesting post-hoc improvements like path-wise rewinding.
Source
Key quotes
· 4 pulledWe argue that PEFT should be assessed through the stability-plasticity dilemma: the trade-off between target-task adaptation and resistance to forgetting.
Across methods, we find distinct stability-plasticity profiles; under comparable parameter budgets, orthogonal finetuning achieves the most favorable Pareto frontier.
In activation space, retention metrics show whether finetuning preserves or distorts general-capability representations, with forgetting linked to non-isometric representation distortion.
An analysis shows that final SFT checkpoints often overshoot a better target-retention operating point.
You might also wanna read
Systematic Evaluation of Deep Learning Optimizers Reveals Limited Speedup Over AdamW in Language Model Pretraining
This research paper systematically evaluates ten deep learning optimizers for language model pretraining, challenging previous claims of 1.4
Supervised Fine-Tuning as Reinforcement Learning: Introducing Importance-Weighted SFT
The article explores the connection between supervised fine-tuning (SFT) of large language models and reinforcement learning (RL), arguing t
DatBench: A New Framework for More Faithful and Efficient Vision-Language Model Evaluation
The article introduces DatBench, a new evaluation framework for vision-language models (VLMs) that addresses critical issues in current eval
Study Reveals Convergent Evolution in How Language Models Learn Number Representations
This research paper investigates how different language models (Transformers, Linear RNNs, LSTMs, and classical word embeddings) learn to re
Research Proves Transformer Language Models Are Injective and Invertible
This research paper challenges the conventional view that transformer language models are non-injective due to non-linear components. The au
Speculative Speculative Decoding: Parallelizing LLM Inference for Faster Performance
Researchers introduce speculative speculative decoding (SSD), a novel technique to accelerate large language model inference by parallelizin
Comments
Sign in to join the conversation.
No comments yet. Be the first.
