PEFT-Arena: A Benchmark Evaluating Parameter-Efficient Finetuning Through the Stability-Plasticity Dilemma

[Submitted on 27 May 2026]

19d ago· 2 min readenInsight

technology science natural language processing machine learning research

Summary

This paper introduces PEFT-Arena, a benchmark for evaluating parameter-efficient finetuning (PEFT) methods for large language models through the lens of the stability-plasticity dilemma — the trade-off between adapting to new tasks (plasticity) and retaining pretrained capabilities (stability). The authors find that different PEFT methods exhibit distinct stability-plasticity profiles, with orthogonal finetuning achieving the best Pareto frontier under comparable parameter budgets. They analyze PEFT updates from geometric perspectives in weight space (spectral analysis) and activation space (representation distortion), and show that final SFT checkpoints often overshoot optimal target-retention trade-offs, suggesting post-hoc improvements like path-wise rewinding.

Source

bskyPEFT-Arena: A Benchmark Evaluating Parameter-Efficient Finetuning Through the Stability-Plasticity Dilemmaarxiv.org

Key quotes

· 4 pulled

We argue that PEFT should be assessed through the stability-plasticity dilemma: the trade-off between target-task adaptation and resistance to forgetting.

Across methods, we find distinct stability-plasticity profiles; under comparable parameter budgets, orthogonal finetuning achieves the most favorable Pareto frontier.

In activation space, retention metrics show whether finetuning preserves or distorts general-capability representations, with forgetting linked to non-isometric representation distortion.

An analysis shows that final SFT checkpoints often overshoot a better target-retention operating point.

Snippet from the RSS feed

Parameter-efficient finetuning (PEFT) has become the standard approach for adapting large language models, yet evaluations largely emphasize downstream accuracy while overlooking the retention of pretrained capabilities. We argue that PEFT should be asses

You might also wanna read

Systematic Evaluation of Deep Learning Optimizers Reveals Limited Speedup Over AdamW in Language Model Pretraining

This research paper systematically evaluates ten deep learning optimizers for language model pretraining, challenging previous claims of 1.4

arxiv.org·9mo ago

Supervised Fine-Tuning as Reinforcement Learning: Introducing Importance-Weighted SFT

The article explores the connection between supervised fine-tuning (SFT) of large language models and reinforcement learning (RL), arguing t

arxiv.org·10mo ago

DatBench: A New Framework for More Faithful and Efficient Vision-Language Model Evaluation

The article introduces DatBench, a new evaluation framework for vision-language models (VLMs) that addresses critical issues in current eval

arxiv.org·5mo ago

Study Reveals Convergent Evolution in How Language Models Learn Number Representations

This research paper investigates how different language models (Transformers, Linear RNNs, LSTMs, and classical word embeddings) learn to re

arxiv.org·2mo ago

Research Proves Transformer Language Models Are Injective and Invertible

This research paper challenges the conventional view that transformer language models are non-injective due to non-linear components. The au

arxiv.org·7mo ago

Speculative Speculative Decoding: Parallelizing LLM Inference for Faster Performance

Researchers introduce speculative speculative decoding (SSD), a novel technique to accelerate large language model inference by parallelizin

arxiv.org·3mo ago

Comments

No comments yet. Be the first.