DecompR: A Method for Reducing Weighting Noise in Multi-Stakeholder LLM Alignment
By
[Submitted on 26 May 2026]
Properly proved. Has structure, has flavour, has a point.
Summary
This paper addresses the challenge of aligning large language models (LLMs) with multiple stakeholders who have conflicting preferences. It identifies a problem with holistic LLM judges that conflate utility estimation and utility aggregation, leading to unstable implicit weights (termed "weighting noise"). The authors demonstrate both empirically and theoretically that this weighting noise causes large score shifts when stakeholder satisfaction is dispersed, and that these shifts increase with the number of stakeholders. They propose DecompR, a method that uses counterfactual-calibrated weights fixed from query structure before candidate scoring, with per-role utilities estimated independently, to remove candidate-dependent weight drift and reduce estimation noise.
Key quotes
· 3 pulledHolistic LLM judges conflate utility estimation and utility aggregation, yielding unstable implicit weights.
This aggregation-specific weighting noise can create large score shifts when stakeholder satisfaction is dispersed.
We propose DecompR: counterfactual-calibrated weights are fixed from query structure before candidate scoring, while per-role utilities are estimated independently, removing candidate-dependent weight drift and reducing estimation noise.
You might also wanna read
Study finds LLMs corrupt documents during delegated editing workflows, with frontier models averaging 25% content degradation
This paper introduces DELEGATE-52, a benchmark to evaluate how well Large Language Models (LLMs) handle delegated document editing tasks acr
Research on LLM Output Drift in Financial Workflows: Quantifying Consistency Across Model Sizes
This research paper examines the critical issue of output drift in Large Language Models (LLMs) deployed for financial workflows. The study
The Problem with Structured Outputs in LLMs: How Constrained Decoding Creates False Confidence
This article critiques the use of structured outputs and constrained decoding in large language models (LLMs), arguing that while these tech
LLM Skirmish: An Adversarial In-Context Learning Benchmark for Evaluating Large Language Models
The article discusses LLM Skirmish, an adversarial in-context learning benchmark designed to test large language models through competitive
Study Finds AI Discourse in Pretraining Data Creates Self-Fulfilling (Mis)alignment in LLMs
This research paper presents the first controlled study of how pretraining corpora containing discourse about AI systems causally influences
Research: LLMs Encode Human-Labeled Problem Difficulty Better Than Model-Derived Difficulty
This research paper investigates whether large language models (LLMs) internally encode problem difficulty in alignment with human judgment.
