All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Study finds LLMs corrupt documents during delegated editing workflows, with frontier models averaging 25% content degradation

By

[Submitted on 17 Apr 2026]

22d ago· 2 min readenInsight

Summary

This paper introduces DELEGATE-52, a benchmark to evaluate how well Large Language Models (LLMs) handle delegated document editing tasks across 52 professional domains. Testing 19 LLMs, the study finds that current models systematically degrade documents during long workflows, with even top-tier frontier models corrupting an average of 25% of document content by the end of extended interactions. The research reveals that agentic tool use does not improve performance, and degradation worsens with larger documents, longer interactions, and distractor files. The authors conclude that current LLMs are unreliable delegates that silently introduce sparse but severe errors that compound over time.

Key quotes

· 3 pulled
Even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely.
Current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.
Agentic tool use does not improve performance on DELEGATE-52, and degradation severity is exacerbated by document size, length of interaction, or presence of distractor files.
Snippet from the RSS feed
Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation that the LLM will faithfully execute the task without in

You might also wanna read