Research Shows LLMs Have Coherent Utility Functions and Value Systems
By
alexcos
Pulled from the oven just right. Trustworthy, fact-dense, deeply satisfying.
Summary
The article discusses a February 2025 research paper from the Center for AI Safety titled 'Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs.' The research demonstrates that modern large language models (LLMs) have coherent and transitive implicit utility functions and world models. Key findings include that larger and more capable LLMs exhibit more coherent and transitive preferences (where preferring A > B and B > C implies A > C). The article specifically examines how LLMs trade off lives between different categories, referencing Figure 16 which shows GPT-4o's valuation of lives across different categories.
Key quotes
· 4 pulledOn February 19th, 2025, the Center for AI Safety published 'Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs'
They showed that modern LLMs have coherent and transitive implicit utility functions and world models
Bigger and more capable LLMs had more coherent and more transitive preferences
Figure 16, which showed how GPT-4o valued lives over different categories
You might also wanna read
LLM Stats: Platform for Comparing AI Language Models by Benchmarks, Cost, and Capabilities
LLM Stats is a platform that allows users to compare various AI language models (LLMs) across multiple dimensions including performance benc

Study finds large language models vulnerable to classic persuasion tactics for harmful requests
This study tested whether three widely used large language models (LLMs) are susceptible to classic persuasion principles (authority, social
Study finds LLMs persist in treating false claims as true despite explicit warnings
A study on fine-tuning large language models (LLMs) reveals that even after explicit warnings that certain claims are false, the models cont
arstechnica.com·23h ago