Research Shows LLMs Have Coherent Utility Functions and Value Systems

alexcos

7mo ago· 24 min readenInsight

100/100

Golden Brown

Bagelometer↗

Pulled from the oven just right. Trustworthy, fact-dense, deeply satisfying.

Score100TypeanalysisSentimentneutral

Summary

The article discusses a February 2025 research paper from the Center for AI Safety titled 'Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs.' The research demonstrates that modern large language models (LLMs) have coherent and transitive implicit utility functions and world models. Key findings include that larger and more capable LLMs exhibit more coherent and transitive preferences (where preferring A > B and B > C implies A > C). The article specifically examines how LLMs trade off lives between different categories, referencing Figure 16 which shows GPT-4o's valuation of lives across different categories.

Key quotes

· 4 pulled

On February 19th, 2025, the Center for AI Safety published 'Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs'

They showed that modern LLMs have coherent and transitive implicit utility functions and world models

Bigger and more capable LLMs had more coherent and more transitive preferences

Figure 16, which showed how GPT-4o valued lives over different categories

Snippet from the RSS feed

How do LLM's trade off lives between different categories?

You might also wanna read

LLM Stats: Platform for Comparing AI Language Models by Benchmarks, Cost, and Capabilities

LLM Stats is a platform that allows users to compare various AI language models (LLMs) across multiple dimensions including performance benc

Product Hunt·7mo ago

Study finds large language models vulnerable to classic persuasion tactics for harmful requests

This study tested whether three widely used large language models (LLMs) are susceptible to classic persuasion principles (authority, social

pnas.org·4d ago

Study finds LLMs persist in treating false claims as true despite explicit warnings

A study on fine-tuning large language models (LLMs) reveals that even after explicit warnings that certain claims are false, the models cont

arstechnica.com·23h ago