All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

Research Seminar: Benchmarking Cooperation Mechanisms for LLM Agents in Social Dilemmas

21d ago· 2 min readenNews

Summary

This article announces an AI Center seminar by Emanuel Tewolde, a CMU PhD student, presenting research on benchmarking cooperation-sustaining mechanisms and LLM agents in social dilemmas. The research finds that stronger LLMs behave less cooperatively in mixed-motive games like prisoner's dilemma, and evaluates four game-theoretic mechanisms (repetition, reputation systems, mediators, contracts) to enable cooperative outcomes. Key findings show contracting and mediation are most effective, while repetition-based cooperation deteriorates when co-players vary. The work addresses safety concerns around LLM agents interacting with other goal-pursuing agents.

Source

bskyResearch Seminar: Benchmarking Cooperation Mechanisms for LLM Agents in Social Dilemmasmemento.epfl.ch

Key quotes

· 5 pulled
LLMs with stronger reasoning capabilities behave _less_ cooperatively in mixed-motive games such as the prisoner's dilemma and public goods settings
Our experiments show that recent models---with or without reasoning enabled---consistently defect in single-shot social dilemmas
We present the first comparative study of game-theoretic mechanisms designed to enable cooperative outcomes between rational agents _in equilibrium_
Contracting and mediation are most effective in achieving cooperative outcomes between capable LLM models
Repetition-induced cooperation deteriorates drastically when co-players vary
Snippet from the RSS feed
The talk is jointly organized by the EPFL AI Center and the DLAB as part of the AI fundamentals seminar series. Hosting professor: Prof. Robert West (DLAB) Title Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas Abstract

You might also wanna read

Principles for Effective LLM Agent Development: Avoiding Multi-Agent Pitfalls

The article critiques current LLM agent frameworks and proposes principles for building effective agents based on the author's practical exp

cognition.ai·10mo ago

New Benchmark Reveals High Rates of Outcome-Driven Constraint Violations in Autonomous AI Agents

Researchers introduce a new benchmark for evaluating autonomous AI agents' safety, specifically focusing on outcome-driven constraint violat

arxiv.org·4mo ago

Skill-MAS: A Meta-Skill Approach to Improving Multi-Agent Systems Without Retraining

Skill-MAS proposes a novel approach to LLM-based automatic Multi-Agent Systems (MAS) generation that bridges the gap between inference-time

arxiv.org·13d ago

Skill-MAS: A Meta-Skill Approach to Improving Multi-Agent Systems Without Retraining

Skill-MAS proposes a novel approach to LLM-based automatic Multi-Agent Systems (MAS) generation that bridges the gap between inference-time

arxiv.org·13d ago

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

vmax.ai·1mo ago

AgentGym-RL: A Reinforcement Learning Framework for Training LLM Agents in Multi-Turn Decision Making

This paper introduces AgentGym-RL, a unified reinforcement learning framework for training LLM agents to perform multi-turn interactive deci

arxiv.org·13d ago

AgentGym-RL: A Reinforcement Learning Framework for Training LLM Agents in Multi-Turn Decision Making

This paper introduces AgentGym-RL, a unified reinforcement learning framework for training LLM agents to perform multi-turn interactive deci

arxiv.org·13d ago

Research Shows LLMs Have Coherent Utility Functions and Value Systems

The article discusses a February 2025 research paper from the Center for AI Safety titled 'Utility Engineering: Analyzing and Controlling Em

arctotherium.substack.com·8mo ago

Comments

Sign in to join the conversation.

No comments yet. Be the first.