Token Consumption Analysis in LLM-Based Multi-Agent Software Engineering Systems

[Submitted on 20 Jan 2026]

16d ago· 2 min readenInsight

technology research programming ai & machine learning

Summary

This paper analyzes token consumption patterns in LLM-based Multi-Agent (LLM-MA) systems applied to software engineering tasks. Using the ChatDev framework with a GPT-5 reasoning model across 30 software development tasks, the researchers mapped internal phases to SDLC stages (Design, Coding, Code Completion, Code Review, Testing, Documentation). Key findings show that the iterative Code Review stage consumes the majority of tokens (59.4% on average), and input tokens consistently represent the largest share (53.9%). The study provides empirical evidence that the primary cost of agentic software engineering lies in automated refinement and verification rather than initial code generation, highlighting significant inefficiencies in agentic collaboration.

Source

Hacker NewsToken Consumption Analysis in LLM-Based Multi-Agent Software Engineering Systemsarxiv.org

Key quotes

· 4 pulled

Our preliminary findings show that the iterative Code Review stage accounts for the majority of token consumption for an average of 59.4% of tokens.

We observe that input tokens consistently constitute the largest share of consumption for an average of 53.9%, providing empirical evidence for potentially significant inefficiencies in agentic collaboration.

Our results suggest that the primary cost of agentic software engineering lies not in initial code generation but in automated refinement and verification.

Our novel methodology can help practitioners predict expenses and optimize workflows, and it directs future research toward developing more token-efficient agent collaboration protocols.

Snippet from the RSS feed

LLM-based Multi-Agent (LLM-MA) systems are increasingly applied to automate complex software engineering tasks such as requirements engineering, code generation, and testing. However, their operational efficiency and resource consumption remain poorly und

You might also wanna read

AI-Powered Code Review: A Framework for Agentic Workflows in Software Development

This paper examines the evolution of code review practices and proposes a vision for AI-powered, agentic code review workflows. It argues th

arxiv.org·15d ago

Token Budgeting: How Context Engineering Can Slash Your LLM Costs

This article debunks the common misconception that token optimization for LLMs is simply about writing shorter prompts. It reframes token op

dev.to·3d ago

AI's Impact on Software Engineering: Evolution or Replacement?

The article explores the complex relationship between AI tools like ChatGPT and software engineering, examining whether AI represents the en

The Verge·9mo ago

Eureka: An LLM-Driven Framework for Automated Feature Engineering in Enterprise AI

This paper presents Eureka, an LLM-driven framework for automated feature engineering in machine learning. It treats feature engineering as

arxiv.org·27d ago

A Technical Taxonomy of LLM Agent Communication Protocols: Classifying Multi-Agent System Interoperability

This study develops a technical taxonomy to classify and analyze LLM agent communication protocols. Using an established iterative method, t

arxiv.org·1d ago

A Technical Taxonomy of LLM Agent Communication Protocols: Classifying Multi-Agent System Interoperability

This study develops a technical taxonomy to classify and analyze LLM agent communication protocols. Using an established iterative method, t

arxiv.org·1d ago

Cognitive debt: How AI-generated code erodes shared understanding in software teams

This article explores the concept of "cognitive debt" in AI-driven software development, arguing that as generative and agentic AI tools tak

getdx.com·2d ago

Comments

No comments yet. Be the first.