All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

MobilityBench: A New Benchmark for Evaluating LLM-Based Route-Planning Agents Using Real-World Mobility Data

By

[Submitted on 26 Feb 2026 (v1), last revised 10 Jun 2026 (this version, v2)]

1d ago· 2 min readenInsight

Summary

This paper introduces MobilityBench, a scalable benchmark for evaluating LLM-based route-planning agents in real-world mobility scenarios. Built from anonymized real user queries from Amap (a mapping service), it covers diverse route-planning intents across multiple cities worldwide. The benchmark features a deterministic API-replay sandbox for reproducible evaluation and a multi-dimensional protocol assessing outcome validity, instruction understanding, planning, tool use, and efficiency. Evaluation of multiple LLM-based agents reveals they perform well on basic information retrieval and route planning but struggle significantly with preference-constrained route planning, highlighting gaps in personalized mobility applications.

Key quotes

· 5 pulled
Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making.
Systematic evaluation in real-world mobility settings is hindered by diverse routing demands, non-deterministic mapping services, and limited reproducibility.
We design a deterministic API-replay sandbox that eliminates environmental variance from live services.
Our findings reveal that current models perform competently on Basic information retrieval and Route Planning tasks, yet struggle considerably with Preference-Constrained Route Planning, underscoring significant room for improvement in personalized mobility applications.
We publicly release the benchmark data, evaluation toolkit, and documentation at https://github.com/AMAP-ML/MobilityBench.
Snippet from the RSS feed
Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world

You might also wanna read