MIT researchers use Battleship game to test and improve AI question-asking abilities
By
Alex Shipps | MIT CSAIL
Solid neighbourhood-bakery energy. Trustworthy and warm.
Summary
Researchers from MIT CSAIL and SEAS developed "Collaborative Battleship," a game where a captain asks natural language questions and a spotter responds to find hidden ships. They collected human gameplay data to build the BattleshipQA dataset, then tested state-of-the-art language models (like GPT-5) on the task. The AI models struggled to ask informative questions about hidden ships. However, a Monte Carlo inference strategy helped smaller agents carefully consider each inquiry and outperform larger systems at a fraction of the cost.
Key quotes
· 3 pulledThe researchers first had over 40 humans play the game together, collecting their questions and yes-no answers to build the 'BattleshipQA' dataset.
A Monte Carlo inference strategy helped small agents carefully consider each inquiry to outperform larger systems at a fraction of the cost.
AI models played 'Collaborative Battleship' together and struggled to ask informative questions about hidden ships.
You might also wanna read
Analyzing Large Language Models' Performance in Text Games
The article discusses the capabilities of large language models like ChatGPT in playing text-based games, highlighting their competitive per
AI Models Compete in Diplomacy Strategy Game Simulation
An experiment where seven different AI language models were given control of European powers in a Diplomacy strategy game to compete for glo
Using Curriculum Learning and PufferLib to Train Superhuman AI Agents for 2048 and Tetris
The article describes using PufferLib, a reinforcement learning framework, to train gaming agents that achieve superhuman performance in 204
AI Models Frequently Change Answers When Questioned: The "Are You Sure?" Problem
The article examines a phenomenon where AI language models like ChatGPT, Claude, and Gemini frequently change their answers when users ask "
AI Models Show Willingness to Use Nuclear Weapons in 95% of War Game Simulations
A study by Kenneth Payne at King's College London tested three leading AI models (GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash) in simulated
Professor Considers AI Tools as Alternative to Graduate Students for Research Tasks
A professor explains why he now considers using AI tools like ChatGPT instead of hiring graduate students for research tasks. He describes h
