Alibaba's Tongyi DeepResearch: Open-Source AI Research Agent Matches OpenAI Performance
By
meander_water
Crackling crust, pillowy middle. The kind of bagel that earns a second cup of coffee.
Summary
Alibaba's Tongyi DeepResearch is presented as the first fully open-source web agent that achieves performance comparable to OpenAI's DeepResearch across multiple benchmarks. The article highlights its state-of-the-art results on academic reasoning tasks (Humanity's Last Exam scoring 32.9), complex information-seeking tasks (BrowseComp at 43.4 and BrowseComp-ZH at 46.7), and user-centric benchmarks (xbench-DeepSearch at 75), systematically outperforming existing proprietary and open-source deep research agents.
Key quotes
· 3 pulledTongyi DeepResearch, the first fully open‑source Web Agent to achieve performance on par with OpenAI's DeepResearch across a comprehensive suite of benchmarks
Tongyi DeepResearch demonstrates state‑of‑the‑art results, scoring 32.9 on the academic reasoning task Humanity's Last Exam (HLE)
Achieving a score of 75 on the user‑centric xbench‑DeepSearch benchmark, systematically outperforming all existing proprietary and open‑source Deep Research agents
You might also wanna read
Arcee AI Launches Trinity-Large-Thinking: Open-Source AI Model Matching Opus 4.6 Performance at 96% Lower Cost
Arcee AI has launched Trinity-Large-Thinking, an open-source AI model that claims to match the performance of OpenAI's Opus 4.6 while being

DeepSeek previews V4 AI model, claims competitiveness with US rivals and Huawei compatibility
Chinese AI company DeepSeek has released a preview of its next-generation AI model V4, claiming it can compete with leading closed-source sy
Alibaba's Qwen3.7-Max ranks 4th globally in coding benchmark, beating OpenAI and Google models
Alibaba's latest AI model, Qwen3.7-Max, has secured the fourth spot globally on the Code Arena coding leaderboard with a score of 1,541, out
Open Comet: Autonomous AI Browser Agent for Research and Task Automation
Open Comet is an autonomous AI browser agent that operates in a browser sidepanel, capable of performing deep research and executing multi-s
DeepSeek's V4 Model Shows Widening Gap with US Frontier AI Despite Being China's Best
DeepSeek's latest V4 model release was met with a muted reaction, as analysis by the US National Institute for Standards and Technology foun
Datacurve's DeepSWE Benchmark Shows GPT-5.5 Leading AI Coding Models with 70% Pass Rate
A new benchmark called DeepSWE, released by startup Datacurve, reveals significant performance differences among AI coding models that were
