All Topics

Technology

Art

Alibaba's Tongyi DeepResearch: Open-Source AI Research Agent Matches OpenAI Performance

meander_water

7mo ago· 1 min readen

85/100

Golden Brown

Bagelometer↗

Crackling crust, pillowy middle. The kind of bagel that earns a second cup of coffee.

Score85Typepress releaseSentimentpositive

Summary

Alibaba's Tongyi DeepResearch is presented as the first fully open-source web agent that achieves performance comparable to OpenAI's DeepResearch across multiple benchmarks. The article highlights its state-of-the-art results on academic reasoning tasks (Humanity's Last Exam scoring 32.9), complex information-seeking tasks (BrowseComp at 43.4 and BrowseComp-ZH at 46.7), and user-centric benchmarks (xbench-DeepSearch at 75), systematically outperforming existing proprietary and open-source deep research agents.

Key quotes

· 3 pulled

Tongyi DeepResearch, the first fully open‑source Web Agent to achieve performance on par with OpenAI's DeepResearch across a comprehensive suite of benchmarks

Tongyi DeepResearch demonstrates state‑of‑the‑art results, scoring 32.9 on the academic reasoning task Humanity's Last Exam (HLE)

Achieving a score of 75 on the user‑centric xbench‑DeepSearch benchmark, systematically outperforming all existing proprietary and open‑source Deep Research agents

Snippet from the RSS feed

GITHUB HUGGINGFACE MODELSCOPE SHOWCASE From Chatbot to Autonomous Agent We are proud to present Tongyi DeepResearch, the first fully open‑source Web Agent to achieve performance on par with OpenAI’s DeepResearch across a comprehensive suite of bench

You might also wanna read

Arcee AI Launches Trinity-Large-Thinking: Open-Source AI Model Matching Opus 4.6 Performance at 96% Lower Cost

Arcee AI has launched Trinity-Large-Thinking, an open-source AI model that claims to match the performance of OpenAI's Opus 4.6 while being

Product Hunt·1mo ago

DeepSeek previews V4 AI model, claims competitiveness with US rivals and Huawei compatibility

Chinese AI company DeepSeek has released a preview of its next-generation AI model V4, claiming it can compete with leading closed-source sy

The Verge·1mo ago

Alibaba's Qwen3.7-Max ranks 4th globally in coding benchmark, beating OpenAI and Google models

Alibaba's latest AI model, Qwen3.7-Max, has secured the fourth spot globally on the Code Arena coding leaderboard with a score of 1,541, out

scmp.com·4d ago

Open Comet: Autonomous AI Browser Agent for Research and Task Automation

Open Comet is an autonomous AI browser agent that operates in a browser sidepanel, capable of performing deep research and executing multi-s

Product Hunt·1mo ago

DeepSeek's V4 Model Shows Widening Gap with US Frontier AI Despite Being China's Best

DeepSeek's latest V4 model release was met with a muted reaction, as analysis by the US National Institute for Standards and Technology foun

bloomberg.com·4d ago

Datacurve's DeepSWE Benchmark Shows GPT-5.5 Leading AI Coding Models with 70% Pass Rate

A new benchmark called DeepSWE, released by startup Datacurve, reveals significant performance differences among AI coding models that were

share.transistor.fm·4d ago