All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Datacurve's DeepSWE Benchmark Shows GPT-5.5 Leading AI Coding Models with 70% Pass Rate

4d ago· 9 min readenNews

Summary

A new benchmark called DeepSWE, released by startup Datacurve, reveals significant performance differences among AI coding models that were previously hidden by flawed evaluation standards. OpenAI's GPT-5.5 leads with a 70% pass rate, while Anthropic's Claude Opus 4.7 trails at 54%, and mid-tier models like Claude Haiku 4.5 perform poorly. The report critiques the industry-standard SWE-Bench Pro for masking these divergences.

Key quotes

· 3 pulled
A new benchmark released by startup Datacurve yesterday, DeepSWE, has revealed a significant divergence in the performance of frontier AI coding models, previously masked by flawed evaluation standards.
OpenAI's GPT-5.5 emerged as the dominant leader with a 70% pass rate, while competing models like Anthropic's Claude Opus 4.7 trailed at 54%.
The report critiques the industry-standard SWE-Bench Pro, identifying it as a flawed benchmark that masked true model capabilities.
Snippet from the RSS feed
A new benchmark released by startup Datacurve yesterday, DeepSWE, has revealed a significant divergence in the performance of frontier AI coding models, previously masked by flawed evaluation standards. OpenAI’s GPT-5.5 emerged as the dominant leader with

You might also wanna read