Snowflake benchmark: China's GLM-5.2 nearly matches Claude Opus 4.7 on coding tasks at a fraction of the cost
By
Matthias Bastian
Summary
Snowflake benchmarked Zhipu AI's GLM-5.2 against Anthropic's Claude Opus 4.7 across 103 coding tasks. The two models performed nearly neck-and-neck on overall task completion (66% vs 67%) when given three attempts per task. However, Opus 4.7 showed higher first-attempt accuracy (53.7% vs 47.6%) and greater efficiency, using fewer tokens per task. GLM-5.2's key advantage is cost — roughly one-fifth the price per output token — putting pricing pressure on Western AI labs like Anthropic and OpenAI.
Source
Key quotes
· 5 pulledThe test covered 103 tasks, each run three times, where models had to write code that works on both DuckDB and Snowflake.
When each model got three attempts per task, the two were neck and neck: 66% vs. 67% of tasks solved.
First-attempt accuracy diverges: Opus hit 53.7%, GLM only 47.6%, showing GLM's output is less consistent.
The Chinese model also averaged 99 runs per task versus Opus's 80 and burned through 860 million tokens, nearly double Opus's 439 million.
That pricing gap is putting real pressure on Anthropic and OpenAI, and could rattle the valuations of Western AI labs.
You might also wanna read
GLM-5.2 vs Claude Opus: A Head-to-Head Test Building a 3D WebGL Game
A comparison between the new open model GLM-5.2 and Claude Opus 4.8, testing them head-to-head on building a 3D platformer in raw WebGL. Whi
GLM 5.2 matches frontier AI models on cybersecurity benchmarks at half the cost, raising distillation concerns
Z.ai's GLM 5.2, an open weights Chinese AI model, has been benchmarked by Louie.ai researchers on the CyberBT-CTF security agent investigati
graphistry.com·1d agoGLM-5.2 Open-Weight Model Outperforms Opus 4.8 on AI-Resistant Backend Test
The article presents a detailed technical comparison between GLM-5.2 (open-weight model) and Opus 4.8, demonstrating that GLM-5.2 outperform
Anthropic Launches Claude Opus 4.8 with Faster Performance and Lower Costs
Anthropic has released Claude Opus 4.8, an upgraded version of their flagship AI model, building on Opus 4.7 with improvements across benchm
Anthropic Releases Claude Opus 4.6 with Enhanced Coding Capabilities and 1M Token Context Window
Anthropic announces Claude Opus 4.6, an upgraded version of their smartest AI model with significant improvements in coding capabilities, in
Anthropic Releases Claude Opus 4.1 with Enhanced Coding and Reasoning Capabilities
Anthropic has released Claude Opus 4.1, an upgraded version of Claude Opus 4, focusing on agentic tasks, real-world coding, and reasoning. T
Comments
Sign in to join the conversation.
No comments yet. Be the first.
