Technology

Art

Snowflake benchmark: China's GLM-5.2 nearly matches Claude Opus 4.7 on coding tasks at a fraction of the cost

Matthias Bastian

1h ago· 3 min readenNews

technology business benchmarking ai models

Summary

Snowflake benchmarked Zhipu AI's GLM-5.2 against Anthropic's Claude Opus 4.7 across 103 coding tasks. The two models performed nearly neck-and-neck on overall task completion (66% vs 67%) when given three attempts per task. However, Opus 4.7 showed higher first-attempt accuracy (53.7% vs 47.6%) and greater efficiency, using fewer tokens per task. GLM-5.2's key advantage is cost — roughly one-fifth the price per output token — putting pricing pressure on Western AI labs like Anthropic and OpenAI.

Source

bskySnowflake benchmark: China's GLM-5.2 nearly matches Claude Opus 4.7 on coding tasks at a fraction of the costthe-decoder.com

Key quotes

· 5 pulled

The test covered 103 tasks, each run three times, where models had to write code that works on both DuckDB and Snowflake.

When each model got three attempts per task, the two were neck and neck: 66% vs. 67% of tasks solved.

First-attempt accuracy diverges: Opus hit 53.7%, GLM only 47.6%, showing GLM's output is less consistent.

The Chinese model also averaged 99 runs per task versus Opus's 80 and burned through 860 million tokens, nearly double Opus's 439 million.

That pricing gap is putting real pressure on Anthropic and OpenAI, and could rattle the valuations of Western AI labs.

Snippet from the RSS feed

Zhipu AI's GLM-5.2 nearly matches Claude Opus 4.7 in a Snowflake benchmark with 103 coding tasks at one-fifth the cost per output token. But the Chinese model burns through nearly twice as many tokens per task. Still, that pricing gap is putting real pres

You might also wanna read

GLM-5.2 vs Claude Opus: A Head-to-Head Test Building a 3D WebGL Game

A comparison between the new open model GLM-5.2 and Claude Opus 4.8, testing them head-to-head on building a 3D platformer in raw WebGL. Whi

techstackups.com·2d ago

GLM 5.2 matches frontier AI models on cybersecurity benchmarks at half the cost, raising distillation concerns

Z.ai's GLM 5.2, an open weights Chinese AI model, has been benchmarked by Louie.ai researchers on the CyberBT-CTF security agent investigati

graphistry.com·1d ago

GLM-5.2 Open-Weight Model Outperforms Opus 4.8 on AI-Resistant Backend Test

The article presents a detailed technical comparison between GLM-5.2 (open-weight model) and Opus 4.8, demonstrating that GLM-5.2 outperform

southbridge.ai·1d ago

Anthropic Launches Claude Opus 4.8 with Faster Performance and Lower Costs

Anthropic has released Claude Opus 4.8, an upgraded version of their flagship AI model, building on Opus 4.7 with improvements across benchm

anthropic.com·27d ago

Anthropic Releases Claude Opus 4.6 with Enhanced Coding Capabilities and 1M Token Context Window

Anthropic announces Claude Opus 4.6, an upgraded version of their smartest AI model with significant improvements in coding capabilities, in

anthropic.com·4mo ago

Anthropic Releases Claude Opus 4.1 with Enhanced Coding and Reasoning Capabilities

Anthropic has released Claude Opus 4.1, an upgraded version of Claude Opus 4, focusing on agentic tasks, real-world coding, and reasoning. T

anthropic.com·10mo ago

Comments

No comments yet. Be the first.