All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

Snowflake benchmark: China's GLM-5.2 nearly matches Claude Opus 4.7 on coding tasks at a fraction of the cost

By

Matthias Bastian

1h ago· 3 min readenNews

Summary

Snowflake benchmarked Zhipu AI's GLM-5.2 against Anthropic's Claude Opus 4.7 across 103 coding tasks. The two models performed nearly neck-and-neck on overall task completion (66% vs 67%) when given three attempts per task. However, Opus 4.7 showed higher first-attempt accuracy (53.7% vs 47.6%) and greater efficiency, using fewer tokens per task. GLM-5.2's key advantage is cost — roughly one-fifth the price per output token — putting pricing pressure on Western AI labs like Anthropic and OpenAI.

Source

bskySnowflake benchmark: China's GLM-5.2 nearly matches Claude Opus 4.7 on coding tasks at a fraction of the costthe-decoder.com

Key quotes

· 5 pulled
The test covered 103 tasks, each run three times, where models had to write code that works on both DuckDB and Snowflake.
When each model got three attempts per task, the two were neck and neck: 66% vs. 67% of tasks solved.
First-attempt accuracy diverges: Opus hit 53.7%, GLM only 47.6%, showing GLM's output is less consistent.
The Chinese model also averaged 99 runs per task versus Opus's 80 and burned through 860 million tokens, nearly double Opus's 439 million.
That pricing gap is putting real pressure on Anthropic and OpenAI, and could rattle the valuations of Western AI labs.
Snippet from the RSS feed
Zhipu AI's GLM-5.2 nearly matches Claude Opus 4.7 in a Snowflake benchmark with 103 coding tasks at one-fifth the cost per output token. But the Chinese model burns through nearly twice as many tokens per task. Still, that pricing gap is putting real pres

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.