Why Did GLM-5.2 Move Away From GRPO?
8h ago
Source
Twitter / XWhy Did GLM-5.2 Move Away From GRPO?zhihu.comWhy Did GLM-5.2 Move Away From GRPO? 🌟Insights from Zhihu contributor 划水的青蛙 TL;DR: GRPO is still a good algorithm, but long-horizon Agentic tasks break the assumptions that made it work well for short, verifiable tasks. The problem is not just theory. It is also reward sparsity, credit assignment, and throughput pressure. GLM-5.2 moving away from GRPO is more like a practical correction than a rejection of GRPO itself. GRPO is still useful. It is just no longer the best algorithm to carry long-horizon tasks. Think back to late 2024 and early 2025. Most models were still rough by today’s standards. Models that could really handle long-horizon tasks, such as Claude Sonnet 4.5 and the Opus series with Claude Code, only became truly impressive later. Before that, models like DeepSeek R1 and OpenAI’s O-series reasoning models were still mainly optimized for short tasks: math, coding unit tests, and other problems that were short and verifiable. But the industry moved extremely fast. Long-horizon coding went from an idea to a real training target in a very short time. If we force GRPO into long-horizon training, two problems become very clear: sparse rewards on the algorithm side, and painful throughput pressure on the engineering side. ⚙️ The Engineering Problem For long-horizon tasks, the hardest trade-off is throughput vs. sample diversity. If you train short tasks first and long tasks later, the gradient signal may swing violently. If you mix them together, short tasks finish early while long tasks keep running. The system then waits to score the whole group, wasting a lot of compute. So even before the algorithm question, the infrastructure pressure is already real. 🧠 The Algorithm Problem GRPO originally worked because of three assumptions
You might also wanna read
Whitney Houston Estate Fires Back at Oprah Over Stage Fall Claim Details
complex.com·2h ago
Pre-orders for GTA 6 are now live
consequence.net·2h ago
Tate Modern’s Frida Kahlo is a blockbuster with a fatal flaw
Fans demand World Cup coverage wins Oscar for Ronaldo moment at Scotland clash
thesun.co.uk·2h ago
The day Wimbledon hopeful Marta Kostyuk cooked me a carbonara From third round of major aged 15 to war at home in Ukraine that prompted thoughts of suicide, world No13 reveals how she has found perspe
thetimes.com·2h ago
“Marcia Marcus: Strange and Clear” is on view at the Provincetown Art Association and Museum in Provincetown, Massachusetts, from June 26 to August 30, 2026.
vogue.com·2h ago

Comments
Sign in to join the conversation.
No comments yet. Be the first.