Why Did GLM-5.2 Move Away From GRPO?

Why Did GLM-5.2 Move Away From GRPO? 🌟Insights from Zhihu contributor 划水的青蛙 TL;DR: GRPO is still a good algorithm, but long-horizon Agentic tasks break the assumptions that made it work well for short, verifiable tasks. The problem is not just theory. It is also reward sparsity, credit assignment, and throughput pressure. GLM-5.2 moving away from GRPO is more like a practical correction than a rejection of GRPO itself. GRPO is still useful. It is just no longer the best algorithm to carry long-horizon tasks. Think back to late 2024 and early 2025. Most models were still rough by today’s standards. Models that could really handle long-horizon tasks, such as Claude Sonnet 4.5 and the Opus series with Claude Code, only became truly impressive later. Before that, models like DeepSeek R1 and OpenAI’s O-series reasoning models were still mainly optimized for short tasks: math, coding unit tests, and other problems that were short and verifiable. But the industry moved extremely fast. Long-horizon coding went from an idea to a real training target in a very short time. If we force GRPO into long-horizon training, two problems become very clear: sparse rewards on the algorithm side, and painful throughput pressure on the engineering side. ⚙️ The Engineering Problem For long-horizon tasks, the hardest trade-off is throughput vs. sample diversity. If you train short tasks first and long tasks later, the gradient signal may swing violently. If you mix them together, short tasks finish early while long tasks keep running. The system then waits to score the whole group, wasting a lot of compute. So even before the algorithm question, the infrastructure pressure is already real. 🧠 The Algorithm Problem GRPO originally worked because of three assumptions

Why Did GLM-5.2 Move Away From GRPO?

Source

You might also wanna read

Whitney Houston Estate Fires Back at Oprah Over Stage Fall Claim Details

Pre-orders for GTA 6 are now live

Tate Modern’s Frida Kahlo is a blockbuster with a fatal flaw

Fans demand World Cup coverage wins Oscar for Ronaldo moment at Scotland clash

The day Wimbledon hopeful Marta Kostyuk cooked me a carbonara From third round of major aged 15 to war at home in Ukraine that prompted thoughts of suicide, world No13 reveals how she has found perspe

“Marcia Marcus: Strange and Clear” is on view at the Provincetown Art Association and Museum in Provincetown, Massachusetts, from June 26 to August 30, 2026.

Comments