Why Would GLM-5.2 Move Away From GRPO?
7h ago
Source
Twitter / XWhy Would GLM-5.2 Move Away From GRPO?zhihu.comWhy Would GLM-5.2 Move Away From GRPO? 🌟Insights from Zhihu contributor 九老师 TL;DR: GLM-5.2 dropping GRPO does not mean GRPO is “bad.” It means the assumptions that made GRPO attractive for short LLM RL tasks may no longer hold for long-horizon agentic tasks. When rollouts get longer, environments get noisier, and credit assignment gets harder, PPO + value modeling starts looking useful again. The key question is not simply “why did GLM-5.2 stop using GRPO?” A better question is: why did GRPO become useful for LLM RL in the first place? If the reasons that made GRPO attractive no longer hold, then going back to PPO becomes natural. GRPO can be understood as a sampled-baseline method. Instead of training a separate value model, it samples multiple responses for the same prompt and uses the group average as a baseline. That is elegant. You get a relative reward signal without paying for a separate critic. In short tasks, this is very appealing. But there is a tradeoff.⚖️ PPO uses a learned value function, or critic. This critic is expensive and harder to tune. It also has its own problems: the policy keeps changing, so the value model is always trying to follow a moving target. That can introduce bias. GRPO avoids that by using an up-to-date sampled baseline. It is closer to low-bias, but it tends to have higher variance. For early LLM RL tasks, that tradeoff made sense
You might also wanna read
Apple Design Resources

'King of the Hill' Season 15 Episode Screens at Annecy Festival; Hulu Premiere Set for July 20
A new episode of 'King of the Hill' Season 15 was screened at the 2026 Annecy Animation Festival, ahead of the July 20 premiere on Hulu. The
Out of all 13 studios LPs, these are the essentials
ultimateclassicrock.com·1h ago

Priyanka Chopra Jonas Surprised by Global Success of Prime Video Pirate Film 'The Bluff'; Shares Update on Rajamouli's 'Varanasi'
Priyanka Chopra Jonas expressed surprise at the global success of her pirate film 'The Bluff' on Prime Video, which topped streaming charts
ESPN draft blunder leaves Dirk Nowitzki red-faced in middle of heartfelt message
talksport.com·1h ago
RT @TeleFootball: England were frustrated in a 0-0 draw with Ghana on Tuesday. @SamWallaceTel breaks down the key talking points from the…
telegraph.co.uk·1h ago
Comments
Sign in to join the conversation.
No comments yet. Be the first.
