All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Research: LLMs Encode Human-Labeled Problem Difficulty Better Than Model-Derived Difficulty

By

stansApprentice

6mo ago· 2 min readenInsight

Summary

This research paper investigates whether large language models (LLMs) internally encode problem difficulty in alignment with human judgment. The study trains linear probes across layers and token positions on 60 models using mathematical and coding subsets of Easy2HardBench. Findings show human-labeled difficulty is strongly linearly decodable and scales with model size, while LLM-derived difficulty is weaker and scales poorly. Steering models toward "easier" representations reduces hallucination and improves accuracy. During GRPO training, human-difficulty probes strengthen and correlate positively with test accuracy, while LLM-difficulty probes degrade and correlate negatively, suggesting human annotations provide stable difficulty signals that reinforcement learning amplifies.

Key quotes

· 5 pulled
Large language models exhibit a puzzling inconsistency: they solve complex problems yet frequently fail on seemingly simpler ones.
We find that human-labeled difficulty is strongly linearly decodable (AMC: $ρ≈0.88$) and exhibits clear model-size scaling, whereas LLM-derived difficulty is substantially weaker and scales poorly.
Steering along the difficulty direction reveals that pushing models toward 'easier' representations reduces hallucination and improves accuracy.
During GRPO training on Qwen2.5-Math-1.5B, the human-difficulty probe strengthens and positively correlates with test accuracy across training steps, while the LLM-difficulty probe degrades and negatively correlates with performance.
These results suggest that human annotations provide a stable difficulty signal that RL amplifies, while automated difficulty estimates derived from model performance become misaligned precisely as models improve.
Snippet from the RSS feed
Large language models exhibit a puzzling inconsistency: they solve complex problems yet frequently fail on seemingly simpler ones. We investigate whether LLMs internally encode problem difficulty in a way that aligns with human judgment, and whether this

You might also wanna read