All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

QUBRIC: A Framework for Co-Designing Queries and Rubrics to Extend Reinforcement Learning Beyond Verifiable Rewards

By

[Submitted on 2 Jun 2026]

9d ago· 2 min readenInsight

Summary

This paper introduces QUBRIC, a framework for rubric-based reinforcement learning (RL) that co-designs queries and rubrics to overcome limitations in extending RL beyond verifiable rewards. The authors identify a structural bottleneck where rubric quality is constrained by query structure—open-ended queries yield vague rubrics, while overly narrow queries introduce fabricated references. QUBRIC uses teacher-derived key points to rewrite open-ended queries into scenario-based evaluable questions, generates contrastive rubrics from teacher-policy gaps, and applies learnability filtering to retain only informative query-rubric pairs for GRPO training. The framework achieves a +5.5 point gain on ArenaHard over SFT baseline and transfers to three held-out benchmarks (legal, moral, narrative reasoning) with +6.3 points average improvement.

Key quotes

· 5 pulled
We identify a structural bottleneck: rubric quality is constrained by query structure.
Open-ended queries yield vague rubrics; naively narrowing them introduces fabricated references that no model can verify, so all responses fail and training receives no reward signal.
QUBRIC achieves a +5.5 point gain on ArenaHard over the SFT baseline.
Trained only on instruction-following data, it further transfers to three held-out benchmarks spanning legal, moral, and narrative reasoning (+6.3 points on average).
These results provide evidence that co-designing queries and rubrics can make rubric-based RL a practical complement to RLVR beyond strictly verifiable tasks.
Snippet from the RSS feed
Rubric-based RL is a promising route for extending reinforcement learning beyond verifiable rewards, yet existing methods optimize rubrics while treating the query distribution as fixed. We identify a structural bottleneck: rubric quality is constrained b

You might also wanna read