Graders
Source
OpenAIGradersopenai.comYou might also wanna read
A Teacher Built an AI Grading Assistant—Then Removed Its Most Automated Feature to Keep Humans in Charge
A teacher returns from two days of chaperoning field trips to find 450 ungraded assignments. Rather than simply marking them credit/no credi
A Teacher Built an AI Grading Assistant—Then Removed Its Most Automated Feature to Keep Humans in Charge
A teacher returns from two days of chaperoning field trips to find 450 ungraded assignments. Rather than simply marking them credit/no credi
Three AI tools all made the same mistake grading student exit tickets — and what it reveals about teaching
A teacher gave three AI tools the task of analyzing 16 student exit tickets from an 8th grade math class on solving systems of linear equati
Grade (YC W2026) Builds API for Performance-Based Payroll Targeting AI Agents and Remote Contractors
Grade (YC W2026) is building an API infrastructure for performance-based payroll, enabling companies to pay AI agents, remote contractors, a
BINEVAL: A Binary Question Framework for Interpretable LLM Evaluation and Self-Improvement
This paper introduces BINEVAL, a framework for evaluating LLM outputs that decomposes evaluation criteria into atomic binary questions. Inst
BINEVAL: A Binary Question Framework for Interpretable LLM Evaluation and Self-Improvement
This paper introduces BINEVAL, a framework for evaluating LLM outputs that decomposes evaluation criteria into atomic binary questions. Inst
Systematic evaluation of 21 LLM-as-a-Judge models reveals reliability flaws and position bias across 541,000 judgments
This paper presents the largest systematic evaluation of LLM-as-a-Judge models to date, analyzing 21 judges from nine providers across three
Systematic evaluation of 21 LLM-as-a-Judge models reveals reliability flaws and position bias across 541,000 judgments
This paper presents the largest systematic evaluation of LLM-as-a-Judge models to date, analyzing 21 judges from nine providers across three

Comments
Sign in to join the conversation.
No comments yet. Be the first.