Getting Started with Evals
10mo ago
Source
OpenAIGetting Started with Evalsopenai.comQuickstart for creating and running evaluations. — evals
You might also wanna read
PPT-Eval: A Benchmark for Evaluating AI Agents on PowerPoint Creation and Editing Tasks
PPT-Eval is a benchmark introduced for evaluating computer-use AI agents on PowerPoint tasks. It consists of 120 tasks across 12 PowerPoint
agent-skills-eval: An open-source test framework for measuring AI agent skill effectiveness
agent-skills-eval is an open-source test runner for evaluating AI agent skills (SKILL.md files) based on the Agent Skills standard from Anth
Programming: Development With R, CRAN, and Memories of Perl Insight
Tux Machines·2d ago
Bun v0.6.6
bun.com·3y ago
Programming Leftovers
Tux Machines·8h ago
Bun v0.6.0
bun.com·3y ago

Comments
Sign in to join the conversation.
No comments yet. Be the first.