All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

SOB: A Multi-Source Structured Output Benchmark for Evaluating LLM JSON Accuracy

By

Yoeven

1mo ago· 7 min readenNews

Summary

This article introduces SOB (Structured Output Benchmark), a new multi-source benchmark for evaluating LLMs' ability to produce structured JSON data from unstructured and semi-structured sources including text, images, and audio. Unlike existing benchmarks that only check schema compliance or evaluate value correctness within a single domain, SOB measures JSON value accuracy per field across multiple source types. The benchmark tests 20+ models using 7 metrics and provides a full leaderboard, addressing the critical need for deterministic structured output in production workflows like invoice parsing, medical record processing, and PDF conversion.

Key quotes

· 3 pulled
A hallucinated invoice_total or an array ordered incorrectly because of inaccurate date values silently breaks downstream systems.
Existing benchmarks either check schema compliance alone or evaluate value correctness within a single source domain.
For deterministic output, the next step in a workflow reads a specific key and expects a specific type.
Snippet from the RSS feed
A multi-source LLM benchmark across text, image, and audio that measures JSON value accuracy per field, not just schema compliance. 20+ models, 7 metrics, full leaderboard.

You might also wanna read