GLOSSOPETRAE v3.1: Procedural Xenolinguistics Engine — Machine vs. Human Legibility Scoreboard
Summary
GLOSSOPETRAE v3.1 is a procedural xenolinguistics engine that generates artificial languages and writing systems. The article presents a Frontier Scoreboard comparing machine vs. human legibility, measuring accuracy across 30 seeds per model, with programs auto-graded by execution (interpreter as oracle) and human proxy measured via cold-reader legibility scores.
Source
Key quotes
· 3 pulledMeasured accuracy across 30 seeds per model.
Each program is auto-graded by execution (the interpreter is the oracle).
Human proxy = cold-reader legibility score
You might also wanna read
Research: Frontier Language Models Show Deterministic Silence for Ontologically Null Concepts
This preprint reports a reproducible behavioral convergence in frontier language models where GPT-5.2 and Claude Opus 4.6 return determinist
New Benchmark Uses Esoteric Programming Languages to Evaluate LLM Reasoning Abilities
Researchers introduce EsoLang-Bench, a new benchmark for evaluating large language models (LLMs) using esoteric programming languages like B
Gemini 3.1 Pro Benchmark Performance Analysis Across Multiple AI Evaluation Tasks
The article presents benchmark performance data for Gemini 3.1 Pro, comparing it against other leading AI models including Gemini 3 Pro, Son
Evaluating LangGraph for Agentic AI Workflows: A Decision-Maker's Guide
LangGraph is becoming the default framework for teams building agentic AI workflows, but its growing reputation means many teams adopt it by
ProgramBench: New Benchmark Reveals Language Models Struggle to Build Complete Software Projects From Scratch
This paper introduces ProgramBench, a new benchmark designed to evaluate the ability of language model-based software engineering agents to
BilliardPhys-Bench: New Benchmark Reveals Physical Reasoning Gaps in Multimodal AI Models
This paper introduces BilliardPhys-Bench, a benchmark designed to evaluate multimodal large language models (MLLMs) on intuitive physical re
