All Topics

Technology

Art

GLOSSOPETRAE v3.1: Procedural Xenolinguistics Engine — Machine vs. Human Legibility Scoreboard

12h ago· 4 min readenInsight

Summary

GLOSSOPETRAE v3.1 is a procedural xenolinguistics engine that generates artificial languages and writing systems. The article presents a Frontier Scoreboard comparing machine vs. human legibility, measuring accuracy across 30 seeds per model, with programs auto-graded by execution (interpreter as oracle) and human proxy measured via cold-reader legibility scores.

Source

Twitter / XGLOSSOPETRAE v3.1: Procedural Xenolinguistics Engine — Machine vs. Human Legibility Scoreboardelder-plinius.github.io

Key quotes

· 3 pulled

Measured accuracy across 30 seeds per model.

Each program is auto-graded by execution (the interpreter is the oracle).

Human proxy = cold-reader legibility score

Snippet from the RSS feed

Frontier Scoreboard — Machine vs. Human Legibility

You might also wanna read

Research: Frontier Language Models Show Deterministic Silence for Ontologically Null Concepts

This preprint reports a reproducible behavioral convergence in frontier language models where GPT-5.2 and Claude Opus 4.6 return determinist

zenodo.org·3mo ago

New Benchmark Uses Esoteric Programming Languages to Evaluate LLM Reasoning Abilities

Researchers introduce EsoLang-Bench, a new benchmark for evaluating large language models (LLMs) using esoteric programming languages like B

esolang-bench.vercel.app·3mo ago

Gemini 3.1 Pro Benchmark Performance Analysis Across Multiple AI Evaluation Tasks

The article presents benchmark performance data for Gemini 3.1 Pro, comparing it against other leading AI models including Gemini 3 Pro, Son

deepmind.google·4mo ago

Evaluating LangGraph for Agentic AI Workflows: A Decision-Maker's Guide

LangGraph is becoming the default framework for teams building agentic AI workflows, but its growing reputation means many teams adopt it by

labyrinthanalyticsconsulting.com·21d ago

ProgramBench: New Benchmark Reveals Language Models Struggle to Build Complete Software Projects From Scratch

This paper introduces ProgramBench, a new benchmark designed to evaluate the ability of language model-based software engineering agents to

arXiv.org·1mo ago

BilliardPhys-Bench: New Benchmark Reveals Physical Reasoning Gaps in Multimodal AI Models

This paper introduces BilliardPhys-Bench, a benchmark designed to evaluate multimodal large language models (MLLMs) on intuitive physical re

arxiv.org·19d ago