All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

A11y LLM Eval update: Frontier models still fail at accessibility, but new "skills" approach shows promise

By

Michael Fairchild

7d ago· 6 min readenInsight

Summary

This article reports on the latest findings from the A11y LLM Eval project, a benchmark measuring how accessibly LLMs generate UI code. Key findings include: frontier models (GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro Preview) still default to inaccessible code, explicit accessibility instructions significantly improve output, and a new "skills" mechanic shows promise in bridging the gap. Manual testing remains essential despite improvements.

Source

bskyA11y LLM Eval update: Frontier models still fail at accessibility, but new "skills" approach shows promisedev.to

Key quotes

· 3 pulled
The newest frontier models (GPT‑5.5, Claude Opus 4.7, Gemini 3.1 Pro Preview) still default to inaccessible code.
Explicit accessibility instructions can dramatically change that, and manual testing is still essential.
Skills change the game.
Snippet from the RSS feed
A few months ago I shared early results from the A11y LLM Eval project, a benchmark that measures how...

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.