All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

LEVANTE-bench: Benchmark Reveals Partial Alignment Between Vision-Language Models and Children's Cognitive Abilities

By

[Submitted on 3 Jun 2026]

3d ago· 2 min readenInsight

Summary

The article introduces LEVANTE-bench, a benchmark for comparing vision-language models (VLMs) with children's cognitive development. Based on data from the Learning Variability Network (LEVANTE), it assesses VLMs on six cognitive tasks and compares their performance against children aged 5-12 (N=1547) across three countries. Key findings show that alignment between VLMs and children is heterogeneous: more capable models align better with humans at task and item levels, but error distribution matching varies widely across tasks. Smaller models sometimes matched younger children's errors better, and even top-performing VLMs struggled with matrix reasoning and mental rotation tasks, indicating only partial alignment with children's cognitive abilities.

Key quotes

· 5 pulled
Alignment was heterogeneous across scales: at the level of tasks and items, more capable models aligned better with humans.
However, match to human error distributions varied widely across tasks, and for several tasks, smaller models matched younger children's errors better.
In addition, even the best-performing VLMs struggled on matrix reasoning and mental rotation tasks.
Thus, current VLM architectures align only partially with the cognitive abilities of children.
Given the inherently multimodal nature of human experience, vision-language models (VLMs) hold substantial promise for modeling human cognition as it grows and develops with experience.
Snippet from the RSS feed
Given the inherently multimodal nature of human experience, vision-language models (VLMs) hold substantial promise for modeling human cognition as it grows and develops with experience. Realizing their potential requires tools for comparing VLMs with huma

You might also wanna read