All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

AI Model Benchmark: The Evolution from Zero-Shot to Agentic Approaches for Creative Tasks

By

todsacerdoti

6mo ago· 6 min readenInsight

Summary

The article discusses Simon Willison's informal benchmark test for AI models: generating an SVG image of a pelican riding a bicycle. This seemingly absurd test has become surprisingly revealing about model capabilities and is even referenced by AI labs in their marketing. The piece explores how the traditional zero-shot approach (direct prompt to SVG output) is evolving with the rise of agentic AI systems that can iteratively generate, assess, and improve their outputs through tool use and loops. The article examines how this agentic approach might transform creative AI tasks like the pelican-on-bicycle benchmark.

Key quotes

· 5 pulled
Simon Willison has been running his own informal model benchmark for years: 'Generate an SVG of a pelican riding a bicycle.'
It's delightfully absurd—and surprisingly revealing. Even the model labs channel this benchmark in their marketing campaigns announcing new models.
Simon's traditional approach is zero-shot: throw the prompt at the model, get SVG back. Maybe—if you're lucky—you get something resembling a pelican on a bicycle.
Nowadays everyone is talking about agents. Models running in a loop using tools.
The agentic loop—generate, assess, improve—seems like a natural fit for iterating on pelicans on bicycles.
Snippet from the RSS feed
The agentic loop—generate, assess, improve—seems like a natural fit for iterating on pelicans on bicycles.

You might also wanna read