AI Model Benchmark: The Evolution from Zero-Shot to Agentic Approaches for Creative Tasks
By
todsacerdoti
A five-star bake. Worth schmearing, sharing, saving.
Summary
The article discusses Simon Willison's informal benchmark test for AI models: generating an SVG image of a pelican riding a bicycle. This seemingly absurd test has become surprisingly revealing about model capabilities and is even referenced by AI labs in their marketing. The piece explores how the traditional zero-shot approach (direct prompt to SVG output) is evolving with the rise of agentic AI systems that can iteratively generate, assess, and improve their outputs through tool use and loops. The article examines how this agentic approach might transform creative AI tasks like the pelican-on-bicycle benchmark.
Key quotes
· 5 pulledSimon Willison has been running his own informal model benchmark for years: 'Generate an SVG of a pelican riding a bicycle.'
It's delightfully absurd—and surprisingly revealing. Even the model labs channel this benchmark in their marketing campaigns announcing new models.
Simon's traditional approach is zero-shot: throw the prompt at the model, get SVG back. Maybe—if you're lucky—you get something resembling a pelican on a bicycle.
Nowadays everyone is talking about agents. Models running in a loop using tools.
The agentic loop—generate, assess, improve—seems like a natural fit for iterating on pelicans on bicycles.
You might also wanna read
Live AI Design Benchmark: Compare Multiple AI Models' Creative Output for Website Design
The article describes a live AI design benchmark tool on Product Hunt where users can write a prompt and watch multiple AI models compete to

AI-Generated Portrait of Sam Altman Raises Questions About AI Art in Professional Illustration
The article discusses the controversial use of AI-generated art in professional illustration, focusing on David Szauder's AI-generated portr
AI 500: Public Benchmark Tracking Brand Visibility Across Major AI Models
The article introduces the AI 500, a public benchmark tracking AI brand visibility across major AI models (ChatGPT, Claude, Gemini, Perplexi

Advanced AI Prompting Techniques for Product and Design Workflows
The article explores advanced AI prompting techniques for product and design workflows, emphasizing how these methods can enhance research,
How Agentic AI Is Moving Enterprise AI from Productivity to Autonomous Work
The article discusses the evolution of enterprise AI from basic generative AI tools (drafting emails, summarizing reports) to agentic AI sys
Why designing for AI should embrace the unfinished, handmade spirit of the 1999 web
The article draws a parallel between the early, experimental, handmade web of 1999 (exemplified by the Hamster Dance) and the current state
uxdesign.cc·2d ago