All Topics

Technology

Art

AI Model Benchmark: The Evolution from Zero-Shot to Agentic Approaches for Creative Tasks

todsacerdoti

6mo ago· 6 min readenInsight

90/100

Golden Brown

Bagelometer↗

A five-star bake. Worth schmearing, sharing, saving.

Score90TypeanalysisSentimentpositive

Summary

The article discusses Simon Willison's informal benchmark test for AI models: generating an SVG image of a pelican riding a bicycle. This seemingly absurd test has become surprisingly revealing about model capabilities and is even referenced by AI labs in their marketing. The piece explores how the traditional zero-shot approach (direct prompt to SVG output) is evolving with the rise of agentic AI systems that can iteratively generate, assess, and improve their outputs through tool use and loops. The article examines how this agentic approach might transform creative AI tasks like the pelican-on-bicycle benchmark.

Key quotes

· 5 pulled

Simon Willison has been running his own informal model benchmark for years: 'Generate an SVG of a pelican riding a bicycle.'

It's delightfully absurd—and surprisingly revealing. Even the model labs channel this benchmark in their marketing campaigns announcing new models.

Simon's traditional approach is zero-shot: throw the prompt at the model, get SVG back. Maybe—if you're lucky—you get something resembling a pelican on a bicycle.

Nowadays everyone is talking about agents. Models running in a loop using tools.

The agentic loop—generate, assess, improve—seems like a natural fit for iterating on pelicans on bicycles.

Snippet from the RSS feed

The agentic loop—generate, assess, improve—seems like a natural fit for iterating on pelicans on bicycles.

You might also wanna read

Live AI Design Benchmark: Compare Multiple AI Models' Creative Output for Website Design

The article describes a live AI design benchmark tool on Product Hunt where users can write a prompt and watch multiple AI models compete to

Product Hunt·3mo ago

AI-Generated Portrait of Sam Altman Raises Questions About AI Art in Professional Illustration

The article discusses the controversial use of AI-generated art in professional illustration, focusing on David Szauder's AI-generated portr

The Verge·1mo ago

AI 500: Public Benchmark Tracking Brand Visibility Across Major AI Models

The article introduces the AI 500, a public benchmark tracking AI brand visibility across major AI models (ChatGPT, Claude, Gemini, Perplexi

Product Hunt·5mo ago

Advanced AI Prompting Techniques for Product and Design Workflows

The article explores advanced AI prompting techniques for product and design workflows, emphasizing how these methods can enhance research,

Smashing Magazine·9mo ago

How Agentic AI Is Moving Enterprise AI from Productivity to Autonomous Work

The article discusses the evolution of enterprise AI from basic generative AI tools (drafting emails, summarizing reports) to agentic AI sys

medium.com·4d ago

Why designing for AI should embrace the unfinished, handmade spirit of the 1999 web

The article draws a parallel between the early, experimental, handmade web of 1999 (exemplified by the Hamster Dance) and the current state

uxdesign.cc·2d ago