Pelicans on a Bicycle
An informal benchmark that asks AI models to generate a drawing of a pelican riding a bicycle — a task designed to test whether a model can translate an unusual, specific idea into a coherent image or piece of code. The test was created in 2024 by developer Simon Willison, who chose the prompt precisely because no such image was likely to exist in AI training data, making pattern-matching shortcuts useless.
For journalists covering AI capabilities, the pelicans on a bicycle test offers a concrete and visually intuitive way to compare AI models. The task demands that a model understand the anatomy of both a bird and a machine, reason about how they would physically relate to each other in space, and render that understanding into a working output. When a model fails — producing a bicycle with no rider, a blob that resembles neither animal nor vehicle, or an image that simply ignores half the prompt — that failure is immediately visible to anyone, no technical background required.
The benchmark became surprisingly influential as a quick sanity check on new model releases, gaining enough visibility to be referenced in a Google I/O keynote and an Anthropic research paper. It sits alongside more formal benchmarks as evidence that official test scores don't always capture whether a model can handle creative, compositional, or spatially complex requests — the kinds of tasks that often trip up AI in the real world.
Willison said he "started my pelicans on a bicycle benchmark as a joke, but it's actually starting to become a bit useful," and predicted the test would continue to be useful for evaluating models for some time to come.— GIGAZINE
The pelican on a bicycle test "sounds ridiculous, but it actually tests spatial reasoning, anatomy knowledge, and whether AI can draw a bike frame — apparently very hard."— News1