Multimodal LLMs Demonstrate Ability to Identify Public Figures in Images
By
minimaxir
10mo ago· 9 min readenInsight
100/100
Golden Brown
Bagelometer↗
Hand-rolled, kettle-boiled, baked to perfection. Worth every minute at the bakery.
Score100TypeanalysisSentimentpositive
Summary
The article discusses the use of multimodal LLMs to identify public figures in images, using President Barack Obama as an example. It highlights the capabilities of LLMs like Gemini in recognizing famous individuals, contrasting it with other models like ChatGPT and Claude.
Key quotes
· 3 pulledI’ve been working on a pipeline for representing an image as semantic structured data using multimodal LLMs for better image categorization, tagging, and searching.
It would be weird if an LLM couldn’t identify Obama from this picture.
ChatGPT and Claude won’t, but Gemini will.
ChatGPT and Claude won’t, but Gemini will.
