Developer Enables Vision Capabilities for Local LLMs Using Google Lens and OpenCV
By
vkaufmann
3mo ago· 1 min readenNews
75/100
Toasty
Bagelometer↗
A good honest bake. Not flashy, but you'll finish the whole bagel.
Score75TypenewsSentimentpositive
Summary
A developer created an MCP server that enables local LLMs like GPT-OSS-120B to perform Google searches and gain vision capabilities without API keys. The system uses OpenCV to detect objects in images, crops them, and sends them to Google Lens for identification. The text-only GPT-OSS-120B model successfully identified specific hardware items (NVIDIA DGX Spark and SanDisk USB drive) from a desk photo using this vision enhancement.
Key quotes
· 3 pulledI built an MCP server that gives any local LLM real Google search and now vision capabilities - no API keys needed.
The latest feature: google_lens_detect uses OpenCV to find objects in an image, crops each one, and sends them to Google Lens for identification.
GPT-OSS-120B, a text-only model with zero vision support, correctly identified an NVIDIA DGX Spark and a SanDisk USB drive from a desk photo.
I built an MCP server that gives any local LLM real Google search and now vision capabilities - no API keys needed.
