More Like This Search: From Keyword Matching to Embeddings and Vector Search
By
Sergey Nikolaev
Pulled from the oven just right. Trustworthy, fact-dense, deeply satisfying.
Summary
This article explores the evolution of "More Like This" (MLT) search functionality, which allows users to find documents similar to a selected starting document rather than starting from an empty query box. It contrasts the classic approach (relying on similar words and term matching) with the modern approach (using embeddings and nearest-vector search). The article explains the use cases for each approach, discusses what production systems need to consider when implementing MLT, and provides guidance on when each method is most appropriate.
Key quotes
· 3 pulledIn many search scenarios, the user does not start from an empty query box, but from an existing result.
The classic approach relies on similar words; the modern approach uses embeddings and nearest-vector search.
This article explains where each approach is useful and what production systems need to consider.
You might also wanna read
Empirical Study Finds Grep Outperforms Vector Retrieval in LLM Agentic Search Systems
This paper presents an empirical study comparing grep-based retrieval versus vector retrieval in LLM agentic search systems. Using a 116-que
Empirical Study Finds Grep Outperforms Vector Retrieval in LLM Agentic Search Systems
This paper presents an empirical study comparing grep-based retrieval versus vector retrieval in LLM agentic search systems. Using a 116-que
Siamese LLM Dual-Encoder with ROAR for Semantic Product Search in E-Commerce
This paper presents a Siamese LLM dual-encoder for semantic retrieval in e-commerce search, addressing challenges of short, noisy queries ov
LinkedIn Researchers Propose Unified SLM Framework for Industrial Semantic Search Query Understanding
This paper presents a unified structured query understanding framework for industrial semantic search, developed and deployed at LinkedIn. T
SEEN: A Four-Layer Framework for AI Visibility in the Post-SEO Era
This article introduces SEEN, a four-layer framework designed to help brands and content creators optimize their online presence for AI syst
Meilisearch Launches Built-in Conversational Chat Interface for Search
Meilisearch has launched a new '/chat' feature that provides a built-in conversational interface for search, allowing developers to add Chat
LiveBrowseComp reveals LLM search agents rely on memorized knowledge, not genuine web searching
This paper introduces the concept of Intrinsic Knowledge Dependence (IKD), showing that LLM-based search agents often rely on pre-trained kn
