All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

More Like This Search: From Keyword Matching to Embeddings and Vector Search

By

Sergey Nikolaev

15h ago· 9 min readenInsight

Summary

This article explores the evolution of "More Like This" (MLT) search functionality, which allows users to find documents similar to a selected starting document rather than starting from an empty query box. It contrasts the classic approach (relying on similar words and term matching) with the modern approach (using embeddings and nearest-vector search). The article explains the use cases for each approach, discusses what production systems need to consider when implementing MLT, and provides guidance on when each method is most appropriate.

Key quotes

· 3 pulled
In many search scenarios, the user does not start from an empty query box, but from an existing result.
The classic approach relies on similar words; the modern approach uses embeddings and nearest-vector search.
This article explains where each approach is useful and what production systems need to consider.
Snippet from the RSS feed
More Like This lets search start from a selected document instead of a new query. The classic approach relies on similar words; the modern approach uses embeddings and nearest-vector search. This article explains where each approach is useful and what pro

You might also wanna read

Empirical Study Finds Grep Outperforms Vector Retrieval in LLM Agentic Search Systems

This paper presents an empirical study comparing grep-based retrieval versus vector retrieval in LLM agentic search systems. Using a 116-que

arxiv.org·1d ago

Empirical Study Finds Grep Outperforms Vector Retrieval in LLM Agentic Search Systems

This paper presents an empirical study comparing grep-based retrieval versus vector retrieval in LLM agentic search systems. Using a 116-que

arxiv.org·1d ago

Siamese LLM Dual-Encoder with ROAR for Semantic Product Search in E-Commerce

This paper presents a Siamese LLM dual-encoder for semantic retrieval in e-commerce search, addressing challenges of short, noisy queries ov

arxiv.org·8d ago

LinkedIn Researchers Propose Unified SLM Framework for Industrial Semantic Search Query Understanding

This paper presents a unified structured query understanding framework for industrial semantic search, developed and deployed at LinkedIn. T

arxiv.org·13d ago

SEEN: A Four-Layer Framework for AI Visibility in the Post-SEO Era

This article introduces SEEN, a four-layer framework designed to help brands and content creators optimize their online presence for AI syst

hackernoon.com·14d ago

Meilisearch Launches Built-in Conversational Chat Interface for Search

Meilisearch has launched a new '/chat' feature that provides a built-in conversational interface for search, allowing developers to add Chat

Product Hunt·8mo ago

LiveBrowseComp reveals LLM search agents rely on memorized knowledge, not genuine web searching

This paper introduces the concept of Intrinsic Knowledge Dependence (IKD), showing that LLM-based search agents often rely on pre-trained kn

arxiv.org·13d ago