Empirical Study Finds Grep Outperforms Vector Retrieval in LLM Agentic Search Systems

[Submitted on 14 May 2026]

1d ago· 2 min readenInsight

75/100

Toasty

Bagelometer↗

A weekday bagel. Dependable, satisfying, no fuss.

Score75TypeanalysisSentimentneutral

Summary

This paper presents an empirical study comparing grep-based retrieval versus vector retrieval in LLM agentic search systems. Using a 116-question sample from LongMemEval, the study tests retrieval strategies across multiple agent harnesses (Chronos, Claude Code, Codex, Gemini CLI) and tool-calling paradigms (inline vs. file-based results). Experiment 1 finds that grep generally yields higher accuracy than vector retrieval, though overall performance depends heavily on the harness and tool-calling style used. Experiment 2 examines how performance degrades when irrelevant conversation history is mixed in, comparing grep-only and vector-only retrieval under increasing distraction.

Key quotes

· 4 pulled

grep generally yields higher accuracy than vector retrieval in our comparisons in experiment 1

overall scores still depend strongly on which harness and tool-calling style is used, even when the underlying conversation data are the same

existing literature lacks a systematic comparison of how retrieval strategy choice interacts with agent architecture and tool-calling paradigm

how tool outputs are presented to the model and how performance changes when searches must cope with more irrelevant surrounding text, remain under-explored in agent loops

Snippet from the RSS feed

Recent advances in Large Language Model (LLM) agents have enabled complex agentic workflows where models autonomously retrieve information, call tools, and reason over large corpora to complete tasks on behalf of users. Despite the growing adoption of ret

You might also wanna read

LiveBrowseComp reveals LLM search agents rely on memorized knowledge, not genuine web searching

This paper introduces the concept of Intrinsic Knowledge Dependence (IKD), showing that LLM-based search agents often rely on pre-trained kn

arxiv.org·13d ago

PRECISE: A Statistical Framework for Reducing LLM Bias in Search and Ranking Evaluations

This paper presents PRECISE, a statistical framework that extends Prediction-Powered Inference (PPI) to combine minimal human annotations wi

arxiv.org·6d ago

Vectorize Platform Releases New RAG Pipeline Features Including Hosted Chat Agent and Remote MCP Support

Vectorize, a data platform for retrieval augmented generation (RAG), has released new features including a fully hosted, no-code agentic cha

Product Hunt·9mo ago

Siamese LLM Dual-Encoder with ROAR for Semantic Product Search in E-Commerce

This paper presents a Siamese LLM dual-encoder for semantic retrieval in e-commerce search, addressing challenges of short, noisy queries ov

arxiv.org·8d ago