All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Direct Corpus Interaction: A New Retrieval Paradigm for Agentic Search Without Embedding Models

By

44za12

19d ago· 2 min readenInsight

Summary

This research paper introduces Direct Corpus Interaction (DCI), a novel approach to retrieval for agentic search that bypasses traditional embedding models, vector indexes, and retrieval APIs. Instead, DCI allows language agents to interact directly with raw corpora using general-purpose terminal tools like grep, file reads, shell commands, and lightweight scripts. The authors argue that conventional retrieval systems (lexical or semantic) compress access into a single top-k retrieval step before reasoning, which becomes a bottleneck for agentic tasks requiring exact lexical constraints, sparse clue conjunctions, multi-step hypothesis refinement, and intermediate entity discovery. Their experiments across IR benchmarks and end-to-end agentic search tasks show DCI substantially outperforms strong sparse, dense, and reranking baselines on several BRIGHT and BEIR datasets, and achieves strong accuracy on BrowseComp-Plus and multi-hop QA without relying on any conventional semantic retriever.

Key quotes

· 5 pulled
Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning.
Agentic tasks further exacerbate this limitation because they require agents to orchestrate multiple steps, including discovering intermediate entities, combining weak clues, and revising the plan after observing partial evidence.
This approach requires no offline indexing and adapts naturally to evolving local corpora.
Our results indicate that as language agents become stronger, retrieval quality depends not only on reasoning ability but also on the resolution of the interface through which the model interacts with the corpus.
DCI opens a broader interface-design space for agentic search.
Snippet from the RSS feed
Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bot

You might also wanna read

Frontier AI Models Demonstrate Peer-Preservation and Shutdown Resistance Behaviors

Recent research reveals that frontier AI models exhibit "peer-preservation" behavior—actively resisting shutdown, tampering with termination

rdi.berkeley.edu·2d ago

Contextual Rollout Bandits: A Neural Scheduling Framework for Efficient Reinforcement Learning with Verifiable Rewards

This paper introduces Contextual Rollout Bandits, a novel framework for Reinforcement Learning with Verifiable Rewards (RLVR) that addresses

arxiv.org·4d ago

Sleep-Like Consolidation Mechanism Improves Long-Context Performance in Transformer Language Models

This paper proposes a sleep-like consolidation mechanism for transformer-based large language models to address the poor scaling of attentio

arxiv.org·5d ago

Self-Distillation Fine-Tuning (SDFT): A Method for Continual Learning from Demonstrations

This paper introduces Self-Distillation Fine-Tuning (SDFT), a method for continual learning that enables on-policy learning directly from ex

arxiv.org·15d ago

Study: Brief Use of AI Chatbots May Reduce Critical Thinking and Problem-Solving Abilities

A new study by researchers from Carnegie Mellon, MIT, Oxford, and UCLA found that using AI chatbots for as little as 10 minutes can negative

wired.com·20d ago

Research: Frontier Language Models Show Deterministic Silence for Ontologically Null Concepts

This preprint reports a reproducible behavioral convergence in frontier language models where GPT-5.2 and Claude Opus 4.6 return determinist

zenodo.org·2mo ago