All Topics

Technology

Art

Chroma Context-1: A 20B Parameter Agentic Search Model for Multi-Hop Retrieval

philip1209

2mo ago· 53 min readenInsight

100/100

Golden Brown

Bagelometer↗

Front-window bakery material. Catches the eye, delivers the goods.

Score100TypeanalysisSentimentpositive

Summary

Chroma Context-1 is a 20B parameter agentic search model designed to improve retrieval-augmented generation (RAG) systems. Unlike traditional single-pass retrieval pipelines, it performs multi-hop search by decomposing queries into subqueries, iteratively searching a corpus, and selectively editing its own context. The model achieves retrieval performance comparable to frontier-scale LLMs but at a fraction of the cost and up to 10x faster inference speed, making it suitable as a subagent alongside frontier reasoning models.

Key quotes

· 4 pulled

This approach, broadly known as retrieval-augmented-generation (RAG), has traditionally relied on single-stage retrieval pipelines composed of vector search, lexical search, or regular expression matching, optionally followed by a learned reranker.

In practice, many real-world queries require multi-hop retrieval, in which the output of one search informs the next.

We introduce Chroma Context-1, a 20B parameter agentic search model derived from gpt-oss-20B that achieves retrieval performance comparable to frontier-scale LLMs at a fraction of the cost and up to 10x faster inference speed.

The model is trained to decompose queries into subqueries, iteratively search a corpus, and selectively edit its own context to free capacity for further exploration.

Snippet from the RSS feed

Retrieval pipelines typically operate in a single pass, which poses a problem when the information required to answer a question is spread across multiple documents or requires intermediate reasoning to locate. In practice, many real-world queries require

You might also wanna read

ATLAS: Adaptive Test-time Learning System Achieves 74.6% Code Benchmark Performance with Frozen 14B Model

ATLAS (Adaptive Test-time Learning and Autonomous Specialization) is a system that wraps a frozen smaller language model (14B parameters) wi

github.com·2mo ago

Google Introduces TurboQuant: Advanced LLM Compression Algorithm for Efficient AI Model Deployment

Google has developed TurboQuant, a new LLM compression algorithm that uses advanced theoretically grounded quantization techniques to enable

Product Hunt·2mo ago

Understanding Transformer Circuits: A Mechanistic Interpretability Perspective

This article explores mechanistic interpretability of transformer neural networks, focusing on understanding how transformers work mathemati

connorjdavis.com·2mo ago

Achieving Top Position on HuggingFace LLM Leaderboard Through Model Analysis and Optimization Techniques

The article describes how the author achieved the #1 position on the HuggingFace Open LLM Leaderboard without training or modifying any mode

dnhkng.github.io·2mo ago

Phi-4 Reasoning: Small Open-Weight AI Models with Strong Math and Science Capabilities

Phi-4 Reasoning is a small open-weight language model (3.8B/14B parameters) that delivers powerful reasoning capabilities for math, science,

Product Hunt·2mo ago

Unsloth Releases Dynamic 2.0 GGUFs for Improved LLM Quantization

Unsloth has released Dynamic 2.0 GGUFs, a major upgrade to their quantization method for large language models. The new version outperforms

unsloth.ai·3mo ago