All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

KaLM-Reranker-V1: A Decoupled Encoder-Decoder Reranker for Efficient Document Retrieval

By

[Submitted on 22 Jun 2026]

4d ago· 2 min readenInsight

Summary

KaLM-Reranker-V1 is a new reranking model for retrieval systems that decouples query and passage computation using an encoder-decoder architecture. Unlike traditional rerankers that jointly encode query and passage (tightly coupling computation), this model pre-encodes passages with Matryoshka embedding pooling via the encoder, while the decoder handles query intent. Cross-attention then captures relevance between query and passage representations. The model comes in three sizes (Nano 0.27B, Small 1B, Large 4B parameters) and achieves state-of-the-art performance on BEIR benchmarks, competitive with industrial models like Qwen3-Reranker, while offering superior efficiency. Even the smallest 0.27B Nano model remains competitive with 7-12B embedding models on the LMEB benchmark.

Source

Twitter / XKaLM-Reranker-V1: A Decoupled Encoder-Decoder Reranker for Efficient Document Retrievalarxiv.org

Key quotes

· 4 pulled
We present KaLM-Reranker-V1, a fast but not late-interaction (FBNL) reranker that decouples query and passage computation while retaining expressive relevance modeling.
This design makes KaLM-Reranker-V1 efficient through decoupled passage encoding, yet not late interaction, by preserving rich relevance modeling through cross-attention.
On BEIR, KaLM-Reranker-V1 achieves state-of-the-art performance, on par with strong industrial models such as the Qwen3-Reranker series.
On LMEB, reranking models demonstrate a clear advantage, with even the 0.27B Nano model remaining competitive with 7-12B embedding models.
Snippet from the RSS feed
As retrieval systems scale, high-quality reranking becomes increasingly important. However, most existing rerankers, whether encoder-based or decoder-based, jointly encode the query and passage, tightly coupling their computation and limiting deployment e

You might also wanna read

Rank-Aware Decomposition Technique Reduces Computation in Recommender Systems by 87.5%

This paper presents a rank-aware decomposition technique for deep ranking models in industrial recommender systems. The key insight is that

arxiv.org·1mo ago

Siamese LLM Dual-Encoder with ROAR for Semantic Product Search in E-Commerce

This paper presents a Siamese LLM dual-encoder for semantic retrieval in e-commerce search, addressing challenges of short, noisy queries ov

arxiv.org·25d ago

LLM Rerankers Can Self-Assess Ranking Quality Through Self-Consistency and Supervised Calibration Methods

This paper investigates whether LLM rerankers can predict their own ranking quality (reranker-internal Query Performance Prediction). The au

arxiv.org·23d ago

Expected Attention: KV Cache Compression Method for Efficient LLM Inference

This research paper introduces Expected Attention, a training-free method for compressing Key-Value (KV) cache in large language models to r

arxiv.org·8mo ago

Chonky_mmbert_small_multilingual_v1: Transformer Model for Semantic Text Segmentation in RAG Systems

Chonky_mmbert_small_multilingual_v1 is a transformer model designed for intelligent text segmentation into meaningful semantic chunks. The m

huggingface.co·8mo ago

Fast-dLLM: Training-Free Acceleration Method for Diffusion Language Models Using KV Cache and Parallel Decoding

Researchers introduce Fast-dLLM, a training-free acceleration method for diffusion-based large language models that addresses their slower i

arxiv.org·8mo ago

Comments

Sign in to join the conversation.

No comments yet. Be the first.