All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

UniFormer: A Unified Model-Centric Scaling Framework for Industrial Recommendation Systems

By

[Submitted on 25 Jun 2026]

1d ago· 2 min readenInsight

Summary

This paper introduces UniFormer, a unified model-centric scaling framework for industrial recommender systems developed by Kuaishou. Unlike prior component-centric approaches that scale individual modules independently, UniFormer decomposes the modeling space into feature and task spaces, each modeled by dedicated interaction modules. It introduces a semantic-based tokenization scheme for user-item decoupling to enable request-level inference acceleration, and uses multi-sequence cross-attention to prevent preference collapse by capturing heterogeneous behavior patterns. Multi-view FFNs support flexible parameter scaling. Online A/B testing in two production scenarios (Kuaishou and Kuaishou Lite) demonstrated consistent improvements in user engagement metrics, including +0.101%/+0.260% in App Stay Time and +0.729%/+1.113% in Watch Time.

Source

bskyUniFormer: A Unified Model-Centric Scaling Framework for Industrial Recommendation Systemsarxiv.org

Key quotes

· 5 pulled
UniFormer decomposes the overall modeling space into feature and task spaces, which are modeled by stacked Feature-space Interaction Modules and Task-space Interaction Modules, respectively.
UniFormer introduces semantic-based tokenization scheme to enable user-item decoupling, thereby achieving request-level inference acceleration.
To prevent preference collapse, UniFormer employs multi-sequence cross-attention to separately capture heterogeneous behavior patterns, followed by the self-attention to enhance interaction modeling.
Extensive online A/B testing in two production scenarios, Kuaishou and Kuaishou Lite, shows that UniFormer consistently improves user engagement and interaction metrics.
Dedicated multi-view FFNs are introduced to support flexible and scalable parameter scaling across different modeling components.
Snippet from the RSS feed
Recently, substantial progress has been made in industrial recommendation through component-centric model scaling, where individual components such as behavior modeling, feature interaction, or task modeling are independently scaled to improve model capac

You might also wanna read

Tokasaurus: An LLM Inference Engine for High-Throughput Workloads

scalingintelligence.stanford.edu·1y ago

Research Study: Effectiveness of Adaptive Merging for Recycling LoRA Modules from Public Repositories

This research paper examines the effectiveness of adaptive merging methods for recycling LoRA (Low-Rank Adaptation) modules from public repo

arxiv.org·4mo ago

Systematic evaluation of 21 LLM-as-a-Judge models reveals reliability flaws and position bias across 541,000 judgments

This paper presents the largest systematic evaluation of LLM-as-a-Judge models to date, analyzing 21 judges from nine providers across three

arxiv.org·4d ago

Systematic evaluation of 21 LLM-as-a-Judge models reveals reliability flaws and position bias across 541,000 judgments

This paper presents the largest systematic evaluation of LLM-as-a-Judge models to date, analyzing 21 judges from nine providers across three

arxiv.org·4d ago

Consistency Diffusion Language Models Achieve 14x Faster Inference Through KV Caching and Step Reduction

Consistency Diffusion Language Models (CDLM) represent a breakthrough in language model architecture that addresses key limitations of stand

together.ai·4mo ago

Jasmine: A Scalable JAX-Based World Modeling Codebase for Efficient AI Training

Researchers introduce Jasmine, a high-performance JAX-based world modeling codebase designed to scale from single hosts to hundreds of accel

arxiv.org·7mo ago

Technical Analysis of Local RAG Implementation: Tradeoffs Between Inference Speed and Retrieval Accuracy

The article discusses local RAG (Retrieval-Augmented Generation) implementation, focusing on model performance tradeoffs between inference s

news.ycombinator.com·5mo ago

Comments

Sign in to join the conversation.

No comments yet. Be the first.