All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

LongCat-Flash: Meituan's 560B-Parameter MoE Language Model with Dynamic Computation and Open-Source Release

By

[Submitted on 1 Sep 2025 (v1), last revised 19 Sep 2025 (this version, v2)]

7h ago· 4 min readen

Summary

Meituan's LongCat Team introduces LongCat-Flash, a 560-billion-parameter Mixture-of-Experts (MoE) language model designed for computational efficiency and advanced agentic capabilities. Key innovations include Zero-computation Experts (dynamic computational budget allocation activating 18.6B-31.3B parameters per token) and Shortcut-connected MoE (improving computation-communication overlap for inference efficiency). The model was trained on over 20 trillion tokens within 30 days, achieves over 100 tokens per second inference at $0.70 per million output tokens, and is open-sourced. It demonstrates competitive performance as a non-thinking foundation model with particular strengths in agentic tasks.

Source

Twitter / XLongCat-Flash: Meituan's 560B-Parameter MoE Language Model with Dynamic Computation and Open-Source Releasearxiv.org

Key quotes

· 5 pulled
LongCat-Flash adopts two novel designs: (a) Zero-computation Experts, which enables dynamic computational budget allocation and activates 18.6B-31.3B (27B on average) per token depending on contextual demands, optimizing resource usage.
We complete model training on more than 20 trillion tokens within 30 days, while achieving over 100 tokens per second (TPS) for inference at a cost of $0.70 per million output tokens.
As a non-thinking foundation model, LongCat-Flash delivers highly competitive performance among other leading models, with exceptional strengths in agentic tasks.
The model checkpoint of LongCat-Flash is open-sourced to foster community research.
We develop a comprehensive scaling framework for large models that combines hyperparameter transfer, model-growth initialization, a multi-pronged stability suite, and deterministic computation to achieve stable and reproducible training.
Snippet from the RSS feed
We introduce LongCat-Flash, a 560-billion-parameter Mixture-of-Experts (MoE) language model designed for both computational efficiency and advanced agentic capabilities. Stemming from the need for scalable efficiency, LongCat-Flash adopts two novel design

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.