LongCat-Flash: Meituan's 560B-Parameter MoE Language Model with Dynamic Computation and Open-Source Release
By
[Submitted on 1 Sep 2025 (v1), last revised 19 Sep 2025 (this version, v2)]
Summary
Meituan's LongCat Team introduces LongCat-Flash, a 560-billion-parameter Mixture-of-Experts (MoE) language model designed for computational efficiency and advanced agentic capabilities. Key innovations include Zero-computation Experts (dynamic computational budget allocation activating 18.6B-31.3B parameters per token) and Shortcut-connected MoE (improving computation-communication overlap for inference efficiency). The model was trained on over 20 trillion tokens within 30 days, achieves over 100 tokens per second inference at $0.70 per million output tokens, and is open-sourced. It demonstrates competitive performance as a non-thinking foundation model with particular strengths in agentic tasks.
Source
Key quotes
· 5 pulledLongCat-Flash adopts two novel designs: (a) Zero-computation Experts, which enables dynamic computational budget allocation and activates 18.6B-31.3B (27B on average) per token depending on contextual demands, optimizing resource usage.
We complete model training on more than 20 trillion tokens within 30 days, while achieving over 100 tokens per second (TPS) for inference at a cost of $0.70 per million output tokens.
As a non-thinking foundation model, LongCat-Flash delivers highly competitive performance among other leading models, with exceptional strengths in agentic tasks.
The model checkpoint of LongCat-Flash is open-sourced to foster community research.
We develop a comprehensive scaling framework for large models that combines hyperparameter transfer, model-growth initialization, a multi-pronged stability suite, and deterministic computation to achieve stable and reproducible training.
You might also wanna read
DeepSeek-V4 Series Preview: Million-Token Context MoE Models with 1.6T Parameters
DeepSeek introduces the V4 series of Mixture-of-Experts (MoE) language models, including DeepSeek-V4-Pro (1.6T parameters, 49B activated) an
StepFun Releases Step 3.5 Flash: 196B Sparse MoE Model for OpenClaw Agents
StepFun has released Step 3.5 Flash, a 196B sparse Mixture of Experts (MoE) model that activates only 11B parameters per token for high effi
GLM-4.7-Flash: Z.ai's 30B-A3B MoE Model for Lightweight AI Deployment
GLM-4.7-Flash is a 30B-A3B Mixture of Experts (MoE) model developed by Z.ai, positioned as the strongest model in the 30B parameter class. T
Consistency Diffusion Language Models Achieve 14x Faster Inference Through KV Caching and Step Reduction
Consistency Diffusion Language Models (CDLM) represent a breakthrough in language model architecture that addresses key limitations of stand
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity

Building high-performance expert-parallel dispatch and combine kernels for MoE LLM inference
This article provides a deep technical deep-dive into the architecture and implementation of high-performance Expert Parallelism (EP) kernel

Comments
Sign in to join the conversation.
No comments yet. Be the first.