LongCat-Flash: Meituan's 560B-Parameter MoE Language Model with Dynamic Computation and Open-Source Release

[Submitted on 1 Sep 2025 (v1), last revised 19 Sep 2025 (this version, v2)]

7h ago· 4 min readen

technology programming ai research open-source models

Summary

Meituan's LongCat Team introduces LongCat-Flash, a 560-billion-parameter Mixture-of-Experts (MoE) language model designed for computational efficiency and advanced agentic capabilities. Key innovations include Zero-computation Experts (dynamic computational budget allocation activating 18.6B-31.3B parameters per token) and Shortcut-connected MoE (improving computation-communication overlap for inference efficiency). The model was trained on over 20 trillion tokens within 30 days, achieves over 100 tokens per second inference at $0.70 per million output tokens, and is open-sourced. It demonstrates competitive performance as a non-thinking foundation model with particular strengths in agentic tasks.

Source

Twitter / XLongCat-Flash: Meituan's 560B-Parameter MoE Language Model with Dynamic Computation and Open-Source Releasearxiv.org

Key quotes

· 5 pulled

LongCat-Flash adopts two novel designs: (a) Zero-computation Experts, which enables dynamic computational budget allocation and activates 18.6B-31.3B (27B on average) per token depending on contextual demands, optimizing resource usage.

We complete model training on more than 20 trillion tokens within 30 days, while achieving over 100 tokens per second (TPS) for inference at a cost of $0.70 per million output tokens.

As a non-thinking foundation model, LongCat-Flash delivers highly competitive performance among other leading models, with exceptional strengths in agentic tasks.

The model checkpoint of LongCat-Flash is open-sourced to foster community research.

We develop a comprehensive scaling framework for large models that combines hyperparameter transfer, model-growth initialization, a multi-pronged stability suite, and deterministic computation to achieve stable and reproducible training.

Snippet from the RSS feed

We introduce LongCat-Flash, a 560-billion-parameter Mixture-of-Experts (MoE) language model designed for both computational efficiency and advanced agentic capabilities. Stemming from the need for scalable efficiency, LongCat-Flash adopts two novel design

You might also wanna read

DeepSeek-V4 Series Preview: Million-Token Context MoE Models with 1.6T Parameters

DeepSeek introduces the V4 series of Mixture-of-Experts (MoE) language models, including DeepSeek-V4-Pro (1.6T parameters, 49B activated) an

huggingface.co·2mo ago

StepFun Releases Step 3.5 Flash: 196B Sparse MoE Model for OpenClaw Agents

StepFun has released Step 3.5 Flash, a 196B sparse Mixture of Experts (MoE) model that activates only 11B parameters per token for high effi

Product Hunt·1mo ago

GLM-4.7-Flash: Z.ai's 30B-A3B MoE Model for Lightweight AI Deployment

GLM-4.7-Flash is a 30B-A3B Mixture of Experts (MoE) model developed by Z.ai, positioned as the strongest model in the 30B parameter class. T

huggingface.co·5mo ago

Consistency Diffusion Language Models Achieve 14x Faster Inference Through KV Caching and Step Reduction

Consistency Diffusion Language Models (CDLM) represent a breakthrough in language model architecture that addresses key limitations of stand

together.ai·4mo ago

Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity

arxiv.org·1y ago

Building high-performance expert-parallel dispatch and combine kernels for MoE LLM inference

This article provides a deep technical deep-dive into the architecture and implementation of high-performance Expert Parallelism (EP) kernel

fergusfinn.com·19d ago

Comments

No comments yet. Be the first.