All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

Ultralytics YOLO26: A Unified Real-Time Vision Model Family with NMS-Free Inference and Advanced Training Pipeline

By

[Submitted on 2 Jun 2026]

1d ago· 2 min readen

Summary

Ultralytics YOLO26 is a new family of real-time vision models that addresses key limitations of prior YOLO detectors. It introduces a dual-head design for NMS-free end-to-end inference, removes Distribution Focal Loss (DFL) for a lighter head, and incorporates a hybrid Muon-SGD optimizer (MuSGD), Progressive Loss training, and a Small Target Aware Label assignment strategy (STAL) to ensure small objects get positive label coverage. The model family spans five scales (n/s/m/l/x) and supports detection, instance segmentation, pose estimation, classification, and oriented detection in a single pipeline. YOLO26 achieves 40.9-57.5 mAP on COCO at 1.7-11.8 ms T4 TensorRT latency, advancing the accuracy-latency Pareto frontier. An open-vocabulary extension, YOLOE-26, enables text-, visual-, and prompt-free inference, reaching 40.6 AP on LVIS minival.

Source

Hacker NewsUltralytics YOLO26: A Unified Real-Time Vision Model Family with NMS-Free Inference and Advanced Training Pipelinearxiv.org

Key quotes

· 5 pulled
Real-time vision demands models that are accurate, efficient, and simple to deploy across diverse hardware.
YOLO26 uses a dual-head design for native NMS-free end-to-end inference and removes DFL entirely, yielding a lighter head with unconstrained regression range.
Its training pipeline combines MuSGD, a hybrid Muon-SGD optimizer adapted from large language model training; Progressive Loss, which shifts supervision toward the inference-time head; and STAL, a label assignment strategy that guarantees positive coverage for small objects.
Across all scales, YOLO26 achieves 40.9-57.5 mAP on COCO at 1.7-11.8 ms T4 TensorRT latency, advancing the accuracy-latency Pareto front over prior real-time detectors.
The family spans five scales (n/s/m/l/x) and supports detection, instance segmentation, pose estimation, classification, and oriented detection in a single pipeline, with an open-vocabulary extension, YOLOE-26, for text-, visual-, and prompt-free inference.
Snippet from the RSS feed
Real-time vision demands models that are accurate, efficient, and simple to deploy across diverse hardware. The YOLO family has become widely deployed for this reason, yet most YOLO detectors still rely on non-maximum suppression at inference, carry heavy

You might also wanna read

Lumos-Nexus: A Training-Efficient Two-Stage Framework for High-Fidelity Video Generation with Reasoning Capabilities

Lumos-Nexus is a training-efficient unified video generation framework that addresses the computational challenge of integrating large high-

arxiv.org·23d ago

LoomVideo: A 5B-Parameter Unified Model for Efficient Video Generation and Editing

LoomVideo is a new 5-billion parameter unified architecture for video generation and editing that addresses computational bottlenecks in exi

arxiv.org·18d ago

RTP-LLM: Alibaba's High-Performance Inference Engine for Large Language Model Deployment

This paper presents RTP-LLM, a high-performance inference engine developed by Alibaba for industrial-scale deployment of Large Language Mode

arxiv.org·25d ago

ReMoT: A Reinforcement Learning Framework Using Motion Contrast Triplets to Improve VLM Spatio-Temporal Reasoning

ReMoT (Reinforcement Learning with Motion Contrast Triplets) is a unified training paradigm designed to address spatio-temporal consistency

arxiv.org·12d ago

LK Losses: A New Training Objective to Optimize Acceptance Rate in Speculative Decoding for LLMs

This paper introduces LK losses, a novel training objective for speculative decoding in large language models (LLMs). Speculative decoding a

arxiv.org·22d ago

PromptEmbedder: A Dual-LLM Framework for Efficient, Architecture-Agnostic Text Embedding

The article presents PromptEmbedder, a novel dual-LLM framework for efficient and transferable text embedding. It addresses the bottleneck o

arxiv.org·27d ago

Comments

Sign in to join the conversation.

No comments yet. Be the first.