All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Unsloth Enables Reinforcement Learning for OpenAI gpt-oss with 3x Faster Inference

By

vinhnx

8mo ago· 7 min readenNews

Summary

Unsloth has released reinforcement learning (RL) support for OpenAI's gpt-oss model, offering significant performance improvements including 3x faster inference (21-30 tokens/s), 50% less VRAM usage, and 8x longer context length compared to other implementations, with no accuracy degradation. The team rewrote the inference code from Transformers since gpt-oss RL isn't yet compatible with vLLM, and plans to add 50% weight sharing once vLLM compatibility is achieved.

Key quotes

· 4 pulled
Unsloth now offers the fastest inference (3x faster), lowest VRAM usage (50% less) and longest context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy degradation.
Since reinforcement learning (RL) on gpt-oss isn't yet vLLM compatible, we had to rewrite the inference code from Transformers code to deliver 3x faster inference for gpt-oss at ~21 tokens/s.
For BF16, Unsloth also achieves the fastest inference (~30 tokens/s), especially relative to VRAM usage, using 50% less VRAM vs. any other RL implementation.
We plan to support our 50% weight sharing feature once vLLM becomes compatible with RL.
Snippet from the RSS feed
You can now train OpenAI gpt-oss with RL and GRPO via Unsloth. Unsloth now offers the fastest inference (3x faster), lowest VRAM usage (50% less) and longest context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy degradation. Since

You might also wanna read