All Topics

Technology

Art

Unsloth Enables Reinforcement Learning for OpenAI gpt-oss with 3x Faster Inference

vinhnx

8mo ago· 7 min readenNews

100/100

Golden Brown

Bagelometer↗

Master baker tier. Every paragraph earns its place on the tray.

Score100TypenewsSentimentpositive

Summary

Unsloth has released reinforcement learning (RL) support for OpenAI's gpt-oss model, offering significant performance improvements including 3x faster inference (21-30 tokens/s), 50% less VRAM usage, and 8x longer context length compared to other implementations, with no accuracy degradation. The team rewrote the inference code from Transformers since gpt-oss RL isn't yet compatible with vLLM, and plans to add 50% weight sharing once vLLM compatibility is achieved.

Key quotes

· 4 pulled

Unsloth now offers the fastest inference (3x faster), lowest VRAM usage (50% less) and longest context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy degradation.

Since reinforcement learning (RL) on gpt-oss isn't yet vLLM compatible, we had to rewrite the inference code from Transformers code to deliver 3x faster inference for gpt-oss at ~21 tokens/s.

For BF16, Unsloth also achieves the fastest inference (~30 tokens/s), especially relative to VRAM usage, using 50% less VRAM vs. any other RL implementation.

We plan to support our 50% weight sharing feature once vLLM becomes compatible with RL.

Snippet from the RSS feed

You can now train OpenAI gpt-oss with RL and GRPO via Unsloth. Unsloth now offers the fastest inference (3x faster), lowest VRAM usage (50% less) and longest context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy degradation. Since

You might also wanna read

Unsloth: Open-Source Platform for Local AI Model Training and Inference

Unsloth is an open-source platform that enables users to run and train AI models and large language models (LLMs) locally on their own hardw

Product Hunt·2mo ago