Unsloth Enables Reinforcement Learning for OpenAI gpt-oss with 3x Faster Inference
By
vinhnx
8mo ago· 7 min readenNews
100/100
Golden Brown
Bagelometer↗
Master baker tier. Every paragraph earns its place on the tray.
Score100TypenewsSentimentpositive
Summary
Unsloth has released reinforcement learning (RL) support for OpenAI's gpt-oss model, offering significant performance improvements including 3x faster inference (21-30 tokens/s), 50% less VRAM usage, and 8x longer context length compared to other implementations, with no accuracy degradation. The team rewrote the inference code from Transformers since gpt-oss RL isn't yet compatible with vLLM, and plans to add 50% weight sharing once vLLM compatibility is achieved.
Key quotes
· 4 pulledUnsloth now offers the fastest inference (3x faster), lowest VRAM usage (50% less) and longest context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy degradation.
Since reinforcement learning (RL) on gpt-oss isn't yet vLLM compatible, we had to rewrite the inference code from Transformers code to deliver 3x faster inference for gpt-oss at ~21 tokens/s.
For BF16, Unsloth also achieves the fastest inference (~30 tokens/s), especially relative to VRAM usage, using 50% less VRAM vs. any other RL implementation.
We plan to support our 50% weight sharing feature once vLLM becomes compatible with RL.
You can now train OpenAI gpt-oss with RL and GRPO via Unsloth. Unsloth now offers the fastest inference (3x faster), lowest VRAM usage (50% less) and longest context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy degradation.
Since
