OpenAI Launches WebSocket Mode for Responses API to Reduce Latency by 40%
By
Rohan Chaubey
Leave it on the tray for the seagulls.
Summary
OpenAI has introduced WebSocket Mode for its Responses API, which maintains persistent connections to reduce latency by up to 40% in AI agent workflows. Instead of resending full context with each agent turn, the system only sends incremental inputs, significantly improving efficiency for heavy tool-call operations.
Key quotes
· 3 pulledEvery agent turn, you're resending the full context. Again. That overhead compounds fast.
WebSocket Mode for the Responses API keeps a persistent connection, sends only incremental inputs, and cuts end-to-end latency by up to 40% on heavy tool-call workflows.
That overhead compounds fast.
You might also wanna read
How OpenAI rebuilt its WebRTC stack for low-latency voice AI at scale
OpenAI rearchitected its WebRTC stack to address three key constraints for real-time voice AI: low-latency audio delivery, global scale, and

OpenAI Launches Cloud-Based Workspace AI Agents for Business and Education Plans
OpenAI is introducing cloud-based "workspace" AI agents for its Business, Enterprise, Edu, and Teachers plan users. These agents can perform
OpenAI Launches Workspace Agents in ChatGPT for Team Automation
OpenAI introduces workspace agents in ChatGPT, which are Codex-powered shared agents that can handle complex tasks and long-running workflow
