Agents, Workers - Agents SDK improves browser automation, code execution, and recovery
18d ago
Source
CloudflareAgents, Workers - Agents SDK improves browser automation, code execution, and recoverycloudflare.comThe latest release of the Agents SDK makes it easier to build agents that can safely interact with real systems and keep working through interruptions. Agents can now browse websites through Browser Run, write code against external tools through Code Mode, use client-provided tools when delegating to Think sub-agents, and recover more reliably from deploys, Durable Object evictions, and connection churn. Safer browser automation Agents can now use Browser Run through a single durable browser_execute tool. Instead of choosing from a fixed list of actions, the model writes code against the Chrome DevTools Protocol (CDP) and can inspect pages, capture screenshots, read rendered content, debug frontend behavior, and interact with live browser sessions. JavaScript const browserTools = createBrowserTools ( { ctx : this . ctx , browser : this . env . BROWSER , loader : this . env . LOADER , session : { mode : "dynamic" }, } ) ; TypeScript const browserTools = createBrowserTools ( { ctx : this . ctx , browser : this . env . BROWSER , loader : this . env . LOADER , session : { mode : "dynamic" }, } ) ; Browser sessions can be one-time, reused, or promoted from one-time to persistent during a run. This is useful when an agent needs a human to log in, complete MFA, or approve a sensitive action. The run can pause, keep the same tabs and cookies, and resume after approval. The browser tools also add Live View URLs, optional session recording, and quick actions such as browser_markdown , browser_extract , browser_links , and browser_scrape for one-shot browsing tasks. Resumable code execution with approvals Code Mode now uses createCodemodeRuntime , connectors, and a durable execution log. This lets you give a model one codemode tool instead of a large prompt full of tool definitions. The model can discover the capabilities it needs, write code against typed globals, and reuse saved snippets. JavaScript const runtime = createCodemodeRuntime ( { ctx : this . ctx , executor : new DynamicWorkerExecutor ( { loader : this . env . LOADER } ) , connectors : [ new GithubConnector ( this . ctx , this . env , connection )] , } ) ; const result = streamText ( { model , messages , tools : { codemode : runtime . tool () }, } ) ; TypeScript const runtime = createCodemodeRuntime ( { ctx : this . ctx , executor : new DynamicWorkerExecutor ( { loader : this . env . LOADER } ) , connectors : [ new GithubConnector ( this . ctx , this . env , connection )] , } ) ; const result = streamText ( { model , messages , tools : { codemode : runtime . tool () }, } ) ; When the code reaches an approval-gated action, the runtime pauses execution and returns a pending approval. After approval, completed calls replay from the durable log, the approved action runs, and the same code continues. This makes it practical to build agents that create issues, update external systems, or perform other side effects without custom pause-and-resume logic for every tool. Better Think delegation Think sub-agents can now use client-defined tools over the RPC chat() path. A parent agent can pass tool schemas with clientTools and resolve tool calls through onClientToolCall . This lets delegated agents use caller-provided capabilities without requiring a browser WebSocket. JavaScript await child . chat ( message , callback , { signal , clientTools : [ { name : "get_user_timezone" , description : "Get the caller's timezone" , parameters : { type : "object" }, }, ] , onClientToolCall : async ({ toolName , input }) => { return runClientTool ( toolName , input ) ; }, } ) ; TypeScript await child . chat ( message , callback , { signal , clientTools : [ { name : "get_user_timezone" , description : "Get the caller's timezone" , parameters : { type : "object" }, }, ] , onClientToolCall : async ({ toolName , input }) => { return runClientTool ( toolName , input ) ; }, } ) ; Think Workflows also improve step.prompt() . A prompt step now runs a full agentic turn before returning structured output, so the agent can call tools before producing the typed result. This makes Workflow steps more useful for durable triage, research, and approval flows. The unified Think execute tool can also include cdp.* browser capabilities alongside state.* and tools.* when Browser Run is bound. Voice output device selection Voice clients can route assistant audio to a specific output device. Use outputDeviceId with useVoiceAgent , or call client.setOutputDevice() from the framework-agnostic client. JavaScript const voice = useVoiceAgent ( { agent : "MyVoiceAgent" , outputDeviceId : selectedSpeakerId , } ) ; TypeScript const voice = useVoiceAgent ( { agent : "MyVoiceAgent" , outputDeviceId : selectedSpeakerId , } ) ; Browsers without speaker-selection support continue playing through the default output device and report a non-fatal outputDeviceError . Reliability fixes This release includes several fixes for production agents: useAgent and AgentClient handle WebSocket replacement more reliably during reconnects and configuration changes. Chat stream replay is more reliable after reconnects, deploys, and provider errors. Fiber recovery continues across multi-pass scans and backs off when recovery hooks keep failing. Agent teardown continues even when the request that started teardown is canceled. Large session histories use byte-budgeted reads to reduce memory pressure during startup. Upgrade To update to the latest version: npm i agents@latest @cloudflare/think@latest @cloudflare/codemode@latest @cloudflare/ai-chat@latest @cloudflare/voice@latest yarn add agents@latest @cloudflare/think@latest @cloudflare/codemode@latest @cloudflare/ai-chat@latest @cloudflare/voice@latest pnpm add agents@latest @cloudflare/think@latest @cloudflare/codemode@latest @cloudflare/ai-chat@latest @cloudflare/voice@latest bun add agents@latest @cloudflare/think@latest @cloudflare/codemode@latest @cloudflare/ai-chat@latest @cloudflare/voice@latest Refer to the Code Mode documentation , Browser tools documentation , Think tools documentation , and Voice documentation for more information.
You might also wanna read
Web Browsers as Secure Sandboxes for AI Coding Agents
The article discusses how web browsers serve as an ideal sandbox environment for AI coding agents, leveraging 30 years of development in run
simonwillison.net·5mo ago
BrowserAct: Open-source browser automation layer for AI agents to handle real web complexities
BrowserAct is an open-source browser automation tool designed for AI agents, helping them navigate real-world web complexities like login st
Agent Browser Protocol (ABP): Deterministic Browser Automation for AI Agents
The article introduces ABP (Agent Browser Protocol), a Chromium fork designed for deterministic browser automation that works seamlessly wit
agent-browser | Browser Automation for AI
Open Browser Use: Open-source local-first browser automation for AI agents
Open Browser Use is an open-source, local-first browser automation tool that connects AI agents (like Codex, Claude Code) to real Chrome pro
agentbrowse: A CLI tool that lets AI coding agents interact with websites via accessibility tree navigation
agentbrowse is a tool that bridges the gap between AI coding agents and web browsing. While AI agents excel at using command-line interfaces

Comments
Sign in to join the conversation.
No comments yet. Be the first.