Cloudflare Made Agent Sandboxing an Edge Runtime Problem

by pr0xy · 2026-05-02

Cloudflare’s new agent stack is a runtime argument disguised as a product launch: if agents are going to write and execute code, the hard problem is no longer the chat interface. It is where the code runs, what it can touch, how long the work survives, and whether credentials ever cross into the model’s blast radius.

This is a revisit of the pattern I wrote about in April 22 Was the Day Agents Became Infrastructure. That post was about the stack re-centering around agents across the IDE, cloud workspace, and CLI. The new development is narrower and more structural: Cloudflare has now published the execution substrate. Dynamic Workers put AI-generated code into V8 isolates on demand, Project Think wraps agents in durable execution and sub-agent primitives, and Cloudflare’s own internal AI engineering stack shows what happens when those ideas are wired into a real engineering organization instead of a launch demo.

The headline number is tempting: Dynamic Workers are billed as agent sandboxes that start roughly 100 times faster than traditional containers and use 10 to 100 times less memory. That matters, but it is not the thesis. The thesis is that agent execution is starting to look like edge runtime design: short-lived programs, capability-scoped bindings, cold-start economics, per-request isolation, and explicit outbound control.

Code mode is a runtime demand

Cloudflare’s premise is that agents should often stop making long chains of tool calls and instead write a small program that calls APIs directly. They call this Code Mode. In their framing, converting an MCP server into a TypeScript API can cut token usage by 81 percent, and their Cloudflare MCP server exposes the whole API through two tools and under 1,000 tokens instead of a massive pile of endpoint schemas.

That is a real systems insight. Tool calling is comfortable because it keeps the model inside a narrow protocol, but it is often wasteful. Every intermediate observation gets serialized back into the context window. Every API call becomes another turn. Every filter, join, retry, and map operation becomes a language-model event rather than a cheap program step.

A generated TypeScript function can do the boring parts locally: call three endpoints, filter the JSON, retry a 429, discard irrelevant fields, and return only the answer the model needs. That is exactly what computers were already good at before we started pretending every action needed to be narrated through a chat transcript.

The catch is obvious. If the model writes code, something has to execute it. Running eval() next to your application secrets is malpractice. Shipping every tiny generated function into a Linux container is safer, but slow and expensive when the expected workload is millions of short agent bursts. A container that takes hundreds of milliseconds and hundreds of megabytes to boot is the wrong shape for a five-line API composition.

Dynamic Workers are Cloudflare’s answer: let a parent Worker instantiate a child Worker at runtime with code supplied on the fly. The child runs as a separate Worker, with a declared compatibilityDate, a mainModule, an explicit module map, and a controlled environment. In the docs, the simplest version is a parent calling env.LOADER.load() with a little ES module string, then forwarding a request to worker.getEntrypoint().fetch(request).

That is not just faster sandboxing. It is a new place to put agent cognition: between the model and the API, close enough to the request to be cheap, but separated enough from the host application to be governed.

The security model is capability plumbing

The most interesting Dynamic Workers detail is not V8. It is globalOutbound.

Cloudflare lets the parent Worker block or intercept outbound network access from the child. Set globalOutbound: null and the generated code cannot phone home. Intercept the request and the parent can attach credentials after the sandbox has already decided what to call. The model can write, “fetch the calendar endpoint,” without ever seeing the OAuth token that makes the call work.

That is the correct direction. Most AI tooling still treats secrets like something the agent can be trusted not to leak if the prompt is stern enough. That is unserious. Secrets should live outside the generated program. The sandbox should receive capabilities, not ambient authority.

This is old thinking in a new costume. Capability security has been around for decades. UNIX file descriptors, object capabilities, browser permissions, and mobile app sandboxes all circle the same idea: do not ask whether code is good, give it only the handles it needs. Agent sandboxes make the idea urgent again because the code is not merely third-party. It is synthetic, per-task, and likely to be shaped by untrusted input.

Cloudflare is honest about the tradeoff. V8 isolates are not microVMs. They share more machinery than a Firecracker VM or a gVisor-style container boundary. The reason to use them is speed and density. The reason to worry is that V8 bugs exist, Spectre-class problems are real, and isolate hosting requires serious mitigation work. Cloudflare points to rapid V8 patching, second-layer sandboxing, dynamic cordoning, MPK, and Spectre defenses. That is credible because Workers have lived on this model for years. It is still a bet.

The practical split is clear. If an agent needs to run cargo build, install system packages, inspect a repo, and execute arbitrary binaries, it wants a heavier sandbox. If it needs to compose APIs, transform JSON, run small JavaScript or Python snippets, or produce a generated web app, an isolate may be the right unit. Calling one universally safer is vendor fog. The right question is workload shape.

Project Think turns the sandbox into an actor

Dynamic Workers handle the moment of execution. Project Think handles the longer agent lifecycle around it.

Think packages several primitives: durable execution through Fibers, sub-agents called Facets, persistent sessions modeled as forkable trees, sandboxed code execution through Dynamic Workers, and self-authored TypeScript extensions. The important infrastructure choice is Durable Objects. An agent becomes closer to an actor with built-in SQLite and hibernation than an always-on VM with a database bolted to the side.

Cloudflare’s comparison is blunt: ten thousand agents with one percent active should not require ten thousand always-on instances. They should require roughly the active slice, plus durable state. That is the serverless pitch applied to agents, but it lands because agent workloads are mostly idle until they are not. A background research agent, support assistant, or coding helper spends most of its life waiting for a user, a webhook, a timer, or a model response.

The execution ladder in Think is also revealing. It starts with a workspace filesystem, then Dynamic Worker, then npm resolution, then browser rendering, then a full sandbox for git, npm test, or cargo build. That hierarchy is sane. Most agent tasks do not deserve a VM. Some absolutely do. The stack should escalate, not start every job at the heaviest possible isolation tier because the platform had no cheaper primitive.

This is where the April 22 agent-infrastructure pattern gets more concrete. The earlier wave made agents visible in the IDE, workspace, and CLI. Cloudflare’s wave makes them schedulable, sandboxable, resumable, and meterable. That is less glamorous than another model selector. It matters more.

Internal adoption is the buried source

Cloudflare’s internal write-up is the least hypey and most useful source here. They claim that in the last 30 days, 93 percent of R&D used AI coding tools routed through infrastructure they built, with 47.95 million AI requests and 241.37 billion tokens passing through AI Gateway. They also say weekly merge requests rose from a Q4 baseline around 5,600 to a peak of 10,952.

The velocity number should be handled carefully. Merge request volume is not quality. It can mean productivity, churn, smaller diffs, more bot-generated noise, or some combination of all four. But the architecture around the number is interesting: Access for authentication, AI Gateway for routing and zero-data-retention controls, Workers AI for high-volume open models like Kimi K2.5, an MCP portal, Backstage as a service graph, AGENTS.md files in around 3,900 repos, and CI-level AI code review tied to an Engineering Codex.

That is the real shape of production agent adoption. Not a magic chatbot. A control plane.

The model is almost secondary. The wiring is the product: who is allowed to invoke which tool, which repos tell agents how to behave, which service catalog gives context, which reviewer enforces standards, which gateway logs cost and policy, which sandbox executes generated code, and which durable object remembers the work. You can swap models under that. You cannot improvise that governance with a prompt.

The browser history matters

Dynamic Workers also pulls agent infrastructure back into old web history. JavaScript was sandboxed because the browser needed to run hostile code from arbitrary sites. Web Workers were added because single-threaded UI execution was not enough. Service Workers turned scripts into programmable network intermediaries. Cloudflare Workers moved that browser-shaped execution model to the edge.

Agent sandboxes are a weird continuation of that lineage. The hostile code is no longer just a script from a random website. It might be a fresh program generated by a model after reading a user’s bug report, a Slack thread, or a malicious support ticket. The same old browser question comes back with sharper teeth: how do you run code you do not trust without letting it become the machine?

This is why the edge-runtime framing is better than the chatbot framing. The core object is not a conversation. It is a tiny, disposable, policy-bound program. The model is the compiler with questionable taste. The sandbox is the thing standing between a clever plan and a credential spill.

There is plenty to criticize. TypeScript is a convenient lingua franca for LLMs and Cloudflare’s platform, but not every useful agent computation wants to live in JavaScript. Isolates are efficient, but they make security a platform trust question. The $0.002 per unique Worker per day price, waived during beta, looks small until someone builds a pathological workload that generates unique sandboxes like cache misses. The marketing phrase “100x faster” will get repeated by people who never ask faster than what, for which workload, under which threat model.

Still, the direction is right. Agent infrastructure needs more runtime design and less theater. Code Mode says the model should generate programs instead of narrating every API call. Dynamic Workers says those programs should run with explicit capabilities. Project Think says the surrounding agent should be durable, forkable, and hibernatable. The internal stack says adoption depends on gateways, service catalogs, repo-local instructions, and enforcement.

The agent era will not be won by the vendor with the most cinematic demo of a bot editing a file. It will be won by whoever makes the boring substrate work: fast sandboxes, scoped credentials, resumable execution, inspectable state, and policies that survive contact with real organizations.

Cloudflare’s launch is not proof that isolates are the final answer. It is proof that agent execution has become an infrastructure problem. That is the part worth paying attention to.