Anthropic Published the Missing Manual for AI-Assisted Exploits

This is a follow-up to The N-Day That Should Have Been Dead. The new part is not another round of vague claims about AI in security. The new part is that Anthropic published a technical post that reverse engineers one of Claude’s own working browser exploits, then placed it next to a larger warning shot about where the curve is going next.

That matters because the conversation can stop pretending this is still a benchmark parlor game.

In Reverse engineering Claude’s CVE-2026-2796 exploit, Anthropic lays out how Claude Opus 4.6 turned a patched Firefox WebAssembly bug into a working exploit in a test environment. In Assessing Claude Mythos Preview’s cybersecurity capabilities, the company then says the successor class is already far beyond that point. The delta between those two documents is the real story. Opus 4.6 needed a harness, a verifier, hundreds of shots, and a softened target. Mythos Preview allegedly turns that whole process into a much more autonomous overnight operation.

The usual lazy take is that this is just Anthropic marketing wrapped in scary language. That is too easy. Marketing departments do not usually publish exploit anatomy detailed enough to walk through Function.prototype.call.bind(...), Wasm import typing, interop boundaries, and the optimization bug that let a wrapped function smuggle the wrong kind of object across a safety boundary. This is not launch-event vapor. This is a lab notebook with the company logo still attached.

The Firefox bug in question, CVE-2026-2796, lives in an optimization path around WebAssembly imports. Under normal conditions, Wasm and JavaScript have an interop layer that converts values at the boundary so mismatched types do not get reinterpreted as raw bits. The interesting crack opened when Firefox tried to optimize a call.bind wrapper around an imported function. Anthropic’s write-up says the engine treated a wrapped Wasm function like a JavaScript function in the import path, which bypassed the usual conversion safety. That is where type confusion becomes useful instead of theoretical.

This level of detail is the important shift. The public argument about AI-assisted exploitation has spent too much time on theater: did the model do it “autonomously,” did a human babysit it, does it count if the sandbox was disabled, is this just a glorified fuzzer, and so on. Those questions are not irrelevant, but they are also where people hide when they do not want to face the operational point. Attackers do not need perfect autonomy. They need a system that makes exploit development cheaper, faster, more iterative, and more legible. Anthropic just published evidence for exactly that.

The company is careful with caveats. Opus 4.6 did not produce a full real-world browser escape chain. The exploit only worked in a testing environment where some modern browser protections were intentionally absent, most importantly the sandbox. Anthropic says Opus 4.6 only converted a bug into a successful exploit in two cases out of many opportunities, after roughly 350 attempts with a VM and a task verifier. Good. Keep those caveats. They matter.

But the caveats do not cancel the trend. They define the baseline.

That is what I think too many people missed in the earlier coverage, including some of the hand-wavy panic around Mythos. The point is not that current models have already replaced elite exploit developers. The point is that the exploit pipeline is getting decomposed into parts that machines can increasingly handle: triage, hypothesis generation, testcase minimization, exploit iteration, root-cause reasoning, patch reading, and eventually chain construction. Mozilla engineers said in the Hacker News discussion around Anthropic’s Firefox work that the bug reports were excellent, with minimal test cases, detailed proofs of concept, and even candidate patches. That is not just a party trick. That is labor displacement inside one of the most expensive parts of software security.

The bigger tell is what changed between Opus 4.6 and Mythos Preview. In the Firefox benchmark Anthropic cites, Opus 4.6 reportedly produced only 2 successful shell exploits after several hundred tries. Mythos Preview reportedly produced 181 working exploits plus 29 more cases with register control in a harness modeled on a Firefox 147 content process. That harness still stripped away major defenses. Fine. But going from 2 to 181 is not noise. That is the kind of jump that turns a curiosity into a planning problem.

This is also why I think my earlier post on the Discord Chrome chain still holds up, but needs an update. In The N-Day That Should Have Been Dead, the core argument was that the structural weak point was update propagation, not some mystical leap to machine superpowers. That remains true. Public patch lands, vulnerable downstream bundle lags, attacker gets a widening window. What Anthropic added this month is a much clearer picture of how the attacker side of that equation is being industrialized.

The missing manual is no longer missing.

Anthropic’s exploit write-up gives defenders a perverse gift. It translates “AI can help exploit bugs” into concrete engineering questions. What harnesses can a model use? How many attempts does it need? What kinds of bugs are easiest to weaponize first? Which mitigations still force a human bottleneck? Which subsystems, like Wasm, JITs, browser graphics, and media stacks, expose enough weirdness that iterative search plus strong reasoning can outperform old workflows? If you maintain a browser engine, runtime, media parser, virtualization layer, or network-facing parser, these are now budgeting questions, not thought experiments.

There is another uncomfortable lesson here. Security people spent years saying that open source has an advantage because everyone can inspect the code and the patches. True. But the same visibility that helps defenders also gives models a beautifully structured corpus of bug classes, fixes, regressions, and exploit-adjacent reasoning. “Given enough eyeballs, all bugs are shallow” was always missing a clause: some of those eyeballs can now be synthetic, tireless, and cheap enough to run in parallel.

That does not mean closed source is safe. Anthropic explicitly says Mythos can reverse engineer stripped binaries and find bugs in closed-source software too. It just means the old comfort blanket around patch transparency is thinner than people want to admit.

If you want the most honest reading of these documents, it is this: Opus 4.6 still looks like a powerful apprentice. Mythos Preview looks like the point where the apprentice starts eating chunks of the senior workflow. Not all of it. Not reliably. Not without guardrails or verifier infrastructure. But enough of it that defenders should stop centering the debate on whether the model deserves the word autonomous.

Autonomy is not the threshold that matters. Throughput is.

If a model can generate exploit ideas, refine them against a verifier, explain root cause in plain English, produce a minimal testcase for a maintainer, and keep trying hundreds of times while a human sleeps, the security labor market has already changed. The fact that the final real-world chain may still require a specialist is cold comfort when the specialist’s expensive hours are now focused only on the last mile.

And yes, this is a revisit. It should be. Earlier this year I wrote about Zero-Day CSS because browser bugs keep showing up in the places normal users think of as harmless plumbing. This Anthropic material sharpens the same lesson from a different angle: the renderer is not just a bug farm, it is now a training ground for machine-assisted exploit development.

The right response is not panic and it is not cope. It is to compress patch windows, harden the boring boundaries, build internal red-team workflows that use the same class of tooling, and stop treating exploit development as an artisanal activity that only a small priesthood can scale.

Anthropic did not just publish a scary story. It published process.

That is why this one matters.