Back to Knowledge
BlogResilience
Apr 15, 20266 min readGilles Seghaier

Claude Mythos Offensive Capabilities: What the Numbers Say, and What They Mean for the Ecosystem

Anthropic has opened Claude Mythos, its new model with expert-level cyberattack capabilities, to a handful of players dominating cloud and cybersecurity.

The numbers behind this release are worth sitting with carefully. Not because they are surprising; anyone tracking AI capabilities over the past 18 months could see this coming; but because of what they imply for the entire software ecosystem, and for how unevenly the benefits of this moment will be distributed.

The performance leap: not incremental, structural

The benchmark data published by Anthropic and independently verified by the UK's AI Security Institute (AISI) tells a clear story.

On a Firefox 147 exploit benchmark, Claude Opus 4.6; the previous frontier model; produced working shell exploits twice across several hundred attempts. Mythos Preview succeeded 181 times, with register control achieved in 29 additional cases. That is a factor of 90.

Across roughly 7,000 entry points in open-source repositories, Opus 4.6 achieved a single tier-3 crash on a five-tier severity scale. Mythos Preview reached tier 5; full control-flow hijack; on ten separate, fully patched targets.

On expert-level Capture the Flag tasks, which no model could complete before April 2025, Mythos Preview now succeeds 73% of the time.

These are not marginal improvements. They represent a structural shift in what AI can do autonomously in an offensive security context.

Decades-old vulnerabilities, found without human involvement

What makes Mythos qualitatively different from its predecessors is not just raw performance; it is the nature of what it finds.

Documented findings include a 27-year-old bug in OpenBSD (one of the most security-hardened operating systems in the world), a 16-year-old flaw in FFmpeg's H.264 codec in a line of code that automated testing tools had hit five million times without catching, and a guest-to-host memory corruption vulnerability in a production hypervisor.

The 17-year-old FreeBSD vulnerability CVE-2026-4747 was identified and fully exploited end-to-end without any human involvement after the initial prompt, granting unauthenticated remote root access.

The implication is significant: the long tail of vulnerabilities that survived decades of human review and automated scanning is now systematically explorable. The safety margin that complexity provided is collapsing.

Chained attacks: from isolated flaws to end-to-end compromise

Mythos does not stop at a single bug. It autonomously chains 3 to 5 vulnerabilities into multi-stage attack sequences: initial access, privilege escalation, credential harvesting, and lateral movement.

The clearest proof point is “The Last Ones” (TLO), a 32-step corporate network attack simulation that AISI estimates would take human experts 20 hours to complete. Mythos Preview is the first model to solve it end-to-end, succeeding in 3 out of 10 attempts. Claude Opus 4.6, the next best performing model, completed an average of 16 out of 32 steps. Mythos averaged 22.

This is the moment where AI-augmented attacks stop being a theoretical risk and start being an operational planning assumption.

The compression of attack timelines and the widening patch gap

The broader threat landscape context makes this more urgent. Average time-to-exploit has dropped to 5 days, down from 30 days in 2022. In the first half of 2025, 32% of CVEs were exploited on or before the day they were disclosed. Average lateral movement time after initial access: 29 minutes. Fastest observed: 27 seconds.

On the defensive side, the median organizational patch window remains stuck at 70 days. The share of organizations deploying critical patches within 30 days has actually declined, from 45% to 30% since 2022.

Mythos does not create this gap. It makes it impossible to ignore.

An opportunity, but only if we move fast enough

This is a real opportunity to collectively raise the security baseline of the software ecosystem. Mythos-class capabilities can find vulnerabilities at scale, autonomously, across codebases that have resisted human review for years. Used defensively, that is a powerful tool.

But there is one condition: we need to be capable of patching our systems, our software supply chains, and our operational maintenance tools as fast as the wave of CVEs that is about to break. Those who have not laid their foundations block by block before nightfall will know what it costs them.

AISI has already documented the model's offensive capabilities and its current limits. The window for defenders to act before these techniques proliferate further is short.

The distribution problem: who actually benefits first

Here is what I find genuinely troubling about this moment.

Anthropic has announced Project Glasswing, which brings together AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks to use Mythos Preview for defensive security work. Alongside these launch partners, access has been extended to a group of over 40 additional organizations that build or maintain critical software infrastructure.

Anthropic is committing $100M in usage credits for Mythos Preview across these efforts, as well as $4M in direct donations to open-source security organizations.

Let us be clear about what this means in practice: the overwhelming majority of these resources are earmarked for the roughly one hundred organizations already selected into the Glasswing program. Large incumbent software vendors will be the first to use Mythos-level capabilities to scan and patch their own products. They will fix their own software before the rest of the market even has access to the tool.

Smaller software publishers and large European enterprises will, for now, have to make do with applying patches decided elsewhere, on their dependencies, and largely at their own expense.

The $100M figure sounds substantial. Distributed across a hundred organizations with complex, large-scale codebases, it is a starting point, not a solution. The open-source donation envelope ($4M) is the only mechanism through which the broader ecosystem gets meaningful access in this first wave, and it is modest relative to the scale of the problem.

What Anthropic says about where these capabilities came from

One detail worth noting: Anthropic has stated that Mythos's offensive capabilities were not explicitly trained into the model. They emerged as a downstream consequence of broader improvements in code reasoning and agentic autonomy. The same improvements that make the model more effective at patching vulnerabilities also make it more effective at exploiting them.

This matters because it forecloses the idea that capability containment is a long-term strategy. The next generation of models will be more capable still, not because anyone chose to make them more dangerous, but because general intelligence improvements in code reasoning are dual-use by nature.

What organizations should do now

The AISI's guidance is direct: prioritize cybersecurity fundamentals. Regular application of security updates, robust access controls, security configuration, and comprehensive logging are not glamorous but they are the difference between being exploitable by Mythos-class tools and not.

The window between when capabilities like this exist and when they become accessible to a wider range of actors; including adversarial ones; is measured in months, not years. Investment in cyber defense now is not optional.

Project Glasswing is described by Anthropic as a starting point. That framing is accurate. No single initiative, however well-resourced, can solve a problem of this structural depth. Frontier AI developers, software vendors, security researchers, open-source maintainers, and governments all have essential roles to play.

The question is whether the ecosystem as a whole will move fast enough to be ahead of the threat when access to these capabilities is no longer restricted to a hundred pre-selected organizations.

Sources
AISI evaluation of Claude Mythos Preview: aisi.gov.uk
Project Glasswing announcement: anthropic.com/glasswing
Anthropic Frontier Red Team blog: red.anthropic.com