Skip to main content

Anthropic Built Project Glasswing With Google and Apple to Test Whether AI Can Be Kept From Breaking Everything

Anthropic's Project Glasswing brought together Apple, Google, and 45+ organizations to test whether AI can be secured collectively — the labs racing to build AGI just admitted they can't contain it alone.

Anthropic Built Project Glasswing With Google and Apple to Test Whether AI Can Be Kept From Breaking Everything
Image via WIRED

More than 45 organizations — including Apple, Google, and a roster of AI labs that spend most of their time competing for dominance — just agreed to work together on Project Glasswing, Anthropic's new initiative to test whether advanced AI models can be prevented from hacking critical infrastructure. The collaboration uses Claude Mythos Preview, Anthropic's latest model, as the test subject. The premise is straightforward: if the companies building the most powerful AI systems can't figure out how to secure them collectively, nobody's individual defenses will matter.

The timing is not subtle. AI labs have spent the last two years in a capabilities arms race, each one promising their model is smarter, faster, and more aligned than the competition. Now those same labs are admitting — through action, if not press releases — that the technology they're racing to deploy could be weaponized faster than any single company can patch the vulnerabilities. Project Glasswing is what happens when the people building the future realize they might have built something they can't individually control.

Cybersecurity has always been a collective action problem, but AI accelerates it. A model sophisticated enough to write code, analyze systems, and identify exploits doesn't need a human to point it at a target. It just needs access and motivation — or in the case of a jailbroken or misaligned system, a poorly designed guardrail. The difference between a helpful coding assistant and a tool that can probe hospital networks or financial infrastructure is a matter of context and constraints, not capability. Anthropic's decision to build Glasswing as a multi-organization stress test is an acknowledgment that those constraints need to be tested at scale, not just in-house.

What makes this collaboration notable is who's involved. Apple and Google are platform-level infrastructure companies with their own AI ambitions. They're not just testing Anthropic's model out of goodwill — they're stress-testing the ecosystem they're all building on. If Claude Mythos Preview can identify vulnerabilities in critical systems, every other frontier model will eventually be able to do the same. The question isn't whether AI will be used offensively; it's whether the defenses can be built faster than the exploits. Glasswing is the first large-scale attempt to find out.

The AI industry has a habit of treating safety as a post-deployment concern. Models get released, vulnerabilities get discovered, patches get issued, and the cycle repeats. Glasswing flips that logic: test the attack vectors before the model is widely available, involve the companies whose infrastructure is most at risk, and build defenses collaboratively rather than reactively. It's a pragmatic approach, but it also exposes how fragile the current system is. If 45 organizations need to coordinate just to test one model's cybersecurity risks, the operational reality of securing hundreds of models across thousands of use cases looks impossible.

This also clarifies what the AI platform economy actually looks like when the stakes move beyond content generation. Creative tools like Sora could be shut down when the business case collapsed. Critical infrastructure tools can't be. Once AI models are embedded in systems that control energy grids, transportation networks, or healthcare databases, the cost of failure isn't a PR crisis — it's material harm. Glasswing is an admission that the industry needs to figure out how to secure these systems before they're too embedded to patch.

The collaboration also highlights a tension the AI industry hasn't resolved: competition accelerates innovation, but it also fragments defenses. Every lab wants to be first to market with the most capable model, but that race creates blind spots. A vulnerability discovered in one model might exist in others, but companies don't share that information freely. Glasswing creates a structure where competitive pressures are temporarily suspended in favor of collective security. Whether that structure can scale beyond a single project is the real test.

What's missing from the announcement is any indication of what happens if Glasswing finds vulnerabilities that can't be patched. AI models are probabilistic systems, not deterministic ones. You can reduce the likelihood of a harmful output, but you can't eliminate it entirely. If Claude Mythos Preview identifies an exploit that's inherent to how large language models process information, the fix might require rethinking the architecture — not just adding another guardrail. That's a problem the industry has been deferring, and Glasswing might force it into the open.

The project also raises questions about who gets access to the findings. If Glasswing identifies critical vulnerabilities in widely used systems, does that information stay within the 45 participating organizations, or does it get disclosed publicly? The AI safety community has been debating responsible disclosure for years, but there's no consensus. Some argue that transparency is the only way to build trust; others warn that publishing vulnerabilities before patches are ready just arms bad actors. Anthropic hasn't said which approach Glasswing will take, but the decision will set a precedent.

For now, Project Glasswing is a signal that the AI industry knows it has a problem. Whether it can solve that problem collectively — while still competing for market share, racing to deploy new models, and managing investor expectations — is the open question. The companies that built the technology now have to prove they can contain it. If they can't, the rest of us will find out the hard way.

More in

See All →