Project Glasswing

And hours after posting about how ill-prepared we all are for the cybersecurity implications of agentic AI, here comes Project Glasswing.

Anthropic partnered with eleven additional companies “to secure the world’s most critical software.” Anthropic continues:

We formed Project Glasswing because of capabilities we’ve observed in a new frontier model trained by Anthropic that we believe could reshape cybersecurity. Claude Mythos2 Preview is a general-purpose, unreleased frontier model that reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.

Echoing and building on yesterday’s post:

Ten years after the first DARPA Cyber Grand Challenge, frontier AI models are now becoming competitive with the best humans at finding and exploiting vulnerabilities. Without the necessary safeguards, these powerful cyber capabilities could be used to exploit the many existing flaws in the world’s most important software. This could make cyberattacks of all kinds much more frequent and destructive, and empower adversaries of the United States and its allies. Addressing these issues is therefore an important security priority for democratic states.

An intentional decision to slow down the script kiddies for a bit while the world tries to clean up code. Interesting times, indeed. A frontier model comapny is withholding a model over safety concerns. Those with long memories will remember OpenAI briefly doing something similar with GPT-2 in 2019. However, the 2019, the concern was the danger of people being fooled by LLMs constructing fake news storiea about unicorns:

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved. Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow. Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.

7 years later, we have push-button superhuman cyber attacks.

Of course, the rest of the world isn’t slowing down, with China’s Z.ai launching GLM-5.1 and specifically calling out cybersecurity performance (though still substantially behind what Mythos claims).

Interesting times, indeed.