Cloudflare tested Anthropic's Mythos Preview, a security-focused AI model, across more than 50 of its own code repositories as part of Project Glasswing. The findings demonstrate that Mythos Preview identifies exploit chains that earlier frontier models failed to catch.

Mythos Preview represents a specialized approach to AI-driven security research. Rather than deploying a general-purpose large language model for vulnerability detection, Anthropic built this model specifically for cybersecurity tasks. The distinction matters because security work requires different reasoning patterns than typical language tasks.

Cloudflare's testing scope is substantial. The company runs critical infrastructure serving millions of websites, so its repositories contain real production code with genuine security implications. Using 50-plus repositories provides a meaningful dataset for evaluating model performance across diverse codebases.

The key finding concerns exploit chains. A single vulnerability often creates risk only when combined with other weaknesses. Detecting individual bugs is simpler than recognizing how multiple issues chain together to create actual attack paths. Mythos Preview spotted these composite vulnerabilities where earlier models saw only isolated problems.

This has direct implications for security teams. Current vulnerability scanning typically relies on static analysis tools and manual code review. An AI model that identifies exploit chains could accelerate threat detection and reduce the window where unknown vulnerabilities remain exploitable. However, the test results apply specifically to Cloudflare's codebase, and performance may vary across different code styles and architectures.

Project Glasswing appears to be Anthropic's broader initiative for demonstrating Mythos Preview's practical utility in real-world security contexts. Rather than publishing synthetic benchmarks, Anthropic is testing against actual infrastructure. This approach builds credibility but also highlights that security AI remains in active development.

The implication for the industry is that frontier models may be reaching the point where specialized versions outperform generalist alternatives on specific