Anthropic Finds 22 Firefox Vulnerabilities Using Claude Opus 4.6 AI Model

Estimated reading time: 6 minutes

Key Takeaways:

  • Claude Opus 4.6 successfully identified 22 unique security vulnerabilities in the Firefox codebase within a two-week period.
  • AI models are significantly lowering the cost and technical friction of vulnerability discovery compared to exploit development.
  • OpenAI’s Codex Security has scaled code analysis to find thousands of critical and high-severity issues across major open-source projects.
  • Threat actors, including North Korean groups, are actively utilizing generative AI to create fraudulent personas and automate infrastructure deployment.
  • Defense-in-depth now requires AI-integrated DevSecOps and enhanced identity verification for remote personnel.

Table of Contents:

The speed and scale of vulnerability discovery have undergone a significant shift as large language models (LLMs) transition from general-purpose assistants to specialized security agents. In January 2026, a collaborative security effort between Anthropic and Mozilla resulted in a significant discovery: Anthropic Finds 22 Firefox Vulnerabilities Using Claude Opus 4.6 AI Model. This finding represents a quantifiable change in how browser security is maintained and how security teams must evaluate the speed of code analysis in modern software development.

Anthropic Finds 22 Firefox Vulnerabilities Using Claude Opus 4.6 AI Model: Technical Breakdown

The two-week engagement in January 2026 utilized the Claude Opus 4.6 model to analyze the Firefox codebase. The model scanned nearly 6,000 C++ files, resulting in 112 unique reports submitted to Mozilla. Of these, 22 were confirmed as distinct security vulnerabilities. The severity breakdown included 14 high-severity bugs, seven moderate-severity bugs, and one low-severity bug.

To provide context for the efficiency of this automated approach, the 14 high-severity bugs identified by the model constitute approximately 20% of all high-severity vulnerabilities patched in the Firefox browser during the entirety of 2025. One specific use-after-free bug in the browser’s JavaScript engine was identified after 20 minutes of model exploration. This finding was later validated by human researchers in a virtualized environment to eliminate false positives.

Security analyst assesses Firefox vulnerability reports with AI assistance on multiple monitors

A notable component of this research involved the model’s ability to transition from discovery to exploitation. Anthropic tasked Claude Opus 4.6 with developing functional exploits for the vulnerabilities it identified. While the model attempted this several hundred times-consuming roughly $4,000 in API credits-it successfully developed a crude exploit in only two instances. One of these successful exploits targeted CVE-2026-2796, a just-in-time (JIT) miscompilation vulnerability within the JavaScript WebAssembly component, which carries a CVSS score of 9.8.

The data suggests a current disparity in AI capabilities: the cost and technical friction of identifying vulnerabilities are significantly lower than those associated with creating stable exploits.

Scaling Vulnerability Discovery via OpenAI Codex Security

The trend of AI-driven code analysis extends beyond browser security. OpenAI recently introduced Codex Security, an agentic tool designed to identify and propose remediation for vulnerabilities at scale. During its beta phase, the tool scanned 1.2 million commits across various external repositories. This scan identified 792 critical findings and 10,561 high-severity findings.

The tool’s analysis covered several high-profile open-source projects, including:

  • GnuPG: CVE-2026-24881 and CVE-2026-24882.
  • GnuTLS: CVE-2025-32988 and CVE-2025-32989.
  • Thorium: Multiple vulnerabilities ranging from CVE-2025-35430 to CVE-2025-35436.
  • Other projects: OpenSSH, GOGS, libssh, PHP, and Chromium.

The operational methodology of Codex Security involves three distinct stages. First, it analyzes the repository to build a project-specific threat model. Second, it identifies vulnerabilities and pressure-tests them in a sandboxed environment to validate findings. Third, it proposes patches designed to align with the existing system behavior. OpenAI reported that this grounded approach reduced false positive rates by over 50% during the beta period.

Adversarial Utilization of AI in the Cyberattack Lifecycle

While Anthropic and OpenAI focus on defensive applications, Microsoft Threat Intelligence has documented the increasing use of generative AI by threat actors to accelerate the cyberattack lifecycle. These actors utilize AI for reconnaissance, phishing, and infrastructure development.

North Korean Operations (Jasper Sleet and Coral Sleet)
The North Korean group tracked as Jasper Sleet (Storm-0287) has integrated AI into remote IT worker schemes. The group uses generative AI to create realistic digital personas, including culturally appropriate name lists and tailored resumes. These identities are used to gain employment at Western companies, facilitating long-term access to corporate networks.

Another group, Coral Sleet (Storm-1877), uses AI to rapidly provision infrastructure. This includes generating fake company websites and troubleshooting deployment scripts. When built-in AI safeguards attempt to block these activities, these groups employ jailbreaking techniques to bypass restrictions.

Geopolitical Cyber Escalation
The regional conflict involving Iran, Israel, and the United States has led to increased cyber mobilization. Groups such as Cyber Islamic Resistance and the 313 Team have announced intentions to target infrastructure. Reported activities include attempts to disrupt drone detection infrastructure in Israel and coordinated attacks targeting government and financial platforms in the Gulf Cooperation Council (GCC) countries.

Tactical Deception: The Red Alert Spyware Campaign

A specific instance of regionally focused cyberactivity involves the distribution of a malicious version of the Red Alert mobile application in Israel. Threat actors, believed to be Arid Viper (APT-C-23), utilize SMS messages that mimic official alerts to trick users into downloading a “technical update.”

The malicious app retains the functionality of the original version but executes background code to perform deep data theft. The actors utilized certificate spoofing to bypass Android security measures and spoofed the installation source to appear as if the app originated from the Google Play Store. This campaign highlights the necessity for brand leak alerting to identify fraudulent versions of essential services.

Strategic Implications for Security Teams

The findings from Anthropic and Mozilla demonstrate that traditional manual code review and fuzzing are no longer sufficient to keep pace with AI-driven discovery. The ability of a model to find 14 high-severity bugs in two weeks requires a fundamental shift in how organizations manage their security backlogs.

For technical teams, the integration of AI agents into the DevSecOps pipeline is becoming a requirement for maintaining parity with external researchers and adversaries. These tools can identify logic errors that traditional fuzzers miss, such as assertion failures and complex use-after-free conditions.

Practical Takeaways for Technical Personnel

  • Automated Validation: Deploy task verifiers in CI/CD pipelines to validate AI-generated patches.
  • Infrastructure Hardening: Use network segmentation to limit the lateral movement of malware developed via AI assistance.
  • Anomaly Detection: Implement Endpoint Detection and Response (EDR) to monitor for the use of legitimate tools like PowerShell used by AI agents for persistence.
  • Fuzzing Augmentation: Complement existing fuzzing tools with AI-driven analysis to identify complex architectural flaws.
  • API Management: Monitor for unusual spikes in API credit usage which may indicate unauthorized code analysis attempts.

Practical Takeaways for Business Leaders

  • Identity Verification: Enhance vetting for remote IT workers. Use multi-factor authentication (MFA) and conduct video interviews to verify personnel legitimacy.
  • Supply-Chain Oversight: Conduct regular audits of open-source dependencies using supply-chain risk monitoring tools.
  • Phishing Education: Update training to reflect the increased quality of AI-generated phishing lures that lack typical grammatical errors.
  • Incident Response: Prepare for accelerated attack timelines as AI reduces technical friction for adversaries.
  • Data Protection: Maintain immutable, offline backups to protect against automated encryption scripts.

Intelligence and Monitoring Requirements

Effective defense now requires the integration of diverse intelligence sources. Utilizing a cyber threat intelligence platform allows organizations to aggregate data on emerging AI-discovered vulnerabilities and track the tactics of groups like Jasper Sleet.

At PurpleOps, we provide the technical expertise and infrastructure required to navigate this shifting environment. Our services include comprehensive penetration testing and red team operations that simulate both human-led and AI-augmented attack methodologies. By leveraging our dark web monitoring and cyber threat intelligence, organizations can identify exposures before they are exploited by automated agents.

For more information on securing your infrastructure against AI-driven threats:

The discovery of 22 vulnerabilities in Firefox via Claude Opus 4.6 serves as a baseline for the future of application security. As AI models become more adept at identifying complex flaws in C++ and other languages, the volume of identified vulnerabilities will continue to rise. Organizations must prioritize automated remediation and identity-first security to mitigate the risks posed by the democratization of high-level vulnerability research.

Frequently Asked Questions

How many vulnerabilities did Claude Opus 4.6 find in Firefox?
The model identified 22 confirmed vulnerabilities, including 14 high-severity bugs, during a two-week security engagement.

What is OpenAI Codex Security?
It is an agentic tool designed to scan repositories at scale. During its beta, it identified over 11,000 high and critical severity findings across 1.2 million commits.

Can AI models create exploits for the vulnerabilities they find?
While discovery is becoming highly efficient, automated exploit generation is still difficult. Claude Opus 4.6 successfully created only two crude exploits out of several hundred attempts.

How are North Korean threat actors using AI?
Groups like Jasper Sleet use AI to generate realistic resumes and digital personas to secure remote IT positions at Western companies for corporate access.

What should businesses prioritize to defend against these threats?
Key priorities include stricter identity verification for remote workers, automated CI/CD security validation, and advanced supply-chain risk monitoring.