How Claude Opus 4.6 Found 22 Firefox Vulnerabilities in 2...

Anthropic and Mozilla ran a live‑fire experiment: could an AI model find serious, previously unknown vulnerabilities in one of the most tested browsers on earth?

In a focused two‑week sprint in January 2026, Claude Opus 4.6 uncovered 22 new Firefox vulnerabilities, all confirmed by Mozilla and assigned CVEs. Fourteen were rated high severity, seven moderate, and one low, with most fixes shipped to hundreds of millions of users in Firefox 148.[5][6][8]

💡 Why this matters: Those 14 high‑severity bugs represent almost a fifth of all high‑severity Firefox vulnerabilities remediated in 2025, concentrated into a single AI‑augmented engagement.[5][6][8] That discovery rate forces security leaders to rethink how software will be attacked—and defended—over the next few years.

1. What Anthropic and Mozilla Actually Achieved

Instead of testing Claude Opus 4.6 on synthetic benchmarks, Anthropic embedded it into a real security partnership with Mozilla, giving it targeted access to the Firefox codebase as a production‑grade testbed.[6][8]

Over two weeks in January 2026, Claude identified 22 previously unknown Firefox vulnerabilities that Mozilla triaged, confirmed, and assigned CVEs.[5][6][8]

14 high‑severity
7 moderate
1 low[5][8]

📊 Impact in context

22 new Firefox CVEs in 14 days
14 high‑severity issues—almost 20% of all high‑severity Firefox bugs patched in 2025[3][5][6][8]
More Firefox vulnerabilities reported in February 2026 by Claude than in any single month of 2025 from all sources combined[6][7][9]

Mozilla shipped fixes for most of these vulnerabilities in Firefox 148, with the rest scheduled for upcoming releases, showing that AI‑found bugs can move rapidly from discovery to remediation at internet scale.[5][6][8]

💼 Strategic takeaway: In a codebase that has been fuzzed, audited, and hardened for decades, one AI‑augmented sprint matched a substantial fraction of a full year’s human‑driven high‑severity discovery.[3][5][6][9] For most enterprise software, the untouched backlog is likely much larger.

2. How Claude Opus 4.6 Actually Found the Bugs

Claude’s first major win came quickly: within 20 minutes, it identified a use‑after‑free vulnerability in Firefox’s SpiderMonkey JavaScript engine, later confirmed and patched by Mozilla.[3][5][8] That early success evolved into a systematic pipeline.

Over the two‑week engagement, Claude:[3][5][8]

Scanned nearly 6,000 C++ files
Generated dozens of crashing inputs during early triage
Contributed to 112 unique bug reports
Proposed candidate patches that Mozilla engineers sometimes used as starting points[3][6][9]

💡 Quality, not just quantity

Mozilla engineers noted that, unlike most low‑quality AI bug reports, Claude’s submissions typically included:[6][8][9]

Minimized test cases
Detailed, step‑by‑step proofs of concept
Candidate fixes mapped to specific files and functions

This sharply reduced validation workload, letting the Firefox security team reproduce and assess issues far faster than with typical external reports.[6][8][9]

Some vulnerabilities overlapped with issues reachable by existing fuzzers, while others were new classes of logic errors that fuzzing had failed to expose—even in heavily fuzzed code paths.[3][6][9]

⚠️ Key implication: If Firefox, one of the most continuously fuzzed and reviewed browser codebases, still harbored this many serious issues, then typical enterprise applications—with less rigorous testing—almost certainly contain a larger, AI‑discoverable vulnerability backlog.[3][6][8][9]

3. What the Firefox Results Signal for Software Security

Anthropic positions the Firefox collaboration as evidence that modern AI models can independently identify high‑severity vulnerabilities in mature, complex software at speeds that exceed traditional techniques.[6][9][10]

Mozilla’s data supports this: Claude’s 22 CVEs in February 2026 exceeded the monthly vulnerability count for any month in 2025, across all human and automated sources.[6][7][9] PCMag noted that Claude effectively found more high‑severity bugs in two weeks than human teams typically uncover over much longer periods.[9][10]

📊 Beyond Firefox

Anthropic reports that, beyond this project, Claude has surfaced more than 500 zero‑day vulnerabilities in other well‑tested open‑source software, focusing on complex, security‑sensitive components.[6][8]

Mashable notes that open‑source projects are particularly well‑suited to AI analysis because models can correlate:[2][4][6]

Full source code
Rich version history
Historical CVEs and patches

That combination lets AI learn patterns of insecure coding and configuration that static tools miss.[2][4][6]

Mozilla engineers observed that Claude’s Firefox findings fell into two categories:[3][6][9]

Bugs overlapping with fuzzing‑accessible paths
Novel logic and state‑handling errors beyond fuzzer coverage

💡 Mini‑conclusion: AI analysis is emerging as a powerful complement to fuzzing, SAST, and manual review—not a replacement. Security programs that orchestrate these methods together will gain the most from AI‑accelerated discovery.[3][6][9][10]

4. Limits, Exploit Tests, and Dual‑Use Concerns

After confirming the 22 CVEs, Anthropic asked a harder question: could Claude also weaponize its own discoveries? Researchers provided Claude with details of the new Firefox bugs and asked it to craft practical exploits, including attempts to read and write local files on a target system, emulating a real attacker.[2][5][9]

Despite several hundred exploit‑generation trials and about $4,000 in API credits, Claude produced reliable, end‑to‑end exploits in only two cases.[5][8][9] One targeted CVE‑2026‑2796, a JIT miscompilation in Firefox’s JavaScript WebAssembly component.[5][9]

Mashable highlighted this asymmetry: Claude was highly effective at finding vulnerabilities but comparatively weak at automating full exploit chains, suggesting that—for now—AI is more beneficial for defense than for fully automated offensive operations.[2][4][5]

⚠️ But not purely defensive

InfoQ stresses the dual‑use reality: with enough steering and iteration, Anthropic did obtain working exploits for some bugs.[5][8][9] The same discovery capabilities that help defenders can also accelerate attacker workflows.

Anthropic situates the Firefox work within its Frontier Red Team efforts and policy commitments, arguing that:[6][8][9]

AI‑assisted research should go through responsible disclosure channels
Partnerships with maintainers (like Mozilla) are essential to keep net impact positive
Safety controls such as task verifiers and rate limits must evolve with capability

Security commentators expect that as models improve, adversaries will adopt similar tools for vulnerability triage and exploit research, making time‑to‑patch and continuous monitoring even more critical risk metrics.[5][9][10]

5. How Security Teams Can Operationalize These Lessons

The Firefox experiment is not a stunt. Anthropic is productizing its techniques through Claude Code Security, a service that scans code for vulnerabilities and suggests targeted fixes for human review.[6][8][10]

💼 Partnership as a pattern

The Mozilla engagement showcased an operating model security teams can emulate:[6][8]

Tight collaboration between AI researchers and maintainers
Shared criteria for what counts as a reportable bug
Rapid patch deployment once issues are confirmed

This pattern can be replicated inside enterprises by pairing AI tools with in‑house AppSec engineers and clear triage rules.

Given that Firefox is far more fuzzed and reviewed than typical enterprise applications, Anthropic’s results imply that internal codebases—especially legacy C/C++ and complex JavaScript—are prime candidates for AI‑assisted review.[3][5][8]

Practical first steps for organizations include:[6][8][9]

Targeting high‑risk components (parsers, auth flows, memory‑unsafe modules) for AI‑assisted audits
Using Claude‑style tools to generate minimized test cases and candidate patches
Integrating AI findings into CI pipelines and secure coding playbooks

⚠️ Triage remains essential

AI‑generated bug reports can still include false positives or low‑impact issues. The Firefox case underlines the need for:[6][8][9][10]

A human triage layer staffed by experienced security engineers
Severity scoring aligned with business risk
Governance that treats AI as an accelerator, not a replacement, for secure development practices

Mini‑conclusion: Mozilla’s experience suggests that embedding AI into established security workflows can dramatically expand coverage and speed without forcing wholesale changes to governance or release processes.[6][9][10]

Conclusion: A Blueprint for AI‑Augmented Defense

The Anthropic–Mozilla experiment shows that Claude Opus 4.6 can uncover high‑severity vulnerabilities in a world‑class, heavily tested browser at speeds humans cannot match: 22 Firefox CVEs, including 14 high‑severity issues, found in two weeks and rapidly patched for hundreds of millions of users.[3][5][6][8][9]

Security leaders should treat this as a blueprint. Pilot AI‑assisted code review on your most critical applications. Embed model findings into existing triage and patch workflows. Establish strong disclosure channels with vendors and open‑source maintainers. As AI makes vulnerability discovery cheaper and faster for everyone—including adversaries—organizations that operationalize these capabilities now will be positioned to benefit before attackers do.[5][6][9][10]

How Claude Opus 4.6 Found 22 Firefox Vulnerabilities in 2 Weeks

1. What Anthropic and Mozilla Actually Achieved

2. How Claude Opus 4.6 Actually Found the Bugs

3. What the Firefox Results Signal for Software Security

4. Limits, Exploit Tests, and Dual‑Use Concerns

5. How Security Teams Can Operationalize These Lessons

Conclusion: A Blueprint for AI‑Augmented Defense

Sources & References (10)

What topic do you want to cover?

Continue reading

GLM-5.2 vs Anthropic Mythos for Bug Finding: Architectures, Benchmarks, and Production Playbook

GLM-5.2 vs Anthropic Mythos for Bug-Finding: A Production-Grade Evaluation Blueprint

Inside OpenAI’s GPT‑5.6 Sol Terra Luna: Why Access Is Restricted to Trusted Partners

Erin Brockovich vs AI Datacentres: What Engineers Must Know