The AI Coding Agent Security Crisis

DryRun Security's March 2026 report reveals AI coding agents introduce vulnerabilities in 87% of pull requests. Discover the critical security flaws in Claude, Codex, and Gemini - and how to protect your codebase.

Your AI coding assistant just shipped a critical vulnerability to production - and you do not even know it yet.

That is not a hypothetical scenario. It is the reality facing development teams in 2026. According to groundbreaking research from DryRun Security released this week, AI coding agents introduce security vulnerabilities in a staggering 87% of pull requests. The study tested Claude Code, OpenAI Codex, and Google Gemini as they built real applications - and none of them produced fully secure code.

If your team is using AI to accelerate development, you are almost certainly shipping security flaws. The question is not whether you have vulnerabilities - it is how many, and how long until attackers find them.

The Shocking Truth About AI-Generated Code Security

DryRun Security's "Agentic Coding Security Report," published March 11, 2026, represents one of the most comprehensive analyses of AI coding agent security to date. Researchers tasked three leading agents - Claude Code with Sonnet 4.6, OpenAI Codex with GPT 5.2, and Google Gemini with 2.5 Pro - with building two complete applications from scratch.

The results should alarm every engineering leader:

87% of pull requests contained at least one vulnerability
143 security issues identified across just 38 security scans
Zero fully secure applications produced by any agent
The same vulnerability classes appeared repeatedly across all three agents

"AI coding agents can produce working software at incredible speed, but security is not part of their default thinking," said James Wickett, CEO of DryRun Security. "In our usage and experience, AI coding agents often missed adding security components or created authentication logic flaws. These mistakes and gaps are exactly where attackers win."

What the Researchers Built - and Broke

To ensure realistic results, DryRun designed two genuine applications with practical use cases:

FaMerAgen: A web application for tracking children's allergies and family contacts - the kind of sensitive health data application that demands robust security.

Road Fury: A browser-based racing game with backend API, high score system, and multiplayer functionality - representing complex real-time applications with authentication and state management.

Neither application was a contrived security test. Both were built from realistic product specifications with no security guidance added to the prompts. This mirrors exactly how many development teams actually use AI coding agents today - focusing on features, assuming security will somehow take care of itself.

Each agent built features through sequential pull requests, just like real engineering teams implement functionality over time. Every PR was scanned as submitted, with full codebase scans before development began and after all features were merged.

The Winners and Losers: How Each Agent Performed

While all three agents failed to produce secure applications, their performance varied significantly:

Claude: Most Unresolved High-Severity Vulnerabilities

Anthropic's Claude Code finished with the most unresolved high-severity vulnerabilities in the final applications. In the web app, Claude ended with 13 issues and introduced a particularly dangerous 2FA-disable bypass not found in the other agents' work.

In the game app, Claude carried an insecure direct object reference from PR 2 and an unauthenticated destructive endpoint from PR 1 all the way to project completion - the longest-lived unresolved findings of any agent in the study.

Codex: Best Overall Security Posture

OpenAI's Codex ultimately finished with the fewest vulnerabilities and demonstrated stronger remediation behavior during development. In the web app, Codex finished with 8 issues - one fewer than the baseline scan. In the game app, Codex had the cleanest final result at 6 issues.

However, Codex still had gaps in JWT revocation and rate limiting. No agent achieved anything close to "secure by default."

Gemini: Early Issues, Partial Recovery

Google's Gemini introduced multiple security issues early in development but interestingly removed some issues with later modifications. Despite this improvement, Gemini still ended with several high-severity findings and introduced the most issues overall.

In the game app, Gemini had the most high-severity findings in the final scan.

The Ten Vulnerability Classes Haunting AI-Generated Code

Ten vulnerability categories appeared consistently enough across agents and tasks to be treated as structural patterns. These are not exotic edge cases - they are fundamental security failures that appear in nearly every AI-built application.

1. Broken Access Control (Universal Across All Agents)

The most universal vulnerability, appearing across all three agents in both applications. Unauthenticated endpoints on destructive and sensitive operations were the primary manifestation. AI agents consistently failed to properly restrict access to critical functionality.

2. Business Logic Failures

Appeared in the game app across all three agents. Scores, balances, and unlock states were accepted from the client without server-side validation. The agents trusted client input when they should have verified everything server-side.

3. OAuth Implementation Failures

Appeared in the web app from all three agents. Missing state parameters and insecure account linking were present in every social login implementation. AI agents understand OAuth exists but consistently implement it incorrectly.

4. WebSocket Authentication Gaps

Missing WebSocket authentication appeared in every final game codebase. The agents built REST authentication middleware correctly, then failed to wire it into the WebSocket upgrade handler. This finding appeared in every final scan regardless of which agent wrote the code.

5. Rate Limiting Neglect

Rate limiting middleware was defined in every codebase, but no agent actually connected it to the application. The agents know rate limiting is important enough to include in the code, but not important enough to make functional.

6. JWT Secret Management Weaknesses

Hardcoded fallback secrets appeared across all three agents in the game app. These insecure defaults mean an attacker can forge valid tokens without obtaining legitimate credentials - a critical authentication bypass.

7. Insecure JWT Verification

Four authentication-related weaknesses appeared in every final codebase: insecure JWT verification and management, lack of application-level brute force protections, vulnerability to token replay attacks, and insecure defaults for refresh token cookie configurations.

8. Temporary Token Bypasses

Codex produced a temporary token bypass that persisted through to the final codebase. These short-lived authentication mechanisms often become permanent security holes.

9. OAuth CSRF Vulnerabilities

Gemini retained OAuth CSRF and invite bypass issues through to the final scan. Cross-site request forgery protection is consistently overlooked by AI agents.

10. Insecure Direct Object References

Claude carried an insecure direct object reference from early in development through to project completion, allowing potential unauthorized access to other users' data.

Editorial illustration visualizing why traditional security tools miss ai-generated vulnerabilities in an enterprise cybersecurity context

Why Traditional Security Tools Miss AI-Generated Vulnerabilities

Perhaps most concerning, the report reveals that pattern-based security scanners miss the class of bugs AI agents produce most frequently.

Many of the vulnerabilities found were logic and authorization flaws. Regex-based static analysis tools flag known-bad function calls and string patterns. They do not trace whether middleware is mounted, whether authentication policies apply to every connection type, or whether unlock cost validation happens on the server.

This creates a dangerous false sense of security. Your SAST tools might pass with flying colors while fundamental authorization flaws remain exploitable.

PR 3 in the game app - which added player login and save game functionality - was the highest-risk task across all three agents. It introduced the largest cluster of findings including:

JWT secrets mismanagement
User enumeration vulnerabilities
Session management failures
Client-side trust issues

Most of the high-severity findings in the final game scans traced back to design choices made during this single task. When AI agents implement authentication - one of the most security-critical components of any application - they consistently get it wrong.

OpenAI's Response: Codex Security Enters the Chat

The timing of DryRun's report coincides with OpenAI's own acknowledgment of the AI coding security problem. On March 6, 2026, OpenAI introduced Codex Security as a research preview - an application security agent designed to hunt down high-impact software flaws.

Formerly known by the codename "Aardvark," Codex Security builds deep context about projects and creates threat models based on system-specific roles and trusted components. According to OpenAI, in just the last 30 days of testing, Codex Security was tested against 1.2 million commits, identifying nearly 800 critical vulnerabilities and over 10,000 high-severity issues - including flaws in massive open-source projects like Chromium, OpenSSL, and PHP.

The tool is now available to ChatGPT Pro, Enterprise, Business, and Edu users, with the first month offered at no cost.

But here is the uncomfortable question: If OpenAI's own coding agent produces vulnerable code, can we trust their security agent to catch it all?

Five Critical Practices for Securing AI-Generated Code

The DryRun report identifies five essential practices for teams using coding agents:

1. Scan Every Pull Request

Do not wait for the final build. Risk compounds across features, and vulnerabilities introduced early often persist through to production. Continuous security review must be integrated into the agentic development workflow.

2. Review Security During Planning

Many issues in the study originated in design decisions that agents then implemented. Security cannot be an afterthought - it must be part of the initial architecture and planning.

3. Use Contextual Security Analysis

Traditional pattern-matching tools miss the logic-level vulnerabilities AI agents consistently introduce. You need security analysis capable of reasoning about data flows and trust boundaries.

4. Pair PR Scanning with Full Codebase Analysis

Each method catches a different class of issue. PR scanning catches incremental risks, while full codebase analysis reveals systemic security gaps that accumulate over time.

5. Check for Recurring Issues

Based on this research, specifically audit for:

Insecure JWT defaults and state management
Missing brute force protections and rate limiting
Non-revocable refresh tokens

These appeared across multiple agents and codebases - they will appear in yours too.

The Bigger Picture: AI Security Is a Shared Responsibility

This research reveals a fundamental tension in AI-assisted development. AI coding agents dramatically accelerate software creation - but they accelerate vulnerability creation just as effectively.

The problem is not that AI agents are malicious or poorly designed. The problem is that security requires contextual understanding, threat modeling, and defensive thinking that current AI systems lack. They can implement OAuth because they have seen OAuth implementations. They cannot recognize when their OAuth implementation is vulnerable because they do not truly understand the attack vectors.

Until AI systems develop genuine security reasoning capabilities - and we are not there yet - human security expertise remains irreplaceable.

Editorial illustration visualizing what this means for your organization in an enterprise cybersecurity context

What This Means for Your Organization

If you are using AI coding agents - or planning to - here is your immediate action plan:

Audit your current AI-generated codebase. The vulnerabilities are there. You need to find them before attackers do.

Implement continuous security scanning. Not just at build time - at every pull request. The 87% vulnerability rate means almost every AI-generated change needs security review.

Invest in contextual security tools. Pattern matching is not enough. You need tools that understand application logic and trust boundaries.

Train your developers on AI-specific risks. Your team needs to understand the common vulnerability patterns AI agents introduce and how to catch them.

Establish security gates for AI-generated code. Do not let AI-written code reach production without human security review. The speed gains are not worth the breach costs.

Frequently Asked Questions

Are AI coding agents safe to use at all?

AI coding agents can be used safely, but they require significant security oversight. The 87% vulnerability rate means you should assume AI-generated code contains security flaws until proven otherwise. Implement continuous scanning, security gates, and human review for all AI-written code.

Which AI coding agent is most secure?

Based on DryRun's research, OpenAI Codex demonstrated the strongest security posture with the fewest unresolved vulnerabilities. However, no agent produced fully secure code, and all three introduced significant security flaws. The choice of agent matters less than implementing proper security controls.

Why do AI agents keep making the same security mistakes?

AI coding agents learn from patterns in training data, not from security principles. They replicate common implementation patterns without understanding the underlying security requirements. OAuth implementations look correct on the surface but lack critical security controls that require threat modeling to identify.

Can traditional security tools catch AI-generated vulnerabilities?

Many AI-generated vulnerabilities are logic and authorization flaws that traditional pattern-based scanners miss. You need contextual security analysis that can reason about data flows, trust boundaries, and application behavior - not just regex pattern matching.

What is the most dangerous vulnerability AI agents introduce?

Authentication and authorization failures are particularly dangerous because they affect every user and every request. The consistent failures around JWT management, OAuth implementation, and access control create widespread attack surfaces that compromise entire applications.

How can I secure AI-generated authentication code?

Never accept AI-generated authentication code without expert security review. Specifically audit for JWT secret management, token validation, session handling, rate limiting, and access control enforcement. Test authentication flows manually and with security tooling.

Should I stop using AI coding agents?

Not necessarily - but you should use them with eyes wide open. The productivity gains are real, but so are the security risks. Implement proper security controls, continuous scanning, and human review. AI agents are tools, not replacements for security expertise.

What is OpenAI Codex Security and does it work?

Codex Security is OpenAI's new security-focused AI agent designed to find vulnerabilities in code. Early testing shows promising results, but it is still a research preview. It should be part of your security strategy, not your entire security strategy.

How do I convince leadership to invest in AI code security?

The 87% vulnerability rate and 143 security issues across just two applications make the business case clear. One production vulnerability can cost millions in breach response, regulatory fines, and reputation damage. Security investment for AI-generated code is risk mitigation, not overhead.

Will AI agents eventually write secure code?

AI capabilities are advancing rapidly, but genuine security reasoning requires understanding attacker mindsets and threat models - capabilities current AI systems lack. Even as AI improves, human security expertise will remain essential for the foreseeable future.

Conclusion: Speed Without Security Is a Breach Waiting to Happen

AI coding agents represent a paradigm shift in software development. They enable teams to build faster than ever before. But the DryRun Security research makes one thing crystal clear: speed without security is just a faster path to vulnerable production code.

The 87% vulnerability rate is not a temporary growing pain. It is a fundamental characteristic of how current AI systems approach software development. They optimize for functionality, not security. They replicate patterns without understanding principles.

Your mission as a security-conscious engineering leader is clear: embrace AI productivity gains, but never abandon security fundamentals. Scan every PR. Review every design. Audit every AI-generated authentication system. Assume vulnerabilities exist until proven otherwise.

The attackers are not waiting for AI to get better at security. They are exploiting the vulnerabilities AI creates right now.

Will your code be their next target?

Ready to secure your AI-generated codebase? Contact our security experts for a comprehensive AI code security assessment and discover vulnerabilities before attackers do.

The AI Coding Agent Security Crisis: Why 87% of AI-Built Code Contains Vulnerabilities