AI Agents Outperform Humans in Smart Contract Auditing, But Challenges Remain

The article explores the state of AI-powered smart contract auditing, highlighting recent developments in the EVMbench benchmark and a breakthrough discovery by Anatomist Security's AI agent.

đź’ˇ

Why it matters

These developments represent a phase transition in how we should think about smart contract security tooling, with AI agents demonstrating capabilities that complement and potentially surpass human auditors in certain tasks.

Key Points

  • 1EVMbench benchmark evaluates AI agents' ability to detect, patch, and exploit smart contract vulnerabilities
  • 2GPT-5.3-Codex outperformed previous models in exploit mode, but struggled with detection and patching
  • 3BlockSec's re-evaluation found EVMbench's exploit mode was biased by scaffold design, but AI detection capabilities are real
  • 4Anatomist Security's AI agent discovered a critical vulnerability in the Solana blockchain, earning a $400K bounty

Details

The article discusses three key events that have shaped the state of AI-powered smart contract auditing as of March 2026. First, the launch of the EVMbench benchmark by OpenAI and Paradigm, which evaluates AI agents' ability to detect, patch, and exploit vulnerabilities in smart contracts. The benchmark revealed that GPT-5.3-Codex outperformed previous models in the exploit mode, successfully exploiting 71% of vulnerabilities. However, the agent struggled with detection and patching, missing more than half of known vulnerabilities and failing to fix most of the ones it found. Next, the article discusses BlockSec's re-evaluation of EVMbench, which raised concerns about the benchmark's methodology. BlockSec found that the exploit mode was biased by the scaffold design, which provided agents with deployment scripts, contract ABIs, and partial proof-of-concept code. This essentially turned the exploit task into a coding exercise rather than a true auditing challenge. However, BlockSec confirmed that AI agents, such as Claude Opus 4.6, do have real detection capabilities, identifying a significant number of real-world vulnerabilities without the scaffold assistance. Finally, the article highlights the breakthrough by Anatomist Security's AI agent, which autonomously discovered a critical vulnerability in the Solana blockchain itself, earning a $400,000 bounty - the largest ever awarded to an AI. This is a significant achievement, as the agent was not given a curated vulnerability to exploit in a sandbox, but rather found a real-world bug that human researchers had missed.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies