Dev.to AI2h ago|Research & Papers Business & Industry

AI Agents Outperform Humans in Smart Contract Auditing, But Challenges Remain

The article explores the state of AI-powered smart contract auditing, highlighting recent developments in the EVMbench benchmark and a breakthrough discovery by Anatomist Security's AI agent.

💡

Why it matters

These developments represent a phase transition in how we should think about smart contract security tooling, with AI agents demonstrating capabilities that complement and potentially surpass human auditors in certain tasks.

Key Points

1EVMbench benchmark evaluates AI agents' ability to detect, patch, and exploit smart contract vulnerabilities
2GPT-5.3-Codex outperformed previous models in exploit mode, but struggled with detection and patching
3BlockSec's re-evaluation found EVMbench's exploit mode was biased by scaffold design, but AI detection capabilities are real
4Anatomist Security's AI agent discovered a critical vulnerability in the Solana blockchain, earning a $400K bounty

Details

The article discusses three key events that have shaped the state of AI-powered smart contract auditing as of March 2026. First, the launch of the EVMbench benchmark by OpenAI and Paradigm, which evaluates AI agents' ability to detect, patch, and exploit vulnerabilities in smart contracts. The benchmark revealed that GPT-5.3-Codex outperformed previous models in the exploit mode, successfully exploiting 71% of vulnerabilities. However, the agent struggled with detection and patching, missing more than half of known vulnerabilities and failing to fix most of the ones it found. Next, the article discusses BlockSec's re-evaluation of EVMbench, which raised concerns about the benchmark's methodology. BlockSec found that the exploit mode was biased by the scaffold design, which provided agents with deployment scripts, contract ABIs, and partial proof-of-concept code. This essentially turned the exploit task into a coding exercise rather than a true auditing challenge. However, BlockSec confirmed that AI agents, such as Claude Opus 4.6, do have real detection capabilities, identifying a significant number of real-world vulnerabilities without the scaffold assistance. Finally, the article highlights the breakthrough by Anatomist Security's AI agent, which autonomously discovered a critical vulnerability in the Solana blockchain itself, earning a $400,000 bounty - the largest ever awarded to an AI. This is a significant achievement, as the agent was not given a curated vulnerability to exploit in a sandbox, but rather found a real-world bug that human researchers had missed.

AI Agents Outperform Humans in Smart Contract Auditing, But Challenges Remain

Why it matters

Key Points

Details

Dive deeper

Related Articles

Best Deep Learning Projects Ideas for Beginners to Advanced…

Visby: Boost AI Visibility and Outperform Competitors

LangGraph vs. LangChain: Production AI Architecture 2026

Free Uncensored Qwen3.5 API — No Restrictions, No Credit Ca…

Accessing Powerful Reasoning Models via API: Claude 4.6 Opu…

The Quality Erosion: How AI is Being Used to Cut Costs, Not…

Run the Viral HuggingFace Qwen3.5-9B Claude Reasoning Model…

Affordable AI for Filipino Developers: A Claude-Powered Alt…

BugVaulty — Auto-Track Every Error to Notion with AI Soluti…

Convergence Rate of Frank-Wolfe for Non-Convex Objectives

AI Curator

Ask me anything about AI

Related Articles

Best Deep Learning Projects Ideas for Beginners to Advanced…

Visby: Boost AI Visibility and Outperform Competitors

LangGraph vs. LangChain: Production AI Architecture 2026

Free Uncensored Qwen3.5 API — No Restrictions, No Credit Ca…

Accessing Powerful Reasoning Models via API: Claude 4.6 Opu…

The Quality Erosion: How AI is Being Used to Cut Costs, Not…

Run the Viral HuggingFace Qwen3.5-9B Claude Reasoning Model…

Affordable AI for Filipino Developers: A Claude-Powered Alt…

BugVaulty — Auto-Track Every Error to Notion with AI Soluti…

Convergence Rate of Frank-Wolfe for Non-Convex Objectives