Dev.to LLM2h ago|Research & Papers

Can LLMs Detect Real Vulnerabilities in Real Code?

The article discusses the N-Day-Bench benchmark, which evaluates whether large language models (LLMs) can identify real vulnerabilities in production codebases, not just synthetic ones. The results show LLMs can detect some common issues like hardcoded credentials, but struggle with more complex vulnerabilities like business logic flaws.

💡

Why it matters

This research highlights the limitations of using LLMs for security auditing, despite their potential as a first line of defense against obvious issues.

Key Points

  • 1N-Day-Bench tests LLMs' ability to find known vulnerabilities (with CVEs) in real codebases
  • 2LLMs perform reasonably well on classic issues like SQL injection and hardcoded credentials
  • 3LLMs consistently fail to detect vulnerabilities in business logic, race conditions, and cross-component interactions

Details

N-Day-Bench is a benchmark published in 2025 that evaluates whether LLMs can identify real vulnerabilities, not just synthetic ones, in production codebases. The methodology involves providing LLMs with the relevant context (affected files, not the entire repo) and asking them to identify the vulnerability without any hints about the CVE. The results show the best models can correctly identify 20-35% of the vulnerabilities when directly queried. While this may seem low, it's not that different from the performance of an average developer doing manual code reviews. The real issue is the gap between 'generation mode' and 'audit mode' - when generating code, LLMs prioritize functionality over security, but when explicitly asked to audit, they can identify issues the model didn't flag during generation. This suggests the problem is not with the tool, but with the process of using it. The benchmark also reveals LLMs struggle with more complex vulnerabilities like business logic flaws, race conditions, and authorization issues that span multiple components.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies