New AI Models and Benchmarks Announced in November 2025

The article announces the latest updates to the SWE-rebench leaderboard, including new AI models like Devstral 2, DeepSeek v3.2, and a comparison mode to benchmark against external systems like Claude Code.

💡

Why it matters

This news highlights the rapid advancements in AI-powered software engineering tools and the ongoing efforts to benchmark their capabilities on real-world tasks.

Key Points

  • 1SWE-rebench leaderboard updated with November 2025 runs on 47 fresh GitHub PR tasks
  • 2New models added include Devstral 2, DeepSeek v3.2, and a comparison mode for external systems like Claude Code
  • 3Introduced a cached-tokens statistic to improve transparency around cache usage

Details

The article discusses the latest updates to the SWE-rebench leaderboard, a benchmark that evaluates AI models on real-world software engineering tasks. The November 2025 update includes a substantial batch of new model releases, such as Devstral 2, a strong locally-runnable model, and DeepSeek v3.2, a new state-of-the-art open-weight model. Additionally, a new comparison mode has been introduced to benchmark these models against external systems like Claude Code. The article also mentions the addition of a cached-tokens statistic to improve transparency around cache usage during the benchmarking process.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies