New AI Models and Benchmarks Announced in November 2025
The article announces the latest updates to the SWE-rebench leaderboard, including new AI models like Devstral 2, DeepSeek v3.2, and a comparison mode to benchmark against external systems like Claude Code.
Why it matters
This news highlights the rapid advancements in AI-powered software engineering tools and the ongoing efforts to benchmark their capabilities on real-world tasks.
Key Points
- 1SWE-rebench leaderboard updated with November 2025 runs on 47 fresh GitHub PR tasks
- 2New models added include Devstral 2, DeepSeek v3.2, and a comparison mode for external systems like Claude Code
- 3Introduced a cached-tokens statistic to improve transparency around cache usage
Details
The article discusses the latest updates to the SWE-rebench leaderboard, a benchmark that evaluates AI models on real-world software engineering tasks. The November 2025 update includes a substantial batch of new model releases, such as Devstral 2, a strong locally-runnable model, and DeepSeek v3.2, a new state-of-the-art open-weight model. Additionally, a new comparison mode has been introduced to benchmark these models against external systems like Claude Code. The article also mentions the addition of a cached-tokens statistic to improve transparency around cache usage during the benchmarking process.
No comments yet
Be the first to comment