LocalLLaMA Reddit2d ago|研究・論文プロダクト・サービス

New AI Models and Benchmarks Announced in November 2025

The article announces the latest updates to the SWE-rebench leaderboard, including new AI models like Devstral 2, DeepSeek v3.2, and a comparison mode to benchmark against external systems like Claude Code.

💡

Why it matters

This news highlights the rapid advancements in AI-powered software engineering tools and the ongoing efforts to benchmark their capabilities on real-world tasks.

Key Points

1SWE-rebench leaderboard updated with November 2025 runs on 47 fresh GitHub PR tasks
2New models added include Devstral 2, DeepSeek v3.2, and a comparison mode for external systems like Claude Code
3Introduced a cached-tokens statistic to improve transparency around cache usage

Details

The article discusses the latest updates to the SWE-rebench leaderboard, a benchmark that evaluates AI models on real-world software engineering tasks. The November 2025 update includes a substantial batch of new model releases, such as Devstral 2, a strong locally-runnable model, and DeepSeek v3.2, a new state-of-the-art open-weight model. Additionally, a new comparison mode has been introduced to benchmark these models against external systems like Claude Code. The article also mentions the addition of a cached-tokens statistic to improve transparency around cache usage during the benchmarking process.

New AI Models and Benchmarks Announced in November 2025

Why it matters

Key Points

Details

Dive deeper

Related Articles

GLM-4.7 soon

Key Highlights of NVIDIA’s New Open-Source Vision-to-Action…

Nine US lawmakers urge DoD to add DeepSeek to list of compa…

楽天が2026年春に700Bパラメーターの大規模言語モデルを公開

Local Semantic Search Engine with Preloaded Models

AI Datacenters Consume Massive Memory Equivalent to Million…

Devstral 2 vs Sonnet 4.5 (Claude Code) on SWE-bench

Mistral Vibe CLI update - New modes & UI improvements

Trellis 2 run locally: not easy but possible

Access your local models from anywhere over WebRTC!

AI Curator

Ask me anything about AI

Related Articles

Key Highlights of NVIDIA’s New Open-Source Vision-to-Action…

Nine US lawmakers urge DoD to add DeepSeek to list of compa…

楽天が2026年春に700Bパラメーターの大規模言語モデルを公開

Local Semantic Search Engine with Preloaded Models

AI Datacenters Consume Massive Memory Equivalent to Million…

Devstral 2 vs Sonnet 4.5 (Claude Code) on SWE-bench

Mistral Vibe CLI update - New modes & UI improvements

Trellis 2 run locally: not easy but possible

Access your local models from anywhere over WebRTC!