Alumnium MCP Achieves 98.5% on WebVoyager Benchmark for Claude Code
Alumnium, an open-source Model Context Protocol (MCP) server, has set a new state-of-the-art benchmark score of 98.5% on the WebVoyager test for AI web browsing agents when used with Claude Code.
Why it matters
Alumnium's strong performance on the WebVoyager benchmark demonstrates the benefits of using specialized subagents to handle complex, context-heavy tasks in Claude Code workflows.
Key Points
- 1Alumnium MCP provides a high-level browser interface for Claude Code, handling complex web tasks internally
- 2It outperformed the previous record of 97.1% held by Surfer 2 on the WebVoyager benchmark
- 3Alumnium acts as a specialized subagent, compressing browsing details and keeping Claude Code's context efficient
- 4Alumnium is open-source and can be installed and configured to work with Claude Code
Details
Alumnium is an open-source Model Context Protocol (MCP) server designed to work with Claude Code, providing a high-level browser interface without exposing raw browser primitives. Instead, it offers a set of tools like 'do()', 'get()', and 'check()' that allow Claude Code to describe browsing goals in plain language, and Alumnium handles the execution internally. This architecture is the key to its strong performance, as it avoids flooding Claude Code's context with low-level details and keeps token usage efficient. In the recent WebVoyager benchmark, a standard test for AI web browsing agents, Alumnium MCP used with Claude Code achieved a 98.5% success rate, beating the previous record of 97.1% held by Surfer 2. This validates Alumnium's design choice of acting as a specialized subagent, compressing the messy work of browsing into a single tool call for Claude Code.
No comments yet
Be the first to comment