Open-source computer-use agent: provider-agnostic, cross-platform, 75% OSWorld (> human)
An open-source computer-use agent that supports both OpenAI GPT-5.4 and Anthropic Claude, and runs on multiple platforms. It has achieved 75% on the OSWorld benchmark, exceeding human-level performance.
Why it matters
This open-source computer-use agent showcases the advancements in AI-powered automation and control, with the potential to revolutionize how we interact with and control our computing environments.
Key Points
- 1Provider-agnostic agent that supports OpenAI GPT-5.4 and Anthropic Claude
- 2Cross-platform support for macOS, Windows, Linux, web, and server environments
- 3Able to perform real-time computer control tasks like drawing shapes from text prompts
- 4Working on MCP-first architecture and sandboxed code execution for OS-specific tool integration
Details
The article discusses an open-source computer-use agent that is provider-agnostic, meaning it supports both OpenAI's GPT-5.4 and Anthropic's Claude language models. The agent is also cross-platform, with the same codebase running on macOS, Windows, Linux, web, and even server environments through abstract input/output ports. The agent has demonstrated impressive performance, achieving 75% on the OSWorld benchmark, which is above human-level for OS control tasks. In the video, the agent is shown drawing the sun and geometric shapes from a text prompt, with the model deciding where to click and drag in real-time, without any scripted actions. The developer is currently working on a MCP-first (Minimal Compute Platform) architecture for better OS-specific tool integration, as well as exploring sandboxed code execution to handle trust boundaries when the agent needs to run arbitrary commands.
No comments yet
Be the first to comment