Open-source computer-use agent: provider-agnostic, cross-platform, 75% OSWorld (> human)

An open-source computer-use agent that supports both OpenAI GPT-5.4 and Anthropic Claude, and runs on multiple platforms. It has achieved 75% on the OSWorld benchmark, exceeding human-level performance.

đź’ˇ

Why it matters

This open-source computer-use agent showcases the advancements in AI-powered automation and control, with the potential to revolutionize how we interact with and control our computing environments.

Key Points

  • 1Provider-agnostic agent that supports OpenAI GPT-5.4 and Anthropic Claude
  • 2Cross-platform support for macOS, Windows, Linux, web, and server environments
  • 3Able to perform real-time computer control tasks like drawing shapes from text prompts
  • 4Working on MCP-first architecture and sandboxed code execution for OS-specific tool integration

Details

The article discusses an open-source computer-use agent that is provider-agnostic, meaning it supports both OpenAI's GPT-5.4 and Anthropic's Claude language models. The agent is also cross-platform, with the same codebase running on macOS, Windows, Linux, web, and even server environments through abstract input/output ports. The agent has demonstrated impressive performance, achieving 75% on the OSWorld benchmark, which is above human-level for OS control tasks. In the video, the agent is shown drawing the sun and geometric shapes from a text prompt, with the model deciding where to click and drag in real-time, without any scripted actions. The developer is currently working on a MCP-first (Minimal Compute Platform) architecture for better OS-specific tool integration, as well as exploring sandboxed code execution to handle trust boundaries when the agent needs to run arbitrary commands.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies