Comparing HTML, Markdown, and SOM for AI Agents
This article explores the pros and cons of different formats for representing web pages to AI agents, including raw HTML, Markdown, and the Semantic Object Model (SOM).
Why it matters
The choice of web page representation format is crucial for AI agents that need to understand and interact with web content efficiently.
Key Points
- 1Raw HTML provides complete fidelity but is expensive and noisy due to styling and scripts
- 2Markdown is more concise but loses interactive elements and makes navigation tasks difficult
- 3SOM preserves meaning and interactivity while stripping presentation noise
Details
When an AI agent needs to understand a web page, there are three common approaches: raw HTML, Markdown, and the Semantic Object Model (SOM). Raw HTML provides complete fidelity to the DOM, but 80-95% of the tokens are noise like styling and scripts, making it expensive and slow. Markdown strips the HTML to readable text while preserving structure, resulting in fewer tokens, but it loses interactive elements and makes navigation tasks guesswork. SOM is a structured JSON representation that preserves meaning and interactivity while removing presentation noise. Each element includes its semantic role and available actions, providing a more efficient and meaningful representation for AI agents.
No comments yet
Be the first to comment