Dev.to LLM2h ago|Research & Papers Products & Services

Building a Self-Healing CSS Selector Repair System

This article describes a Python sidecar that automatically fixes broken CSS selectors in a web scraper by using a local language model to propose new selector candidates and validate them against the live HTML.

💡

Why it matters

This system helps reduce the operational overhead of maintaining fragile web scrapers by automating the selector repair process, improving reliability and reducing manual intervention.

Key Points

1Scraper failures due to fragile CSS selectors are a recurring operational problem
2The sidecar publishes repair jobs to a Redis queue when the scraper fails
3A local language model proposes new selector candidates, which are then tested against the HTML
4Validated selectors are automatically written to the database, no redeploy required
5The system has design principles like LLM as proposer, not decider, and escalation over hallucination

Details

The article discusses the problem of fragile CSS selectors in production web scrapers, where changes to a third-party website's DOM can silently break the scraper. The author presents a Python sidecar system that automatically fixes these issues. When the scraper fails to extract a field, it publishes a repair job to a Redis queue. The sidecar picks up the job, fetches the current HTML, and sends a prompt to a local language model running on the device via MLX. The LLM proposes new CSS/XPath selector candidates with confidence scores and reasoning. Each candidate is then tested against the live HTML using BeautifulSoup and lxml, and the extracted value is validated against a type schema. If a candidate passes, the new selector is written directly to the database, and the next scraper run uses the updated configuration. The system has design principles like treating the LLM as a proposer, not a decider, and escalating to human intervention for cases it cannot automatically resolve.

Building a Self-Healing CSS Selector Repair System

Why it matters

Key Points

Details

Dive deeper

Related Articles

The Hidden Cost of Using Large Language Models in SaaS

Recognition Is All You Need: Human-AI Dynamics as Cognitive…

The LLM Is the New Parser

Indexatron: Teaching Local LLMs to Analyze Family Photos

CityJS London 2026: Celebrating 30 Years of JavaScript in t…

LLM Agents Need a Nervous System, Not Just a Brain

Optimizing Token Usage in Claude Code: Killing the MCP Serv…

Adversarial Review for AI Agent Outputs

Making AI

Open-Weight AI Models Catch Up to Proprietary Ones, Shiftin…

AI Curator

Ask me anything about AI

Related Articles

The Hidden Cost of Using Large Language Models in SaaS

Recognition Is All You Need: Human-AI Dynamics as Cognitive…

Indexatron: Teaching Local LLMs to Analyze Family Photos

CityJS London 2026: Celebrating 30 Years of JavaScript in t…

LLM Agents Need a Nervous System, Not Just a Brain

Optimizing Token Usage in Claude Code: Killing the MCP Serv…

Adversarial Review for AI Agent Outputs

Open-Weight AI Models Catch Up to Proprietary Ones, Shiftin…