Cloudflare Workers HTML to Markdown on the Free Plan
This article explores strategies for converting HTML to Markdown within Cloudflare Workers on the free plan, which has strict CPU and bundle size limits.
Why it matters
This article provides a practical solution for efficiently converting HTML to Markdown within the constraints of the Cloudflare Workers free plan, which is useful for a variety of AI-powered applications.
Key Points
- 1Cloudflare Workers' free plan has a 10ms CPU limit and 1MB compressed bundle limit
- 2The HTMLRewriter API is a streaming, SAX-style parser that can convert HTML to Markdown efficiently within the free plan limits
- 3Other HTML-to-Markdown libraries like turndown and Readability have larger bundle sizes and higher CPU usage, making them unsuitable for the free plan
Details
The article discusses the need to convert HTML content to Markdown format for AI crawlers and language models, which prefer the cleaner structure and reduced token count of Markdown. For content that is already in Markdown, the solution is simple - just negotiate the format with the `Accept: text/markdown` header. However, for HTML content, a conversion process is required within a Cloudflare Worker. The free Cloudflare Workers plan has strict limits of 10ms CPU time and 1MB compressed bundle size, which rules out many popular HTML-to-Markdown libraries that rely on constructing a full DOM in memory. The article highlights the Cloudflare-provided HTMLRewriter API as the best fit for this use case, as it is a streaming, SAX-style parser that can convert HTML to Markdown within the free plan constraints. The author provides performance metrics showing that HTMLRewriter uses only 2ms of CPU time and a 3.74KB gzipped bundle size, leaving ample headroom under the free plan limits.
No comments yet
Be the first to comment