Open-Sourcing an Ollama Logging Proxy for Visibility into LLM Usage
The author open-sourced a transparent proxy that logs every inference call to an Ollama LLM instance, providing visibility into token usage, latency, and per-service consumption.
Why it matters
This tool provides much-needed observability for developers and teams running Ollama LLM instances, allowing them to optimize performance and understand their LLM usage.
Key Points
- 1Ollama LLM usage data is not available when running locally, making it difficult to optimize performance
- 2The proxy logs model usage, token counts, and response times, allowing the author to identify performance bottlenecks
- 3The proxy supports multiple storage backends and includes a built-in dashboard and Prometheus metrics
- 4The author open-sourced the tool to address the lack of observability in the local LLM ecosystem
Details
The author built a transparent proxy that sits between their services and the Ollama LLM instance, logging every inference call with full token counts and timing data. This allowed them to gain visibility into their LLM usage, which they previously lacked when running Ollama locally. The proxy forwards requests and responses without adding any latency, and supports multiple storage backends, including SQLite, PostgreSQL, and JSON logs. It also includes a built-in dashboard and Prometheus metrics, as well as a pre-built Grafana dashboard. The author open-sourced the tool because the observability layer in the local LLM ecosystem is mostly missing, and anyone running Ollama for real workloads will eventually need to know what's happening at the request level.
No comments yet
Be the first to comment