Open-Sourcing an Ollama Logging Proxy
The author built a transparent proxy to log every inference call made to their local Ollama instance, including token counts and timing, to gain visibility into their LLM usage. They open-sourced the tool to help others running Ollama for real workloads.
Why it matters
Provides a much-needed observability layer for local Ollama deployments, enabling data-driven optimization of LLM infrastructure.
Key Points
- 1Developed a proxy to log Ollama inference calls with full token counts and timing
- 2Uncovered insights about model swap overhead, high-consuming workflows, and efficient embedding usage
- 3Open-sourced the tool with flexible storage backends, a built-in dashboard, and Prometheus metrics
Details
When running Ollama locally, the author had no visibility into which services were consuming the most tokens or how their model-swapping strategy was performing. They built a transparent proxy that sits between their services and Ollama, logging every inference call with full token counts and timing. This allowed them to measure the overhead of model swapping, identify a high-consuming workflow, and optimize their embedding usage. The open-source tool supports multiple storage backends, includes a built-in dashboard and Prometheus metrics, and can be easily extended to send logs anywhere. The author open-sourced it to help others running Ollama for production workloads gain observability into their LLM usage.
No comments yet
Be the first to comment