The Single Best Way to Reduce LLM Costs (It Is Not What You Think)
The article discusses how the real cost driver for large language models (LLMs) is making unnecessary calls, not the cost per call. By tracking which LLM outputs are actually used by users, the author was able to reduce LLM costs by 40%.
Why it matters
Tracking the value of LLM outputs is a critical first step in optimizing costs, as unnecessary calls are a major driver of LLM expenses for many organizations.
Key Points
- 1Most LLM cost optimization advice focuses on reducing cost per call, but the real issue is making unnecessary calls
- 2The author found that 50% of LLM outputs were never read or immediately dismissed by users
- 3Adding a simple check to track if users acted on the LLM output allowed the author to reduce LLM calls and costs by 40%
- 4Optimizing prompts, models, and caching should come after understanding which outputs are actually valuable
Details
The article argues that the common advice to reduce LLM costs, such as using caching, cheaper models, or reducing token counts, misses the real issue. The author found that the majority of LLM outputs were being ignored by users, meaning those calls were essentially wasted. By adding tracking to see which outputs drove actual user actions, the author was able to reduce LLM calls and costs by 40% without impacting user satisfaction. The key is to first understand which LLM outputs are valuable before optimizing other aspects of the system. This data-driven approach allows companies to focus their LLM investments on the most impactful use cases.
No comments yet
Be the first to comment