June 18, 2026 · 6 min read
Prompt Caching Is Quietly Breaking Your AI Cost Tracking (Cache Reads vs Writes, and the Numbers That Lie)
Prompt caching is the best per-call cost lever in 2026 - up to 90% off repeated context, stackable with batch discounts to ~25% of standard rates - but it quietly breaks cost tracking. A cached request still reports the full input-token count, so any tracker that multiplies total input tokens by the standard rate overstates spend on cache-heavy workloads (up to ~10x) and hides whether caching is working at all. The bug is real and current: the LiteLLM team logged "Anthropic cost tracking inaccurate for cached usage" (LIT-3771) in its June stability sprint, with an enterprise customer confirming it in production. The fix is an accounting rule, not a discount: meter cache writes, cache reads, and uncached input as three separately-priced events, and your dashboard goes from lying to load-bearing - surfacing both true cost and cache hit ratio.
Read →