Who Spent the Tokens? Cost Attribution Across Tools, Sub-Agents, and Retries (2026)

A single agent run fans out into tool calls, sub-agents, parallel branches, and silent retries, then returns one opaque token total - and the expensive question (which customer, feature, step, and model burned the spend) cannot be reconstructed from it after the fact. Attribution is a write-time property: every model call has to be tagged with a few dimensions (trace id, customer, feature, step, model) and emitted as a usage event the meter rolls up by any of them. This explains why provider exports and application logs cannot attribute agent cost, the exact dimension set that makes a token traceable, and the idempotency rule (count each event id once) that stops retries and at-least-once collectors from double-counting an agent's own spend. For AI products, attribution is not a report - it is the billing system.

8 min read

cost attributionAI agentsusage meteringdimensionsidempotencytrace idchargebacksub-agents2026

TL;DR (June 2026): A single agent run fans out into tool calls, sub-agents, parallel branches, and silent retries, then hands you back one opaque token total. The expensive question - which customer, which feature, which step, which model burned the spend - cannot be answered from that total, and it cannot be reconstructed after the fact. Attribution has to be captured at the moment of each call, by tagging every LLM request with a few dimensions (trace id, customer, feature, step, model) and emitting a usage event the meter rolls up by any of them. Get the dimensions right and "who spent the tokens" is a query; get them wrong and it is a forensic investigation that retries quietly double-count. Here is the dimension set that makes a token traceable, and the idempotency rule that keeps the roll-up honest.

The runaway-bill stories all share a shape: the number is real, but nobody can say where it came from. A $1,400 hour of agent work is 87 tasks deep, each task a chain of tool calls and model invocations, and the invoice is a single line. You cannot tell which task was the disaster, which customer to bill, or which feature to fix, because the cost arrived pre-blended. That is the attribution problem, and as agents replace one-shot chat it is becoming the central metering challenge - not "how many tokens" but "whose, and for what".

One run, one opaque bill

Consider a single agent request. It plans, calls a search tool, spawns two sub-agents to read results in parallel, retries one of them after a timeout, calls the model again to synthesize, and returns. That is six-plus model calls across three logical "actors," one of them duplicated by a retry. The provider bills you the sum. If all you logged was the top-level request's reported usage, you have one number for an event that had six causes - and the retry may or may not be in it depending on where your SDK counts.

This is why per-seat thinking collapses here: a flat price assumes flat cost-to-serve, but one agentic user's run can cost more than their monthly seat, and you cannot even see which run did it. We covered the pricing side in per-seat pricing can't survive agentic users; this is the measurement side of the same problem.

You can't attribute after the fact

The instinct is to reconcile later - take the provider's monthly usage export and try to map it back to customers. It does not work, for structural reasons:

  • Provider exports are aggregated by API key and time, not by your customer or feature. The dimension you need was never recorded.
  • Retries and SDK-internal calls do not appear in your application logs at all, so your reconstruction is missing rows it does not know are missing.
  • Parallel sub-agents interleave in time, so timestamps cannot untangle which branch a call belonged to.

Attribution is a write-time property. If the dimensions are not stamped onto the usage event when the call happens, they are gone.

The dimensions that make a token traceable

The fix is to emit a structured usage event at every model call - ideally at the gateway, so retries and SDK calls are captured too - carrying the dimensions you will later want to slice by:

DimensionAnswersWhy it has to be at write time
trace_id / span_idWhich run, and which step within itLets you collapse a fan-out of calls back into one run and drill into the expensive step
customer_id / account_idWhose usage this isThe basis for billing or chargeback; provider exports never have it
feature / workflowWhat product surface triggered itTells you which feature is unprofitable, not just that something is
agent / stepWhich actor in the chain (planner, sub-agent, tool synth)Isolates the runaway sub-agent from the cheap planner
model + tokens in/out + cache statusThe cost itselfCost differs 100x by model; a blended number hides the mix

With these stamped on each event, "which customer spent the most this month," "what does feature X cost to serve," and "which step in this run blew up" are all the same operation: filter and sum by a dimension. This is the dimensioned roll-up we built in Kata #4: per-customer, per-model cost with dimensions - the agent case just adds trace_id and step so a single run is itself sliceable.

Retries and sub-agents double-count if you let them

The dimension that makes agents dangerous to meter is the retry. A timed-out sub-agent that is retried makes the call twice; an at-least-once collector may ship the same event twice; a framework that wraps and re-emits can report it again. If your roll-up naively sums every event it sees, an agent that retries aggressively inflates its own attributed cost - and now your "expensive customer" analysis is wrong in the customer's favor or yours, depending on which way it skews.

The guard is idempotency: every usage event carries a stable event_id, and the meter counts each id once no matter how many times it arrives. A genuine second call (a real retry that actually hit the provider) is a distinct event with its own id and should count; a duplicate delivery of the same call must not. Telling those apart is exactly the job covered in idempotent usage metering, and it is why agent attribution cannot live in a plain log table that has no concept of "seen this one already."

Attribution is the product, not a report

For anyone selling an AI product, attribution stops being an internal nicety and becomes the billing system: you cannot charge a customer usage-based, offer per-feature pricing, or honor a prepaid credit draw-down if you cannot say whose tokens these were. The agent era did not create the need to meter - it made the blended bill so opaque that after-the-fact attribution finally became impossible, forcing the dimensions to the write path where they always belonged.

Tag every call. Meter at the gateway. Dedupe by event id. Then the opaque agent bill becomes a table you can group by customer, feature, run, and step - and the next $1,400 hour is a row you can point at, not a mystery you absorb.

Key Topics

  • cost attribution
  • AI agents
  • usage metering
  • dimensions
  • idempotency
  • trace id
  • chargeback
  • sub-agents
  • 2026

Related Articles

Explore more articles on similar topics to deepen your understanding of usage-based billing.

UsageBox Kata #1: From Token Event to Invoice Line in 30 Minutes

A hands-on kata: take a raw AI usage event - a chunk of Claude tokens, a tool call, a credit burn - and turn it into a s...

7 min readRead more

Why Usage Metering Needs Its Own Database (and What a SQL Table Quietly Breaks)

Most usage-pricing writing is about reading the meter your vendor gives you. This is about the layer underneath: the dat...

9 min readRead more

The LLM Gateway Is Your Cheapest Cost Lever: Token Quotas, Per-Key Budgets, and Where Metering Lives (2026)

The cheapest place to control AI cost is not your application code - it is the LLM gateway every model call already pass...

8 min readRead more

Explore More Articles

Discover our complete collection of usage-based billing guides and implementation patterns.

View all articles