This is a hands-on kata, not a think-piece. The goal: take the invoice your model vendor just sent - Anthropic, OpenAI, whoever - and reconcile it against your own meter line by line, so you know exactly where the gap between what you paid and what you charged customers comes from. By the end you will have your metered total grouped by model, a self-check that proves your own numbers before you blame anyone, a per-model gap against the vendor invoice, a localized cause, and an immutable correction that closes the loop. All over the real metering API, no spreadsheet archaeology.
If you have done Kata #1 (meter an AI usage event to an invoice line), you already have events flowing. This kata is the month-end question that follows: the vendor bill says one thing, your invoices to customers say another, and someone has to explain the delta before margin quietly erodes.
Step 1: pull what you recorded, grouped by model
Start with your own truth. Ask the meter for the period total, split by model_id - this is what you actually metered and (presumably) passed through to customers:
curl "https://api.usagebox.com/v1/accounts/acct_42/usage?from=2026-05-01T00:00:00Z&to=2026-06-01T00:00:00Z&group_by=model_id" \
-H "Authorization: Bearer $USAGEBOX_KEY"
{
"source": "rollup",
"groups": [
{ "model_id": "claude-opus-4-8", "sum": 42180000, "count": 3140 },
{ "model_id": "claude-sonnet-4-5", "sum": 191400000, "count": 28755 },
{ "model_id": "claude-haiku-4-5", "sum": 88060000, "count": 14002 }
]
}
That is your billable token quantity per model for May. Note "source": "rollup": completed hours are pre-aggregated, so this read is cheap even over a full month. Keep these three numbers - they are the left-hand column of the reconciliation.
Step 2: verify your own numbers first (the step everyone skips)
Before you accuse the vendor of overbilling, prove that your fast rollup total is not itself wrong. A rollup that drifted from the raw events would make a perfectly correct vendor invoice look like a discrepancy. UsageBox ships a built-in raw-vs-rollup check:
curl "https://api.usagebox.com/v1/accounts/acct_42/verify?from=2026-05-01T00:00:00Z&to=2026-06-01T00:00:00Z" \
-H "Authorization: Bearer $USAGEBOX_KEY"
{
"ok": true,
"rollup_sum": 321640000,
"raw_sum": 321640000,
"drift": 0
}
/verify scans the raw immutable events and compares the total against the rollups you read in Step 1. If drift is anything but zero, stop here - the problem is on your side, not the vendor's, and you would have spent the afternoon arguing the wrong case. Only once your own books reconcile internally do you have standing to compare against an external bill.
Step 3: lay your total next to the vendor invoice
Now put the two columns side by side. Your meter says 321.64M tokens for May. The Anthropic invoice (converted from dollars back to token quantity at the rate card you are on, per model) says something a little higher. Compute the gap per model:
model metered (you) vendor invoice gap
claude-opus-4-8 42,180,000 42,180,000 0
claude-sonnet-4-5 191,400,000 197,920,000 +6,520,000
claude-haiku-4-5 88,060,000 88,060,000 0
TOTAL 321,640,000 328,160,000 +6,520,000
The gap is not spread evenly - it is concentrated entirely in claude-sonnet-4-5, about 3.4% over what you metered. A flat percentage across every model would suggest a rate-card or currency mismatch; a single-model spike says something specific happened on Sonnet traffic. That localization is the whole game, and the next step does it for you instead of by hand.
Step 4: localize the discrepancy with /explain and dimensions
Call /explain to get the breakdown, the segment provenance (which immutable files each number came from), and any corrections already applied. Then slice by your dimensions to find where the Sonnet tokens are - and are not:
curl "https://api.usagebox.com/v1/accounts/acct_42/explain?from=2026-05-01T00:00:00Z&to=2026-06-01T00:00:00Z" \
-H "Authorization: Bearer $USAGEBOX_KEY"
If you attached a feature or token_type dimension at ingest (Kata #1 covers this), group on it now to see what the raw total is made of:
curl "https://api.usagebox.com/v1/accounts/acct_42/usage?from=2026-05-01T00:00:00Z&to=2026-06-01T00:00:00Z&model_id=claude-sonnet-4-5&group_by=token_type" \
-H "Authorization: Bearer $USAGEBOX_KEY"
{
"source": "rollup",
"groups": [
{ "token_type": "input", "sum": 122900000, "count": 28755 },
{ "token_type": "output", "sum": 68500000, "count": 28755 }
]
}
You metered 191.4M Sonnet tokens as input plus output. The vendor invoice is 197.92M. The 6.52M you never recorded is the smoking gun: the category your collector did not pass through. The usual suspects are cached reads, retries that hit the vendor but never produced a customer-facing result, and system-prompt tokens you treated as fixed overhead. None show up under token_type because none were ever sent as events - which is itself the finding.
Step 5: decide which kind of gap this is
Every reconciliation gap is one of two things, and the fix is completely different for each. Use the dimensions to separate them:
- Metering gap. You actually served the usage and should have charged for it, but your collector failed to record the event. The vendor billed you, the customer used it, and you ate the cost. Fix: instrument the missing path so future events get recorded, and decide whether to bill the customer retroactively.
- Pass-through gap. The vendor charged you for overhead you legitimately should not pass to the customer - speculative retries, internal evals, your own system prompt that every request shares. This is real cost-of-goods, not under-billing. Fix: price it into your margin, do not invoice it as customer usage.
In our case, suppose /explain plus a quick grouped read shows that 6.0M of the 6.52M was prompt caching on a shared system prompt (overhead you chose not to bill) and the remaining 0.52M was a retried agent loop on one account (acct_42) that genuinely served the customer but never emitted an event. So: 6.0M is a pass-through gap to absorb, 0.52M is a metering gap to correct. You now know the number and the cause for both, and you can defend either one in a review.
Step 6: close the loop with a Correction event
For the 0.52M you under-metered, you do not edit history and you do not silently bump a total. You record a first-class Correction event that nets against the original and carries a correction_ref, so the audit trail proves the reconciliation end to end:
curl -X POST https://api.usagebox.com/v1/usage/batch \
-H "Authorization: Bearer $USAGEBOX_KEY" \
-H "Content-Type: application/json" \
-d '{
"events": [{
"event_id": "evt_2026-05_recon_acct42_sonnet_0001",
"account_id": "acct_42",
"meter_id": "claude_tokens",
"model_id": "claude-sonnet-4-5",
"kind": "Correction",
"correction_ref": "recon_may_2026_vendor_anthropic",
"quantity": 520000,
"unit": "tokens",
"timestamp": "2026-05-31T23:59:59Z",
"dimensions": { "reason": "untracked_retry_loop", "source": "vendor_reconciliation" }
}]
}'
{ "accepted": 1, "duplicates": 0, "conflicts": 0, "rejected": 0 }
If you had over-metered instead - billed a customer for tokens the vendor never actually charged you for - the same call with a negative quantity backs it out. Either way the original events stay untouched; the correction is an additive entry that references the reconciliation by correction_ref. Re-run /explain and the correction appears in the corrections-applied section with its provenance, so six months from now anyone can see not just the number but why it moved.
Production notes before you ship it
- Verify before you accuse. Always run
/verifyfirst. A drifted rollup makes a correct vendor invoice look fraudulent, and you only get to cry wolf once. - Meter the same dimensions the vendor bills on. If your vendor prices cached reads, retries, and system-prompt tokens separately, attach a
token_typeorkinddimension at ingest so reconciliation is agroup_by, not a forensic exercise. - Corrections, not edits. Every reconciliation adjustment is a
Correctionevent with acorrection_ref. History is append-only, which is exactly what makes the next audit short. - Reconcile before you close the period. Run this kata while the month is still open; once you close it, a stray
Usageevent is rejected and corrections land as visible adjustments instead of quiet edits.
Kata variations to try
- Per-customer attribution. Re-run Step 4 with
group_by=account_idon the Sonnet slice to find which customer's traffic drove the metering gap, then bill or absorb per account. - Ad-hoc SQL. Hit
/v1/query/sqlwithSELECT model_id, SUM(quantity) FROM usage_events WHERE account_id='acct_42' GROUP BY model_idfor a one-off cross-check against the rollup read. - Structured query. Use
/v1/query/jsonwithfiltersontoken_typeandmetricsofsumandcountto script the whole reconciliation into your month-end job. - Daily drift watch. Run
/verifywithgroup_by=dayto catch the exact day a collector started dropping events, instead of discovering a month-sized gap at invoice time.
Kata FAQ
The vendor invoice is in dollars and my meter is in tokens - how do I compare? Convert the invoice back to quantity per model using your rate card, then compare quantity to quantity. Reconciling on tokens (the thing both sides actually counted) isolates volume gaps from pricing gaps, which is what you want.
Why not just trust whichever total is bigger? Because the two gaps need opposite responses. A metering gap means you under-billed a customer; a pass-through gap means you should absorb vendor overhead into margin. Treating them the same either eats your profit or overcharges a customer.
What if /verify shows drift? Then your own rollup disagrees with your own raw events and the vendor is not the problem yet. Force a source=raw read to get the true total, find the cause of the drift, and only then reopen the vendor comparison.
Do corrections change the number I already invoiced? No. A Correction nets against the original as a separate, immutable event referenced by correction_ref. The original total stays visible; the corrected total is computed on top. The trail shows both.
What you just avoided building
In six steps you reconciled an external vendor bill against your own meter: a per-model metered total, a raw-vs-rollup self-check so you trust your own books first, a per-model gap, a dimensional drill-down that named the cause, a clean split between metering gaps and pass-through overhead, and an immutable correction that documents the whole thing. Built in-house, that is a second aggregation path to keep consistent with your raw events, a drift detector, dimensional grouping that does not melt under a month of data, and an append-only correction model that never rewrites history - a real metering database, not a reconciliation spreadsheet. That is why the gap between AI list price and real cost is so easy to lose track of, and why metering became strategic enough to drive the 2026 acquisition wave.
Keep reading: Kata #1 (meter a usage event to an invoice line), Kata #2 (live spend caps on real-time usage), and Kata #4 (per-customer, per-model cost with dimensions) - plus the usage-based billing guide for the bigger picture.