This is a hands-on kata, not a think-piece. The goal: catch and cap an account's spend before the bill lands, using UsageBox, in about 30 minutes. By the end you will read a live month-to-date total that is both fast and current, compute headroom against a budget, run a cheap real-time check loop, and act at the threshold - with one honest constraint baked in: UsageBox meters usage, it does not gate it. The cap lives in your app; the meter tells you the truth in real time.
If you have done Kata #1, metering a usage event to an invoice line, you already have idempotent ingest and cheap totals. This kata builds the spend guardrail on top of that same store.
Step 1: read the month-to-date total fast
The number you guard against is the account's spend since month-start. Ask for it from the first of the month to right now, grouped by meter so you can see what is driving it. The default source=rollup keeps it cheap:
curl "https://api.usagebox.com/v1/accounts/acct_42/usage?from=2026-06-01T00:00:00Z&to=2026-06-16T14:30:00Z&source=rollup&group_by=meter_id" \
-H "Authorization: Bearer $USAGEBOX_KEY"
{
"source": "rollup",
"groups": [
{ "meter_id": "claude_tokens", "sum": 41280400, "count": 2310 },
{ "meter_id": "tool_calls", "sum": 18900, "count": 18900 }
]
}
Completed hours are pre-aggregated in the background, so this read does not scan millions of raw rows and does not contend with live ingestion. That matters here more than in Kata #1: you are going to call this on an interval, not once at billing time.
Step 2: the open-hour question
A spend cap is only useful if "month-to-date" includes the last few minutes of usage. The trap with most rollup systems is that the current hour is not aggregated yet, so a fast read silently lags reality by up to an hour - exactly when a runaway account does its damage.
UsageBox closes that gap automatically. Completed hours come from the rollup; the open (current) hour falls back to raw under the hood, so the returned sum already includes right-now usage. You do not pass a special flag and you do not pay for a full scan - only the open hour is read raw. The total in Step 1 is both fast and current. If you want to prove it to yourself, force a full raw scan and confirm the numbers agree:
curl "https://api.usagebox.com/v1/accounts/acct_42/usage?from=2026-06-01T00:00:00Z&to=2026-06-16T14:30:00Z&source=raw&group_by=meter_id" \
-H "Authorization: Bearer $USAGEBOX_KEY"
{ "source": "raw", "groups": [{ "meter_id": "claude_tokens", "sum": 41280400, "count": 2310 }, { "meter_id": "tool_calls", "sum": 18900, "count": 18900 }] }
Same sums, different source. Use rollup for the loop; keep raw in your pocket for verification.
Step 3: set a budget and compute headroom
The cap itself lives in your system - a per-account budget you store next to the customer record. Convert the metered quantities into spend with your own price book, then measure how close you are. The threshold logic is plain client code:
const budgetUsd = 500.00; // this account's monthly cap
const price = { claude_tokens: 0.000009, tool_calls: 0.002 };
const spend = groups.reduce((acc, g) => acc + g.sum * (price[g.meter_id] || 0), 0);
const pct = spend / budgetUsd; // fraction of budget consumed
const headroom = budgetUsd - spend; // dollars left this month
if (pct >= 0.80) {
// soft cap territory - warn
}
if (pct >= 1.00) {
// hard cap territory - stop serving this account
}
UsageBox gives you the quantities; the budget, the price book, and the 80 percent line are yours. Keep them in your app so a customer-specific exception is a config change, not an API call.
Step 4: the real-time check loop
Run the cheap rollup read on a short interval - every minute or two - and recompute headroom each tick. Because the open hour folds in automatically, each tick reflects the latest usage:
async function checkSpend(accountId) {
const now = new Date().toISOString();
const url = "https://api.usagebox.com/v1/accounts/" + accountId
+ "/usage?from=2026-06-01T00:00:00Z&to=" + now
+ "&source=rollup&group_by=meter_id";
const res = await fetch(url, { headers: { Authorization: "Bearer " + process.env.USAGEBOX_KEY } });
const { groups } = await res.json();
return groups; // feed into the Step 3 math
}
setInterval(() => checkSpend("acct_42"), 90000);
If you also want a live burn rate - dollars per minute, to predict when an account will cross the line - query a tight recent window instead of the whole month. Last ten minutes, grouped by meter:
curl "https://api.usagebox.com/v1/accounts/acct_42/usage?from=2026-06-16T14:20:00Z&to=2026-06-16T14:30:00Z&source=rollup&group_by=meter_id" \
-H "Authorization: Bearer $USAGEBOX_KEY"
{ "source": "rollup", "groups": [{ "meter_id": "claude_tokens", "sum": 612000, "count": 33 }] }
That ten-minute window is almost entirely the open hour, so it is served from raw and stays current to the second. Divide the spend by ten for a per-minute burn rate and project it against the headroom from Step 3.
Step 5: act at the threshold - soft cap vs hard cap
This is the honest part. UsageBox meters usage; it does not block it. There is no endpoint that says "stop this account" - and that is correct, because the meter should never be in the request path of your product. So the cap has two flavors, and both live in your code:
- Soft cap (warn). At 80 percent, alert and let traffic continue. Post to Slack, email the account owner, raise a flag in your dashboard. Nothing is throttled - you just stop being surprised.
- Hard cap (stop). At 100 percent, your application stops serving further billable requests for that account until the next period or a manual override. UsageBox told you the number; your gateway enforces the consequence.
if (pct >= 1.00) {
await denyFurtherRequests(accountId); // YOUR gateway, not UsageBox
await notify("#billing", accountId + " hit hard cap at $" + spend.toFixed(2));
} else if (pct >= 0.80) {
await notify("#billing", accountId + " at " + Math.round(pct * 100) + "% of budget");
}
Keep the gate fast and local (a cached flag your request handler checks), and treat the UsageBox read as the source of truth that flips that flag. Never put a network call to the meter on the hot path of every request - poll it, cache the verdict, enforce locally.
Step 6: per-meter and per-feature caps
A single dollar cap is blunt. Often one expensive meter - a vision model, a long-context call - is the real risk, and you want to throttle just that without starving the account's cheap traffic. Group by meter_id (you already are) and apply a separate ceiling per meter:
curl "https://api.usagebox.com/v1/accounts/acct_42/usage?from=2026-06-01T00:00:00Z&to=2026-06-16T14:30:00Z&source=rollup&group_by=meter_id" \
-H "Authorization: Bearer $USAGEBOX_KEY"
{ "source": "rollup", "groups": [{ "meter_id": "claude_tokens", "sum": 41280400, "count": 2310 }, { "meter_id": "vision_pages", "sum": 9800, "count": 9800 }] }
Now your decision logic reads each meter against its own budget, so you can hard-cap vision_pages while leaving claude_tokens running. If you instrument a feature dimension on your events, swap group_by=meter_id for group_by=feature and cap a single product feature the same way - no schema change required.
Production notes before you ship it
- Poll, do not block. The meter read belongs in a background loop, not in your request handler. Cache the verdict and enforce the cap locally so the cap costs you nothing per request.
- Open hour is free and current. Rollup reads already fold in the open hour, so a one-minute loop on
source=rollupis both cheap and live. Reservesource=rawfor occasional verification. - The cap is yours, the truth is theirs. UsageBox never stops a request. Budgets, thresholds, price book, and the actual gate all live in your application.
- Per-meter beats per-dollar. Cap the one expensive meter with
group_by=meter_idinstead of throttling the whole account at the first dollar sign. - Time math. Define month-start and "now" in UTC to match the RFC3339 timestamps on your events, or your headroom will drift by a timezone offset.
Kata variations to try
- Projected overage. Combine the Step 4 burn rate with the Step 3 headroom to estimate the exact hour an account will cross its cap, and alert a day ahead.
- Per-model cap. Re-run Step 6 with
group_by=model_idto cap Opus spend separately from Sonnet on the same account. - Org-wide ceiling. Drop the
account_idfilter via/v1/query/jsonand group byaccount_idto find every account near its cap in one read. - Ad-hoc burn check. Hit
/v1/query/sqlwithSELECT meter_id, SUM(quantity) FROM usage_events WHERE account_id='acct_42' GROUP BY meter_idfor a one-off slice without wiring a new loop.
Kata FAQ
Does UsageBox stop usage when an account hits its cap? No. UsageBox meters usage; it does not gate it. It gives you a fast, current total in real time - your application decides whether to warn (soft cap) or stop serving the account (hard cap).
Is the live total actually up to date, or does it lag by an hour? It is current. Completed hours come from the rollup and the open current hour auto-falls-back to raw, so the returned sum includes right-now usage without a full scan.
How often can I safely poll? Often. The rollup read is cheap and does not contend with ingestion, so a per-minute loop is fine. Cache the verdict and enforce the cap locally rather than calling the meter on every request.
Can I cap one expensive meter without throttling everything? Yes. Read with group_by=meter_id (or a feature dimension) and apply a separate ceiling per meter, so a vision model can be hard-capped while cheap token traffic keeps flowing.
What you just avoided building
In six steps you got a live, fast, always-current month-to-date total, a headroom calculation, a real-time poll loop, a burn-rate window, soft and hard cap decision logic, and per-meter ceilings - without standing up your own real-time aggregation tier. Built in-house, "fast and current at the same time" is the hard part: you would be running a streaming aggregator alongside a batch rollup and reconciling the two on every read, which is precisely the consistency problem that makes a plain SQL usage table buckle under billing load. The meter holds that invariant so your guardrail can be a 90-second loop.
Keep reading: Kata #1, meter a usage event to an invoice line, Kata #3, reconcile a vendor bill against your meter, Kata #4, per-customer per-model cost with dimensions, and how to instrument AI usage for visibility.