UsageBox Kata #2: Live Spend Caps and Real-Time Usage

Catch and cap AI spend before the bill lands. A hands-on kata against the real metering API: read an account month-to-date total fast from rollups, understand why the open current hour falls back to raw so the live number is both fast and current, compute headroom against a budget, run a real-time burn-rate check, and act at the threshold - soft caps that alert and hard caps your app enforces (the meter measures, your app gates). Plus per-meter caps with group_by, production notes, variations (Slack alerts, per-model caps, prepaid-credit countdowns), and FAQ.

7 min read

usagebox kataspend capsreal-time usagebudget alertsusage-based billingAI cost controlmetering APIrollups2026

This is a hands-on kata, not a think-piece. The goal: catch and cap an account's spend before the bill lands, using UsageBox, in about 30 minutes. By the end you will read a live month-to-date total that is both fast and current, compute headroom against a budget, run a cheap real-time check loop, and act at the threshold - with one honest constraint baked in: UsageBox meters usage, it does not gate it. The cap lives in your app; the meter tells you the truth in real time.

If you have done Kata #1, metering a usage event to an invoice line, you already have idempotent ingest and cheap totals. This kata builds the spend guardrail on top of that same store.

Step 1: read the month-to-date total fast

The number you guard against is the account's spend since month-start. Ask for it from the first of the month to right now, grouped by meter so you can see what is driving it. The default source=rollup keeps it cheap:

curl "https://api.usagebox.com/v1/accounts/acct_42/usage?from=2026-06-01T00:00:00Z&to=2026-06-16T14:30:00Z&source=rollup&group_by=meter_id" \
  -H "Authorization: Bearer $USAGEBOX_KEY"
{
  "source": "rollup",
  "groups": [
    { "meter_id": "claude_tokens", "sum": 41280400, "count": 2310 },
    { "meter_id": "tool_calls", "sum": 18900, "count": 18900 }
  ]
}

Completed hours are pre-aggregated in the background, so this read does not scan millions of raw rows and does not contend with live ingestion. That matters here more than in Kata #1: you are going to call this on an interval, not once at billing time.

Step 2: the open-hour question

A spend cap is only useful if "month-to-date" includes the last few minutes of usage. The trap with most rollup systems is that the current hour is not aggregated yet, so a fast read silently lags reality by up to an hour - exactly when a runaway account does its damage.

UsageBox closes that gap automatically. Completed hours come from the rollup; the open (current) hour falls back to raw under the hood, so the returned sum already includes right-now usage. You do not pass a special flag and you do not pay for a full scan - only the open hour is read raw. The total in Step 1 is both fast and current. If you want to prove it to yourself, force a full raw scan and confirm the numbers agree:

curl "https://api.usagebox.com/v1/accounts/acct_42/usage?from=2026-06-01T00:00:00Z&to=2026-06-16T14:30:00Z&source=raw&group_by=meter_id" \
  -H "Authorization: Bearer $USAGEBOX_KEY"
{ "source": "raw", "groups": [{ "meter_id": "claude_tokens", "sum": 41280400, "count": 2310 }, { "meter_id": "tool_calls", "sum": 18900, "count": 18900 }] }

Same sums, different source. Use rollup for the loop; keep raw in your pocket for verification.

Step 3: set a budget and compute headroom

The cap itself lives in your system - a per-account budget you store next to the customer record. Convert the metered quantities into spend with your own price book, then measure how close you are. The threshold logic is plain client code:

const budgetUsd = 500.00;            // this account's monthly cap
const price = { claude_tokens: 0.000009, tool_calls: 0.002 };

const spend = groups.reduce((acc, g) => acc + g.sum * (price[g.meter_id] || 0), 0);
const pct = spend / budgetUsd;       // fraction of budget consumed
const headroom = budgetUsd - spend;  // dollars left this month

if (pct >= 0.80) {
  // soft cap territory - warn
}
if (pct >= 1.00) {
  // hard cap territory - stop serving this account
}

UsageBox gives you the quantities; the budget, the price book, and the 80 percent line are yours. Keep them in your app so a customer-specific exception is a config change, not an API call.

Step 4: the real-time check loop

Run the cheap rollup read on a short interval - every minute or two - and recompute headroom each tick. Because the open hour folds in automatically, each tick reflects the latest usage:

async function checkSpend(accountId) {
  const now = new Date().toISOString();
  const url = "https://api.usagebox.com/v1/accounts/" + accountId
    + "/usage?from=2026-06-01T00:00:00Z&to=" + now
    + "&source=rollup&group_by=meter_id";
  const res = await fetch(url, { headers: { Authorization: "Bearer " + process.env.USAGEBOX_KEY } });
  const { groups } = await res.json();
  return groups; // feed into the Step 3 math
}
setInterval(() => checkSpend("acct_42"), 90000);

If you also want a live burn rate - dollars per minute, to predict when an account will cross the line - query a tight recent window instead of the whole month. Last ten minutes, grouped by meter:

curl "https://api.usagebox.com/v1/accounts/acct_42/usage?from=2026-06-16T14:20:00Z&to=2026-06-16T14:30:00Z&source=rollup&group_by=meter_id" \
  -H "Authorization: Bearer $USAGEBOX_KEY"
{ "source": "rollup", "groups": [{ "meter_id": "claude_tokens", "sum": 612000, "count": 33 }] }

That ten-minute window is almost entirely the open hour, so it is served from raw and stays current to the second. Divide the spend by ten for a per-minute burn rate and project it against the headroom from Step 3.

Step 5: act at the threshold - soft cap vs hard cap

This is the honest part. UsageBox meters usage; it does not block it. There is no endpoint that says "stop this account" - and that is correct, because the meter should never be in the request path of your product. So the cap has two flavors, and both live in your code:

  • Soft cap (warn). At 80 percent, alert and let traffic continue. Post to Slack, email the account owner, raise a flag in your dashboard. Nothing is throttled - you just stop being surprised.
  • Hard cap (stop). At 100 percent, your application stops serving further billable requests for that account until the next period or a manual override. UsageBox told you the number; your gateway enforces the consequence.
if (pct >= 1.00) {
  await denyFurtherRequests(accountId);     // YOUR gateway, not UsageBox
  await notify("#billing", accountId + " hit hard cap at $" + spend.toFixed(2));
} else if (pct >= 0.80) {
  await notify("#billing", accountId + " at " + Math.round(pct * 100) + "% of budget");
}

Keep the gate fast and local (a cached flag your request handler checks), and treat the UsageBox read as the source of truth that flips that flag. Never put a network call to the meter on the hot path of every request - poll it, cache the verdict, enforce locally.

Step 6: per-meter and per-feature caps

A single dollar cap is blunt. Often one expensive meter - a vision model, a long-context call - is the real risk, and you want to throttle just that without starving the account's cheap traffic. Group by meter_id (you already are) and apply a separate ceiling per meter:

curl "https://api.usagebox.com/v1/accounts/acct_42/usage?from=2026-06-01T00:00:00Z&to=2026-06-16T14:30:00Z&source=rollup&group_by=meter_id" \
  -H "Authorization: Bearer $USAGEBOX_KEY"
{ "source": "rollup", "groups": [{ "meter_id": "claude_tokens", "sum": 41280400, "count": 2310 }, { "meter_id": "vision_pages", "sum": 9800, "count": 9800 }] }

Now your decision logic reads each meter against its own budget, so you can hard-cap vision_pages while leaving claude_tokens running. If you instrument a feature dimension on your events, swap group_by=meter_id for group_by=feature and cap a single product feature the same way - no schema change required.

Production notes before you ship it

  • Poll, do not block. The meter read belongs in a background loop, not in your request handler. Cache the verdict and enforce the cap locally so the cap costs you nothing per request.
  • Open hour is free and current. Rollup reads already fold in the open hour, so a one-minute loop on source=rollup is both cheap and live. Reserve source=raw for occasional verification.
  • The cap is yours, the truth is theirs. UsageBox never stops a request. Budgets, thresholds, price book, and the actual gate all live in your application.
  • Per-meter beats per-dollar. Cap the one expensive meter with group_by=meter_id instead of throttling the whole account at the first dollar sign.
  • Time math. Define month-start and "now" in UTC to match the RFC3339 timestamps on your events, or your headroom will drift by a timezone offset.

Kata variations to try

  • Projected overage. Combine the Step 4 burn rate with the Step 3 headroom to estimate the exact hour an account will cross its cap, and alert a day ahead.
  • Per-model cap. Re-run Step 6 with group_by=model_id to cap Opus spend separately from Sonnet on the same account.
  • Org-wide ceiling. Drop the account_id filter via /v1/query/json and group by account_id to find every account near its cap in one read.
  • Ad-hoc burn check. Hit /v1/query/sql with SELECT meter_id, SUM(quantity) FROM usage_events WHERE account_id='acct_42' GROUP BY meter_id for a one-off slice without wiring a new loop.

Kata FAQ

Does UsageBox stop usage when an account hits its cap? No. UsageBox meters usage; it does not gate it. It gives you a fast, current total in real time - your application decides whether to warn (soft cap) or stop serving the account (hard cap).

Is the live total actually up to date, or does it lag by an hour? It is current. Completed hours come from the rollup and the open current hour auto-falls-back to raw, so the returned sum includes right-now usage without a full scan.

How often can I safely poll? Often. The rollup read is cheap and does not contend with ingestion, so a per-minute loop is fine. Cache the verdict and enforce the cap locally rather than calling the meter on every request.

Can I cap one expensive meter without throttling everything? Yes. Read with group_by=meter_id (or a feature dimension) and apply a separate ceiling per meter, so a vision model can be hard-capped while cheap token traffic keeps flowing.

What you just avoided building

In six steps you got a live, fast, always-current month-to-date total, a headroom calculation, a real-time poll loop, a burn-rate window, soft and hard cap decision logic, and per-meter ceilings - without standing up your own real-time aggregation tier. Built in-house, "fast and current at the same time" is the hard part: you would be running a streaming aggregator alongside a batch rollup and reconciling the two on every read, which is precisely the consistency problem that makes a plain SQL usage table buckle under billing load. The meter holds that invariant so your guardrail can be a 90-second loop.

Keep reading: Kata #1, meter a usage event to an invoice line, Kata #3, reconcile a vendor bill against your meter, Kata #4, per-customer per-model cost with dimensions, and how to instrument AI usage for visibility.

Key Topics

  • usagebox kata
  • spend caps
  • real-time usage
  • budget alerts
  • usage-based billing
  • AI cost control
  • metering API
  • rollups
  • 2026

Related Articles

Explore more articles on similar topics to deepen your understanding of usage-based billing.

UsageBox Kata #1: From Token Event to Invoice Line in 30 Minutes

A hands-on kata: take a raw AI usage event - a chunk of Claude tokens, a tool call, a credit burn - and turn it into a s...

7 min readRead more

UsageBox Kata #3: Reconcile a Vendor Bill Against Your Meter

Close the gap between what a model vendor like Anthropic or OpenAI bills you and what you metered and charged customers....

8 min readRead more

UsageBox Kata #4: Per-Customer, Per-Model Cost with Dimensions

Turn the meter into a management instrument. A hands-on kata: attach up to 16 dimension keys (customer, feature, region,...

8 min readRead more

Explore More Articles

Discover our complete collection of usage-based billing guides and implementation patterns.

View all articles