Built for AI products

Track AI usage.
Bill it accurately.
Without rebuilding your stack.

UsageBox meters tokens, GPU minutes, agent runtime, and tool calls per customer. Open-source storage engine. Connects to Stripe or your own invoicing.

Idempotent
Ingestion
Immutable
Audit trail
Open source
Storage engine
POST /v1/events
curl -X POST https://api.usagebox.com/v1/events \
-H "Authorization: Bearer $UBX_KEY" \
-d '{
"event_id": "req_8x42jk",
"account_id": "acme-co",
"meter": "llm_tokens_in",
"model": "claude-4.5-sonnet",
"quantity": 12450,
"timestamp": "2026-05-16T18:42:11Z"
}'
 
# Idempotent. Retries are safe.
# Rolls up hourly. Invoiced via Stripe.

Why generic billing breaks on AI workloads

Stripe Billing, Chargebee, Recurly: built for SaaS subscriptions in the 2010s. AI usage is a different shape of problem.

Volume breaks generic billing

Stripe's metered usage assumes thousands of events per customer per month. AI products send millions per day. Generic tools throttle, fail silently, or charge you per-event.

Attribution needs a graph

An agent run is N tool calls + M LLM calls + K memory ops. Each costs different amounts. Generic billing tools can't roll those into a single billable unit without engineering work you keep redoing.

Attacks show up in the bill

A user can manipulate prompts to trigger expensive generations. Without per-user spend ceilings and real-time anomaly detection, the first sign of an attack is your AWS or OpenAI invoice next month.

Built for the way AI products actually meter

Six primitives. Each one designed for the AI billing patterns generic tools fight you on.

Token Metering

Meter LLM input + output tokens per request, per agent, per tenant. Idempotent ingestion handles retries without double-billing.

GPU Minute Pricing

Bill inference time, fine-tuning runtime, or per-job GPU usage. Catalog-driven pricing rules; no code changes to update rates.

Per-Agent Cost Attribution

Track cost for each agent run across N tool calls + M LLM calls + memory ops. Roll up to tenant invoices automatically.

Real-Time Anomaly Detection

Cost-amplification attacks land in your usage data first. Per-user spend ceilings, alerting on token surges, kill-switches.

Hourly Rollups

Raw events feed hourly aggregates. Invoice generation is O(1) per account, not a scan over millions of rows.

Stripe + Manual Invoicing

Plug into Stripe for self-serve. Or generate finance-ready invoices for enterprise contracts. Same metering pipeline.

The storage engine is open source

usagedb is the Rust storage engine UsageBox runs on. Append-only, idempotent, immutable raw event audit trail, hourly rollups for invoice queries. Apache 2.0 on GitHub.

Read the code that produces every invoice line. Fork it. Self-host the ingestion layer while still using UsageBox for the platform side. The right answer to “is your billing math correct” is “read the code yourself.”

pbudzik/usagedb

Or read the architecture overview in our usagedb article, then go deep with the 10-part engine internals series: ingest, dedupe, columnar segments, rollups, the query engine, and how it is tested.

Notes on AI billing

Practical writing on metering patterns, AI cost attribution, and what we learn from production billing systems.

The LLM Gateway Is Your Cheapest Cost Lever: Token Quotas, Per-Key Budgets, and Where Metering Lives (2026)

The cheapest place to control AI cost is not your application code - it is the LLM gateway every model call already passes through. One proxy gives you four levers in a single chokepoint: per-key and per-team token quotas, hard spend budgets, model routing to cheaper models, and response caching - plus a metering point that sees every call, including the retries and SDK-internal requests app-level tracking misses. This breaks down what a gateway buys you, the build-vs-buy options (LiteLLM, Portkey, Helicone, OpenRouter, Cloudflare AI Gateway), why per-key virtual keys are the lever most teams skip, and why the gateway should be your enforcement point while a real meter behind it is the accounting point - because a request log is not a billing ledger.

Read →

Self-Hosting Open-Weight Models vs the API Bill: Where the Cost Actually Crosses Over (2026)

"You don't need Opus" is the loudest cost take of 2026 - open-weight models handle most production work at a fraction of frontier price. But "so just self-host and stop paying the API" hides a break-even most teams get wrong: self-hosting swaps a per-token bill for a per-hour GPU bill, and a per-hour bill is only cheap if the GPU stays busy. The crossover is a utilization problem - effective dollars-per-million-tokens equals GPU hourly cost divided by tokens served per hour - so the same hardware is cheaper or far more expensive than an API depending only on how saturated you keep it. This lays out the math, the three honest options (self-host, hosted open-model inference, frontier API), the hidden costs of self-hosting (idle time, ops, cold starts), when self-host genuinely wins, and why you cannot pick a side without measuring cost per task.

Read →

Who Spent the Tokens? Cost Attribution Across Tools, Sub-Agents, and Retries (2026)

A single agent run fans out into tool calls, sub-agents, parallel branches, and silent retries, then returns one opaque token total - and the expensive question (which customer, feature, step, and model burned the spend) cannot be reconstructed from it after the fact. Attribution is a write-time property: every model call has to be tagged with a few dimensions (trace id, customer, feature, step, model) and emitted as a usage event the meter rolls up by any of them. This explains why provider exports and application logs cannot attribute agent cost, the exact dimension set that makes a token traceable, and the idempotency rule (count each event id once) that stops retries and at-least-once collectors from double-counting an agent's own spend. For AI products, attribution is not a report - it is the billing system.

Read →

Start metering AI usage in 5 minutes

Free tier. No credit card. Open-source storage engine.

Get started free →