Built for AI products

Track AI usage.
Bill it accurately.
Without rebuilding your stack.

UsageBox meters tokens, GPU minutes, agent runtime, and tool calls per customer. Open-source storage engine. Connects to Stripe or your own invoicing.

Idempotent
Ingestion
Immutable
Audit trail
Open source
Storage engine
POST /v1/events
curl -X POST https://api.usagebox.com/v1/events \
-H "Authorization: Bearer $UBX_KEY" \
-d '{
"event_id": "req_8x42jk",
"account_id": "acme-co",
"meter": "llm_tokens_in",
"model": "claude-4.5-sonnet",
"quantity": 12450,
"timestamp": "2026-05-16T18:42:11Z"
}'
 
# Idempotent. Retries are safe.
# Rolls up hourly. Invoiced via Stripe.

Why generic billing breaks on AI workloads

Stripe Billing, Chargebee, Recurly: built for SaaS subscriptions in the 2010s. AI usage is a different shape of problem.

Volume breaks generic billing

Stripe's metered usage assumes thousands of events per customer per month. AI products send millions per day. Generic tools throttle, fail silently, or charge you per-event.

Attribution needs a graph

An agent run is N tool calls + M LLM calls + K memory ops. Each costs different amounts. Generic billing tools can't roll those into a single billable unit without engineering work you keep redoing.

Attacks show up in the bill

A user can manipulate prompts to trigger expensive generations. Without per-user spend ceilings and real-time anomaly detection, the first sign of an attack is your AWS or OpenAI invoice next month.

Built for the way AI products actually meter

Six primitives. Each one designed for the AI billing patterns generic tools fight you on.

Token Metering

Meter LLM input + output tokens per request, per agent, per tenant. Idempotent ingestion handles retries without double-billing.

GPU Minute Pricing

Bill inference time, fine-tuning runtime, or per-job GPU usage. Catalog-driven pricing rules; no code changes to update rates.

Per-Agent Cost Attribution

Track cost for each agent run across N tool calls + M LLM calls + memory ops. Roll up to tenant invoices automatically.

Real-Time Anomaly Detection

Cost-amplification attacks land in your usage data first. Per-user spend ceilings, alerting on token surges, kill-switches.

Hourly Rollups

Raw events feed hourly aggregates. Invoice generation is O(1) per account, not a scan over millions of rows.

Stripe + Manual Invoicing

Plug into Stripe for self-serve. Or generate finance-ready invoices for enterprise contracts. Same metering pipeline.

The storage engine is open source

usagedb is the Rust storage engine UsageBox runs on. Append-only, idempotent, immutable raw event audit trail, hourly rollups for invoice queries. Apache 2.0 on GitHub.

Read the code that produces every invoice line. Fork it. Self-host the ingestion layer while still using UsageBox for the platform side. The right answer to “is your billing math correct” is “read the code yourself.”

pbudzik/usagedb

Or read the architecture overview in our usagedb article, then go deep with the 10-part engine internals series: ingest, dedupe, columnar segments, rollups, the query engine, and how it is tested.

Notes on AI billing

Practical writing on metering patterns, AI cost attribution, and what we learn from production billing systems.

Prompt Caching Is Quietly Breaking Your AI Cost Tracking (Cache Reads vs Writes, and the Numbers That Lie)

Prompt caching is the best per-call cost lever in 2026 - up to 90% off repeated context, stackable with batch discounts to ~25% of standard rates - but it quietly breaks cost tracking. A cached request still reports the full input-token count, so any tracker that multiplies total input tokens by the standard rate overstates spend on cache-heavy workloads (up to ~10x) and hides whether caching is working at all. The bug is real and current: the LiteLLM team logged "Anthropic cost tracking inaccurate for cached usage" (LIT-3771) in its June stability sprint, with an enterprise customer confirming it in production. The fix is an accounting rule, not a discount: meter cache writes, cache reads, and uncached input as three separately-priced events, and your dashboard goes from lying to load-bearing - surfacing both true cost and cache hit ratio.

Read →

Per-Seat Pricing Can't Survive Agentic Users: The SaaS Margin Math That Breaks in One Loop

If you sell software at a flat per-seat price and your product calls an LLM that bills per token, your margin is a bet that no seat ever runs an agent - and that bet is now losing in public. An agentic task consumes roughly 1,000x more tokens than a one-shot chat, so a single power user can burn more cost-to-serve in a week than their annual seat price. Per-seat pricing assumes flat cost-to-serve; agentic usage turns that into a power law, and a flat price cannot straddle a power law. Raising the seat price overcharges the light-usage majority while still failing to cap the heavy tail. The escape is to meter consumption per account first, then pick a model that survives the curve - usage-based, hybrid seat-plus-overage, or prepaid credits - and gate runaway accounts with hard spend caps. Meter first, price second.

Read →

The Token Count Isn't the Bill: Why Tokenizer Differences Break Your LLM Cost Comparisons

The price-per-million-token number on a pricing page is not comparable across providers, because the token is not a standard unit. OpenAI tokenizes with tiktoken; Anthropic and Google use proprietary schemes, and the same prompt yields a different token count on each. So a model with a lower sticker rate can produce a higher bill for the identical text if its tokenizer splits that text into more tokens - Claude Fable 5 carried a ~35% tokenizer tax versus a naive token-for-token comparison. Code, JSON, and non-English text tokenize differently enough to flip a "cheaper" pick. The only honest comparison is $/task, not $/token: run a representative real task through each model, read the token counts each API actually reports, multiply by real rates (including long-context and cache pricing), and rank by cost-per-task at your quality bar.

Read →

Start metering AI usage in 5 minutes

Free tier. No credit card. Open-source storage engine.

Get started free →