Built for AI products

Track AI usage.
Bill it accurately.
Without rebuilding your stack.

UsageBox meters tokens, GPU minutes, agent runtime, and tool calls per customer. Open-source storage engine. Connects to Stripe or your own invoicing.

Idempotent
Ingestion
Immutable
Audit trail
Open source
Storage engine
POST /v1/events
curl -X POST https://api.usagebox.com/v1/events \
-H "Authorization: Bearer $UBX_KEY" \
-d '{
"event_id": "req_8x42jk",
"account_id": "acme-co",
"meter": "llm_tokens_in",
"model": "claude-4.5-sonnet",
"quantity": 12450,
"timestamp": "2026-05-16T18:42:11Z"
}'
 
# Idempotent. Retries are safe.
# Rolls up hourly. Invoiced via Stripe.

Why generic billing breaks on AI workloads

Stripe Billing, Chargebee, Recurly: built for SaaS subscriptions in the 2010s. AI usage is a different shape of problem.

Volume breaks generic billing

Stripe's metered usage assumes thousands of events per customer per month. AI products send millions per day. Generic tools throttle, fail silently, or charge you per-event.

Attribution needs a graph

An agent run is N tool calls + M LLM calls + K memory ops. Each costs different amounts. Generic billing tools can't roll those into a single billable unit without engineering work you keep redoing.

Attacks show up in the bill

A user can manipulate prompts to trigger expensive generations. Without per-user spend ceilings and real-time anomaly detection, the first sign of an attack is your AWS or OpenAI invoice next month.

Built for the way AI products actually meter

Six primitives. Each one designed for the AI billing patterns generic tools fight you on.

Token Metering

Meter LLM input + output tokens per request, per agent, per tenant. Idempotent ingestion handles retries without double-billing.

GPU Minute Pricing

Bill inference time, fine-tuning runtime, or per-job GPU usage. Catalog-driven pricing rules; no code changes to update rates.

Per-Agent Cost Attribution

Track cost for each agent run across N tool calls + M LLM calls + memory ops. Roll up to tenant invoices automatically.

Real-Time Anomaly Detection

Cost-amplification attacks land in your usage data first. Per-user spend ceilings, alerting on token surges, kill-switches.

Hourly Rollups

Raw events feed hourly aggregates. Invoice generation is O(1) per account, not a scan over millions of rows.

Stripe + Manual Invoicing

Plug into Stripe for self-serve. Or generate finance-ready invoices for enterprise contracts. Same metering pipeline.

The storage engine is open source

usagedb is the Rust storage engine UsageBox runs on. Append-only, idempotent, immutable raw event audit trail, hourly rollups for invoice queries. Apache 2.0 on GitHub.

Read the code that produces every invoice line. Fork it. Self-host the ingestion layer while still using UsageBox for the platform side. The right answer to “is your billing math correct” is “read the code yourself.”

pbudzik/usagedb

Or read the architecture overview in our usagedb article, then go deep with the 10-part engine internals series: ingest, dedupe, columnar segments, rollups, the query engine, and how it is tested.

Notes on AI billing

Practical writing on metering patterns, AI cost attribution, and what we learn from production billing systems.

The Tokenpocalypse: AI Coding's Flat-Rate Era Ended in 2026 (and What Survives the Meter)

June 2026 is when AI coding stopped being a flat subscription and became a metered utility. GitHub Copilot flipped every plan to usage-based AI Credits on June 1 and heavy users reported bills jumping 25x, from $29 to nearly $750 and from $50 to $3,000. Uber burned a full year of AI budget in four months and capped engineers at $1,500/month; Microsoft dropped Claude Code by June 30. Developers called it a "rug pull." The timeline, three charts (bill-shock, the Uber budget burn, the 2026 usage-based timeline), why the VC subsidy collapsed, and the one capability that actually survives the meter: per-developer, per-model metering with real-time caps.

Read →

Cost Per Task Is the New AI Benchmark: Composer 2.5 and the Workhorse-Model Economics of 2026

The benchmark that decides your AI bill is not score and it is not price per token, it is cost per task. On Artificial Analysis's Coding Agent Index, Cursor Composer 2.5 lands third (index 62) at about $0.07 per task on its standard tier, while the two models above it, Claude Opus 4.7 (66) and GPT-5.5 (65), cost $4.10 and $4.82 per task, roughly ten to sixty times more for three to four index points. But cost per task is a property of your traffic, not a launch slide: Composer is locked inside one editor with no API, and the cheap tier is not uniformly getting cheaper (Gemini 3.5 Flash shipped at six times the output price of Flash-Lite). Verified pricing table, a cost-per-task bar chart, a capability-vs-cost scatter, the Gemini price-jump chart, and why routing, enforced spend caps, and continuous per-task metering are the only way to control the bill.

Read →

Cheaper Than Gemini Flash-Lite? DeepSeek, GLM, Qwen and Kimi as Agentic Workhorses

On raw capability-per-dollar, several Chinese models beat Gemini 3.1 Flash-Lite (index 34, $0.25/$1.50): DeepSeek V4 Flash is smarter (Artificial Analysis index 47) at ~5x cheaper output ($0.28), with MiniMax M3 and DeepSeek V4 Pro also dominant. But the production deciders for an agentic/support workhorse are not IQ: tool-call serialization reliability, data residency (open-weight self-hosting as the escape hatch), and API stability. Provider-sourced price + capability table, a cost-vs-capability chart, and why you meter tool-call success rate before switching.

Read →

Start metering AI usage in 5 minutes

Free tier. No credit card. Open-source storage engine.

Get started free →