usageDb: Open-Source Rust Database for AI Usage & Billing

usageDb is the open-source Rust storage engine behind UsageBox: append-only, immutable, idempotent, with hourly rollups built in. Apache-licensed on GitHub.

8 min read

usageDbopen sourceRustAI billingusage trackingappend-only database

usageDb is the open-source storage engine that grew out of running UsageBox in production. It's an embedded, append-only usage database written in Rust, purpose-built for the kind of workload that breaks general-purpose databases the moment your AI product starts metering real customers. The source is on GitHub at github.com/pbudzik/usagedb, Apache-licensed, ready to fork.

Want the full tour of how it works inside? This article is the overview. The usageDb internals series is a 10-part deep dive, with links to the actual source, covering ingest and durability, idempotent dedupe, the columnar segment format, the manifest and crash recovery, hourly rollups, the query engine, compaction, the period lifecycle, and how it is all tested. The full list is at the bottom of this page.

Why a Purpose-Built Database for Usage Tracking

If you've ever tried to put billable usage events into Postgres or DynamoDB and watched the table grow into something nobody wants to operate, you already know the problem. AI usage data has a shape that's awkward for general-purpose engines:

  • Write-heavy by an order of magnitude. A single chatty agent can emit dozens of events per second, per customer.
  • Append-only. You never update a usage event. You correct it with a new event that points at the old one.
  • Read patterns are narrow. 95% of reads are "what does account X owe for this month", answered fast with rollups, not raw scans.
  • Audit requirements are absolute. Every invoice line must be reconcilable to the raw events that produced it, years later, even after the schema changed twice.

Postgres can do it. ClickHouse can do it. Both make you build the billing invariants on top, by hand, every time. usageDb bakes those invariants in.

What's Built In

Idempotency, Without the Lecture

Every event carries a stable event_id. If the same event arrives twice (retries, double-fires from an upstream client, reprocessing a Kafka offset), the second copy is dropped silently. If a duplicate event_id arrives with a different payload, that's a conflict and you get told about it instead of having two truths coexisting in your billing data.

Immutable Raw Audit Trail

Raw segments are written once and never rewritten. Compaction merges micro-segments into larger ones, but the underlying events keep their original timestamps and quantities. Three years from now when a customer disputes a 2026-Q2 invoice, the events are bit-for-bit what they were the day they landed.

Hourly Rollups for Invoice Queries

Raw events feed a rollup builder that maintains hour-bucket aggregates per account, product, meter, model. Invoice queries hit the rollups, not the raw segments, so generating a monthly invoice for an account is constant time regardless of how many events that account produced.

Columnar Encoding Without an OLAP Engine in the Way

Segments use dictionary encoding for strings, delta encoding for timestamps, Zigzag for quantities, and Zstd or LZ4 compression at the block level. Block metadata enables skip pruning on common predicates (account, product, time range) so queries don't scan blocks they can't possibly need.

Late-Correction Handling

Stripe sends a refund three weeks after the original charge. Your AI provider issues a credit two days after the run that consumed it. usageDb treats these as first-class events: the original raw record stays untouched, a correction event is recorded, and the rollup for the affected hour bucket is rebuilt deterministically.

What It Isn't

usageDb is a small tool that does one thing. Things it deliberately is not:

  • Not a general-purpose OLAP engine. If you need arbitrary analytics across years of mixed dimensions, ClickHouse and DuckDB will outclass it.
  • Not a time-series database. InfluxDB, Prometheus, and TimescaleDB are still the right tools for monitoring and observability data.
  • Not a document store. No nested JSON, no graph traversal.
  • Not a Postgres replacement. Your application data (customers, plans, invoices, subscriptions) still belongs in a relational database. usageDb sits next to it and owns the usage event firehose.

Comparison Against the Alternatives

usageDb vs Postgres for Usage Tracking

  • Throughput: usageDb sustains millions of events per minute on a single node. Postgres caps out an order of magnitude lower without partitioning gymnastics.
  • Storage cost: Columnar compression typically gives 10x to 20x smaller on-disk footprint vs row-stored Postgres usage tables.
  • Invoice generation: O(1) per account via rollups vs O(events) range scans.
  • Operational simplicity: Embedded, no separate service to operate.

usageDb vs ClickHouse

  • Footprint: Single Rust binary vs multi-node distributed system.
  • Billing invariants: Idempotency and immutability are first-class in usageDb. In ClickHouse you build them in the application layer.
  • Query power: ClickHouse wins for arbitrary analytical SQL. usageDb intentionally limits to a SQL subset focused on usage and invoice queries.

usageDb vs InfluxDB

  • Use case: InfluxDB is for monitoring time series; usageDb is for billable events. Different retention, different audit requirements.
  • Cardinality: InfluxDB struggles with high-cardinality tags (one per customer); usageDb is designed for it.
  • Corrections: First-class in usageDb, awkward in InfluxDB.

How UsageBox Uses It

UsageBox handles the billing API surface: meters, products, subscriptions, invoice generation, customer dashboards, and the integration layer between your application and Stripe. usageDb sits behind it as the storage layer that handles raw event ingestion and the rollups that invoice generation reads from.

Splitting the engine from the platform was a deliberate choice. It means:

  • The storage engine is auditable by anyone, not just our team.
  • Customers who want to self-host the ingestion side can, while still using UsageBox's invoice and customer-facing UI.
  • The billing-invariant primitives (idempotency, immutability, deterministic rollups) get pressure-tested by people other than us.

Quickstart

Clone and run via Docker Compose:

git clone https://github.com/pbudzik/usagedb
cd usagedb
docker compose up

The HTTP API accepts JSON events on a POST endpoint and serves both a SQL subset and a structured JSON query interface for reads. The README walks through the event schema and the available query patterns.

Status and Roadmap

usageDb is at 0.1.0. It's usable for the workloads above, but the public API may shift before 1.0. Things in flight:

  • Snapshot export for cold-storage offload
  • Multi-tenant isolation primitives
  • Native gRPC ingest endpoint alongside the existing HTTP one
  • Property-based test coverage expansion around dedupe and compaction edge cases

Issues, benchmarks against your real workload, and design feedback are all welcome via GitHub Issues.

When to Use usageDb vs Something Else

Use usageDb when:

  • You're metering AI usage (tokens, credits, tool calls, agent runtime).
  • Write volume exceeds what a single Postgres instance handles comfortably.
  • You need a clean audit trail that survives schema changes.
  • You want a small embedded engine instead of operating a distributed cluster.

Use something else when:

  • Your usage volume is small (a few thousand events per day) and Postgres is fine.
  • You need arbitrary analytical SQL across years of mixed dimensions (ClickHouse, DuckDB).
  • Your data is monitoring time series, not billable events (Prometheus, InfluxDB, TimescaleDB).

Adjacent Concerns

If your billing pipeline meters LLM-generated content (agents, chatbots, support bots) or persistent agent memory, the storage layer has neighbors:

  • LLM security audit: cost-amplification attacks against per-user metering show up first in the usage data. Austa's 2026 LLM Security Checklist covers the per-user-cost and anomaly-detection controls explicitly (L-3, L-5).
  • Agent memory architecture: persistent agent state (vector store, episodic memory) generates billable events differently from one-shot LLM calls. memnode documents the memory-architecture side; usagedb handles the per-event billing trail for it.

The usageDb Internals Series

This overview stays high level on purpose. If you want to see how each guarantee is actually implemented, with the real structs, functions, and file links, the internals series walks the engine end to end:

  1. Why a purpose-built usage database: the billing invariants, the architecture, and the data model.
  2. The ingest path and durability contract: WAL, memtable, and Strict vs Fast fsync.
  3. Idempotency and deduplication: event_ids, blake3 hashing, conflicts, and cross-restart dedupe.
  4. The columnar segment format: dictionary, delta, zigzag, and run-length encodings.
  5. The manifest and crash recovery: atomic commits and generation rollback.
  6. Hourly rollups and the watermark: fast monthly totals that never cross unflushed data.
  7. The query engine: segment pruning, a strict SQL subset, and provenance.
  8. Compaction: merging segments behind an atomic manifest swap.
  9. Period lifecycle and frozen snapshots: corrections and stable invoices.
  10. Property tests and simulation testing: how the billing invariants are proven.

The Bigger Picture

Usage-based billing platforms have spent a decade telling teams to build their own metering layer on whatever database they already have. The result is a long tail of broken billing systems where idempotency was an afterthought, invoices can't be reconciled to events, and the storage layer hits a wall at the worst possible time. usageDb is what we think the storage layer for that work should look like, and it's open source because the right answer to "is your billing math correct" is "read the code yourself."

If you want the storage engine without the platform, grab it from GitHub. If you want the full billing platform with the engine already wired up, UsageBox ships it as the production default.

Key Topics

  • usageDb
  • open source
  • Rust
  • AI billing
  • usage tracking
  • append-only database

Related Articles

Explore more articles on similar topics to deepen your understanding of usage-based billing.

Why We Built usageDb: A Purpose-Built Rust Database for AI Usage and Billing

usageDb is an open-source Rust storage engine for AI usage metering and billing. Part 1 of a 10-part internals series: t...

8 min readRead more

Inside usageDb's Ingest Path: WAL, Memtable, and the Durability Contract

How usageDb turns an acknowledged usage event into a durable, billable fact: the three-phase ingest critical section, th...

9 min readRead more

Idempotent Metering in usageDb: Dedupe, Conflicts, and At-Least-Once Collectors

How usageDb guarantees each billable event is counted exactly once: stable event_ids, blake3 128-bit payload hashing, th...

9 min readRead more

Explore More Articles

Discover our complete collection of usage-based billing guides and implementation patterns.

View all articles