The short version: usageDb's query engine is intentionally narrow. It exposes two paths, a structured JSON query and a deliberately strict SQL subset, and it refuses queries that a general SQL database would happily run. SUM only accepts quantity, COUNT only accepts *, range operators distinguish < from <=, and OR / HAVING / aliases / SELECT * are rejected with an explicit error rather than silently mapped to something plausible. The executor prunes most segments before opening a single file, uses half-open [from, to) time ranges so adjacent months never double-count, and ships an explain endpoint that names exactly which segments produced each invoice line.
This is Part 7 of the usageDb internals series, a walk through the open-source Rust storage engine (github.com/pbudzik/usagedb) that backs UsageBox. Earlier parts built the durable ingest path, the columnar segments, and the hourly rollups. This part is about reading that data back. The guiding idea: a billing query engine should do less than a general SQL database on purpose, because the worst failure mode here is not a slow query, it is a query that silently returns the wrong number and that number becomes an invoice.
Series navigation: ← Part 6: Hourly rollups and the watermark · Part 8: Compaction →
Two query paths
There are two ways to ask usageDb a question, and they compile down to the same internal QueryPlan (src/query/plan.rs). The first is a structured JSON body at POST /v1/query/json with explicit fields: source, account_id, from, to, group_by, filters, and metrics. No parsing ambiguity, easy to generate from code.
POST /v1/query/json
{
"source": "usage_rollup_hourly",
"account_id": "acc_8f1c",
"from": "2026-05-01T00:00:00Z",
"to": "2026-06-01T00:00:00Z",
"group_by": ["meter_id", "model_id"],
"filters": { "product_id": ["ai_gateway"] },
"metrics": { "quantity": "sum", "count": "count" }
}
The second path is POST /v1/query/sql, which takes a SQL string and runs it through the strict subset parser. It targets the same two logical tables, usage_events (raw) and usage_rollup_hourly (rollups), and supports a bounded grammar: a projection of group columns and aggregates, an AND-only WHERE, and a GROUP BY.
POST /v1/query/sql
{
"query": "SELECT meter_id, SUM(quantity) FROM usage_events
WHERE account_id = 'acc_8f1c'
AND timestamp_ms >= 1746057600000
AND timestamp_ms < 1748736000000
GROUP BY meter_id"
}
Why the SQL parser is strict on purpose
The parser lives in src/query/sql.rs. An earlier permissive version did what a lot of homegrown SQL layers do: it mapped SUM(any_column) to SUM(quantity), treated < and <= as interchangeable, quietly dropped OR / HAVING / aliases, and tolerated AST nodes it did not actually implement. Every one of those shortcuts is wrong-answer territory when the answer is a bill. The current parser inverts the default: anything it does not fully understand is rejected with a precise message.
Concretely, the parser refuses:
SUMof anything butquantity.SUM(tokens)returns the error "SUM only supports thequantitycolumn", instead of guessing what you meant.COUNTof anything but*. Column-counting is not implemented, soCOUNT(meter_id)errors rather than silently behaving likeCOUNT(*).ORinWHERE. The executor only models a conjunction of equalities and ranges. A silently-mishandledORcould match nothing, so it is rejected explicitly.SELECT *, column aliases (AS),HAVING,DISTINCT,JOIN,ORDER BY,LIMIT, CTEs, set operations. Each gets its own targeted error string.
The range handling is the subtle part. The parser distinguishes the four comparison operators and normalizes every bound into a half-open [from_ms, to_ms) window, converting inclusive bounds to exclusive by adjusting by one millisecond:
match op {
BinaryOperator::Gt => plan.from_ms = plan.from_ms.max(v.saturating_add(1)),
BinaryOperator::GtEq => plan.from_ms = plan.from_ms.max(v),
BinaryOperator::Lt => plan.to_ms = plan.to_ms.min(v),
BinaryOperator::LtEq => plan.to_ms = plan.to_ms.min(v.saturating_add(1)),
_ => unreachable!(),
}
The philosophy is that an explicit error costs an engineer a few minutes to fix their query, while a wrong number can cost a customer an over-charge and the vendor its credibility. Strictness is a billing safety feature, not pedantry.
How the executor runs a plan
The executor (src/query/executor.rs) collects events from immutable raw segments plus a snapshot of the live memtable, applies the plan's filters, groups by the requested keys, and computes the metrics. Supported group keys are the column names (account_id, product_id, meter_id, model_id, source, unit), the time buckets hour_start_ms and day, or any dimension key from the event's variable-axis map. You can also filter or group by kind to isolate corrections and retractions for forensics. Supported metrics are sum (of quantity, accumulated in i128) and count.
Segment pruning before any file is opened
The expensive part of a query is reading segment files, so the executor decides which segments could possibly contribute before opening any of them. It does this from SegmentMeta in the manifest, which carries a min/max timestamp, a bucket label, account-ID min/max bounds, and per-segment sets of product_ids / meter_ids / model_ids. The pruning predicate segment_overlaps_plan combines all of them:
// time range (half-open): segment in range iff some ts could be in [from, to)
if !(s.min_timestamp_ms < to_ms && s.max_timestamp_ms >= from_ms) { return false; }
// bucket from account_id: blake3(account_id) % bucket_count
if let Some(account) = &plan.account_id {
let target = bucket_for_account(&AccountId(account.clone()), bucket_count);
if s.bucket != target { return false; }
// account name within this segment's [min, max] range
if account < &min.0 || account > &max.0 { return false; }
}
// product / meter / model filters via the segment's ID sets
if !filter_intersects(&plan.filters, "product_id", &s.product_ids, ..) { return false; }
The bucket field is the workhorse. Segments are partitioned by bucket = blake3(account_id) % bucket_count at flush time, so a single-account monthly query touches only the one bucket that account hashes into, plus the timestamp and account-name bounds prune most segments inside even that bucket. For a database holding thousands of accounts, this turns a full scan into a handful of file opens. The same bucket check is applied to rollup segments. Pruning is segment-level only today; there is no block-level metadata for finer skipping inside a segment yet.
Half-open time ranges, everywhere
Every query path uses half-open [from, to) semantics: the from bound is included, the to bound is excluded. This is enforced both in the pruning predicate and in the per-event check inside matches_plan:
if e.timestamp_ms < plan.from_ms || e.timestamp_ms >= plan.to_ms {
return false;
}
This is a correctness detail worth dwelling on. Consider two adjacent monthly queries, May as [May 1, Jun 1) and June as [Jun 1, Jul 1). An event landing at exactly midnight on June 1 belongs to June and June only. If the convention were inclusive on both ends, that boundary event would be counted in both months and the customer would be billed for it twice. Half-open intervals make month boundaries tile perfectly with no gaps and no overlaps, and because the rule is applied identically across the JSON path, the SQL path, and the account-usage GET, there is no path where it can drift.
Rollup source vs raw source
The source field selects the read strategy, and it ties directly back to Part 6. With source = usage_rollup_hourly (the default for the account-usage GET) the executor reads pre-aggregated hourly rollup rows for the sealed part of the range, that is up to the rollup watermark, then falls back to a raw scan only for the open-period tail above the watermark. This is the fast path for monthly totals: most of the month is already summed into compact rollup rows. With source = usage_events (or ?source=raw on the GET) the executor forces a full raw scan over the time range. Both feed the same aggregator, so SUM(quantity) is identical regardless of source.
One documented caveat: COUNT means different things per source. Over rollups each rollup row counts as one, not as the number of underlying events it aggregates. For an exact event count you must query the raw source. The README calls this out as a known difference rather than papering over it.
Provenance: explain and verify
The two endpoints that make the engine auditable are GET /v1/accounts/{id}/explain and GET /v1/accounts/{id}/verify (both in src/api/http_server.rs). They exist because when a dashboard total and an invoice total disagree, you need to find out why fast.
explain returns the breakdown for an account over a range, grouped by (product_id, meter_id, model_id, source, unit) so each row is a distinct invoice line, plus the corrections and retractions in the range as raw rows, plus the segment provenance: the rollup watermark, the IDs of every rollup and raw segment that overlaps the range (already bucket-filtered to match the executor's pruning), and for each rollup segment the input_segment_ids of the raw segments that fed it. That last piece means you can name exactly which raw segments produced a given rollup line, which is what lets an invoice snapshot reference a watermark plus a concrete source-segment set rather than a number with no paper trail.
verify computes the same SUM(quantity) two ways, once through the rollup path and once through a pure raw scan, and reports both totals plus drift = raw_total - rollup_total and a matches boolean. On a fully sealed period (to <= watermark_ms) the invariant is zero drift. A non-zero result points at a real problem: a rollup bug, a late event that landed below the watermark, or a missing rollup segment, any of which the operator-facing rebuild_rollups can repair by rewinding the watermark and refilling from raw events. There is also an inspect-segment admin command and a verify-period CLI subcommand for the same checks off-server.
The point of doing less
None of this engine would impress a SQL benchmark. It does not join, it does not sort, it does not do arbitrary aggregates. That narrowness is the feature. Because the scope is small and the schema is fixed, usageDb can prune aggressively from cheap per-segment metadata, refuse ambiguous queries that could silently produce a wrong bill, and explain exactly where each number came from. For billing, an engine you can fully reason about beats a general one you cannot.
usageDb internals: the full series
- Why a purpose-built usage database
- The ingest path and durability contract
- Idempotency and deduplication
- The columnar segment format
- The manifest and crash recovery
- Hourly rollups and the watermark
- The query engine
- Compaction
- Period lifecycle and frozen snapshots
- Property tests and simulation testing
usageDb is the open-source Rust storage engine behind UsageBox. Browse the code at github.com/pbudzik/usagedb; the query engine described here lives under src/query/.