The short version: Every event in usageDb carries a stable event_id. The ingest path classifies each one into four buckets: accepted, duplicates, conflicts, and rejected. Duplicates are silently swallowed so a retrying collector never double-charges a customer. Conflicts (same id, different payload) are reported back rather than hidden, because they are the symptom of a buggy or non-deterministic upstream. The hard part is making this survive a process restart, which usageDb does by re-registering recently committed events from raw segments on recovery.
This is Part 3 of the usageDb internals series. Part 2 covered the ingest path and durability contract: validate, append to the WAL, fsync, then commit to memory. This part covers the correctness property on top of that contract. In a usage database for AI billing, the worst failure mode is counting a billable event twice, so usageDb treats idempotency as a first-class part of the ingest API and goes out of its way to make double-counting impossible even across crashes.
usageDb is the open-source Rust storage engine behind UsageBox. The source is public at github.com/pbudzik/usagedb, and this article links to the exact files involved so you can read along.
Why at-least-once pipelines force the issue
Usage events do not arrive in a clean, exactly-once stream. They come off message queues, retried HTTP calls, batch exporters, and edge collectors that lose their network mid-flush and replay from a checkpoint. Every one of those layers is built for at-least-once delivery: it would rather send an event twice than drop it, because a dropped event is a billing gap that is hard to detect later.
The cost is that duplicates land at the database constantly. They are not anomalies, they are normal traffic. If the meter naively summed everything it received, a single retry storm could inflate a customer's invoice, and you would only find out in a dispute. So the database, as the last hop before the numbers become money, has to enforce exactly-once counting. The contract usageDb offers upstream is simple: send the same event as many times as you like, keep its event_id stable, and it will be billed exactly once.
The four buckets
The ingest endpoint accepts a batch and returns counts, not a pass/fail. The response struct in src/api/http.rs is just four numbers:
pub struct IngestBatchResponse {
pub accepted: usize,
pub duplicates: usize,
pub conflicts: usize,
pub rejected: usize,
}
Each bucket exists for a different operational reason:
| Bucket | Meaning | Why it matters |
|---|---|---|
| accepted | New event_id never seen inside the TTL window. | The only events that get counted and persisted. |
| duplicates | Same event_id and identical payload. | A safe retry. Swallowed silently so collectors can retry freely. |
| conflicts | Same event_id, but a different payload. | A collector bug or non-deterministic id. Surfaced, not hidden. |
| rejected | Validation failure. | Malformed or out-of-policy events that never reach storage. |
Rejection happens first, before any dedupe work. The validate_event check in the HTTP handler enforces non-empty event_id, account_id, product_id and meter_id, a positive timestamp, a dimension cap, and a correction_ref on any Correction or Retraction event. There is one more rejection path that is policy rather than shape: a Usage event that lands in a period already closed for that account is rejected, because the invoice for that period is frozen. Correction and Retraction events are still accepted in a closed period and tracked as pending adjustments, which Part 9 covers. Rejected events never touch the WAL or the dedupe cache.
Why conflicts are a feature, not an error
The conflict bucket earns its keep for billing integrity. A duplicate is a retry of the same event, so dropping it is obviously correct. A conflict is two events that claim to be the same event (same event_id) but disagree on what happened (different payload). That should be impossible if the upstream is well-behaved, which is exactly why usageDb reports it instead of silently picking a winner.
A conflict means one of two things, both bugs you want to know about. Either a collector is generating event_ids that are not stable across retries (for example, hashing in a wall-clock timestamp or a request counter that changes on replay), or two genuinely different events have been assigned the same id by a buggy allocator. Silently deduping would make the second value vanish from the bill with no trace; silently overwriting would make the audit trail lie. By counting conflicts and returning them in the response, usageDb turns a silent revenue leak into a visible, alertable signal. A non-zero conflict count is a page-the-on-call event, not a routine retry.
How dedupe classifies an event
The in-memory dedupe lives in src/ingest/dedupe.rs. It is a HotDedupe struct holding a HashMap keyed by a 128-bit hash of the event_id, plus a FIFO insertion queue used for capacity eviction and TTL expiry. Each entry stores the event's payload hash and the time it was first seen. Classification is a pure read, no mutation:
pub fn classify(&self, event_id_hash: EventHash, payload_hash: EventHash)
-> DedupeResult
{
if let Some(existing) = self.cache.get(&event_id_hash) {
if existing.payload_hash == payload_hash {
DedupeResult::ExactDuplicate
} else {
DedupeResult::PayloadConflict
}
} else {
DedupeResult::NewEvent
}
}
The two-hash design is the key. The lookup key is the event_id hash, so retries of the same logical event collide in the map. The stored value is the payload hash, so once two events share an id, comparing payload hashes is what distinguishes a harmless duplicate from a dangerous conflict.
Splitting classify from commit is deliberate and ties back to the durability contract. The hot path classifies an event against the cache without mutating it, then appends to the WAL and fsyncs, and only then calls commit to insert the dedupe entry and the memtable row. If the WAL append fails, no dedupe state was touched, so a client retry is correctly seen as a new event rather than a false duplicate. The cache also enforces a maximum capacity with FIFO eviction and a 7-day TTL, both keyed off that first-seen timestamp.
Computing the hashes
Identity hashing uses blake3 truncated to 128 bits (u128). The code notes that this is stable across Rust versions and that at billing scale, roughly 10^9 events, the birthday collision probability is about 2^-67, which is far below anything that would matter to a bill. The id types themselves are simple newtype wrappers around String in src/model/ids.rs, and blake3 also backs the account-to-bucket mapping there for the same stability reason.
The two hashes are computed in compute_event_hashes in src/runtime/recovery.rs. The event id hash is blake3 over the raw event_id bytes. The payload hash is blake3 over a bincode serialization of the whole UsageEvent, with one important adjustment:
let event_id_hash = blake3_u128(event.event_id.0.as_bytes());
let mut ev_clone = event.clone();
ev_clone.ingested_at_ms = 0; // zero the server stamp first
let payload_hash = match bincode::serialize(&ev_clone) {
Ok(bytes) => blake3_u128(&bytes),
Err(_) => 0, // degrade to conflict-prone, never silent dedupe
};
Zeroing ingested_at_ms before hashing is what makes a retry hash identical to the original. That field is stamped server-side at ingest, so the same logical event arriving a second time will carry a different ingest timestamp. Excluding it from the payload hash means a genuine retry is recognized as an exact duplicate rather than misclassified as a conflict. Everything else, including the account, product, meter, quantity, unit, source, and the dimensions map, is part of the payload identity. The dimensions are a BTreeMap in src/model/event.rs, so their key order is canonical and a re-serialized retry produces byte-identical output. The bincode-failure branch degrades to a zero payload hash, which makes the event conflict-prone rather than letting it sneak through as a silent duplicate, the safer direction to fail.
The hard part: correctness across a restart
An in-memory dedupe cache is fast, but it is rebuilt from scratch on every process start, while the WAL files that fed it get sealed and deleted once their events are flushed into durable columnar segments. Consider the sequence: an event is accepted, fsynced to the WAL, committed to the memtable, then flushed to a segment, after which its WAL file is deleted. The process restarts. The collector, which never got a clean ack because the network dropped, retries that same event. A naive in-memory dedupe would have no record of it, classify it as NewEvent, and bill it twice. That is the silent double-count a billing engine cannot afford.
usageDb closes this gap during recovery. The recovery routine in src/runtime/recovery.rs does three things, in order:
- It loads the manifest and cleans up WAL files at or below
last_sealed_wal_id, whose contents are already durable in segments. - It replays the unsealed WAL files (those past
last_sealed_wal_id) into both the dedupe cache and the memtable, so events that were accepted but not yet flushed are visible again and will be re-flushed. - It scans every raw segment whose maximum timestamp falls inside the 7-day dedupe TTL window and re-registers each event into the dedupe cache.
Step 3 defeats the cross-restart double-count. By walking back over recently committed segments and calling insert_known for each event, usageDb rebuilds a cache that knows about everything it could plausibly be asked to dedupe, namely anything recent enough that the upstream might still be retrying it. Segments older than the TTL are skipped, because the contract says the pipeline should not be retrying events that old. The billing-safety test makes the property explicit: commit a segment, drop the process state, run recovery, then retry the same event and assert it returns ExactDuplicate rather than NewEvent. A companion test asserts that an event 30 days old, beyond the TTL, is correctly treated as new.
Making recovery scan cheap
Rescanning raw segments on every restart would be slow if it meant decompressing and bincode-decoding the full event column. usageDb avoids that with a sidecar index, implemented in src/storage/dedupe_index.rs. Each raw segment raw_<uuid>.seg gets a companion raw_<uuid>.idx file holding the (event_id_hash, payload_hash, ingested_at_ms) triples in segment row order, behind a magic header and a blake3 checksum. Recovery reads the sidecar to rebuild dedupe entries directly, a 10 to 100x speedup over a full segment scan on payload-heavy segments.
The sidecar is strictly an optimization. If it is missing (an older segment from a build that did not write it) or corrupt (checksum mismatch), read_dedupe_index signals the caller to fall back to opening the segment with the raw reader and recomputing the hashes. Recovery tracks how many segments took the fast path versus the fallback, so a sudden spike in fallbacks is a visible signal that sidecars went missing. Either way the dedupe cache ends up complete; the only difference is how long recovery takes.
Putting it together
The classification logic in the ingest critical section, in src/api/http_server.rs, also dedupes within a single batch using a seen_in_batch map before consulting the persistent cache, so two copies of the same event in one request are handled the same way as two copies across requests. Combined with the cross-restart re-registration, the result is a single, consistent rule applied everywhere: an event is counted exactly once per stable event_id within the TTL window, no matter how many times the network, the queue, or a process restart causes it to be re-sent. That is what lets the rest of the engine, the columnar segments in Part 4 and the hourly rollups in Part 6, treat every stored row as a real, billable, once-and-only-once event.
The property tests in tests/properties.rs exercise dedupe idempotence under random retry orderings and payload-conflict detection across many generated cases per run, which Part 10 covers in detail.
usageDb internals: the full series
- Why a purpose-built usage database
- The ingest path and durability contract
- Idempotency and deduplication
- The columnar segment format
- The manifest and crash recovery
- Hourly rollups and the watermark
- The query engine
- Compaction
- Period lifecycle and frozen snapshots
- Property tests and simulation testing
Previous: Part 2: The ingest path and durability contract
Next: Part 4: The columnar segment format
usageDb is the open-source Rust storage engine behind UsageBox. The code in this article is real and links to the public repository at github.com/pbudzik/usagedb. It is an MVP scaffold: the ingest and dedupe paths described here are implemented end-to-end, while some other spec items remain stubbed.