PaddySpeaks · Systems at the Whiteboard · Nº 11

The Metering Problem

Design the billing backend for a usage-metered SaaS — flat seats plus hourly metering, mid-cycle plan changes, overages, free trials, multi-currency. One prompt, two revenue streams of opposite shape, and a single decision that keeps the invoice honest: a complete working-through of data flow, schema, the append-only ledger, streaming Python, the MRR roll-up, and the dashboard that proves the books balance.

§ 01 — THE QUESTIONOne invoice, two clocks

Every data engineer who touches a revenue system eventually meets this question. It sounds like a CRUD app for invoices. It is two billing systems running on different clocks, joined only at the moment money is owed.

Interview Prompt

"Design the data model for a SaaS with mixed pricing — flat-rate seats plus usage-based metering, per-query, per-GB, per-API-call. Handle mid-cycle plan changes, prorations, overages, free trials, and multi-currency invoicing. How would you keep the bill correct?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The trap is to reach for one fact table and a column called amount. The two revenue streams the prompt names have opposite shapes and opposite failure modes. Subscription is predictable, low-volume, and changes a handful of times per customer per year — a Pro seat is a flat number until someone clicks upgrade. Usage is volatile, arrives by the billion, and is worthless to restate after the fact — a per-query meter ticks tens of thousands of times a second across the fleet. A model that stores them at the same grain either crushes the subscription logic under usage volume or smears usage into a coarse monthly figure that finance can never audit. Keep them apart at the fact layer; join them only at invoice time.

A weak answer treats the invoice as a mutable running total it edits as charges land. A strong answer notices that the invoice is the wrong place to hold truth at all — truth lives in an append-only log of charges, and the invoice is a SUM over a window. So before any boxes and arrows, the working frame for the whole session:

THE SLOW STREAM
Subscription state. Plan, status, MRR — one row per subscription per day. A handful of state transitions a year, snapshotted daily so finance can ask "what was true on the 14th" without replaying a ledger. Bounded, cheap, queryable.
THE FAST STREAM
Usage meter. Raw events rolled up to one row per subscription × meter × hour. Billions of raw ticks per day collapse to the hour at ingest. High volume, append-only, the integration point three teams read from.
THE LEDGER
Billing events. One immutable row per charge, credit, proration, refund, or dunning step, keyed by an idempotent source_event_id. The invoice is not stored authority — it is a derived roll-up of these rows over a billing period.
The invoice is a SUM, never a recompute. Money that has happened is appended, never edited — because an edited charge is an argument you will lose with an auditor.

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. State what you build, what you ignore, and the numbers that shape every later choice. Out of scope here, said explicitly: the payment gateway itself (treated as a callable service that returns success/failure with its own idempotency key), tax engines and tax jurisdiction logic (a separate line-item service), revenue recognition under ASC 606 (downstream of this ledger), and fraud. In scope: how a charge becomes a row, how a row becomes an invoice, how a mid-cycle change stays correct to the penny, and how MRR is read in one query.

Then the envelope math, volunteered rather than extracted. Snowflake-shaped numbers — a usage warehouse where compute is metered by the credit:

QuantityEstimateConsequence
Active subscriptions2,000,000Sets the daily snapshot size: 2 M rows/day, trivial
Raw meter events/day≈ 5 BPer-query, per-call — cannot live at raw grain forever
Hourly meter rows/day≈ 2 M subs × 4 meters × 24 ≈ 190 MThe grain that shapes the whole architecture
Billing events/day~3 MOne charge stream; the audit spine, append-only
Invoices/month~2 MDerived rollup; never the source of a number
MRR querySUM over 2 M rows, 1 daySub-second on a partitioned snapshot — a single SELECT
FX rates~30 pairs × hourlyLocked at usage time; never restated when rates drift

Notice the asymmetry. The raw meter firehose is three orders of magnitude above everything else, and the subscription state is so small it would be silly to model it as a slowly-changing dimension. That spread is the whole design: roll the firehose to the hour at the edge, snapshot the slow stream daily, and let an immutable ledger be the one place a dollar figure is ever authoritative. The rest of this article follows the money.


§ 02 — DATA FLOWFollowing a dollar through the building

One ledger, two tributaries. The spine of the design is an append-only billing-event log fed by a metering pipeline on one side and a subscription state machine on the other — and read, at month-end, by a roll-up that mints invoices without ever holding a balance of its own.

FAST STREAM · ~5B RAW EVENTS/DAY → HOURLY SLOW STREAM · ~2M SUBSCRIPTIONS · DAILY SNAPSHOT PRODUCT EDGE per-query / per-call emit METER AGGREGATOR roll to hour · lock FX apply free allowance fct_usage_meter grain: sub × meter × hour billable = max(0, raw − free) USAGE ROLL-UP JOB period sum → usage_charge writes one ledger event PRODUCT ANALYTICS same source · daily rollups UPSELL SIGNAL (>85%) usage × plan cap · CS team SUBSCRIPTION API upgrade / cancel / pause SUBSCRIPTION SVC state machine · proration emits 2 events on change fct_subscription_state daily snapshot · MRR fct_billing_events append-only · idempotent INVOICE ROLL-UP SUM events in period · holds $0 PAYMENT GATEWAY charge · webhook → ledger usage_charge → ledger paid webhook SOLID — sustained data flow · DASHED — period-close & settlement writes · Two streams meet only in the ledger.
FIG. 1 — End-to-end flow. The ledger is the spine; the meter and the snapshot are tributaries; the invoice is a query, not a store.

Three properties of this picture do most of the interview's work. First, the firehose never reaches the ledger at raw grain — five billion daily ticks collapse to the hour at the aggregator, and only one usage_charge per subscription per period ever becomes a billing event. Second, authority is deliberately one-directional: every dollar that will appear on an invoice is first an append-only row in fct_billing_events, written exactly once, and the invoice roll-up holds no balance it could disagree with. Third, the two streams are joined only at the rightmost box — usage and subscription stay separate facts, at separate grains, until the moment finance asks "what does this customer owe," and the answer is a SUM with a GROUP BY.

The Failure Philosophy, In One Rule

The meter may lose a tick; the ledger may never lose a charge. A dropped raw usage event costs a fraction of a cent and self-heals at the next reconciliation pass against the warehouse's own audit logs — so the metering path optimizes for throughput and tolerates at-least-once duplication, made safe by idempotent upserts on the hour bucket. A lost billing event — a proration, a refund — is never acceptable: it lands through a durable, exactly-once-keyed write, and if the producer retries, UNIQUE(source_event_id) swallows the duplicate silently. Approximate the usage; never approximate the bill.


§ 03 — THE GRAINTwo facts, a snapshot, and a ledger

The schema falls out of the grain question. Volatile, high-volume truth: the hourly meter. Bounded, slow truth: the daily subscription snapshot. Immutable money: the billing-event ledger. The invoice is a view, derived and re-derivable.

The high-volume fact — the hourly meter

The meter is the integration point, so its grain is chosen to serve three masters at once: fine enough for product's adoption analytics, coarse enough that the table does not explode, and aligned to the unit finance bills on. One row per (subscription, meter, hour). The decisive columns are billable_quantity — already net of the free allowance, so no consumer re-derives the allowance and they all agree — and the FX trio, locked at the hour the usage occurred and never restated.

DDL · FAST FACT — HOURLY USAGE METER
-- One row per subscription × meter × hour. Raw per-query/per-call events -- roll up to the hour AT INGEST; this table never sees raw grain. -- Partitioned by hour_bucket_ts; clustered on subscription_id. CREATE TABLE fct_usage_meter ( account_id BIGINT NOT NULL, subscription_id BIGINT NOT NULL, meter_id TEXT NOT NULL, -- compute_credits | storage_gb | api_calls hour_bucket_ts TIMESTAMPTZ NOT NULL, -- truncated to the hour raw_quantity NUMERIC(18,4) NOT NULL, free_allowance NUMERIC(18,4) NOT NULL DEFAULT 0, billable_quantity NUMERIC(18,4) NOT NULL, -- = max(0, raw − free), materialized unit_price_local NUMERIC(12,6) NOT NULL, -- the rate, locked at this hour currency_code CHAR(3) NOT NULL, amount_local NUMERIC(18,6) NOT NULL, -- billable × unit_price_local fx_rate_to_usd NUMERIC(18,8) NOT NULL, -- locked at hour, never restated amount_usd NUMERIC(18,6) NOT NULL, PRIMARY KEY (subscription_id, meter_id, hour_bucket_ts) -- idempotent upsert key ); CREATE INDEX idx_meter_sub_time ON fct_usage_meter (subscription_id, hour_bucket_ts DESC);

The bounded fact — the daily subscription snapshot

A subscription changes a handful of times a year, so the temptation is to model it as a slowly-changing dimension and reconstruct any day's state with an as-of join. Resist it. There is enough daily flux — trials converting, accounts pausing, MRR ticking on plan change — that a daily snapshot is both simpler and faster to query than an SCD2 walk. The snapshot is bounded at two million rows a day and answers the question finance asks most: what was true on this day.

DDL · SLOW FACT — DAILY SUBSCRIPTION STATE
CREATE TABLE fct_subscription_state ( subscription_id BIGINT NOT NULL, day DATE NOT NULL, account_id BIGINT NOT NULL, plan_id TEXT NOT NULL, -- SCD2 surrogate, effective on this day status TEXT NOT NULL CHECK (status IN ('trialing','active','paused', 'canceled','past_due')), mrr_usd NUMERIC(14,2) NOT NULL DEFAULT 0, -- 0 while trialing — trials bill nothing currency_code CHAR(3) NOT NULL, days_into_cycle SMALLINT NOT NULL, days_remaining SMALLINT NOT NULL, -- the proration denominator's numerator PRIMARY KEY (subscription_id, day) ); CREATE INDEX idx_substate_day ON fct_subscription_state (day, status); -- MRR is one scan

The ledger — append-only money

This is the table everything else exists to feed. Every movement of money is a row, never an update: a subscription_charge at the cycle boundary, a usage_charge at period close, a proration credit and a proration charge on a plan change, a refund, a credit, a dunning step. The source_event_id is the idempotency key — the producer's stable identifier for the thing that happened — and UNIQUE on it is what makes the whole pipeline replay-safe. A retried plan-change webhook cannot double-charge, because the second insert is a no-op.

DDL · THE LEDGER — APPEND-ONLY BILLING EVENTS
CREATE TABLE fct_billing_events ( event_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, account_id BIGINT NOT NULL, subscription_id BIGINT NOT NULL, event_type TEXT NOT NULL CHECK (event_type IN ('subscription_charge','usage_charge', 'proration','refund','credit','dunning')), plan_id TEXT, -- SCD2 plan effective at event_ts event_ts TIMESTAMPTZ NOT NULL DEFAULT now(), period_start DATE NOT NULL, -- the billing window this row belongs to period_end DATE NOT NULL, amount_local NUMERIC(14,2) NOT NULL, -- signed: credits and refunds are negative currency_code CHAR(3) NOT NULL, amount_usd NUMERIC(14,2) NOT NULL, status TEXT NOT NULL DEFAULT 'pending', source_event_id TEXT NOT NULL, UNIQUE (source_event_id) -- idempotent + audit-defensible ); CREATE INDEX idx_be_invoice ON fct_billing_events (subscription_id, period_start);

And the invoice — deliberately last, deliberately derived. It is a materialization of a SUM over the ledger for one subscription and one period, refreshed at close and re-derivable from the events at any time. It stores split totals (subscription_total, usage_total, overage, proration, credits) for presentation, but it is never the place a figure originates. If the invoice and the ledger ever disagree, the ledger is right and the invoice is rebuilt.

DDL · DERIVED — INVOICE ROLL-UP
CREATE TABLE fct_invoices ( invoice_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, subscription_id BIGINT NOT NULL, period_start DATE NOT NULL, period_end DATE NOT NULL, subscription_total NUMERIC(14,2) NOT NULL, usage_total NUMERIC(14,2) NOT NULL, overage_total NUMERIC(14,2) NOT NULL, proration_net NUMERIC(14,2) NOT NULL, credits_total NUMERIC(14,2) NOT NULL, total_due_usd NUMERIC(14,2) NOT NULL, -- = SUM(all the above) = SUM(ledger) currency_code CHAR(3) NOT NULL, status TEXT NOT NULL DEFAULT 'open', UNIQUE (subscription_id, period_start) );

§ 04 — THE INVARIANTProration is two events, not one

The whole correctness of this system lives in one rule about mutation: a plan change does not edit a charge, it appends two. Everything else — invoices, MRR, NRR — is arithmetic over an immutable log, and arithmetic over an immutable log is auditable by construction.

Consider the canonical hard case. An account on Pro at $100/month upgrades to Enterprise at $300/month on day 15 of a 30-day cycle. The naive instinct is to find the existing $100 charge and change it. That instinct is the bug: it destroys the history a dispute needs, and it makes the invoice a recompute that can drift from what was actually billed. The correct move is to leave the original $100 charge untouched and append two proration rows — a credit for the unused remainder of the old plan, and a charge for the remainder of the new plan.

TRIALING ACTIVE PAST_DUE PAUSED CANCELED

Each transition of the subscription lifecycle is itself a fact: a trial converting to active emits a subscription_charge and bumps mrr_usd in tomorrow's snapshot; a plan change inside ACTIVE emits the two proration events; a PAST_DUE step emits a dunning row; a cancel freezes MRR to zero from the effective day forward. The state machine never overwrites — it appends, and the daily snapshot reads the latest state forward. Trials are the quiet edge case the prompt plants: they look exactly like active subscriptions in every column except that mrr_usd = 0, so they appear in adoption analytics and trial-conversion cohorts without ever inflating revenue.

SQL · THE ATOMIC HEART — A PLAN CHANGE APPENDS TWO ROWS
-- Pro ($100) → Enterprise ($300), day 15 of 30. days_remaining = 15. -- credit_old = old_fee × (days_remaining / days_in_cycle) = 100 × 15/30 = −50 -- charge_new = new_fee × (days_remaining / days_in_cycle) = 300 × 15/30 = +150 -- Net delta = +100. Two rows, both idempotent, the invoice picks both up. INSERT INTO fct_billing_events (account_id, subscription_id, event_type, plan_id, event_ts, period_start, period_end, amount_local, currency_code, amount_usd, status, source_event_id) VALUES (:acct, :sub, 'proration', 'PLAN_PRO', :ts, :pstart, :pend, -50.00, 'USD', -50.00, 'applied', :sub||':chg42:credit'), (:acct, :sub, 'proration', 'PLAN_ENTERPRISE', :ts, :pstart, :pend, 150.00, 'USD', 150.00, 'paid', :sub||':chg42:charge') ON CONFLICT (source_event_id) DO NOTHING; -- replay-safe: retries are no-ops

The deterministic identifiers :sub||':chg42:credit' and :chg42:charge are doing the heavy lifting. They are derived from the change request, not generated fresh, so the same plan-change event arriving twice — a webhook retry, a replayed Kafka offset, a backfill — produces the same two keys and the second attempt collides harmlessly. This is the difference between a billing system that is merely correct on the happy path and one that is correct under the duplicate-delivery realities of every distributed queue.

Never edit money. Append it with a stable key, and let the invoice be a SUM. An edited charge is a number you cannot defend; an appended one is a number you can.METERING RULE Nº 1

§ 05 — INGESTION & STREAMSPython on the meter

Three programs carry the data. The aggregator that tames the firehose into the hour, the proration calculator that turns a plan change into exactly two events, and the gateway-webhook handler that records settlement idempotently. Each is small; the judgment is in what they refuse to do.

1 · The aggregator — roll to the hour, lock the rate

The reasoning is physical: at five billion raw events a day, storing raw grain is paying for resolution no one will ever query. The product team wants the hour; finance bills on the hour. So the aggregator folds raw ticks into hour buckets, computes billable_quantity against the plan's free allowance once, and — critically — locks the FX rate to the hour the usage occurred. A rate that moves tomorrow must never change a charge that already happened. The write is an idempotent upsert keyed on (subscription, meter, hour), so an at-least-once stream that replays a window simply re-converges to the same totals.

PYTHON · METER AGGREGATOR — RAW → HOURLY, FX LOCKED
import time from collections import defaultdict from decimal import Decimal class HourlyMeterAggregator: """Folds raw per-call events into (sub, meter, hour) buckets. The write is an UPSERT, not an INSERT: replaying a Kafka window re-converges to the same total instead of double-counting. Usage is allowed to be at-least-once because the meter may lose a tick; the bucket key makes duplicates harmless.""" def __init__(self, plans, fx): self.plans, self.fx = plans, fx # caches: allowances + hourly rates self.buckets = defaultdict(lambda: Decimal(0)) def offer(self, ev) -> None: hour = ev.ts - (ev.ts % 3600) # truncate to the hour key = (ev.subscription_id, ev.meter_id, hour) self.buckets[key] += Decimal(ev.quantity) # accumulate raw def flush(self, sink) -> None: for (sub, meter, hour), raw in self.buckets.items(): plan = self.plans.effective(sub, hour) free = plan.free_allowance(meter) billable = max(Decimal(0), raw - free) # the allowance, applied ONCE rate = plan.unit_price(meter) # local-currency rate fx = self.fx.rate(plan.currency, hour) # LOCKED at this hour amt_local = billable * rate sink.upsert_usage_meter( # PK (sub, meter, hour): idempotent sub, meter, hour, raw_quantity=raw, free_allowance=free, billable_quantity=billable, unit_price_local=rate, currency_code=plan.currency, amount_local=amt_local, fx_rate_to_usd=fx, amount_usd=(amt_local * fx).quantize(Decimal("0.000001"))) self.buckets.clear()

One carve-out, always stated: the aggregator never emits a billing event. It fills the meter and stops. Turning a month of meter rows into a single usage_charge is the roll-up job's responsibility at period close — separating "record the usage" from "bill the usage" is what lets product read the meter all month without any of those reads being mistaken for charges.

2 · The proration calculator — exactly two events

PYTHON · PRORATION — A PLAN CHANGE BECOMES TWO LEDGER ROWS
from decimal import Decimal, ROUND_HALF_UP CENTS = Decimal("0.01") def prorate_plan_change(change) -> list[dict]: """A mid-cycle plan change is NEVER an edit to the original charge. It is two new rows: a credit for the unused old plan, and a charge for the remainder of the new plan. source_event_id is derived from the change so retries collapse to no-ops.""" ratio = Decimal(change.days_remaining) / Decimal(change.days_in_cycle) credit_old = (-change.old_fee * ratio).quantize(CENTS, ROUND_HALF_UP) charge_new = ( change.new_fee * ratio).quantize(CENTS, ROUND_HALF_UP) base = f"{change.subscription_id}:{change.change_id}" # STABLE, not random return [ {"event_type": "proration", "plan_id": change.old_plan, "amount_usd": credit_old, "status": "applied", "source_event_id": base + ":credit"}, {"event_type": "proration", "plan_id": change.new_plan, "amount_usd": charge_new, "status": "paid", "source_event_id": base + ":charge"}, ] # Note what is absent: no UPDATE, no DELETE, no read of the old charge. # The downgrade case falls out for free — new_fee < old_fee makes the # net delta negative, a credit the customer carries to next period.

3 · The settlement handler — record paid, idempotently

PYTHON · GATEWAY WEBHOOK → LEDGER STATUS
async def on_payment_webhook(db, hook) -> str: """The gateway is at-least-once too — it will redeliver the same 'charge.succeeded' webhook on timeout. We never trust the network's delivery count; we trust the gateway's own idempotency key.""" if await db.seen_webhook(hook.id): # dedupe table on the gateway's id return "duplicate_ignored" async with db.transaction(): await db.mark_webhook(hook.id) if hook.outcome == "succeeded": await db.update_invoice_status(hook.invoice_id, "paid", paid_amount=hook.amount, paid_ts=hook.ts) elif hook.outcome == "failed": # Do not edit the charge. Append a dunning event; the # subscription state machine moves the sub to past_due. await db.append_billing_event(event_type="dunning", subscription_id=hook.subscription_id, amount_usd=Decimal(0), source_event_id=f"dun:{hook.id}") return "ok"

§ 06 — THE CLOSEPeriod-end: the invoice as a query

Month-end is where the two streams finally meet. The close job does three things in order — turn the meter into one usage charge, sum the ledger into an invoice, and hand a balance to the gateway — and it holds no money of its own at any step.

The first act collapses a subscription's entire month of meter rows into a single usage_charge billing event. This is the one place usage crosses into the ledger, and it crosses as a SUM: all the hourly amount_usd for the period, plus the overage split that customer success watches, condensed to one immutable row with a stable key derived from (subscription, period). Because the key is stable, re-running the close — after a late-arriving meter correction, say — does not double-bill; it either inserts the missing charge or collides on the one already there.

SQL · CLOSE STEP 1 — METER → ONE usage_charge PER PERIOD
-- Collapse a month of hourly meter rows into a single ledger event. -- Idempotent: source_event_id is derived, ON CONFLICT DO NOTHING guards. INSERT INTO fct_billing_events (account_id, subscription_id, event_type, event_ts, period_start, period_end, amount_local, currency_code, amount_usd, status, source_event_id) SELECT m.account_id, m.subscription_id, 'usage_charge', now(), :period_start, :period_end, sum(m.amount_local), max(m.currency_code), sum(m.amount_usd), 'pending', 'usage:' || m.subscription_id || ':' || :period_start -- stable key FROM fct_usage_meter m WHERE m.subscription_id = :sub AND m.hour_bucket_ts >= :period_start AND m.hour_bucket_ts < :period_end GROUP BY m.account_id, m.subscription_id ON CONFLICT (source_event_id) DO NOTHING;

The second act is the invoice itself — and it is, satisfyingly, just a pivoted SUM over the ledger. Every event type that landed in the period folds into its column; the grand total is the sum of all of them, which is identically the sum of the raw ledger rows. There is no balance being maintained, no running total to reconcile. The invoice is a photograph of the ledger taken at close, and if the photograph is ever doubted, you re-take it from the same rows and get the same picture.

SQL · CLOSE STEP 2 — LEDGER → INVOICE (CONDITIONAL AGGREGATION)
INSERT INTO fct_invoices (subscription_id, period_start, period_end, subscription_total, usage_total, overage_total, proration_net, credits_total, total_due_usd, currency_code, status) SELECT subscription_id, :period_start, :period_end, sum(amount_usd) FILTER (WHERE event_type = 'subscription_charge') AS subscription_total, sum(amount_usd) FILTER (WHERE event_type = 'usage_charge') AS usage_total, coalesce(sum(amount_usd) FILTER (WHERE event_type = 'usage_charge' AND amount_usd > 0), 0) AS overage_total, sum(amount_usd) FILTER (WHERE event_type = 'proration') AS proration_net, sum(amount_usd) FILTER (WHERE event_type IN ('credit','refund')) AS credits_total, sum(amount_usd) AS total_due_usd, max(currency_code), 'open' FROM fct_billing_events WHERE subscription_id = :sub AND period_start = :period_start GROUP BY subscription_id ON CONFLICT (subscription_id, period_start) DO UPDATE SET total_due_usd = EXCLUDED.total_due_usd, -- re-derive, never drift usage_total = EXCLUDED.usage_total, proration_net = EXCLUDED.proration_net;
Eventual consistency everywhere in the pipeline; determinism at the moment of the invoice. The meter may lag, the FX feed may stutter — but the close is a pure function of the ledger, and a pure function is reproducible.METERING RULE Nº 2 — THE CLOSE IS A FUNCTION

§ 07 — ANALYTICS SQLInterrogating the revenue

The snapshot and the ledger are where the business explains itself. Three queries an interviewer loves, because each one carries a classic SQL pattern on its back — the MRR scan, the upsell threshold, and the cohort retention join.

MRR & ARR — the one-scan snapshot read

The reason the daily snapshot earns its place: monthly recurring revenue is a single filtered SUM over today's partition. No ledger replay, no SCD2 walk, no window functions — active subscriptions, today, summed. Trials contribute nothing because their mrr_usd is zero by construction, so the number is clean without a special case.

SQL · MRR / ARR — FILTERED AGGREGATION OVER ONE DAY
SELECT count(*) AS active_subs, round(sum(mrr_usd), 2) AS mrr_usd, round(sum(mrr_usd) * 12, 2) AS arr_usd FROM fct_subscription_state WHERE day = CURRENT_DATE AND status = 'active'; -- trials are mrr_usd = 0, excluded for free

The upsell signal — threshold over a rolling window

Customer success wants accounts about to blow past their plan tier — the expansion trigger. This is a rolling-window aggregation of the meter joined to the plan cap, filtered at the 85% line. The shape is the classic "current usage versus an allowance" pattern: aggregate the fast fact over a trailing window, join the slow fact for the limit, and surface the ratio.

SQL · ACCOUNTS APPROACHING THEIR CAP — UPSELL CANDIDATES
WITH usage_30d AS ( SELECT subscription_id, meter_id, sum(billable_quantity) AS used FROM fct_usage_meter WHERE hour_bucket_ts >= CURRENT_DATE - INTERVAL '30 days' GROUP BY subscription_id, meter_id ), plan_caps AS ( SELECT s.subscription_id, p.meter_id, p.included_quantity FROM fct_subscription_state s JOIN dim_plan_meter_limits p USING (plan_id) WHERE s.day = CURRENT_DATE AND s.status = 'active' ) SELECT u.subscription_id, u.meter_id, u.used, c.included_quantity, round(100.0 * u.used / nullif(c.included_quantity, 0), 1) AS pct_of_cap FROM usage_30d u JOIN plan_caps c USING (subscription_id, meter_id) WHERE u.used > c.included_quantity * 0.85 ORDER BY pct_of_cap DESC; -- >85% of cap = upsell trigger; >100% = overage already hitting next invoice.

Net revenue retention — cohort retention join

Sales lives on NRR: does a signup cohort's revenue grow as it ages, expansion outrunning churn? The pattern is cohort retention — anchor each subscription to its first-seen quarter and its starting MRR with a window function, join to its current MRR, and ratio the cohort sums. Above 100% is the SaaS holy grail: the cohort is worth more today than the day it arrived.

SQL · NRR BY SIGNUP COHORT — FIRST_VALUE + COHORT JOIN
WITH cohort_baseline AS ( SELECT subscription_id, date_trunc('quarter', min(day) OVER w) AS cohort_q, first_value(mrr_usd) OVER w AS starting_mrr FROM fct_subscription_state WINDOW w AS (PARTITION BY subscription_id ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ), current_state AS ( SELECT subscription_id, mrr_usd AS current_mrr FROM fct_subscription_state WHERE day = CURRENT_DATE ) SELECT cb.cohort_q, count(DISTINCT cb.subscription_id) AS cohort_size, round(sum(cb.starting_mrr), 0) AS starting_mrr, round(sum(coalesce(cs.current_mrr, 0)), 0) AS current_mrr, round(100.0 * sum(coalesce(cs.current_mrr, 0)) / nullif(sum(cb.starting_mrr), 0), 1) AS nrr_pct FROM cohort_baseline cb LEFT JOIN current_state cs USING (subscription_id) GROUP BY cb.cohort_q ORDER BY cb.cohort_q; -- NRR > 100% = expansion outpaces churn. The number boards live and die on.

§ 08 — THE DASHBOARDProving the books balance

A senior design ends with observability, because every guarantee above is invisible without it. The billing dashboard watches three things a healthy system keeps true: revenue is growing, money is flowing, and the ledger reconciles to the gateway to the cent.

REVENUE HEALTH
MRR and net-new MRR (expansion − churn), trial→paid conversion by cohort, NRR trend — the numbers that say the business is compounding, read straight off the daily snapshot.
BILLING PIPELINE
meter ingest lag (raw → hour), unbilled usage sitting in the meter before close, proration events/day, dunning queue depth — a spike in unbilled usage means the close job is behind, not that customers stopped using the product.
RECONCILIATION
ledger SUM vs gateway settled (must match to the cent), duplicate webhooks suppressed, FX restatement attempts (should be zero — a non-zero count is a bug), invoice re-derivation drift.
Billing Ops — Global MON 00:10 UTC · MONTH CLOSE +1 · 60s REFRESH
MRR
8.42M$
Net-New MRR
+182K$
Meter Ingest Lag
6.8min
Dunning Queue
412
Revenue by Stream — May close · subscription vs usage (USD, daily accrual)
— subscription (flat, predictable) — usage (volatile, billions of ticks)
Trial→Paid
28%
Unbilled Usage
1.9M$
Ledger vs Gateway
$0.00
Dup Webhooks Killed
3.1k/h
FX Restatements
0
NRR (TTM)
118%
FIG. 2 — The story a healthy close tells: MRR compounding, NRR above 100, ledger reconciling to the gateway at exactly $0.00 — and the two amber tiles, ingest lag and unbilled usage, climbing together because the close job is mid-run, not because anything is wrong.

Read the amber tiles together and the dashboard narrates a month-end close in progress: usage is being rolled to charges, so unbilled usage is draining while ingest lag breathes — and the reconciliation row stays at zero throughout, because the invoice is a SUM of an immutable ledger and there is nothing for it to disagree with. That is what a designed billing system looks like from the controller's chair: the numbers move, but they never stop tying out.


§ 09 — THE RUBRICWhat was actually being tested

Strip the billing details away and the question was testing five judgments, each of which generalizes far beyond SaaS:

GRAIN
Seeing two facts where the prompt says one — and letting each stream's volume and volatility, not a shared schema, pick its own resolution. Hourly for the firehose, daily snapshot for the slow state.
IMMUTABILITY
Refusing to edit money. A plan change appends two rows; a failed payment appends a dunning row; nothing is ever overwritten — so the invoice is a SUM and the ledger is an audit trail by construction.
IDEMPOTENCY
Deriving stable keys from the event, not the network. UNIQUE(source_event_id) turns every retry, replay, and backfill into a harmless no-op — the only honest stance toward at-least-once delivery.
SEPARATION
Recording usage all month without billing it; joining the two streams only at close. Three teams read the same meter, and none of their reads are mistaken for charges.
HONESTY
Engineering the promise: FX locked at usage time, the invoice a pure function of the ledger, eventual consistency in the pipeline and determinism at the statement. The usage may be approximate; the bill may not.
Usage is approximate; the invoice is exact. The art is keeping them in separate tables until the single instant they must agree — and making that agreement a SUM, not a recompute.— CLOSING ARGUMENT