The Metering Problem — PaddySpeaks

§ 01 — THE QUESTIONOne invoice, two clocks

Every data engineer who touches a revenue system eventually meets this question. It sounds like a CRUD app for invoices. It is two billing systems running on different clocks, joined only at the moment money is owed.

Interview Prompt

"Design the data model for a SaaS with mixed pricing — flat-rate seats plus usage-based metering, per-query, per-GB, per-API-call. Handle mid-cycle plan changes, prorations, overages, free trials, and multi-currency invoicing. How would you keep the bill correct?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The trap is to reach for one fact table and a column called amount. The two revenue streams the prompt names have opposite shapes and opposite failure modes. Subscription is predictable, low-volume, and changes a handful of times per customer per year — a Pro seat is a flat number until someone clicks upgrade. Usage is volatile, arrives by the billion, and is worthless to restate after the fact — a per-query meter ticks tens of thousands of times a second across the fleet. A model that stores them at the same grain either crushes the subscription logic under usage volume or smears usage into a coarse monthly figure that finance can never audit. Keep them apart at the fact layer; join them only at invoice time.

A weak answer treats the invoice as a mutable running total it edits as charges land. A strong answer notices that the invoice is the wrong place to hold truth at all — truth lives in an append-only log of charges, and the invoice is a SUM over a window. So before any boxes and arrows, the working frame for the whole session:

THE SLOW STREAM: Subscription state. Plan, status, MRR — one row per subscription per day. A handful of state transitions a year, snapshotted daily so finance can ask "what was true on the 14th" without replaying a ledger. Bounded, cheap, queryable.
THE FAST STREAM: Usage meter. Raw events rolled up to one row per subscription × meter × hour. Billions of raw ticks per day collapse to the hour at ingest. High volume, append-only, the integration point three teams read from.
THE LEDGER: Billing events. One immutable row per charge, credit, proration, refund, or dunning step, keyed by an idempotent source_event_id. The invoice is not stored authority — it is a derived roll-up of these rows over a billing period.

The invoice is a SUM, never a recompute. Money that has happened is appended, never edited — because an edited charge is an argument you will lose with an auditor.

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. State what you build, what you ignore, and the numbers that shape every later choice. Out of scope here, said explicitly: the payment gateway itself (treated as a callable service that returns success/failure with its own idempotency key), tax engines and tax jurisdiction logic (a separate line-item service), revenue recognition under ASC 606 (downstream of this ledger), and fraud. In scope: how a charge becomes a row, how a row becomes an invoice, how a mid-cycle change stays correct to the penny, and how MRR is read in one query.

Then the envelope math, volunteered rather than extracted. Snowflake-shaped numbers — a usage warehouse where compute is metered by the credit:

Quantity	Estimate	Consequence
Active subscriptions	2,000,000	Sets the daily snapshot size: 2 M rows/day, trivial
Raw meter events/day	≈ 5 B	Per-query, per-call — cannot live at raw grain forever
Hourly meter rows/day	≈ 2 M subs × 4 meters × 24 ≈ 190 M	The grain that shapes the whole architecture
Billing events/day	~3 M	One charge stream; the audit spine, append-only
Invoices/month	~2 M	Derived rollup; never the source of a number
MRR query	SUM over 2 M rows, 1 day	Sub-second on a partitioned snapshot — a single SELECT
FX rates	~30 pairs × hourly	Locked at usage time; never restated when rates drift

Notice the asymmetry. The raw meter firehose is three orders of magnitude above everything else, and the subscription state is so small it would be silly to model it as a slowly-changing dimension. That spread is the whole design: roll the firehose to the hour at the edge, snapshot the slow stream daily, and let an immutable ledger be the one place a dollar figure is ever authoritative. The rest of this article follows the money.

§ 02 — DATA FLOWFollowing a dollar through the building

One ledger, two tributaries. The spine of the design is an append-only billing-event log fed by a metering pipeline on one side and a subscription state machine on the other — and read, at month-end, by a roll-up that mints invoices without ever holding a balance of its own.

FIG. 1 — End-to-end flow. The ledger is the spine; the meter and the snapshot are tributaries; the invoice is a query, not a store.

Three properties of this picture do most of the interview's work. First, the firehose never reaches the ledger at raw grain — five billion daily ticks collapse to the hour at the aggregator, and only one usage_charge per subscription per period ever becomes a billing event. Second, authority is deliberately one-directional: every dollar that will appear on an invoice is first an append-only row in fct_billing_events, written exactly once, and the invoice roll-up holds no balance it could disagree with. Third, the two streams are joined only at the rightmost box — usage and subscription stay separate facts, at separate grains, until the moment finance asks "what does this customer owe," and the answer is a SUM with a GROUP BY.

The Failure Philosophy, In One Rule

The meter may lose a tick; the ledger may never lose a charge. A dropped raw usage event costs a fraction of a cent and self-heals at the next reconciliation pass against the warehouse's own audit logs — so the metering path optimizes for throughput and tolerates at-least-once duplication, made safe by idempotent upserts on the hour bucket. A lost billing event — a proration, a refund — is never acceptable: it lands through a durable, exactly-once-keyed write, and if the producer retries, UNIQUE(source_event_id) swallows the duplicate silently. Approximate the usage; never approximate the bill.

§ 03 — THE GRAINTwo facts, a snapshot, and a ledger

The schema falls out of the grain question. Volatile, high-volume truth: the hourly meter. Bounded, slow truth: the daily subscription snapshot. Immutable money: the billing-event ledger. The invoice is a view, derived and re-derivable.

The high-volume fact — the hourly meter

The meter is the integration point, so its grain is chosen to serve three masters at once: fine enough for product's adoption analytics, coarse enough that the table does not explode, and aligned to the unit finance bills on. One row per (subscription, meter, hour). The decisive columns are billable_quantity — already net of the free allowance, so no consumer re-derives the allowance and they all agree — and the FX trio, locked at the hour the usage occurred and never restated.

DDL · FAST FACT — HOURLY USAGE METER

-- One row per subscription × meter × hour. Raw per-query/per-call events
-- roll up to the hour AT INGEST; this table never sees raw grain.
-- Partitioned by hour_bucket_ts; clustered on subscription_id.
CREATE TABLE fct_usage_meter (
    account_id        BIGINT       NOT NULL,
    subscription_id   BIGINT       NOT NULL,
    meter_id          TEXT         NOT NULL,   -- compute_credits | storage_gb | api_calls
    hour_bucket_ts    TIMESTAMPTZ  NOT NULL,   -- truncated to the hour
    raw_quantity      NUMERIC(18,4) NOT NULL,
    free_allowance    NUMERIC(18,4) NOT NULL DEFAULT 0,
    billable_quantity NUMERIC(18,4) NOT NULL,  -- = max(0, raw − free), materialized
    unit_price_local  NUMERIC(12,6) NOT NULL,  -- the rate, locked at this hour
    currency_code     CHAR(3)      NOT NULL,
    amount_local      NUMERIC(18,6) NOT NULL,  -- billable × unit_price_local
    fx_rate_to_usd    NUMERIC(18,8) NOT NULL,  -- locked at hour, never restated
    amount_usd        NUMERIC(18,6) NOT NULL,
    PRIMARY KEY (subscription_id, meter_id, hour_bucket_ts)  -- idempotent upsert key
);
CREATE INDEX idx_meter_sub_time ON fct_usage_meter (subscription_id, hour_bucket_ts DESC);

The bounded fact — the daily subscription snapshot

A subscription changes a handful of times a year, so the temptation is to model it as a slowly-changing dimension and reconstruct any day's state with an as-of join. Resist it. There is enough daily flux — trials converting, accounts pausing, MRR ticking on plan change — that a daily snapshot is both simpler and faster to query than an SCD2 walk. The snapshot is bounded at two million rows a day and answers the question finance asks most: what was true on this day.

DDL · SLOW FACT — DAILY SUBSCRIPTION STATE

CREATE TABLE fct_subscription_state (
    subscription_id BIGINT      NOT NULL,
    day             DATE        NOT NULL,
    account_id      BIGINT      NOT NULL,
    plan_id         TEXT        NOT NULL,        -- SCD2 surrogate, effective on this day
    status          TEXT        NOT NULL
                    CHECK (status IN ('trialing','active','paused',
                                      'canceled','past_due')),
    mrr_usd         NUMERIC(14,2) NOT NULL DEFAULT 0,  -- 0 while trialing — trials bill nothing
    currency_code   CHAR(3)     NOT NULL,
    days_into_cycle SMALLINT    NOT NULL,
    days_remaining  SMALLINT    NOT NULL,        -- the proration denominator's numerator
    PRIMARY KEY (subscription_id, day)
);
CREATE INDEX idx_substate_day ON fct_subscription_state (day, status);  -- MRR is one scan

The ledger — append-only money

This is the table everything else exists to feed. Every movement of money is a row, never an update: a subscription_charge at the cycle boundary, a usage_charge at period close, a proration credit and a proration charge on a plan change, a refund, a credit, a dunning step. The source_event_id is the idempotency key — the producer's stable identifier for the thing that happened — and UNIQUE on it is what makes the whole pipeline replay-safe. A retried plan-change webhook cannot double-charge, because the second insert is a no-op.

DDL · THE LEDGER — APPEND-ONLY BILLING EVENTS

CREATE TABLE fct_billing_events (
    event_id        BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    account_id      BIGINT       NOT NULL,
    subscription_id BIGINT       NOT NULL,
    event_type      TEXT         NOT NULL
                    CHECK (event_type IN ('subscription_charge','usage_charge',
                          'proration','refund','credit','dunning')),
    plan_id         TEXT,                       -- SCD2 plan effective at event_ts
    event_ts        TIMESTAMPTZ  NOT NULL DEFAULT now(),
    period_start    DATE         NOT NULL,       -- the billing window this row belongs to
    period_end      DATE         NOT NULL,
    amount_local    NUMERIC(14,2) NOT NULL,      -- signed: credits and refunds are negative
    currency_code   CHAR(3)      NOT NULL,
    amount_usd      NUMERIC(14,2) NOT NULL,
    status          TEXT         NOT NULL DEFAULT 'pending',
    source_event_id TEXT         NOT NULL,
    UNIQUE (source_event_id)                  -- idempotent + audit-defensible
);
CREATE INDEX idx_be_invoice ON fct_billing_events (subscription_id, period_start);

And the invoice — deliberately last, deliberately derived. It is a materialization of a SUM over the ledger for one subscription and one period, refreshed at close and re-derivable from the events at any time. It stores split totals (subscription_total, usage_total, overage, proration, credits) for presentation, but it is never the place a figure originates. If the invoice and the ledger ever disagree, the ledger is right and the invoice is rebuilt.

DDL · DERIVED — INVOICE ROLL-UP

CREATE TABLE fct_invoices (
    invoice_id        BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    subscription_id   BIGINT       NOT NULL,
    period_start      DATE         NOT NULL,
    period_end        DATE         NOT NULL,
    subscription_total NUMERIC(14,2) NOT NULL,
    usage_total       NUMERIC(14,2) NOT NULL,
    overage_total     NUMERIC(14,2) NOT NULL,
    proration_net     NUMERIC(14,2) NOT NULL,
    credits_total     NUMERIC(14,2) NOT NULL,
    total_due_usd     NUMERIC(14,2) NOT NULL,   -- = SUM(all the above) = SUM(ledger)
    currency_code     CHAR(3)      NOT NULL,
    status            TEXT         NOT NULL DEFAULT 'open',
    UNIQUE (subscription_id, period_start)
);

§ 04 — THE INVARIANTProration is two events, not one

The whole correctness of this system lives in one rule about mutation: a plan change does not edit a charge, it appends two. Everything else — invoices, MRR, NRR — is arithmetic over an immutable log, and arithmetic over an immutable log is auditable by construction.

Consider the canonical hard case. An account on Pro at $100/month upgrades to Enterprise at $300/month on day 15 of a 30-day cycle. The naive instinct is to find the existing $100 charge and change it. That instinct is the bug: it destroys the history a dispute needs, and it makes the invoice a recompute that can drift from what was actually billed. The correct move is to leave the original $100 charge untouched and append two proration rows — a credit for the unused remainder of the old plan, and a charge for the remainder of the new plan.

TRIALING→ ACTIVE→ PAST_DUE→ PAUSED→ CANCELED

Each transition of the subscription lifecycle is itself a fact: a trial converting to active emits a subscription_charge and bumps mrr_usd in tomorrow's snapshot; a plan change inside ACTIVE emits the two proration events; a PAST_DUE step emits a dunning row; a cancel freezes MRR to zero from the effective day forward. The state machine never overwrites — it appends, and the daily snapshot reads the latest state forward. Trials are the quiet edge case the prompt plants: they look exactly like active subscriptions in every column except that mrr_usd = 0, so they appear in adoption analytics and trial-conversion cohorts without ever inflating revenue.

SQL · THE ATOMIC HEART — A PLAN CHANGE APPENDS TWO ROWS

-- Pro ($100) → Enterprise ($300), day 15 of 30. days_remaining = 15.
-- credit_old  = old_fee × (days_remaining / days_in_cycle) = 100 × 15/30 = −50
-- charge_new  = new_fee × (days_remaining / days_in_cycle) = 300 × 15/30 = +150
-- Net delta = +100. Two rows, both idempotent, the invoice picks both up.
INSERT INTO fct_billing_events
  (account_id, subscription_id, event_type, plan_id, event_ts,
   period_start, period_end, amount_local, currency_code, amount_usd,
   status, source_event_id)
VALUES
  (:acct, :sub, 'proration', 'PLAN_PRO',        :ts,
   :pstart, :pend, -50.00, 'USD', -50.00, 'applied', :sub||':chg42:credit'),
  (:acct, :sub, 'proration', 'PLAN_ENTERPRISE', :ts,
   :pstart, :pend, 150.00, 'USD', 150.00, 'paid',    :sub||':chg42:charge')
ON CONFLICT (source_event_id) DO NOTHING;   -- replay-safe: retries are no-ops

The deterministic identifiers :sub||':chg42:credit' and :chg42:charge are doing the heavy lifting. They are derived from the change request, not generated fresh, so the same plan-change event arriving twice — a webhook retry, a replayed Kafka offset, a backfill — produces the same two keys and the second attempt collides harmlessly. This is the difference between a billing system that is merely correct on the happy path and one that is correct under the duplicate-delivery realities of every distributed queue.

Never edit money. Append it with a stable key, and let the invoice be a SUM. An edited charge is a number you cannot defend; an appended one is a number you can.METERING RULE Nº 1

§ 05 — INGESTION & STREAMSPython on the meter

Three programs carry the data. The aggregator that tames the firehose into the hour, the proration calculator that turns a plan change into exactly two events, and the gateway-webhook handler that records settlement idempotently. Each is small; the judgment is in what they refuse to do.

1 · The aggregator — roll to the hour, lock the rate

The reasoning is physical: at five billion raw events a day, storing raw grain is paying for resolution no one will ever query. The product team wants the hour; finance bills on the hour. So the aggregator folds raw ticks into hour buckets, computes billable_quantity against the plan's free allowance once, and — critically — locks the FX rate to the hour the usage occurred. A rate that moves tomorrow must never change a charge that already happened. The write is an idempotent upsert keyed on (subscription, meter, hour), so an at-least-once stream that replays a window simply re-converges to the same totals.

PYTHON · METER AGGREGATOR — RAW → HOURLY, FX LOCKED

import time
from collections import defaultdict
from decimal import Decimal

class HourlyMeterAggregator:
    """Folds raw per-call events into (sub, meter, hour) buckets.
    The write is an UPSERT, not an INSERT: replaying a Kafka window
    re-converges to the same total instead of double-counting. Usage
    is allowed to be at-least-once because the meter may lose a tick;
    the bucket key makes duplicates harmless."""

    def __init__(self, plans, fx):
        self.plans, self.fx = plans, fx          # caches: allowances + hourly rates
        self.buckets = defaultdict(lambda: Decimal(0))

    def offer(self, ev) -> None:
        hour = ev.ts - (ev.ts % 3600)            # truncate to the hour
        key  = (ev.subscription_id, ev.meter_id, hour)
        self.buckets[key] += Decimal(ev.quantity)  # accumulate raw

    def flush(self, sink) -> None:
        for (sub, meter, hour), raw in self.buckets.items():
            plan = self.plans.effective(sub, hour)
            free = plan.free_allowance(meter)
            billable = max(Decimal(0), raw - free)            # the allowance, applied ONCE
            rate = plan.unit_price(meter)                     # local-currency rate
            fx   = self.fx.rate(plan.currency, hour)          # LOCKED at this hour
            amt_local = billable * rate
            sink.upsert_usage_meter(                          # PK (sub, meter, hour): idempotent
                sub, meter, hour, raw_quantity=raw,
                free_allowance=free, billable_quantity=billable,
                unit_price_local=rate, currency_code=plan.currency,
                amount_local=amt_local, fx_rate_to_usd=fx,
                amount_usd=(amt_local * fx).quantize(Decimal("0.000001")))
        self.buckets.clear()

One carve-out, always stated: the aggregator never emits a billing event. It fills the meter and stops. Turning a month of meter rows into a single usage_charge is the roll-up job's responsibility at period close — separating "record the usage" from "bill the usage" is what lets product read the meter all month without any of those reads being mistaken for charges.

2 · The proration calculator — exactly two events

PYTHON · PRORATION — A PLAN CHANGE BECOMES TWO LEDGER ROWS

from decimal import Decimal, ROUND_HALF_UP

CENTS = Decimal("0.01")

def prorate_plan_change(change) -> list[dict]:
    """A mid-cycle plan change is NEVER an edit to the original charge.
    It is two new rows: a credit for the unused old plan, and a charge
    for the remainder of the new plan. source_event_id is derived from
    the change so retries collapse to no-ops."""
    ratio = Decimal(change.days_remaining) / Decimal(change.days_in_cycle)
    credit_old = (-change.old_fee * ratio).quantize(CENTS, ROUND_HALF_UP)
    charge_new = ( change.new_fee * ratio).quantize(CENTS, ROUND_HALF_UP)
    base = f"{change.subscription_id}:{change.change_id}"     # STABLE, not random
    return [
        {"event_type": "proration", "plan_id": change.old_plan,
         "amount_usd": credit_old, "status": "applied",
         "source_event_id": base + ":credit"},
        {"event_type": "proration", "plan_id": change.new_plan,
         "amount_usd": charge_new, "status": "paid",
         "source_event_id": base + ":charge"},
    ]
    # Note what is absent: no UPDATE, no DELETE, no read of the old charge.
    # The downgrade case falls out for free — new_fee < old_fee makes the
    # net delta negative, a credit the customer carries to next period.

3 · The settlement handler — record paid, idempotently

PYTHON · GATEWAY WEBHOOK → LEDGER STATUS

async def on_payment_webhook(db, hook) -> str:
    """The gateway is at-least-once too — it will redeliver the same
    'charge.succeeded' webhook on timeout. We never trust the network's
    delivery count; we trust the gateway's own idempotency key."""
    if await db.seen_webhook(hook.id):       # dedupe table on the gateway's id
        return "duplicate_ignored"
    async with db.transaction():
        await db.mark_webhook(hook.id)
        if hook.outcome == "succeeded":
            await db.update_invoice_status(hook.invoice_id, "paid",
                                          paid_amount=hook.amount,
                                          paid_ts=hook.ts)
        elif hook.outcome == "failed":
            # Do not edit the charge. Append a dunning event; the
            # subscription state machine moves the sub to past_due.
            await db.append_billing_event(event_type="dunning",
                      subscription_id=hook.subscription_id, amount_usd=Decimal(0),
                      source_event_id=f"dun:{hook.id}")
    return "ok"

§ 06 — THE CLOSEPeriod-end: the invoice as a query

Month-end is where the two streams finally meet. The close job does three things in order — turn the meter into one usage charge, sum the ledger into an invoice, and hand a balance to the gateway — and it holds no money of its own at any step.

The first act collapses a subscription's entire month of meter rows into a single usage_charge billing event. This is the one place usage crosses into the ledger, and it crosses as a SUM: all the hourly amount_usd for the period, plus the overage split that customer success watches, condensed to one immutable row with a stable key derived from (subscription, period). Because the key is stable, re-running the close — after a late-arriving meter correction, say — does not double-bill; it either inserts the missing charge or collides on the one already there.

SQL · CLOSE STEP 1 — METER → ONE usage_charge PER PERIOD

-- Collapse a month of hourly meter rows into a single ledger event.
-- Idempotent: source_event_id is derived, ON CONFLICT DO NOTHING guards.
INSERT INTO fct_billing_events
  (account_id, subscription_id, event_type, event_ts, period_start, period_end,
   amount_local, currency_code, amount_usd, status, source_event_id)
SELECT
   m.account_id, m.subscription_id, 'usage_charge', now(),
   :period_start, :period_end,
   sum(m.amount_local), max(m.currency_code), sum(m.amount_usd),
   'pending',
   'usage:' || m.subscription_id || ':' || :period_start   -- stable key
FROM fct_usage_meter m
WHERE m.subscription_id = :sub
  AND m.hour_bucket_ts >= :period_start
  AND m.hour_bucket_ts <  :period_end
GROUP BY m.account_id, m.subscription_id
ON CONFLICT (source_event_id) DO NOTHING;

The second act is the invoice itself — and it is, satisfyingly, just a pivoted SUM over the ledger. Every event type that landed in the period folds into its column; the grand total is the sum of all of them, which is identically the sum of the raw ledger rows. There is no balance being maintained, no running total to reconcile. The invoice is a photograph of the ledger taken at close, and if the photograph is ever doubted, you re-take it from the same rows and get the same picture.

SQL · CLOSE STEP 2 — LEDGER → INVOICE (CONDITIONAL AGGREGATION)

INSERT INTO fct_invoices
  (subscription_id, period_start, period_end, subscription_total, usage_total,
   overage_total, proration_net, credits_total, total_due_usd, currency_code, status)
SELECT
   subscription_id, :period_start, :period_end,
   sum(amount_usd) FILTER (WHERE event_type = 'subscription_charge') AS subscription_total,
   sum(amount_usd) FILTER (WHERE event_type = 'usage_charge')        AS usage_total,
   coalesce(sum(amount_usd) FILTER (WHERE event_type = 'usage_charge'
                                AND amount_usd > 0), 0)               AS overage_total,
   sum(amount_usd) FILTER (WHERE event_type = 'proration')          AS proration_net,
   sum(amount_usd) FILTER (WHERE event_type IN ('credit','refund'))   AS credits_total,
   sum(amount_usd)                                                     AS total_due_usd,
   max(currency_code), 'open'
FROM fct_billing_events
WHERE subscription_id = :sub
  AND period_start = :period_start
GROUP BY subscription_id
ON CONFLICT (subscription_id, period_start)
DO UPDATE SET total_due_usd = EXCLUDED.total_due_usd,  -- re-derive, never drift
              usage_total   = EXCLUDED.usage_total,
              proration_net = EXCLUDED.proration_net;

Eventual consistency everywhere in the pipeline; determinism at the moment of the invoice. The meter may lag, the FX feed may stutter — but the close is a pure function of the ledger, and a pure function is reproducible.METERING RULE Nº 2 — THE CLOSE IS A FUNCTION

§ 07 — ANALYTICS SQLInterrogating the revenue

The snapshot and the ledger are where the business explains itself. Three queries an interviewer loves, because each one carries a classic SQL pattern on its back — the MRR scan, the upsell threshold, and the cohort retention join.

MRR & ARR — the one-scan snapshot read

The reason the daily snapshot earns its place: monthly recurring revenue is a single filtered SUM over today's partition. No ledger replay, no SCD2 walk, no window functions — active subscriptions, today, summed. Trials contribute nothing because their mrr_usd is zero by construction, so the number is clean without a special case.

SQL · MRR / ARR — FILTERED AGGREGATION OVER ONE DAY

SELECT
    count(*)                       AS active_subs,
    round(sum(mrr_usd), 2)        AS mrr_usd,
    round(sum(mrr_usd) * 12, 2)   AS arr_usd
FROM   fct_subscription_state
WHERE  day = CURRENT_DATE
  AND  status = 'active';          -- trials are mrr_usd = 0, excluded for free

The upsell signal — threshold over a rolling window

Customer success wants accounts about to blow past their plan tier — the expansion trigger. This is a rolling-window aggregation of the meter joined to the plan cap, filtered at the 85% line. The shape is the classic "current usage versus an allowance" pattern: aggregate the fast fact over a trailing window, join the slow fact for the limit, and surface the ratio.

SQL · ACCOUNTS APPROACHING THEIR CAP — UPSELL CANDIDATES

WITH usage_30d AS (
    SELECT subscription_id, meter_id,
           sum(billable_quantity) AS used
    FROM   fct_usage_meter
    WHERE  hour_bucket_ts >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY subscription_id, meter_id
),
plan_caps AS (
    SELECT s.subscription_id, p.meter_id, p.included_quantity
    FROM   fct_subscription_state s
    JOIN   dim_plan_meter_limits p USING (plan_id)
    WHERE  s.day = CURRENT_DATE AND s.status = 'active'
)
SELECT u.subscription_id, u.meter_id, u.used, c.included_quantity,
       round(100.0 * u.used / nullif(c.included_quantity, 0), 1) AS pct_of_cap
FROM   usage_30d u
JOIN   plan_caps c USING (subscription_id, meter_id)
WHERE  u.used > c.included_quantity * 0.85
ORDER BY pct_of_cap DESC;
-- >85% of cap = upsell trigger; >100% = overage already hitting next invoice.

Net revenue retention — cohort retention join

Sales lives on NRR: does a signup cohort's revenue grow as it ages, expansion outrunning churn? The pattern is cohort retention — anchor each subscription to its first-seen quarter and its starting MRR with a window function, join to its current MRR, and ratio the cohort sums. Above 100% is the SaaS holy grail: the cohort is worth more today than the day it arrived.

SQL · NRR BY SIGNUP COHORT — FIRST_VALUE + COHORT JOIN

WITH cohort_baseline AS (
    SELECT subscription_id,
           date_trunc('quarter', min(day) OVER w)                AS cohort_q,
           first_value(mrr_usd) OVER w                           AS starting_mrr
    FROM   fct_subscription_state
    WINDOW w AS (PARTITION BY subscription_id ORDER BY day
                ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
),
current_state AS (
    SELECT subscription_id, mrr_usd AS current_mrr
    FROM   fct_subscription_state
    WHERE  day = CURRENT_DATE
)
SELECT cb.cohort_q,
       count(DISTINCT cb.subscription_id)                       AS cohort_size,
       round(sum(cb.starting_mrr), 0)                            AS starting_mrr,
       round(sum(coalesce(cs.current_mrr, 0)), 0)               AS current_mrr,
       round(100.0 * sum(coalesce(cs.current_mrr, 0))
                   / nullif(sum(cb.starting_mrr), 0), 1)         AS nrr_pct
FROM   cohort_baseline cb
LEFT JOIN current_state cs USING (subscription_id)
GROUP BY cb.cohort_q
ORDER BY cb.cohort_q;
-- NRR > 100% = expansion outpaces churn. The number boards live and die on.

§ 08 — THE DASHBOARDProving the books balance

A senior design ends with observability, because every guarantee above is invisible without it. The billing dashboard watches three things a healthy system keeps true: revenue is growing, money is flowing, and the ledger reconciles to the gateway to the cent.

REVENUE HEALTH: MRR and net-new MRR (expansion − churn), trial→paid conversion by cohort, NRR trend — the numbers that say the business is compounding, read straight off the daily snapshot.
BILLING PIPELINE: meter ingest lag (raw → hour), unbilled usage sitting in the meter before close, proration events/day, dunning queue depth — a spike in unbilled usage means the close job is behind, not that customers stopped using the product.
RECONCILIATION: ledger SUM vs gateway settled (must match to the cent), duplicate webhooks suppressed, FX restatement attempts (should be zero — a non-zero count is a bug), invoice re-derivation drift.

Billing Ops — Global MON 00:10 UTC · MONTH CLOSE +1 · 60s REFRESH

MRR

8.42M$

Net-New MRR

+182K$

Meter Ingest Lag

6.8min

Dunning Queue

412

Revenue by Stream — May close · subscription vs usage (USD, daily accrual)

Trial→Paid

28%

Unbilled Usage

1.9M$

Ledger vs Gateway

$0.00

Dup Webhooks Killed

3.1k/h

FX Restatements

NRR (TTM)

118%

FIG. 2 — The story a healthy close tells: MRR compounding, NRR above 100, ledger reconciling to the gateway at exactly $0.00 — and the two amber tiles, ingest lag and unbilled usage, climbing together because the close job is mid-run, not because anything is wrong.

Read the amber tiles together and the dashboard narrates a month-end close in progress: usage is being rolled to charges, so unbilled usage is draining while ingest lag breathes — and the reconciliation row stays at zero throughout, because the invoice is a SUM of an immutable ledger and there is nothing for it to disagree with. That is what a designed billing system looks like from the controller's chair: the numbers move, but they never stop tying out.

§ 09 — THE RUBRICWhat was actually being tested

Strip the billing details away and the question was testing five judgments, each of which generalizes far beyond SaaS:

GRAIN: Seeing two facts where the prompt says one — and letting each stream's volume and volatility, not a shared schema, pick its own resolution. Hourly for the firehose, daily snapshot for the slow state.
IMMUTABILITY: Refusing to edit money. A plan change appends two rows; a failed payment appends a dunning row; nothing is ever overwritten — so the invoice is a SUM and the ledger is an audit trail by construction.
IDEMPOTENCY: Deriving stable keys from the event, not the network. UNIQUE(source_event_id) turns every retry, replay, and backfill into a harmless no-op — the only honest stance toward at-least-once delivery.
SEPARATION: Recording usage all month without billing it; joining the two streams only at close. Three teams read the same meter, and none of their reads are mistaken for charges.
HONESTY: Engineering the promise: FX locked at usage time, the invoice a pure function of the ledger, eventual consistency in the pipeline and determinism at the statement. The usage may be approximate; the bill may not.

Usage is approximate; the invoice is exact. The art is keeping them in separate tables until the single instant they must agree — and making that agreement a SUM, not a recompute.— CLOSING ARGUMENT