The Royalty Problem — PaddySpeaks

§ 01 — THE QUESTIONOne stream, two ledgers

It sounds like an analytics question — store the plays, count them. It is really an accounting question wearing a product's clothes, and the moment money attaches to an event, every modeling shortcut becomes a future lawsuit.

Interview Prompt

"Model Spotify's listening history to power recommendations and Year in Review, and to pay per-stream royalties to multiple rights holders — artist, label, publisher, songwriter — under the pool model, where each country-period's revenue is divided by its total qualified streams. How would you scope it?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The trap is to hear one verb — "store the plays" — and reach for one table. But a single play is consumed by two organizations that disagree about what it even is. To Product, a play is behavioral signal: every tap counts, and a three-second skip is the most useful data point of all, because it teaches the recommender what not to do. To Finance, that same skip does not exist: industry custom pays royalty only on streams of thirty seconds or more, so a skip is a non-event with a dollar value of exactly zero. The fact table has to satisfy both readings without forking into two sources of truth that drift apart by the next quarter.

Then the pool model adds a genuinely strange dependency. A stream's payout is not a price you can look up when it happens. It is revenue ÷ total qualified streams in the country-period — and the denominator is unknowable until the period ends and the last late-arriving offline play has landed. The rate a stream earns in April is only computable in May. That single fact reorders the whole architecture.

THE EVENT GRAIN: Plays. Billions of rows a day, append-only, every tap including skips. Owned by Product for recommendations; the raw material Finance filters down. Carries the qualified flag that is the entire bridge between the two readings.
THE POOL GRAIN: Country-period. One row per (country × month). Holds the frozen pool size and — the prize — the per_stream_rate, which is derived exactly once at close and then is immutable forever. The denominator made durable.
THE PAYOUT GRAIN: Attribution. One row per (qualified play × rights holder). About four rows per eligible stream. Generated in a single bulk write at period close, then read-only. The defensible per-play line item a label's auditor can subpoena.

A play is behavior to Product and a liability to Finance. The thirty-second flag is the seam where those two readings are sewn into one row.

Scoping out loud

Scope is the first thing scored, and it is mostly about what you refuse to build. In scope: the play event, the rights bridge, the pool close, the attribution output, and the queries each stakeholder lives on. Explicitly out of scope, said aloud: the recommender model itself (it consumes the play fact, it does not live here), the audio pipeline, subscriber billing internals (we take collected revenue as an input), and tax withholding on payouts. One deliberate non-goal worth naming: I am not hard-coding pro-rata. The schema must also express user-centric payout (UCPS) — where a subscriber's fee flows only to artists they personally played — without a rewrite, because that is the live policy debate in the industry and an interviewer will probe it.

Then the envelope math, volunteered rather than dragged out. Spotify-shaped numbers:

Quantity	Estimate	Consequence
Monthly active listeners	~600 M	Sets the play-event firehose and the user dimension size
Plays per day	≈ 4 B	Append-only event grain; partition by play_date or it is unqueryable
Qualified-stream fraction	~55–70%	The flag that splits Product's denominator from Finance's
Rights holders per track	~4	Attribution rows ≈ 4 × qualified plays — linear, but 4× the firehose
Country pools / month	~180	Each freezes one revenue figure and one per-stream rate at close
Payout pool share of revenue	≈ 70%	The ~30% Spotify retains; pool = collected revenue × 0.70
Late offline streams	days after play	Attribution must key on month-of-event, never month-of-ingest

Notice the asymmetry. The play firehose is enormous and cheap to be wrong about — a dropped skip costs nobody anything. The pool table is tiny, a few thousand rows a year, and catastrophic to be wrong about, because every payout in the country divides by its numbers. That inversion — vast-but-forgiving against tiny-but-sacred — is the spine of everything below. The architecture follows the dollars, not the bytes.

§ 02 — DATA FLOWFrom a tap to a line item

Two tempos share one spine. The hot path is a continuous play stream that fans out to recommendations the instant it lands. The cold path is a once-a-month batch that freezes a pool, joins a temporal bridge, and mints payouts that never change again.

FIG. 1 — Two paths off one event. Recommendations are eventual and forgiving; payouts are monthly and frozen. The play lake is the shared spine both consume.

Three properties of this picture carry the interview. First, the hot path never waits for money: a play reaches the recommender in milliseconds and lands in fct_play_events with its qualified flag already computed, so Product is never blocked on Finance's monthly cadence. Second, the cold path is a pure function of frozen inputs — collected revenue, the qualified-stream count, and the rights bridge as it stood at play time — which is what lets an auditor replay any month from scratch and land on the identical number. Third, fraud is a side-flag, never a delete: the anti-fraud Spark job sets is_fraudulent on offending rows so the close excludes them, but the original play survives, because "what was claimed versus what was paid" is itself a reportable, litigable fact.

The Failure Philosophy, In One Rule

The hot path may forget; the cold path may never lie. A lost or duplicated play barely moves a recommendation and is invisible against billions of events, so the stream is at-least-once and best-effort. A wrong payout is a breach of a rights contract, so the close is deterministic, idempotent, and reproducible from immutable inputs. When the two philosophies conflict, the money wins — the recommender can tolerate noise the ledger cannot.

§ 03 — DATA MODELOne flag, one bridge, two frozen facts

The schema falls out of who owns what. Product owns the event grain. Finance owns three derived facts: the pool, the attribution, and the rights bridge that decides who gets paid. The cleverness is concentrated in two columns — a generated boolean and a pair of effective dates.

The event grain — every tap, qualified or not

A play is a small, immutable record at colossal volume. The decisive choices are the generated is_royalty_eligible column — the thirty-second boundary computed once at write so no downstream query can forget it — and context_type, which tells the recommender which surface drove the play. Note what is not here: no payout amount. A play does not know what it earns until its country-period closes, so storing a dollar figure on this row would be a lie with a timestamp.

DDL · EVENT GRAIN (LAKE / COLUMNAR)

-- The "Listen" grain. One row per play event, skips included.
-- Append-only. Partitioned by play_date so a single month of a single
-- country is a partition prune, not a full scan of four billion rows/day.
CREATE TABLE fct_play_events (
    play_id            BIGINT       NOT NULL,   -- snowflake: time-ordered
    user_id            BIGINT       NOT NULL,
    track_id           BIGINT       NOT NULL,
    country_id         CHAR(2)      NOT NULL,   -- ISO country; the pool key
    played_at          TIMESTAMPTZ  NOT NULL,   -- event time, NOT ingest time
    duration_played_s  INTEGER      NOT NULL,
    -- The seam between Product and Finance, materialized once at write:
    is_royalty_eligible BOOLEAN
        GENERATED ALWAYS AS (duration_played_s >= 30) STORED,
    is_fraudulent      BOOLEAN      NOT NULL DEFAULT false,  -- side-flag
    context_type       TEXT         NOT NULL    -- playlist|algorithmic|search|radio
                       CHECK (context_type IN
                          ('playlist','algorithmic','search','radio')),
    source_playlist_id BIGINT,
    device_id          BIGINT
)
PARTITION BY RANGE (played_at);

That one generated column does heavy lifting. Product queries the table raw — every row is signal. Finance adds WHERE is_royalty_eligible AND NOT is_fraudulent and is instantly looking at the payable universe, from the same source of truth, with no parallel "finance plays" table to drift out of sync. The thirty-second rule lives in the schema, not in a hundred analysts' memories.

The rights bridge — a ledger of deals, not a lookup

Here is where most candidates lose the plot. The split of a stream — how much to artist, label, publisher, songwriter — is not a property of the track. It is a property of the deal that was in force when the stream happened. Deals renegotiate mid-quarter; a label might cede 5% to an artist effective the 15th. So the bridge is Slowly-Changing-Dimension Type 2: every share is bounded by effective_from and effective_to, and history is never overwritten.

DDL · THE DEAL-TERMS LEDGER (SCD2)

-- bridge_track_rights: who owns what share of a track, and WHEN.
-- The contract: SUM(share_pct) per (track, rights_type) over any instant
-- = 1.00. Rows are closed, never updated; a renegotiation closes the old
-- row (sets effective_to) and inserts a new one.
CREATE TABLE bridge_track_rights (
    track_id        BIGINT       NOT NULL,
    rights_holder_id BIGINT      NOT NULL,
    rights_type     TEXT         NOT NULL
                    CHECK (rights_type IN
                       ('artist','label','publisher','songwriter')),
    share_pct       NUMERIC(6,5) NOT NULL      -- 0.45000 = 45%
                    CHECK (share_pct >= 0 AND share_pct <= 1),
    effective_from  TIMESTAMPTZ  NOT NULL,
    effective_to    TIMESTAMPTZ  NOT NULL DEFAULT '9999-12-31',
    PRIMARY KEY (track_id, rights_holder_id, rights_type, effective_from)
);
-- Half-open intervals: [from, to). played_at >= from AND played_at < to.
-- This makes the boundary instant unambiguous — a play at exactly the
-- renegotiation timestamp belongs to the NEW deal, never both.
CREATE INDEX idx_bridge_asof
    ON bridge_track_rights (track_id, effective_from, effective_to);

The two frozen facts — pool and payout

The pool table is the smallest and most important object in the design. It exists to make the denominator durable. At month-close the job computes pool_usd as collected revenue times the payout share (~70%), counts qualified non-fraud streams, divides, and writes per_stream_rate_usd — then sets frozen_at and never touches the row again. The attribution table is the output: one defensible line per (play × holder), produced in a single bulk write.

DDL · THE FROZEN POOL & THE PAYOUT LINE ITEM

-- The "Pool" grain. One row per (country × period). The denominator,
-- made into a stored fact so it can be replayed and audited.
CREATE TABLE fct_royalty_periods (
    period_id           TEXT         PRIMARY KEY,  -- 'US_2025_04'
    country_id          CHAR(2)      NOT NULL,
    period_start        DATE         NOT NULL,
    period_end          DATE         NOT NULL,
    collected_rev_usd   NUMERIC(18,2) NOT NULL,
    pool_usd            NUMERIC(18,2) NOT NULL,   -- ≈ collected_rev × 0.70
    total_qualified_plays BIGINT     NOT NULL,
    per_stream_rate_usd NUMERIC(14,10) NOT NULL,  -- pool ÷ qualified plays
    frozen_at           TIMESTAMPTZ,                -- null until close; then immutable
    CHECK (period_end > period_start)
);

-- The "Payout" grain. One row per (qualified play × rights holder),
-- generated at close. ~4 rows per eligible stream. Read-only after write.
CREATE TABLE fct_royalty_attributions (
    play_id          BIGINT       NOT NULL,
    period_id        TEXT         NOT NULL REFERENCES fct_royalty_periods,
    rights_holder_id BIGINT       NOT NULL,
    rights_type      TEXT         NOT NULL,
    share_pct        NUMERIC(6,5) NOT NULL,    -- snapshot of the as-of-play share
    attributed_payout_usd NUMERIC(14,10) NOT NULL, -- rate × share_pct
    PRIMARY KEY (play_id, rights_holder_id, rights_type)
);

One subtlety worth saying out loud: the attribution row stores share_pct as a snapshot, not a foreign key back into the bridge. The bridge is the source of truth at compute time, but freezing the resolved share onto the payout row means a future bridge correction — even a legitimate one — cannot silently restate a payout that has already been remitted and reported to a tax authority. The line item is a photograph, not a live view.

§ 04 — THE CORE INVARIANTThe rate that only exists at close

Every correctness property of this system reduces to one discipline: the per-stream rate is computed once, from frozen inputs, after the period is sealed — and never again. Get the timing of that single division right and the rest is bookkeeping.

The pool model has a chicken-and-egg shape that breaks naive designs. You cannot pay a stream when it happens, because its rate is pool ÷ total_qualified_plays, and the total is unknown until the last play of the period — including offline plays uploaded days late — has arrived. A junior model computes a "running rate" daily and pays against it; by month-end that rate has drifted, early payouts are wrong, and Finance is reconciling against a number that was never real. The senior move is to refuse to compute the rate until the denominator is final, and to make the period's transition through close an explicit, observable lifecycle.

The period lifecycle

OPEN→ CUTOFF→ RECONCILING→ FROZEN→ PAID

OPEN — plays land freely against the month-of-event. CUTOFF — a grace window closes (say, period_end + 7 days) to admit late offline uploads that still belong to the period. RECONCILING — fraud flags are finalized, revenue is confirmed, the qualified count is locked. FROZEN — the single division runs, per_stream_rate_usd and frozen_at are written, and the row becomes immutable. PAID — attribution is generated and remitted. A late chargeback or a discovered fraud cluster after FROZEN is never an edit to this period; it is a correcting entry in a subsequent period's pool, exactly as a double-entry ledger appends a reversal rather than rewriting history.

The rate is not a number you read. It is a number you mint, once, when the period is sealed — and then defend forever.ROYALTY RULE Nº 1 — FREEZE BEFORE YOU DIVIDE

The division itself is trivial; the discipline is in everything that must be true before it runs. Below is the atomic close — note it reads only frozen, period-bounded inputs and writes the rate exactly once, guarded so it can never double-run.

SQL · THE ATOMIC POOL CLOSE

-- Compute and freeze the per-stream rate for one country-period.
-- Idempotent: the WHERE frozen_at IS NULL guard means a retry after a
-- partial failure is a no-op, never a second (different) rate.
UPDATE fct_royalty_periods rp
SET    total_qualified_plays = q.cnt,
       pool_usd              = round(rp.collected_rev_usd * 0.70, 2),
       per_stream_rate_usd   = round(rp.collected_rev_usd * 0.70, 2)
                               / nullif(q.cnt, 0),
       frozen_at             = now()
FROM (
    SELECT count(*) AS cnt
    FROM   fct_play_events
    WHERE  country_id = :country
      AND  played_at >= :period_start
      AND  played_at <  :period_end_plus_grace   -- month-of-event window
      AND  is_royalty_eligible                       -- the 30-second boundary
      AND  NOT is_fraudulent                        -- side-flagged bots excluded
) q
WHERE rp.period_id = :period_id
  AND rp.frozen_at IS NULL;            -- freeze-once guard IS the lock

Once that row is frozen, the per-stream rate is a constant the rest of the pipeline multiplies against. Attribution becomes a join: each qualified play, times its as-of-play shares, times the frozen rate. The hard part is over before the first dollar is split.

§ 05 — INGESTION & STREAMSPython on the play firehose

Three small programs carry the hot path: the ingest that normalizes a play and stamps the boundary, the consumer that fans plays out without blocking, and the fraud pass that flags without deleting. As always, the judgment is in what each refuses to do.

1 · Play ingest — stamp event time, derive the boundary

The single most consequential rule in ingestion is invisible: trust the play's own clock, not the wall clock at the server. Offline plays — a subway commute cached on the phone — arrive in a batch days later but belong to the month they actually happened. If ingest stamped played_at with arrival time, those streams would attribute to the wrong pool and quietly corrupt two months' payouts at once. The client's timestamp is authoritative; the server only validates it is sane.

PYTHON · PLAY INGEST → STREAM

import time

ROYALTY_THRESHOLD_S = 30        # the industry boundary, in one place
MAX_CLOCK_SKEW_S    = 90 * 86400  # reject plays "from" >90d ago/future

def normalize_play(raw: dict, now_s: float) -> dict | None:
    """Take a raw client event, return a stream record — or None to drop.
    We DO NOT invent played_at: an offline play that happened Tuesday must
    bind to Tuesday's pool even if it uploads Friday. The client clock is
    authoritative; we only sanity-bound it."""
    played_at = raw["played_at_ms"] / 1000.0
    if abs(now_s - played_at) > MAX_CLOCK_SKEW_S:
        return None                      # implausible clock: drop, alarm on rate

    dur = int(raw["duration_played_s"])
    return {
        "play_id":   raw["play_id"],         # client-minted, for dedup
        "user_id":   raw["user_id"],
        "track_id":  raw["track_id"],
        "country_id": raw["country_id"],     # where the listener was
        "played_at": played_at,
        "duration_played_s": dur,
        # Derive the boundary at the edge too, so the recommender and
        # the lake agree before the generated column ever runs:
        "qualified": dur >= ROYALTY_THRESHOLD_S,
        "context_type": raw.get("context_type", "radio"),
    }

The play_id is minted on the client, which makes the stream idempotent: a phone that retries an upload after a flaky connection produces the same play_id, and the lake's de-duplication on that key collapses the retry. At-least-once delivery plus a client key equals effectively-once payouts — the only acceptable guarantee when each row is money.

2 · The fan-out consumer — Product never waits on Finance

PYTHON · STREAM CONSUMER → RECOMMENDER + LAKE

async def consume(stream, recommender, lake) -> None:
    """Every play feeds the recommender immediately (skips are signal) and
    is appended to the lake. The two sinks are independent: a slow lake
    write must never delay a recommendation, and a recommender hiccup must
    never drop a payable play. Different durability, different urgency."""
    async for batch in stream.read():
        plays = [p for p in (normalize_play(r, time.time()) for r in batch)
                 if p is not None]

        # Fire-and-forget to the recommender: best-effort, low-latency.
        recommender.observe(plays)            # skips included — the whole point

        # Durable append to the lake: at-least-once, deduped on play_id.
        await lake.append_idempotent(plays, key="play_id")
        # NOTE: we never compute payout here. A play's dollar value does not
        # exist yet — it is born only when its country-period is frozen.

The comment at the bottom is the carve-out an interviewer is listening for. The temptation, having a play in hand, is to look up "the rate" and write a payout immediately. There is no rate yet. Resisting that — keeping the hot path entirely free of money — is what keeps the two tempos from contaminating each other.

3 · The fraud pass — flag, never delete

PYTHON · ANTI-FRAUD SIDE-FLAGGER (BATCH)

def flag_fraud_clusters(spark, period_id: str) -> None:
    """Bot farms inflate small artists with 30-to-35-second loops on repeat.
    We detect and SIDE-FLAG — we never delete the rows. Legal needs the
    'claimed vs paid' trail; deleting destroys the ability to report
    'we caught X% of fraud'. is_fraudulent is set by THIS job, not the
    player, and the close excludes flagged rows from the denominator."""
    plays = spark.table("fct_play_events").filter(
        f"period_id = '{period_id}' AND is_royalty_eligible")

    # Heuristic: same (user, track) replayed many times at near-threshold
    # duration is the canonical loop-attack signature.
    suspects = (plays
        .filter("duration_played_s BETWEEN 30 AND 35")
        .groupBy("user_id", "track_id")
        .count()
        .filter("count > 200"))          # tune per market; this is a sketch

    spark.sql(f"""
        UPDATE fct_play_events SET is_fraudulent = true
        WHERE play_id IN (SELECT play_id FROM suspect_plays)
    """)
    # The only UPDATE in the design — and it sets a flag, never erases a fact.

§ 06 — AGGREGATIONThe period close, end to end

The monthly close is the system's heartbeat. It freezes a pool, resolves every qualified play against the rights bridge as it stood at play time, multiplies by the frozen rate, and writes a few billion immutable line items in one bulk pass. Run twice, it produces the identical result.

The close is deliberately a batch, not a stream, because it requires a property streams cannot give: a sealed input. Only after CUTOFF and RECONCILING is the qualified count final and the revenue confirmed; only then can the rate be honest. The whole job is structured as a pure function of frozen inputs so that auditors, finance, and a future re-run all land on the same dollar.

PYTHON · THE CLOSE ORCHESTRATION

def close_period(engine, country: str, period_id: str) -> None:
    """Deterministic, idempotent month-close for one country-period.
    Ordering is the contract: you cannot attribute before the rate is
    frozen, and you cannot freeze the rate before fraud is finalized."""
    # 1. Gate: the period must be reconciled. Refuse otherwise — better to
    #    fail loudly than to pay against a denominator still in motion.
    assert engine.period_state(period_id) == "RECONCILING", \
        f"{period_id} not ready to close"

    # 2. Freeze the denominator and mint the rate (the atomic SQL, §04).
    engine.execute(POOL_CLOSE_SQL, country=country, period_id=period_id)
    rate_row = engine.fetch_period(period_id)
    assert rate_row.frozen_at is not None

    # 3. Generate attribution: each qualified play × as-of-play shares ×
    #    frozen rate. A single INSERT...SELECT, partition-pruned to the
    #    period. Re-runnable: it writes into a fresh partition we swap in.
    engine.execute(ATTRIBUTION_SQL, period_id=period_id)

    # 4. Prove it before paying: every qualified play's shares summed
    #    across holders must reconstruct the rate to the penny.
    drift = engine.scalar(ATTRIBUTION_RECON_SQL, period_id=period_id)
    if drift > 0.01:
        engine.execute("ROLLBACK_PARTITION", period_id=period_id)
        raise PayoutReconciliationError(period_id, drift)

    # 5. Only now: advance to PAID and hand off to remittance.
    engine.set_period_state(period_id, "PAID")

The attribution generation itself is the join the whole schema was built to make cheap. It reads the frozen rate, fans each eligible play across its rights holders using the bridge bounded by play time, and computes the per-line payout. The half-open interval predicate is what makes a play at the exact renegotiation instant resolve to one deal, never two.

SQL · ATTRIBUTION GENERATION (ONE BULK WRITE)

INSERT INTO fct_royalty_attributions
    (play_id, period_id, rights_holder_id, rights_type, share_pct,
     attributed_payout_usd)
SELECT
    p.play_id,
    rp.period_id,
    b.rights_holder_id,
    b.rights_type,
    b.share_pct,                                   -- snapshot the share
    round(rp.per_stream_rate_usd * b.share_pct, 10) AS attributed_payout_usd
FROM fct_play_events p
JOIN fct_royalty_periods rp
  ON rp.country_id = p.country_id
 AND p.played_at >= rp.period_start
 AND p.played_at <  rp.period_end + INTERVAL '7 days'   -- grace window
JOIN bridge_track_rights b
  ON b.track_id = p.track_id
 AND p.played_at >= b.effective_from
 AND p.played_at <  b.effective_to            -- as-of-play, half-open
WHERE rp.period_id = :period_id
  AND rp.frozen_at IS NOT NULL                -- never attribute an unfrozen pool
  AND p.is_royalty_eligible
  AND NOT p.is_fraudulent;

Pro-rata and user-centric are not two schemas. They are two queries over the same frozen pool and the same temporal bridge — which is the entire reason the rate and the bridge are stored facts.ROYALTY RULE Nº 2 — THE POLICY IS A JOIN, NOT A TABLE

That last rule is the senior flourish. Because the pool, the rate, and the rights bridge are all durable facts rather than logic baked into a pipeline, switching from pro-rata (a stream earns rate × share) to user-centric payout (a subscriber's fee splits only among artists they played) is a different aggregation over identical inputs — no migration, no restatement of closed months. The schema deliberately does not know which distribution algorithm it serves, which is exactly what lets the business change its mind without rewriting history.

§ 07 — ANALYTICS SQLInterrogating the catalogue

The frozen facts are where the system explains itself to three different audiences. Three queries an interviewer loves, because each carries a named pattern on its back: an as-of-time temporal join, conditional aggregation, and window-ranked cohort share.

Royalty attribution as-of-play — the SCD2 temporal join

This is the defensible line item: for one track in one period, who earned what, using the splits that were in force when each play happened. The pattern is the as-of join — a range predicate against the bridge's half-open validity interval. It is the answer to "pay me as of the deal that was active when my song was streamed," which is the only answer that survives a courtroom.

SQL · WHO EARNED WHAT, BY THE DEAL THAT WAS LIVE

-- Per-play, per-holder payout for one track in one period, using the
-- bridge row that was effective at played_at. Produces ~4 rows per play.
SELECT
    p.play_id,
    p.played_at,
    b.rights_type,
    b.rights_holder_id,
    b.share_pct,
    rp.per_stream_rate_usd,
    round(rp.per_stream_rate_usd * b.share_pct, 6) AS payout_usd
FROM fct_play_events p
JOIN fct_royalty_periods rp
  ON rp.country_id = p.country_id
 AND p.played_at >= rp.period_start
 AND p.played_at <  rp.period_end + INTERVAL '7 days'
JOIN bridge_track_rights b
  ON b.track_id = p.track_id
 AND p.played_at >= b.effective_from
 AND p.played_at <  b.effective_to              -- the as-of-play predicate
WHERE p.track_id = :track_id
  AND rp.period_id = :period_id
  AND p.is_royalty_eligible
  AND NOT p.is_fraudulent
ORDER BY p.played_at, b.rights_type;

Skip rate by surface — conditional aggregation

Product's question: is the recommender any good? Skip rate by context_type answers it, and the pattern is conditional aggregation — counting a sub-population with FILTER rather than a second pass. A high skip rate on algorithmic is a recommender-quality alarm; the same rate on radio is expected lean-back behavior. The thirty-second flag does double duty: a non-qualified play is a skip.

SQL · IS THE RECOMMENDER EARNING ITS SURFACE?

SELECT
    context_type,
    count(*)                                              AS plays,
    count(*) FILTER (WHERE NOT is_royalty_eligible)   AS skips,
    round(100.0 * count(*) FILTER (WHERE NOT is_royalty_eligible)
                / nullif(count(*), 0), 2)               AS skip_rate_pct,
    round(avg(duration_played_s), 1)                    AS avg_listen_s
FROM fct_play_events
WHERE played_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY context_type
ORDER BY skip_rate_pct DESC;
-- High on 'algorithmic' = a recommender problem. High on 'radio' =
-- lean-back listening, expected. The surface tells you which.

The long tail — windowed cohort share

The pool model's central equity question: how concentrated are payouts? This ranks holders by earnings within a period and uses a window to compute each one's running share of the total — the gateway to "the top 1% of artists take N% of the pool," which is the headline pro-rata's critics aim at and the chart UCPS exists to change. The pattern is the windowed cumulative share, a cohort-concentration staple.

SQL · PAYOUT CONCENTRATION ACROSS THE CATALOGUE

WITH holder_take AS (
    SELECT rights_holder_id,
           sum(attributed_payout_usd) AS earned_usd
    FROM   fct_royalty_attributions
    WHERE  period_id = :period_id
    GROUP BY rights_holder_id
),
ranked AS (
    SELECT rights_holder_id, earned_usd,
           row_number() OVER (ORDER BY earned_usd DESC) AS rnk,
           sum(earned_usd) OVER (ORDER BY earned_usd DESC
               ROWS UNBOUNDED PRECEDING)              AS cum_usd,
           sum(earned_usd) OVER ()                     AS pool_total
    FROM   holder_take
)
SELECT rnk, rights_holder_id,
       round(earned_usd, 2)                              AS earned_usd,
       round(100.0 * cum_usd / nullif(pool_total, 0), 2) AS cum_pct_of_pool
FROM   ranked
WHERE  rnk <= 1000
ORDER BY rnk;

§ 08 — THE DASHBOARDWatching a close land

A royalty system is judged on a few days a month, when a period closes and money moves. The dashboard watches two things at once: the always-on play firehose, and the once-a-month close that must reconcile to the penny before a single payout is remitted.

THE FIREHOSE: play ingest rate against expected, qualified-stream fraction (a sudden drop means a client-clock bug or a skip surge), late-arrival lag (offline uploads still landing against last period), and dedup collision rate on play_id — the proof that retries are collapsing.
THE CLOSE: period state across countries, reconciliation drift (attribution summed back to the pool — must be ~$0), fraud-excluded % of streams, and per-stream rate versus prior period — a rate that swings sharply flags a revenue or denominator anomaly before payout.
THE EQUITY LENS: top-1% pool share, long-tail count earning below a floor, and pro-rata vs UCPS delta — the simulation that tells policy what switching models would cost which artists.

Royalty Ops — US · 2025-04 close MON 06:10 UTC · STATE: RECONCILING → FROZEN · 5m REFRESH

Play Ingest

46.2k/s

Qualified Fraction

63%

Recon Drift

$0.004

Per-Stream Rate

0.00417$

Pool Concentration — US 2025-04 · cumulative % of pool by holder rank

Fraud Excluded

0.7%

Late Arrival Lag

31h

Countries Frozen

168/180

Attribution Rows

7.9B

Dedup Collisions

0.04%

UCPS Δ (sim)

+12% tail

FIG. 2 — A close in flight. 168 of 180 pools are frozen; recon drift is four-tenths of a cent, well under the penny gate; the per-stream rate ticked up on confirmed revenue. Late-arrival lag is amber because offline uploads are still trickling against the grace window — exactly why CUTOFF exists.

Read the amber tiles together and the dashboard narrates the close from §06. Revenue firmed up, nudging the rate; the last twelve countries are still admitting late offline plays before they freeze; and the UCPS simulation quietly shows the long tail would gain twelve percent if the policy switched — the schema's whole point, surfaced as a number a product executive can act on. Recon drift flat near zero is the load-bearing tile: nothing gets paid until that reads green.

§ 09 — THE RUBRICWhat was actually being tested

Strip away the music and the question was probing five judgments, each of which generalizes far past streaming royalties:

ONE FACT, TWO READERS: Serving Product and Finance from a single event grain via a generated flag, rather than forking into two tables that silently disagree by quarter-end. The seam is a column, not a copy.
DERIVE-AT-CLOSE: Recognizing that the per-stream rate is unknowable until the denominator is sealed, and modeling it as a frozen fact computed once — not a running number that drifts and forces restatement.
TEMPORAL TRUTH: Paying by the deal that was live at play time through an SCD2 bridge and an as-of-time join, turning a renegotiation from an audit nightmare into a range predicate.
APPEND, NEVER EDIT: Side-flagging fraud and posting late corrections to a subsequent period instead of rewriting a closed one — the same immutability discipline a payments ledger lives by.
POLICY AS QUERY: Keeping the distribution algorithm out of the schema so pro-rata and user-centric are two joins over identical frozen inputs. The model that doesn't bake in the answer is the one that survives the business changing its mind.

A play is an event the moment it happens, but it does not become money until a country's month is sealed. The whole design is the discipline of keeping those two truths apart until exactly the instant they must meet.— CLOSING ARGUMENT