PaddySpeaks · Systems at the Whiteboard · Nº 08

The Royalty Problem

Model Spotify's listening history so that one fact serves recommendations and pays four rights holders per stream — under a pool where the per-stream rate doesn't exist until the month closes. One decision separates the senior answer: the rate is a stored fact, not a running number. A full working-through — data flow, schema, the 30-second boundary, the SCD2 rights bridge, the period-close pipeline, and the SQL that splits a fraction of a cent four ways.

§ 01 — THE QUESTIONOne stream, two ledgers

It sounds like an analytics question — store the plays, count them. It is really an accounting question wearing a product's clothes, and the moment money attaches to an event, every modeling shortcut becomes a future lawsuit.

Interview Prompt

"Model Spotify's listening history to power recommendations and Year in Review, and to pay per-stream royalties to multiple rights holders — artist, label, publisher, songwriter — under the pool model, where each country-period's revenue is divided by its total qualified streams. How would you scope it?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The trap is to hear one verb — "store the plays" — and reach for one table. But a single play is consumed by two organizations that disagree about what it even is. To Product, a play is behavioral signal: every tap counts, and a three-second skip is the most useful data point of all, because it teaches the recommender what not to do. To Finance, that same skip does not exist: industry custom pays royalty only on streams of thirty seconds or more, so a skip is a non-event with a dollar value of exactly zero. The fact table has to satisfy both readings without forking into two sources of truth that drift apart by the next quarter.

Then the pool model adds a genuinely strange dependency. A stream's payout is not a price you can look up when it happens. It is revenue ÷ total qualified streams in the country-period — and the denominator is unknowable until the period ends and the last late-arriving offline play has landed. The rate a stream earns in April is only computable in May. That single fact reorders the whole architecture.

THE EVENT GRAIN
Plays. Billions of rows a day, append-only, every tap including skips. Owned by Product for recommendations; the raw material Finance filters down. Carries the qualified flag that is the entire bridge between the two readings.
THE POOL GRAIN
Country-period. One row per (country × month). Holds the frozen pool size and — the prize — the per_stream_rate, which is derived exactly once at close and then is immutable forever. The denominator made durable.
THE PAYOUT GRAIN
Attribution. One row per (qualified play × rights holder). About four rows per eligible stream. Generated in a single bulk write at period close, then read-only. The defensible per-play line item a label's auditor can subpoena.
A play is behavior to Product and a liability to Finance. The thirty-second flag is the seam where those two readings are sewn into one row.

Scoping out loud

Scope is the first thing scored, and it is mostly about what you refuse to build. In scope: the play event, the rights bridge, the pool close, the attribution output, and the queries each stakeholder lives on. Explicitly out of scope, said aloud: the recommender model itself (it consumes the play fact, it does not live here), the audio pipeline, subscriber billing internals (we take collected revenue as an input), and tax withholding on payouts. One deliberate non-goal worth naming: I am not hard-coding pro-rata. The schema must also express user-centric payout (UCPS) — where a subscriber's fee flows only to artists they personally played — without a rewrite, because that is the live policy debate in the industry and an interviewer will probe it.

Then the envelope math, volunteered rather than dragged out. Spotify-shaped numbers:

QuantityEstimateConsequence
Monthly active listeners~600 MSets the play-event firehose and the user dimension size
Plays per day≈ 4 BAppend-only event grain; partition by play_date or it is unqueryable
Qualified-stream fraction~55–70%The flag that splits Product's denominator from Finance's
Rights holders per track~4Attribution rows ≈ 4 × qualified plays — linear, but 4× the firehose
Country pools / month~180Each freezes one revenue figure and one per-stream rate at close
Payout pool share of revenue≈ 70%The ~30% Spotify retains; pool = collected revenue × 0.70
Late offline streamsdays after playAttribution must key on month-of-event, never month-of-ingest

Notice the asymmetry. The play firehose is enormous and cheap to be wrong about — a dropped skip costs nobody anything. The pool table is tiny, a few thousand rows a year, and catastrophic to be wrong about, because every payout in the country divides by its numbers. That inversion — vast-but-forgiving against tiny-but-sacred — is the spine of everything below. The architecture follows the dollars, not the bytes.


§ 02 — DATA FLOWFrom a tap to a line item

Two tempos share one spine. The hot path is a continuous play stream that fans out to recommendations the instant it lands. The cold path is a once-a-month batch that freezes a pool, joins a temporal bridge, and mints payouts that never change again.

HOT PATH · ~4B PLAYS/DAY · CONTINUOUS COLD PATH · ONCE PER MONTH · DETERMINISTIC PLAYERS tap · skip · offline cache PLAY INGEST stamp played_at + dur derive qualified ≥30s PLAY STREAM key: hash(user_id) partition by play_date RECOMMENDER · REAL-TIME every play, skips included fct_play_events · LAKE append-only · partitioned ANTI-FRAUD (SPARK) side-flags bots · never deletes SUBSCRIPTION REV collected $ per user·country PERIOD CLOSE JOB pool = rev × 70% rate = pool ÷ Σ qualified fct_royalty_periods FROZEN rate · denominator bridge_track_rights SCD2 · as-of played_at fct_royalty_attributions ~4 rows / qualified play bulk write · then read-only Σ qualified plays (month-of-event) · the denominator each play SOLID — sustained / batch flow · DASHED — the play lake feeding the close · The stream pays attention; the close pays money.
FIG. 1 — Two paths off one event. Recommendations are eventual and forgiving; payouts are monthly and frozen. The play lake is the shared spine both consume.

Three properties of this picture carry the interview. First, the hot path never waits for money: a play reaches the recommender in milliseconds and lands in fct_play_events with its qualified flag already computed, so Product is never blocked on Finance's monthly cadence. Second, the cold path is a pure function of frozen inputs — collected revenue, the qualified-stream count, and the rights bridge as it stood at play time — which is what lets an auditor replay any month from scratch and land on the identical number. Third, fraud is a side-flag, never a delete: the anti-fraud Spark job sets is_fraudulent on offending rows so the close excludes them, but the original play survives, because "what was claimed versus what was paid" is itself a reportable, litigable fact.

The Failure Philosophy, In One Rule

The hot path may forget; the cold path may never lie. A lost or duplicated play barely moves a recommendation and is invisible against billions of events, so the stream is at-least-once and best-effort. A wrong payout is a breach of a rights contract, so the close is deterministic, idempotent, and reproducible from immutable inputs. When the two philosophies conflict, the money wins — the recommender can tolerate noise the ledger cannot.


§ 03 — DATA MODELOne flag, one bridge, two frozen facts

The schema falls out of who owns what. Product owns the event grain. Finance owns three derived facts: the pool, the attribution, and the rights bridge that decides who gets paid. The cleverness is concentrated in two columns — a generated boolean and a pair of effective dates.

The event grain — every tap, qualified or not

A play is a small, immutable record at colossal volume. The decisive choices are the generated is_royalty_eligible column — the thirty-second boundary computed once at write so no downstream query can forget it — and context_type, which tells the recommender which surface drove the play. Note what is not here: no payout amount. A play does not know what it earns until its country-period closes, so storing a dollar figure on this row would be a lie with a timestamp.

DDL · EVENT GRAIN (LAKE / COLUMNAR)
-- The "Listen" grain. One row per play event, skips included. -- Append-only. Partitioned by play_date so a single month of a single -- country is a partition prune, not a full scan of four billion rows/day. CREATE TABLE fct_play_events ( play_id BIGINT NOT NULL, -- snowflake: time-ordered user_id BIGINT NOT NULL, track_id BIGINT NOT NULL, country_id CHAR(2) NOT NULL, -- ISO country; the pool key played_at TIMESTAMPTZ NOT NULL, -- event time, NOT ingest time duration_played_s INTEGER NOT NULL, -- The seam between Product and Finance, materialized once at write: is_royalty_eligible BOOLEAN GENERATED ALWAYS AS (duration_played_s >= 30) STORED, is_fraudulent BOOLEAN NOT NULL DEFAULT false, -- side-flag context_type TEXT NOT NULL -- playlist|algorithmic|search|radio CHECK (context_type IN ('playlist','algorithmic','search','radio')), source_playlist_id BIGINT, device_id BIGINT ) PARTITION BY RANGE (played_at);

That one generated column does heavy lifting. Product queries the table raw — every row is signal. Finance adds WHERE is_royalty_eligible AND NOT is_fraudulent and is instantly looking at the payable universe, from the same source of truth, with no parallel "finance plays" table to drift out of sync. The thirty-second rule lives in the schema, not in a hundred analysts' memories.

The rights bridge — a ledger of deals, not a lookup

Here is where most candidates lose the plot. The split of a stream — how much to artist, label, publisher, songwriter — is not a property of the track. It is a property of the deal that was in force when the stream happened. Deals renegotiate mid-quarter; a label might cede 5% to an artist effective the 15th. So the bridge is Slowly-Changing-Dimension Type 2: every share is bounded by effective_from and effective_to, and history is never overwritten.

DDL · THE DEAL-TERMS LEDGER (SCD2)
-- bridge_track_rights: who owns what share of a track, and WHEN. -- The contract: SUM(share_pct) per (track, rights_type) over any instant -- = 1.00. Rows are closed, never updated; a renegotiation closes the old -- row (sets effective_to) and inserts a new one. CREATE TABLE bridge_track_rights ( track_id BIGINT NOT NULL, rights_holder_id BIGINT NOT NULL, rights_type TEXT NOT NULL CHECK (rights_type IN ('artist','label','publisher','songwriter')), share_pct NUMERIC(6,5) NOT NULL -- 0.45000 = 45% CHECK (share_pct >= 0 AND share_pct <= 1), effective_from TIMESTAMPTZ NOT NULL, effective_to TIMESTAMPTZ NOT NULL DEFAULT '9999-12-31', PRIMARY KEY (track_id, rights_holder_id, rights_type, effective_from) ); -- Half-open intervals: [from, to). played_at >= from AND played_at < to. -- This makes the boundary instant unambiguous — a play at exactly the -- renegotiation timestamp belongs to the NEW deal, never both. CREATE INDEX idx_bridge_asof ON bridge_track_rights (track_id, effective_from, effective_to);

The two frozen facts — pool and payout

The pool table is the smallest and most important object in the design. It exists to make the denominator durable. At month-close the job computes pool_usd as collected revenue times the payout share (~70%), counts qualified non-fraud streams, divides, and writes per_stream_rate_usd — then sets frozen_at and never touches the row again. The attribution table is the output: one defensible line per (play × holder), produced in a single bulk write.

DDL · THE FROZEN POOL & THE PAYOUT LINE ITEM
-- The "Pool" grain. One row per (country × period). The denominator, -- made into a stored fact so it can be replayed and audited. CREATE TABLE fct_royalty_periods ( period_id TEXT PRIMARY KEY, -- 'US_2025_04' country_id CHAR(2) NOT NULL, period_start DATE NOT NULL, period_end DATE NOT NULL, collected_rev_usd NUMERIC(18,2) NOT NULL, pool_usd NUMERIC(18,2) NOT NULL, -- ≈ collected_rev × 0.70 total_qualified_plays BIGINT NOT NULL, per_stream_rate_usd NUMERIC(14,10) NOT NULL, -- pool ÷ qualified plays frozen_at TIMESTAMPTZ, -- null until close; then immutable CHECK (period_end > period_start) ); -- The "Payout" grain. One row per (qualified play × rights holder), -- generated at close. ~4 rows per eligible stream. Read-only after write. CREATE TABLE fct_royalty_attributions ( play_id BIGINT NOT NULL, period_id TEXT NOT NULL REFERENCES fct_royalty_periods, rights_holder_id BIGINT NOT NULL, rights_type TEXT NOT NULL, share_pct NUMERIC(6,5) NOT NULL, -- snapshot of the as-of-play share attributed_payout_usd NUMERIC(14,10) NOT NULL, -- rate × share_pct PRIMARY KEY (play_id, rights_holder_id, rights_type) );

One subtlety worth saying out loud: the attribution row stores share_pct as a snapshot, not a foreign key back into the bridge. The bridge is the source of truth at compute time, but freezing the resolved share onto the payout row means a future bridge correction — even a legitimate one — cannot silently restate a payout that has already been remitted and reported to a tax authority. The line item is a photograph, not a live view.


§ 04 — THE CORE INVARIANTThe rate that only exists at close

Every correctness property of this system reduces to one discipline: the per-stream rate is computed once, from frozen inputs, after the period is sealed — and never again. Get the timing of that single division right and the rest is bookkeeping.

The pool model has a chicken-and-egg shape that breaks naive designs. You cannot pay a stream when it happens, because its rate is pool ÷ total_qualified_plays, and the total is unknown until the last play of the period — including offline plays uploaded days late — has arrived. A junior model computes a "running rate" daily and pays against it; by month-end that rate has drifted, early payouts are wrong, and Finance is reconciling against a number that was never real. The senior move is to refuse to compute the rate until the denominator is final, and to make the period's transition through close an explicit, observable lifecycle.

The period lifecycle

OPEN CUTOFF RECONCILING FROZEN PAID

OPEN — plays land freely against the month-of-event. CUTOFF — a grace window closes (say, period_end + 7 days) to admit late offline uploads that still belong to the period. RECONCILING — fraud flags are finalized, revenue is confirmed, the qualified count is locked. FROZEN — the single division runs, per_stream_rate_usd and frozen_at are written, and the row becomes immutable. PAID — attribution is generated and remitted. A late chargeback or a discovered fraud cluster after FROZEN is never an edit to this period; it is a correcting entry in a subsequent period's pool, exactly as a double-entry ledger appends a reversal rather than rewriting history.

The rate is not a number you read. It is a number you mint, once, when the period is sealed — and then defend forever.ROYALTY RULE Nº 1 — FREEZE BEFORE YOU DIVIDE

The division itself is trivial; the discipline is in everything that must be true before it runs. Below is the atomic close — note it reads only frozen, period-bounded inputs and writes the rate exactly once, guarded so it can never double-run.

SQL · THE ATOMIC POOL CLOSE
-- Compute and freeze the per-stream rate for one country-period. -- Idempotent: the WHERE frozen_at IS NULL guard means a retry after a -- partial failure is a no-op, never a second (different) rate. UPDATE fct_royalty_periods rp SET total_qualified_plays = q.cnt, pool_usd = round(rp.collected_rev_usd * 0.70, 2), per_stream_rate_usd = round(rp.collected_rev_usd * 0.70, 2) / nullif(q.cnt, 0), frozen_at = now() FROM ( SELECT count(*) AS cnt FROM fct_play_events WHERE country_id = :country AND played_at >= :period_start AND played_at < :period_end_plus_grace -- month-of-event window AND is_royalty_eligible -- the 30-second boundary AND NOT is_fraudulent -- side-flagged bots excluded ) q WHERE rp.period_id = :period_id AND rp.frozen_at IS NULL; -- freeze-once guard IS the lock

Once that row is frozen, the per-stream rate is a constant the rest of the pipeline multiplies against. Attribution becomes a join: each qualified play, times its as-of-play shares, times the frozen rate. The hard part is over before the first dollar is split.


§ 05 — INGESTION & STREAMSPython on the play firehose

Three small programs carry the hot path: the ingest that normalizes a play and stamps the boundary, the consumer that fans plays out without blocking, and the fraud pass that flags without deleting. As always, the judgment is in what each refuses to do.

1 · Play ingest — stamp event time, derive the boundary

The single most consequential rule in ingestion is invisible: trust the play's own clock, not the wall clock at the server. Offline plays — a subway commute cached on the phone — arrive in a batch days later but belong to the month they actually happened. If ingest stamped played_at with arrival time, those streams would attribute to the wrong pool and quietly corrupt two months' payouts at once. The client's timestamp is authoritative; the server only validates it is sane.

PYTHON · PLAY INGEST → STREAM
import time ROYALTY_THRESHOLD_S = 30 # the industry boundary, in one place MAX_CLOCK_SKEW_S = 90 * 86400 # reject plays "from" >90d ago/future def normalize_play(raw: dict, now_s: float) -> dict | None: """Take a raw client event, return a stream record — or None to drop. We DO NOT invent played_at: an offline play that happened Tuesday must bind to Tuesday's pool even if it uploads Friday. The client clock is authoritative; we only sanity-bound it.""" played_at = raw["played_at_ms"] / 1000.0 if abs(now_s - played_at) > MAX_CLOCK_SKEW_S: return None # implausible clock: drop, alarm on rate dur = int(raw["duration_played_s"]) return { "play_id": raw["play_id"], # client-minted, for dedup "user_id": raw["user_id"], "track_id": raw["track_id"], "country_id": raw["country_id"], # where the listener was "played_at": played_at, "duration_played_s": dur, # Derive the boundary at the edge too, so the recommender and # the lake agree before the generated column ever runs: "qualified": dur >= ROYALTY_THRESHOLD_S, "context_type": raw.get("context_type", "radio"), }

The play_id is minted on the client, which makes the stream idempotent: a phone that retries an upload after a flaky connection produces the same play_id, and the lake's de-duplication on that key collapses the retry. At-least-once delivery plus a client key equals effectively-once payouts — the only acceptable guarantee when each row is money.

2 · The fan-out consumer — Product never waits on Finance

PYTHON · STREAM CONSUMER → RECOMMENDER + LAKE
async def consume(stream, recommender, lake) -> None: """Every play feeds the recommender immediately (skips are signal) and is appended to the lake. The two sinks are independent: a slow lake write must never delay a recommendation, and a recommender hiccup must never drop a payable play. Different durability, different urgency.""" async for batch in stream.read(): plays = [p for p in (normalize_play(r, time.time()) for r in batch) if p is not None] # Fire-and-forget to the recommender: best-effort, low-latency. recommender.observe(plays) # skips included — the whole point # Durable append to the lake: at-least-once, deduped on play_id. await lake.append_idempotent(plays, key="play_id") # NOTE: we never compute payout here. A play's dollar value does not # exist yet — it is born only when its country-period is frozen.

The comment at the bottom is the carve-out an interviewer is listening for. The temptation, having a play in hand, is to look up "the rate" and write a payout immediately. There is no rate yet. Resisting that — keeping the hot path entirely free of money — is what keeps the two tempos from contaminating each other.

3 · The fraud pass — flag, never delete

PYTHON · ANTI-FRAUD SIDE-FLAGGER (BATCH)
def flag_fraud_clusters(spark, period_id: str) -> None: """Bot farms inflate small artists with 30-to-35-second loops on repeat. We detect and SIDE-FLAG — we never delete the rows. Legal needs the 'claimed vs paid' trail; deleting destroys the ability to report 'we caught X% of fraud'. is_fraudulent is set by THIS job, not the player, and the close excludes flagged rows from the denominator.""" plays = spark.table("fct_play_events").filter( f"period_id = '{period_id}' AND is_royalty_eligible") # Heuristic: same (user, track) replayed many times at near-threshold # duration is the canonical loop-attack signature. suspects = (plays .filter("duration_played_s BETWEEN 30 AND 35") .groupBy("user_id", "track_id") .count() .filter("count > 200")) # tune per market; this is a sketch spark.sql(f""" UPDATE fct_play_events SET is_fraudulent = true WHERE play_id IN (SELECT play_id FROM suspect_plays) """) # The only UPDATE in the design — and it sets a flag, never erases a fact.

§ 06 — AGGREGATIONThe period close, end to end

The monthly close is the system's heartbeat. It freezes a pool, resolves every qualified play against the rights bridge as it stood at play time, multiplies by the frozen rate, and writes a few billion immutable line items in one bulk pass. Run twice, it produces the identical result.

The close is deliberately a batch, not a stream, because it requires a property streams cannot give: a sealed input. Only after CUTOFF and RECONCILING is the qualified count final and the revenue confirmed; only then can the rate be honest. The whole job is structured as a pure function of frozen inputs so that auditors, finance, and a future re-run all land on the same dollar.

PYTHON · THE CLOSE ORCHESTRATION
def close_period(engine, country: str, period_id: str) -> None: """Deterministic, idempotent month-close for one country-period. Ordering is the contract: you cannot attribute before the rate is frozen, and you cannot freeze the rate before fraud is finalized.""" # 1. Gate: the period must be reconciled. Refuse otherwise — better to # fail loudly than to pay against a denominator still in motion. assert engine.period_state(period_id) == "RECONCILING", \ f"{period_id} not ready to close" # 2. Freeze the denominator and mint the rate (the atomic SQL, §04). engine.execute(POOL_CLOSE_SQL, country=country, period_id=period_id) rate_row = engine.fetch_period(period_id) assert rate_row.frozen_at is not None # 3. Generate attribution: each qualified play × as-of-play shares × # frozen rate. A single INSERT...SELECT, partition-pruned to the # period. Re-runnable: it writes into a fresh partition we swap in. engine.execute(ATTRIBUTION_SQL, period_id=period_id) # 4. Prove it before paying: every qualified play's shares summed # across holders must reconstruct the rate to the penny. drift = engine.scalar(ATTRIBUTION_RECON_SQL, period_id=period_id) if drift > 0.01: engine.execute("ROLLBACK_PARTITION", period_id=period_id) raise PayoutReconciliationError(period_id, drift) # 5. Only now: advance to PAID and hand off to remittance. engine.set_period_state(period_id, "PAID")

The attribution generation itself is the join the whole schema was built to make cheap. It reads the frozen rate, fans each eligible play across its rights holders using the bridge bounded by play time, and computes the per-line payout. The half-open interval predicate is what makes a play at the exact renegotiation instant resolve to one deal, never two.

SQL · ATTRIBUTION GENERATION (ONE BULK WRITE)
INSERT INTO fct_royalty_attributions (play_id, period_id, rights_holder_id, rights_type, share_pct, attributed_payout_usd) SELECT p.play_id, rp.period_id, b.rights_holder_id, b.rights_type, b.share_pct, -- snapshot the share round(rp.per_stream_rate_usd * b.share_pct, 10) AS attributed_payout_usd FROM fct_play_events p JOIN fct_royalty_periods rp ON rp.country_id = p.country_id AND p.played_at >= rp.period_start AND p.played_at < rp.period_end + INTERVAL '7 days' -- grace window JOIN bridge_track_rights b ON b.track_id = p.track_id AND p.played_at >= b.effective_from AND p.played_at < b.effective_to -- as-of-play, half-open WHERE rp.period_id = :period_id AND rp.frozen_at IS NOT NULL -- never attribute an unfrozen pool AND p.is_royalty_eligible AND NOT p.is_fraudulent;
Pro-rata and user-centric are not two schemas. They are two queries over the same frozen pool and the same temporal bridge — which is the entire reason the rate and the bridge are stored facts.ROYALTY RULE Nº 2 — THE POLICY IS A JOIN, NOT A TABLE

That last rule is the senior flourish. Because the pool, the rate, and the rights bridge are all durable facts rather than logic baked into a pipeline, switching from pro-rata (a stream earns rate × share) to user-centric payout (a subscriber's fee splits only among artists they played) is a different aggregation over identical inputs — no migration, no restatement of closed months. The schema deliberately does not know which distribution algorithm it serves, which is exactly what lets the business change its mind without rewriting history.


§ 07 — ANALYTICS SQLInterrogating the catalogue

The frozen facts are where the system explains itself to three different audiences. Three queries an interviewer loves, because each carries a named pattern on its back: an as-of-time temporal join, conditional aggregation, and window-ranked cohort share.

Royalty attribution as-of-play — the SCD2 temporal join

This is the defensible line item: for one track in one period, who earned what, using the splits that were in force when each play happened. The pattern is the as-of join — a range predicate against the bridge's half-open validity interval. It is the answer to "pay me as of the deal that was active when my song was streamed," which is the only answer that survives a courtroom.

SQL · WHO EARNED WHAT, BY THE DEAL THAT WAS LIVE
-- Per-play, per-holder payout for one track in one period, using the -- bridge row that was effective at played_at. Produces ~4 rows per play. SELECT p.play_id, p.played_at, b.rights_type, b.rights_holder_id, b.share_pct, rp.per_stream_rate_usd, round(rp.per_stream_rate_usd * b.share_pct, 6) AS payout_usd FROM fct_play_events p JOIN fct_royalty_periods rp ON rp.country_id = p.country_id AND p.played_at >= rp.period_start AND p.played_at < rp.period_end + INTERVAL '7 days' JOIN bridge_track_rights b ON b.track_id = p.track_id AND p.played_at >= b.effective_from AND p.played_at < b.effective_to -- the as-of-play predicate WHERE p.track_id = :track_id AND rp.period_id = :period_id AND p.is_royalty_eligible AND NOT p.is_fraudulent ORDER BY p.played_at, b.rights_type;

Skip rate by surface — conditional aggregation

Product's question: is the recommender any good? Skip rate by context_type answers it, and the pattern is conditional aggregation — counting a sub-population with FILTER rather than a second pass. A high skip rate on algorithmic is a recommender-quality alarm; the same rate on radio is expected lean-back behavior. The thirty-second flag does double duty: a non-qualified play is a skip.

SQL · IS THE RECOMMENDER EARNING ITS SURFACE?
SELECT context_type, count(*) AS plays, count(*) FILTER (WHERE NOT is_royalty_eligible) AS skips, round(100.0 * count(*) FILTER (WHERE NOT is_royalty_eligible) / nullif(count(*), 0), 2) AS skip_rate_pct, round(avg(duration_played_s), 1) AS avg_listen_s FROM fct_play_events WHERE played_at >= CURRENT_DATE - INTERVAL '30 days' GROUP BY context_type ORDER BY skip_rate_pct DESC; -- High on 'algorithmic' = a recommender problem. High on 'radio' = -- lean-back listening, expected. The surface tells you which.

The long tail — windowed cohort share

The pool model's central equity question: how concentrated are payouts? This ranks holders by earnings within a period and uses a window to compute each one's running share of the total — the gateway to "the top 1% of artists take N% of the pool," which is the headline pro-rata's critics aim at and the chart UCPS exists to change. The pattern is the windowed cumulative share, a cohort-concentration staple.

SQL · PAYOUT CONCENTRATION ACROSS THE CATALOGUE
WITH holder_take AS ( SELECT rights_holder_id, sum(attributed_payout_usd) AS earned_usd FROM fct_royalty_attributions WHERE period_id = :period_id GROUP BY rights_holder_id ), ranked AS ( SELECT rights_holder_id, earned_usd, row_number() OVER (ORDER BY earned_usd DESC) AS rnk, sum(earned_usd) OVER (ORDER BY earned_usd DESC ROWS UNBOUNDED PRECEDING) AS cum_usd, sum(earned_usd) OVER () AS pool_total FROM holder_take ) SELECT rnk, rights_holder_id, round(earned_usd, 2) AS earned_usd, round(100.0 * cum_usd / nullif(pool_total, 0), 2) AS cum_pct_of_pool FROM ranked WHERE rnk <= 1000 ORDER BY rnk;

§ 08 — THE DASHBOARDWatching a close land

A royalty system is judged on a few days a month, when a period closes and money moves. The dashboard watches two things at once: the always-on play firehose, and the once-a-month close that must reconcile to the penny before a single payout is remitted.

THE FIREHOSE
play ingest rate against expected, qualified-stream fraction (a sudden drop means a client-clock bug or a skip surge), late-arrival lag (offline uploads still landing against last period), and dedup collision rate on play_id — the proof that retries are collapsing.
THE CLOSE
period state across countries, reconciliation drift (attribution summed back to the pool — must be ~$0), fraud-excluded % of streams, and per-stream rate versus prior period — a rate that swings sharply flags a revenue or denominator anomaly before payout.
THE EQUITY LENS
top-1% pool share, long-tail count earning below a floor, and pro-rata vs UCPS delta — the simulation that tells policy what switching models would cost which artists.
Royalty Ops — US · 2025-04 close MON 06:10 UTC · STATE: RECONCILING → FROZEN · 5m REFRESH
Play Ingest
46.2k/s
Qualified Fraction
63%
Recon Drift
$0.004
Per-Stream Rate
0.00417$
Pool Concentration — US 2025-04 · cumulative % of pool by holder rank
top 1% = 54% long tail
Fraud Excluded
0.7%
Late Arrival Lag
31h
Countries Frozen
168/180
Attribution Rows
7.9B
Dedup Collisions
0.04%
UCPS Δ (sim)
+12% tail
FIG. 2 — A close in flight. 168 of 180 pools are frozen; recon drift is four-tenths of a cent, well under the penny gate; the per-stream rate ticked up on confirmed revenue. Late-arrival lag is amber because offline uploads are still trickling against the grace window — exactly why CUTOFF exists.

Read the amber tiles together and the dashboard narrates the close from §06. Revenue firmed up, nudging the rate; the last twelve countries are still admitting late offline plays before they freeze; and the UCPS simulation quietly shows the long tail would gain twelve percent if the policy switched — the schema's whole point, surfaced as a number a product executive can act on. Recon drift flat near zero is the load-bearing tile: nothing gets paid until that reads green.


§ 09 — THE RUBRICWhat was actually being tested

Strip away the music and the question was probing five judgments, each of which generalizes far past streaming royalties:

ONE FACT, TWO READERS
Serving Product and Finance from a single event grain via a generated flag, rather than forking into two tables that silently disagree by quarter-end. The seam is a column, not a copy.
DERIVE-AT-CLOSE
Recognizing that the per-stream rate is unknowable until the denominator is sealed, and modeling it as a frozen fact computed once — not a running number that drifts and forces restatement.
TEMPORAL TRUTH
Paying by the deal that was live at play time through an SCD2 bridge and an as-of-time join, turning a renegotiation from an audit nightmare into a range predicate.
APPEND, NEVER EDIT
Side-flagging fraud and posting late corrections to a subsequent period instead of rewriting a closed one — the same immutability discipline a payments ledger lives by.
POLICY AS QUERY
Keeping the distribution algorithm out of the schema so pro-rata and user-centric are two joins over identical frozen inputs. The model that doesn't bake in the answer is the one that survives the business changing its mind.
A play is an event the moment it happens, but it does not become money until a country's month is sealed. The whole design is the discipline of keeping those two truths apart until exactly the instant they must meet.— CLOSING ARGUMENT