PaddySpeaks · Systems at the Whiteboard · Nº 14

The Bingewatch Problem

Model the analytics backend for a global streaming service. Fifteen billion play events a day, a content tree four levels deep, and a behavior — bingewatching — that is never recorded and must be derived. The one decision that separates a senior answer: a play event is not allowed to wait for its metadata. A complete working-through — data flow, schema, the defensive join, streaming Python, the nightly derivation, and the dashboard.

§ 01 — THE QUESTIONOne sentence, four time horizons

It sounds like a reporting task. It is really a question about time — three different definitions of it, fighting over the same fact table.

Interview Prompt

"Design the data model behind a streaming service like Netflix — play events, series and episodes, and bingewatching. Two hundred and fifty million subscribers, fifteen billion plays a day. How would you scope it?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The trap is that all three sub-problems hide inside the word "play." A play event is dense, immutable, and arrives in a torrent. But the moment you ask anything interesting of it — what was watched, by whom, as part of what behavior — you discover three separate clocks. Content metadata propagates on its own schedule, regionally, over minutes to hours. The content hierarchy must roll up without double-counting, so that a subscriber who watches eight episodes is one series-bingewatcher and not eight series-views. And bingewatching itself is not a column you can write; it is a shape in the play stream that only exists once you have stitched a night's worth of sessions together. A junior answer reaches for a star schema and a risk_score-shaped is_binge flag. A senior answer notices that the fact arrives before its dimension, that the dimension changes underneath the fact, and that the behavior is downstream of both.

So before any boxes and arrows, the working frame for the session — three structural challenges that the schema has to hold simultaneously:

LATE-ARRIVING DIM
Metadata trails the event. A Netflix Original drops at midnight Pacific. Plays fire before the title row propagates to every regional shard. An INNER JOIN silently drops those plays, and Content Strategy stares at "zero views in the first three minutes" of the most-anticipated launch of the quarter.
THE GRAIN STACK
Four levels, one count. "Stranger Things S5 E1" is an episode, of a season, of a series, of a franchise. Aggregations must roll up cleanly: eight episodes watched is one bingewatcher, not eight series-views — and a re-titled show must not rewrite five years of history.
DERIVED BEHAVIOR
Binge is computed, not logged. No event says "the user binged." It is reconstructed from the play stream — and there are two definitions that must coexist: per-session (for the auto-play algorithm) and per-day (for the marketing rollup), neither blocking the other.
Keep every play, even the orphans. A view you dropped on launch night is a number that never comes back — and launch night is the only night that matters.

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. State what you are building and what you are deliberately ignoring. Out of scope, said explicitly: the recommendation model itself (a consumer of these facts, not part of this design), the CDN and adaptive-bitrate delivery path, billing and payments, and the encode pipeline. In scope: the play fact and its companions, the content and subscriber dimensions, the late-arriving-dimension defense, the bingewatch and streak derivations, and the analytics surface those feed. One caveat, volunteered: the design should not preclude the password-sharing analysis that Trust & Safety will ask for later — which means the account and the profile must both live on every play row from day one.

Then the envelope math, offered rather than extracted. Netflix-shaped numbers:

QuantityEstimateConsequence
Subscribers250,000,000Up to 5 profiles each — the dual-key population
Play events≈ 15 B / day → ~175 K/sThe number that shapes the whole architecture
Heartbeats (every 60 s)~120 B / day8× the plays — split to its own table, downsample
Launch-night metadata lagseconds to hoursThe orphan window every join must survive
Orphan-rate SLO< 0.5%Above 1% pages the data-platform on-call
Binge derivationnightly batchMaterialized so analyst queries never re-derive
Raw plays, 5-yr title historytens of PBEpisode-only fact + bridge — never backfill on re-title

Notice the asymmetry. The heartbeat stream is an order of magnitude heavier than the play stream, and the play stream is itself heavy enough that no per-event synchronous lookup against a regional dimension can be on its critical path. That single fact dictates the join strategy, the partition layout, and the failure philosophy. The rest of this article follows the play.


§ 02 — DATA FLOWFollowing a play through the building

One pipeline, two clocks. Plays land immediately and unconditionally; metadata propagates on its own schedule and is reconciled at read time, never at write time. The fact never waits for the dimension.

EVENT PLANE · ~175K PLAYS/S · ARRIVES UNCONDITIONALLY METADATA PLANE · PROPAGATES IN SECONDS-TO-HOURS CLIENT DEVICES TV · mobile · web · offline EVENT COLLECTOR validates shape only never resolves the title PLAY STREAM key: (profile_id) · by play_date append-only · ordered per profile fct_play_event · RAW episode_id only · no title partitioned by play_date fct_play_heartbeat QoE · downsampled to 5-min DERIVED FACTS (NIGHTLY) binge · streak · QoE summary CONTENT CMS titles · seasons · franchises REGIONAL FAN-OUT propagates per region eventually consistent dim_episode (SCD2) effective_from / _to dim_series · _season parent FKs · SCD2 titles LAD-DEFENSIVE READ LEFT JOIN · rn=1 · COALESCE BI / DASHBOARDS "Pending Sync" surfaces orphans resolved at read time, never write time SOLID — sustained data flow · DASHED — read-time resolution · The fact lands first; the title catches up.
FIG. 1 — End-to-end flow. The play fact is durable and title-free; the content dimension is eventually consistent; the join that marries them lives at read time, where lateness is survivable.

Three properties of this picture do most of the interview's work. First, the collector validates shape and nothing else — it never calls the content service to confirm the episode exists, because a regional metadata miss at 175 K events per second would either block the firehose or drop the launch. Second, the fact stores episode_id and only episode_id — the immutable identity — never the title, never the series name; every human-readable attribute is resolved by joining the dimension at query time. Third, the dashed line is dashed on purpose: the marriage of fact and dimension is deferred to the read path, where a missing or stale title degrades into a labeled placeholder rather than a dropped row.

The Failure Philosophy, In One Rule

The fact plane never blocks on the metadata plane. When the dimension is missing, the play is still written, still counted, and still attributed — to a searchable placeholder ("Pending Sync") that the BI tool can filter on. The orphan is visible, measurable against a < 0.5% SLO, and self-heals the instant metadata propagates, because the join re-resolves on the next query. The failure mode is a labeled unknown, never a silent zero.


§ 03 — DATA MODELA content tree, a dual key, and an append-only firehose

The schema falls out of two questions: what is immutable, and what changes underneath us. Immutable identity goes on the fact. Mutable, re-titleable, late-arriving description goes in SCD2 dimensions, resolved at read time.

The content tree

Four dimensions, each pointing at its parent: dim_franchisedim_seriesdim_seasondim_episode. The titles are SCD2 — Marvel acquires a show and renames it; a series gets re-genre'd — so each carries effective_from and effective_to. The fact never stores the title; it stores the leaf identity and walks up the tree at query time.

DDL · THE CONTENT HIERARCHY (SCD2)
-- Each level links to its parent. Titles are SCD2: a re-title opens a new -- version row rather than rewriting history. The fact stores episode_id ONLY. CREATE TABLE dim_franchise ( franchise_id BIGINT PRIMARY KEY, name TEXT NOT NULL ); CREATE TABLE dim_series ( series_sk BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, series_id BIGINT NOT NULL, -- stable natural key franchise_id BIGINT NOT NULL REFERENCES dim_franchise, series_title TEXT NOT NULL, -- SCD2-tracked: re-titled / re-genre'd genre TEXT, effective_from TIMESTAMPTZ NOT NULL, effective_to TIMESTAMPTZ, -- NULL = current version UNIQUE (series_id, effective_from) ); CREATE TABLE dim_season ( season_sk BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, season_id BIGINT NOT NULL, series_id BIGINT NOT NULL, season_number INT NOT NULL ); -- The late-arriving one. On launch night, rows for new episodes may not -- have propagated to this region yet; the join must survive their absence. CREATE TABLE dim_episode ( episode_sk BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, episode_id BIGINT NOT NULL, -- immutable identity, lives on the fact series_id BIGINT NOT NULL, season_number INT NOT NULL, episode_number INT NOT NULL, title TEXT NOT NULL, runtime_sec INT, effective_from TIMESTAMPTZ NOT NULL, effective_to TIMESTAMPTZ, UNIQUE (episode_id, effective_from) ); CREATE INDEX idx_episode_id ON dim_episode (episode_id, effective_from DESC);

The subscriber: account above, profile below

Netflix's identity model is unusual and load-bearing: one billing account, up to five profiles. Billing, plan tier, and payment events are account-grain — they cross profile boundaries. Every play, search, browse, and rating is profile-grain. The rule that makes everything downstream tractable is that both keys live on every play row — never just one. Roll up "minutes per account" and you simply sum without re-joining; ask "which profiles on this account watched from incompatible geographies" and the password-sharing analysis is a GROUP BY, not a join puzzle.

DDL · ACCOUNT / PROFILE DUAL KEY (SCD2)
CREATE TABLE dim_account ( account_sk BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, account_id BIGINT NOT NULL, plan_tier TEXT NOT NULL, -- basic | standard | premium country TEXT NOT NULL, primary_region TEXT NOT NULL, -- home geo, for sharing analysis effective_from TIMESTAMPTZ NOT NULL, effective_to TIMESTAMPTZ, UNIQUE (account_id, effective_from) ); CREATE TABLE dim_profile ( profile_id BIGINT PRIMARY KEY, account_id BIGINT NOT NULL, -- a profile belongs to one account is_kids BOOLEAN NOT NULL DEFAULT false, age_range TEXT ); -- ≤ 5 profiles per account

The facts: one raw firehose, one companion, two derivations

The grain of fct_play_event is one row per playback session start. It is append-only and partitioned by play_date so that launch night is one hot partition and last year is cold Parquet. Critically it carries account_id and profile_id, the bare episode_id (no resolved title), and the wall-clock played_at — which is the timestamp every temporal join will pivot on. Quality-of-experience telemetry — rebuffering, bitrate, startup latency — is an order of magnitude heavier, so it splits into fct_play_heartbeat and is downsampled in the warehouse. The two derived facts, fct_binge_session and fct_series_streak, are computed nightly and materialized so that analyst queries read them rather than re-deriving them from fifteen billion rows.

DDL · FACT TABLES & DERIVED FACTS
-- Grain: one row per playback session start. Append-only. 15B/day. -- Stores BOTH account_id and profile_id, and the bare episode_id. CREATE TABLE fct_play_event ( play_id BIGINT NOT NULL, account_id BIGINT NOT NULL, -- dual key, always both profile_id BIGINT NOT NULL, episode_id BIGINT NOT NULL, -- identity only; title resolved at read played_at TIMESTAMPTZ NOT NULL, -- the pivot for every temporal join watch_minutes NUMERIC(8,2) NOT NULL, device_type TEXT NOT NULL, country TEXT NOT NULL, play_date DATE NOT NULL, -- partition key PRIMARY KEY (play_date, play_id) ) PARTITION BY RANGE (play_date); -- Grain: one row per 60-sec sample, downsampled to 5-min in the warehouse. -- Split off because it is ~8x the volume of the play fact. CREATE TABLE fct_play_heartbeat ( play_id BIGINT NOT NULL, heartbeat_at TIMESTAMPTZ NOT NULL, position_sec INT NOT NULL, bitrate_kbps INT, rebuffer_ms INT ); -- DERIVED nightly: 3+ same-series episodes, <30-min gaps. Algorithm input. CREATE TABLE fct_binge_session ( account_id BIGINT NOT NULL, profile_id BIGINT NOT NULL, series_id BIGINT NOT NULL, session_id BIGINT NOT NULL, session_start TIMESTAMPTZ NOT NULL, session_end TIMESTAMPTZ NOT NULL, episodes_in_session INT NOT NULL CHECK (episodes_in_session >= 3) ); -- DERIVED nightly: consecutive-day engagement per (account, series). Loyalty. CREATE TABLE fct_series_streak ( account_id BIGINT NOT NULL, series_id BIGINT NOT NULL, streak_start DATE NOT NULL, streak_end DATE NOT NULL, streak_days INT NOT NULL );

One bridge completes the model. Temporal correctness — attributing a play to the title that was current at the time of the event, not now — is handled by an SCD2 bridge keyed on episode_id with effective_from/effective_to. It is the same SCD2 mechanism the dimensions use, exposed as a thin join surface for the read path.


§ 04 — THE CORE INVARIANTThe play is never dropped, and never mis-titled

Everything in this design protects one sentence: every play is counted exactly once, attributed to the version of the truth that was current when it happened. Two defenses enforce it — a defensive join for lateness, and a temporal join for change.

The lifecycle of a single play is short but it crosses both clocks. It is collected, streamed, and written with episode_id intact. At that moment the episode's dimension row may not exist in this region yet. Later — seconds, or hours — it propagates. Later still the show may be re-titled. The invariant says none of these later events is allowed to change the count or corrupt the attribution of the play that already happened.

PLAYED COLLECTED STREAMED WRITTEN (episode_id) RESOLVED @ read ATTRIBUTED

The defensive read is where correctness lives. It is three moves working as one. LEFT JOIN keeps every play, orphan or not. ROW_NUMBER() … rn = 1 picks the most-recent SCD2 version of the dimension when several exist. COALESCE with a searchable placeholder turns a missing title into a visible, filterable "Pending Sync" rather than a NULL that silently vanishes from a GROUP BY. This is the make-or-break pattern of the whole system, and an interviewer is watching for it by name.

SQL · THE LAD-DEFENSIVE READ (THE MAKE-OR-BREAK PATTERN)
-- Every play-to-content join is defensive. Keep the play, take the newest -- dim version, and label the orphan so the BI dropdown can surface it. WITH current_episode AS ( SELECT episode_id, title, series_id, season_number, episode_number, row_number() OVER (PARTITION BY episode_id ORDER BY effective_from DESC) AS rn FROM dim_episode ) SELECT p.play_id, p.account_id, p.profile_id, coalesce(e.title, 'Episode ID: ' || p.episode_id || ' (Pending Sync)') AS title, coalesce(e.series_id, -1) AS series_id_safe, coalesce(e.season_number, -1) AS season_number_safe, p.watch_minutes FROM fct_play_event p LEFT JOIN current_episode e ON e.episode_id = p.episode_id AND e.rn = 1; -- Orphan-rate SLO: < 0.5%. Above 1% pages the data-platform on-call.

The cousin of lateness is change, and it is handled by the same machinery pointed the other way. A user watches an episode offline on the subway; the device uploads the play four hours later, by which time the dimension has moved on. The temporal join binds the play to the version that was effective at played_at, not the version that is current now.

SQL · TEMPORAL ATTRIBUTION (AS-OF THE EVENT)
JOIN dim_episode e ON e.episode_id = p.episode_id AND p.played_at BETWEEN e.effective_from AND coalesce(e.effective_to, '9999-12-31'); -- Plays during the OLD title attribute to the OLD title; new plays to the new. -- No backfill of 5 years of history when a show is re-titled.
A play that lost its title is a row with a placeholder. A play that lost its row is a number that never existed. Choose the first failure, always.BINGEWATCH RULE Nº 1

§ 05 — INGESTION & STREAMSPython on the firehose

Three programs carry the event plane: the collector that refuses to resolve titles, the consumer that lands plays idempotently, and the orphan monitor that turns the SLO into a number. Each is small; the judgment is in what they refuse to do.

1 · The collector — validate shape, never resolve identity

The collector's one temptation is to be helpful: to look up the episode and stamp the title onto the event before it streams, so downstream joins are simpler. It must refuse. A synchronous content-service call at 175 K events per second couples the firehose to a regional cache that, by design, is sometimes behind. The collector validates that the event has the fields it needs — both keys, an episode_id, a timestamp — and streams it. Resolution is somebody else's job, later, where lateness is cheap.

PYTHON · EVENT COLLECTOR: SHAPE-ONLY VALIDATION
import orjson, time REQUIRED = ("account_id", "profile_id", "episode_id", "played_at") class PlayCollector: """Validates the SHAPE of a play event and nothing else. It does NOT call the content service to confirm the episode exists: a regional metadata miss must never block ingestion or drop a launch-night play. The title is resolved downstream, at read time, where lateness is cheap.""" def __init__(self, producer): self.producer = producer self.rejected = 0 def accept(self, raw: bytes) -> bool: try: ev = orjson.loads(raw) except orjson.JSONDecodeError: self.rejected += 1 return False if any(ev.get(k) is None for k in REQUIRED): self.rejected += 1 # malformed, not orphan: drop loudly return False # Partition by profile_id so one viewer's plays stay ordered, while # 250M profiles spread evenly — no hot key on a viral launch night. ev["play_date"] = ev["played_at"][:10] self.producer.send(key=ev["profile_id"], value=orjson.dumps(ev)) return True

One distinction, always stated: a malformed event — missing a required field — is dropped loudly and counted as a reject, because it can never be made correct. An orphan event — well-formed but pointing at a title that hasn't propagated — is kept, because it will become correct on its own. The collector handles the first; the read path handles the second. Conflating them is how a launch-night metadata lag turns into thrown-away analytics.

2 · The play consumer — idempotent landing, partition by date

PYTHON · STREAM CONSUMER → fct_play_event
import orjson async def land_plays(db, msgs: list[bytes]) -> None: """Idempotent by construction: (play_date, play_id) is the primary key, so re-delivery from the stream is a no-op. No title resolution happens here either — episode_id goes in raw. The fact is title-free on disk.""" rows = [] for m in msgs: e = orjson.loads(m) rows.append(( e["play_id"], e["account_id"], e["profile_id"], e["episode_id"], e["played_at"], e["watch_minutes"], e["device_type"], e["country"], e["play_date"], )) await db.executemany( """INSERT INTO fct_play_event (play_id, account_id, profile_id, episode_id, played_at, watch_minutes, device_type, country, play_date) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9) ON CONFLICT (play_date, play_id) DO NOTHING""", # at-least-once is fine rows, )

3 · The orphan monitor — the SLO as a live number

The defensive join keeps orphans countable; this job makes the count actionable. It measures the fraction of recent plays whose episode_id has no resolvable dimension version, compares it to the SLO, and pages only on a rate — never a raw count, because the raw count spikes mechanically every launch night and then drains as metadata catches up. The signal is the persistence of orphans, not their existence.

PYTHON · ORPHAN-RATE MONITOR (PAGES ON RATE, NOT COUNT)
async def orphan_rate(db, window_min=15) -> float: """An orphan is a well-formed play whose episode_id has not yet resolved to any dim version. Expected to spike at launch and drain as metadata propagates, so we alarm on the RATE crossing the SLO, never the count.""" row = await db.fetchrow( """SELECT count(*) AS plays, count(*) FILTER (WHERE d.episode_id IS NULL) AS orphans FROM fct_play_event p LEFT JOIN LATERAL ( SELECT 1 AS episode_id FROM dim_episode e WHERE e.episode_id = p.episode_id LIMIT 1 ) d ON true WHERE p.played_at >= now() - make_interval(mins => $1)""", window_min, ) rate = row["orphans"] / max(row["plays"], 1) if rate > 0.01: # 1% → page; 0.5% → SLO breach page(f"orphan rate {rate:.2%} over {window_min}m — metadata stuck?") return rate

§ 06 — AGGREGATIONBingewatching, derived two ways

Bingewatching is not an event; it is a pattern in the play stream. There are two definitions, they answer different questions, and the schema must carry both without one re-writing the other. The craft is in the sessionization — and in the discipline of materializing nightly.

The two definitions are not interchangeable. Per-session binge — three or more episodes of the same series within twenty-four hours, with gaps under thirty minutes — is the signal the auto-play algorithm tunes against: it captures a continuous sitting. Per-day binge — any three-plus episodes of the same series in a calendar day — is the marketing rollup: "what percentage of subscribers binged this month." Force them into one definition and you mislead one stakeholder to please the other. So both are derived facts, computed nightly, materialized so analysts never re-derive from the raw firehose.

The per-session derivation is a sessionization — the canonical gaps-and-islands move applied to time. Partition the same-series plays by (account_id, profile_id, series_id), look back one row with LAG, and start a new session every time the gap exceeds thirty minutes. A running sum of those session-start flags becomes the session id; group, count, and keep only the sittings of three or more.

SQL · fct_binge_session DERIVATION (NIGHTLY, SESSIONIZED)
WITH same_series_runs AS ( SELECT account_id, profile_id, series_id, episode_id, played_at, lag(played_at) OVER (PARTITION BY account_id, profile_id, series_id ORDER BY played_at) AS prev_start FROM fct_play_event p JOIN dim_episode e ON e.episode_id = p.episode_id AND e.effective_to IS NULL ), sessioned AS ( SELECT *, sum(CASE WHEN prev_start IS NULL OR played_at - prev_start > INTERVAL '30 min' THEN 1 ELSE 0 END) OVER (PARTITION BY account_id, profile_id, series_id ORDER BY played_at) AS session_id FROM same_series_runs ) INSERT INTO fct_binge_session SELECT account_id, profile_id, series_id, session_id, min(played_at) AS session_start, max(played_at) AS session_end, count(*) AS episodes_in_session FROM sessioned GROUP BY account_id, profile_id, series_id, session_id HAVING count(*) >= 3; -- 3+ episodes = a binge sitting

The per-day rollup is a different shape entirely — and notably simpler, which is itself the point: it counts distinct episodes of a series within a single play_date, no gap logic, because the marketing question does not care whether the sitting was continuous. Two derivations, two grains, one fact stream feeding both.

SQL · PER-DAY BINGE ROLLUP (MARKETING GRAIN)
SELECT account_id, profile_id, e.series_id, p.play_date, count(DISTINCT p.episode_id) AS episodes_that_day FROM fct_play_event p JOIN dim_episode e ON e.episode_id = p.episode_id AND e.effective_to IS NULL GROUP BY account_id, profile_id, e.series_id, p.play_date HAVING count(DISTINCT p.episode_id) >= 3; -- 3+ same-series eps in a day
Materialize the behavior, never re-derive it. The algorithm reads last night's sittings; it does not re-walk fifteen billion plays at query time.BINGEWATCH RULE Nº 2 — DERIVE ONCE, READ MANY

§ 07 — ANALYTICS SQLInterrogating the catalog

The fact stream and its derivations are where the system explains itself. Three queries an interviewer loves, because each one carries a classic pattern on its back: SCD2 as-of attribution, conditional rollup over a content tree, and gaps-and-islands across consecutive days.

Top binged series — SCD2 as-of join

"Top binged series this week" requires going from episode to series without letting a re-title rewrite history. Store only episode_id on the fact; join up through the SCD2 hierarchy with a temporal predicate so each play attributes to the title that was current when it happened. The distinct-viewer count keys on account_id || '|' || series_id so eight episodes by one household are one viewer of one series — the grain discipline made literal.

SQL · TOP SERIES BY UNIQUE VIEWERS (TEMPORAL-CORRECT)
SELECT s.series_title, count(DISTINCT p.account_id || '|' || s.series_id) AS unique_viewers FROM fct_play_event p JOIN dim_episode e USING (episode_id) JOIN dim_series s ON s.series_id = e.series_id AND p.played_at BETWEEN s.effective_from AND coalesce(s.effective_to, '9999-12-31') WHERE p.play_date >= CURRENT_DATE - 7 GROUP BY s.series_title ORDER BY unique_viewers DESC; -- Plays during the OLD title roll up to the OLD title. Re-titling needs -- no backfill: the temporal predicate routes each play to its own era.

Series streaks — gaps and islands across days

"Watched at least one episode every day for five days" is the loyalty signal content acquisition uses for renewal decisions. It is a different shape from binge: same household across consecutive days, not many episodes in one sitting. The pattern is gaps-and-islands by date — subtract a dense row number from the date, and every run of consecutive days collapses to a constant island id.

SQL · CONSECUTIVE-DAY STREAKS PER (USER, SERIES)
WITH active_days AS ( SELECT DISTINCT account_id, e.series_id, p.play_date AS active_date FROM fct_play_event p JOIN dim_episode e ON e.episode_id = p.episode_id AND e.effective_to IS NULL ), islands AS ( SELECT account_id, series_id, active_date, active_date - (row_number() OVER (PARTITION BY account_id, series_id ORDER BY active_date))::int AS island_id -- date minus its rank = run constant FROM active_days ) SELECT account_id, series_id, min(active_date) AS streak_start, max(active_date) AS streak_end, count(*) AS streak_days FROM islands GROUP BY account_id, series_id, island_id HAVING count(*) >= 5; -- 5 days in a row = a hooked viewer

Profile mix per account — the sharing precursor

Because both keys live on every play, the password-sharing question that Trust & Safety will eventually ask is a conditional aggregation, not a join expedition. Per account, count distinct profiles, distinct countries, and distinct devices in a window; the accounts whose plays span geographically incompatible patterns are the candidates. The dual key — not a clever model — is what makes this cheap.

SQL · ACCOUNT FOOTPRINT (PASSWORD-SHARING PRECURSOR)
SELECT account_id, count(DISTINCT profile_id) AS profiles_active, count(DISTINCT country) AS countries_7d, count(DISTINCT device_type) AS devices_7d, count(*) FILTER (WHERE country <> a.primary_region) AS away_plays FROM fct_play_event p JOIN dim_account a ON a.account_id = p.account_id AND a.effective_to IS NULL WHERE p.play_date >= CURRENT_DATE - 7 GROUP BY account_id, a.primary_region HAVING count(DISTINCT country) >= 3; -- 3+ geos in a week → review

§ 08 — THE DASHBOARDProving the launch landed

A senior design ends with observability, because every defensive choice above is invisible without it. The dashboard watches two things at once: the health of the pipeline (is the orphan rate draining?) and the shape of the behavior (did the season get binged?).

PIPELINE HEALTH
play ingest rate vs. expected, orphan rate against the 0.5% SLO, metadata propagation lag per region, reject rate (malformed, never orphans) — a flat-but-nonzero orphan rate hours after launch means metadata is genuinely stuck, not merely racing.
LAUNCH SHAPE
plays in first hour, unique viewers, completion rate from heartbeat position, per-session binge count — the auto-play signal — and episodes-per-sitting p50, which tells Content Strategy whether the season is a sprint or a slow burn.
LOYALTY
active 5-day streaks per series, day-2 retention on the new season, and the binge-to-streak ratio — a show binged once and abandoned looks very different from one watched a little every night.
Launch Ops — "Stranger Things S5" · Day 1 FRI 00:42 PT · GLOBAL · 60s REFRESH
Play Ingest
182K/s
Orphan Rate
0.7%
Metadata Lag p95
41s
Reject Rate
0.01%
Episodes-per-sitting distribution · first 8 hours (per-session binge)
1 2 3 4 5 6 7 mode = 4 eps · the whole season in two sittings
Unique Viewers 1h
8.4M
Completion Rate
87%
Binge Sessions
3.1M
Eps/Sitting p50
4.0
Rebuffer Ratio
0.4%
Day-1 Streaks
FIG. 2 — The story a healthy launch tells: ingest steady at peak, orphan rate climbing as plays outrun metadata — but reject rate flat, so nothing is being lost, only delayed — and the binge distribution already peaking at four episodes a sitting.

Read the amber tiles together and the dashboard narrates the launch from §02: fifteen billion plays a day means the first hour outruns the regional metadata fan-out, so orphans rise while propagation lag is still draining — exactly the designed race. The tell that it is healthy and not broken is the reject rate pinned at one in ten thousand: every play is landing, none malformed, the orphans are well-formed rows wearing "Pending Sync" until their titles arrive. And the season is being devoured four episodes at a time. That is what a designed degradation looks like from the operator's chair on the biggest night of the quarter.


§ 09 — THE RUBRICWhat was actually being tested

Strip the show away and the question was testing five judgments, each of which generalizes far beyond streaming:

LATENESS
Seeing that a fact can arrive before its dimension, and refusing to let the dimension's tardiness drop the fact. LEFT JOIN + ROW_NUMBER + COALESCE is not a flourish; it is the whole defense, and it has a name.
GRAIN
Counting at the right level of a hierarchy — eight episodes is one bingewatcher — and storing immutable identity on the fact so a re-title never costs a five-year backfill.
DERIVATION
Recognizing that a behavior can be a shape in the data rather than a logged event, and that two valid definitions can coexist as two materialized facts instead of one lossy flag.
DUAL KEY
Carrying both account and profile on every row so that rollups and abuse-detection are aggregations, not join gymnastics. The cheapest modeling decision pays the largest dividend later.
HONESTY
Engineering the failure mode you can see: a labeled "Pending Sync" against an SLO beats a silent zero, because the number you never recorded is the number nobody knows is missing.
The play lands first; the title catches up; the behavior is computed after both. Get the order of those three clocks right and the schema writes itself — get it wrong and launch night reports a zero.— CLOSING ARGUMENT