PaddySpeaks · Systems at the Whiteboard · Nº 07

The Feed Problem

Model the engagement behind an Instagram-scale feed — a hundred billion impressions a day, every one stamped with the model that ranked it, mutable likes that may never be deleted, and a single celebrity post that can melt a partition by itself. One prompt, three stakeholders who want contradictory truths, and the one decision that separates a senior answer: the impression is the fact, and the model version rides on every row. A complete working-through of data flow, schema, streaming Python, the aggregation layer, the ranker-lift SQL, and the dashboard that proves it.

§ 01 — THE QUESTIONOne feed, three irreconcilable readers

Every data engineer who works near a social product eventually meets this question. It sounds like "log the likes." It is really "design a fact table that a ranking team, a creator-payments team, and a brand-safety team can all query without lying to any of them" — and those three stakeholders do not want the same number.

Interview Prompt

"Model engagement on the Instagram feed — impressions, dwell time, likes, saves, comments, shares, plus the negative signals: hides and reports. It has to support feed-ranker training and A/B testing, creator analytics, and brand safety, at a hundred billion impressions a day. How do you grain it?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The feed is the most-instrumented surface humans have built, and the trap is not the volume — it is that the cheapest mental model is wrong in a way that quietly poisons everything downstream. The naive design stores a counter: post 001 has 9,412 likes. That number cannot answer a single question the business actually asks. It cannot tell the ranking team whether the new model earned those likes or inherited them. It cannot tell finance which of them were unliked four minutes later — the canary for a low-quality recommendation. It cannot tell brand safety whether the hides clustered in one ugly hour. The moment you collapse engagement to a count, you have thrown away the dimension that every stakeholder is secretly asking about: which impression, ranked by which model, at what position, produced this reaction, and was it taken back?

A weak answer models the post and decrements a like counter on unlike. A strong answer notices that the unit of truth is not the post and not the like — it is the impression: one row every time a viewer is shown a post, stamped with the model that chose it. Engagements then hang off impressions, append-only, an unlike recorded as a second row rather than a deletion. Get that grain right and the ranker A/B becomes a GROUP BY; get it wrong and it becomes a separate ETL pipeline you will rebuild every quarter. So before any boxes and arrows, the working frame for the whole session:

THE SERVED GRAIN
Impressions. One row per (viewer × post × impression), a hundred billion a day, each carrying the ranker_model_id that chose it, the predicted_score it was chosen on, and the feed_position it landed in. This is the spine. It is the most-written table on Earth, and it is the only place model-attribution can possibly live.
THE REACTION GRAIN
Engagements. One row per (viewer × post × event × timestamp) — likes, saves, shares, comments, follows, and the negatives, hides and reports. Append-only. A like at T1 and an unlike at T2 are two rows, the second flagged is_undone, never a DELETE — because the toggle history is itself a signal.
THE TIME-ON-POST GRAIN
Dwell. One row per impression with total dwell_ms, rolled up from 250-millisecond heartbeats at ETL — because the raw heartbeat grain is twelve trillion rows a day and no warehouse should ever see it. The aggregation is not an optimization; it is a precondition for the data existing at all.
A like is mutable; the fact of the like is not. The instant you let an unlike DELETE a row, you have destroyed the one question the ranker most needs answered: how often does it make people regret the tap?

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. State what you build, what you ignore, and the numbers that shape every later choice. Out of scope here, said explicitly: the ranking model itself (treated as a callable service that returns scored posts), the media CDN and transcoding, the social graph and follow-fan-out service, and the feed UI. In scope: how an impression becomes a row at a hundred billion a day, how the model version is stamped so A/B lift is a query rather than a project, how a mutable like is recorded without ever deleting, how dwell collapses from heartbeats to a per-impression number, and — the part that separates the seniors — how the system survives the single post from a celebrity that arrives a million times a second.

Then the envelope math, volunteered rather than extracted. Instagram-shaped numbers:

QuantityEstimateConsequence
Impressions / day≈ 100 B≈ 1.2 M/s average, multiples at peak — the firehose that defines the table
Raw dwell heartbeats≈ 12 T / day250 ms × ~30 s/impression — must die at ETL, never reaches the warehouse
Engagements / day≈ 5 B~5% of impressions react; append-only, so unlikes add rows, not subtract
Concurrent ranker versions5–10Forces ranker_model_id onto every impression as a first-class dimension
Hottest single post~10⁶ impressions/sOne celebrity melts any post-keyed partition — the row that shapes the architecture
Distinct viewers / post / dayup to 10⁸Exact COUNT DISTINCT is unaffordable; HLL sketches are mandatory
Raw event retention30 daysPII; aggregate to durable per-post and per-user rollups, expire the raw

Notice which row does the architectural work. The hundred-billion firehose dictates the partitioning and the sampling and the bill — but the row that dictates the shape is the hottest post. A uniform hundred-billion stream is easy; you hash it and walk away. The cruelty of a feed is that the distribution is violently skewed: the median post is seen a handful of times and a single celebrity post is seen a million times a second, into the same partition, at the same instant. Any design that keys the write path on post_id has already lost — that post is a hot partition the moment it is published. The rest of this article follows that skew.


§ 02 — DATA FLOWFollowing an impression through the building

One feed, two tempos. The hot path logs an impression and freezes the model's prediction in under the serving budget; everything heavy — dwell aggregation, the per-post and per-user rollups, the ranker-lift join — happens offline, downstream of a single ordered log that is partitioned by viewer, never by post.

SERVING PATH · ~1.2M IMPRESSIONS/S · LOG + FREEZE SCORE ANALYTICS PATH · DWELL ROLLUP · RANKER A/B · OFFLINE CLIENT FEED scroll · 250ms heartbeats INGEST GATEWAY stamp model_id + score key by viewer, NOT post EVENT LOG (KAFKA) key: hash(viewer_id) % N celebrity post → spread, not stacked fct_impressions served grain · model_id · score partition by impression_ts (hour) fct_engagements append-only · is_undone for unlikes RAW DWELL (TRANSIENT) 12T heartbeats/day · TTL hours DWELL ROLLUP heartbeats → dwell_ms RANKER A/B LIFT like-rate v3.1 vs v3.2 GROUP BY model_id mart_post_daily HLL reach · per-post rollup mart_user_daily per-viewer affinity rollup BRAND SAFETY hide+report rate · review queue RANKER TRAINING label = engaged? + dwell heartbeats collapse 100× model_id × engagements SOLID — sustained event flow · DASHED — offline aggregation & joins · The post proposes the spike; the viewer key disposes of it.
FIG. 1 — End-to-end flow. The log is the spine; it is keyed on the viewer so no single post owns a partition; the model id rides every impression so lift is a join, not a pipeline.

Three properties of this picture do most of the interview's work. First, the log is partitioned by viewer_id, never by post_id — because impressions of one celebrity post are produced by millions of different viewers, so keying on the viewer scatters that spike across every partition while a post-key would funnel it into one. The hottest object in the system is structurally prevented from being a hot partition. Second, the prediction is frozen on the hot path: the gateway stamps the serving ranker_model_id and its predicted_score onto the impression the instant the post is shown, before any reaction exists — the only moment that prediction is uncontaminated by what later happened. Third, the dashed lines are deliberate: raw dwell heartbeats are transient and collapse a hundred-to-one before any durable table sees them, and the ranker-lift join is offline against the append-only facts — the comparison that licenses promoting a model is a GROUP BY on a column, not a bespoke experiment harness.

The Failure Philosophy, In One Rule

Aggregate before you durably store; append rather than mutate; and never let one object key the write path. The twelve-trillion-row heartbeat stream is summarized to a hundred billion impression-dwell rows at ETL and the raw is thrown away — storing it would be a liability with no buyer. Engagements append: an unlike is a new row, because the warehouse's job is to remember what happened, not to depict the present. And the partition key is the viewer, so a post going viral degrades into more rows spread evenly, never one partition on fire. Each rule trades a tempting shortcut for the property that keeps the system queryable at the 99th percentile of skew.


§ 03 — DATA MODELThree facts, the dimensions that version, and the rollups

The schema falls out of the grain question. The impression is the spine, stamped with the model that chose it. The engagement is the reaction, append-only. The dwell is the time-on-post, pre-aggregated. The dimensions are SCD2 because models, posts, and creators all change underneath the facts — and the per-post and per-user rollups exist because nobody can scan a hundred billion rows for a dashboard.

The spine — one row per post shown

The impression fact is the most-written table in the design, so it is about discipline under skew. One row per (viewer × post × impression), stamped with the serving context the analytics layer will need: the feed_position where position-bias lives, the surface that separates feed from reels from explore, and — the column that makes A/B a query — the ranker_model_id with the predicted_score it chose on. The model id is a foreign key into a versioned dimension, not free text, because "which model" must survive a redeploy.

DDL · THE SPINE — IMPRESSION FACT
-- The "served" grain. ~100B rows/day. Partitioned by impression_ts (hour), -- clustered on viewer_id. ranker_model_id is the bias firewall AND the A/B -- key: without it on the row, a 1% holdout can't be compared to the baseline. CREATE TABLE fct_impressions ( impression_id BIGINT PRIMARY KEY, -- snowflake: time-ordered, shard-aware viewer_id BIGINT NOT NULL, -- the partition/cluster key, never post_id post_id BIGINT NOT NULL, feed_position SMALLINT NOT NULL, -- position bias lives here surface TEXT NOT NULL CHECK (surface IN ('feed','reels','stories','explore')), ranker_model_id BIGINT NOT NULL REFERENCES dim_ranker_models, -- ★ first-class predicted_score NUMERIC(8,6) NOT NULL, -- the score that ranked it, frozen device_id BIGINT, impression_ts TIMESTAMPTZ NOT NULL ); CREATE INDEX idx_imp_post ON fct_impressions (post_id, impression_ts DESC); CREATE INDEX idx_imp_model ON fct_impressions (ranker_model_id, impression_ts DESC);

The reaction — engagement, append-only

Engagement is a separate fact, joined to the impression by impression_id — nullable, because a comment can arrive on a post the viewer reached by deep link, with no prior feed impression. Every event type lives here: the positives the ranker optimizes and the negatives brand-safety polices. The non-negotiable design choice is that it is append-only. A like and its later unlike are two rows; the unlike is the original marked is_undone = TRUE with an undone_ts, never a deletion. That preserves the toggle time-series — which is the only way to answer "what fraction of likes are taken back within five minutes," the sharpest canary for a ranker pushing regret-bait.

DDL · THE REACTION — ENGAGEMENT FACT
-- The "reaction" grain. One row per (viewer × post × event × ts). -- APPEND-ONLY. An unlike is a new row with is_undone=TRUE — never a DELETE. -- impression_id is NULLable: comments can arrive without a prior impression. CREATE TABLE fct_engagements ( engagement_id BIGINT PRIMARY KEY, impression_id BIGINT REFERENCES fct_impressions, -- NULLable on purpose viewer_id BIGINT NOT NULL, post_id BIGINT NOT NULL, event_type TEXT NOT NULL CHECK (event_type IN ('like','save','share','comment', 'follow_creator','hide','report')), event_ts TIMESTAMPTZ NOT NULL, is_undone BOOLEAN NOT NULL DEFAULT false, -- the unlike marker undone_ts TIMESTAMPTZ -- when it was taken back ); -- Active state of any engagement = latest row per (viewer,post,event) -- with is_undone=FALSE. The HISTORY of toggles is the regret signal. CREATE INDEX idx_eng_post ON fct_engagements (post_id, event_ts DESC); CREATE INDEX idx_eng_imp ON fct_engagements (impression_id);

The time-on-post — dwell, already collapsed

Dwell is the third fact, and it exists only because something upstream refused to let the raw grain survive. The client emits a heartbeat every 250 milliseconds a post is on screen; at a hundred billion impressions of ~30 seconds each that is twelve trillion rows a day — a number with no warehouse and no purpose. The streaming job sums those heartbeats into one dwell_ms per impression and discards the rest. What lands is a hundred-billion-row table, the regression target the ranker trains on, with the watch-through signal for reels baked in.

DDL · THE TIME-ON-POST — DWELL FACT
-- The "time-on-post" grain. One row per impression, summed from raw 250ms -- heartbeats AT ETL — collapsing ~100 heartbeats into one row. The raw -- heartbeat stream is transient and never durably stored. CREATE TABLE fct_dwell_per_impression ( impression_id BIGINT PRIMARY KEY REFERENCES fct_impressions, dwell_ms INTEGER NOT NULL, -- the regression target viewport_pct NUMERIC(5,2) NOT NULL, -- how much of the post was visible was_full_view BOOLEAN NOT NULL DEFAULT false );

The dimensions that version, and the rollups that survive scale

Two structural pieces finish the model. The dimensions are SCD2 because the things they describe change while the facts keep flowing: a creator's follower count, a post's caption or swapped audio, and above all the ranker model — semver, deployed percentage, model-card URL — so that a query about a fact from last Tuesday joins to the model as it was last Tuesday. And the rollups exist because no dashboard can scan the spine: mart_post_engagement_daily pre-aggregates per (post × day) with an HLL sketch for distinct reach, and mart_user_affinity_daily rolls per (viewer × day) for personalization, both de-identified enough to keep long after the raw facts expire at thirty days.

DDL · VERSIONED DIMENSION + PER-POST & PER-USER ROLLUPS
-- SCD2: the model that ranked an impression must be reconstructable as it -- was AT impression-time, across redeploys and percentage ramps. CREATE TABLE dim_ranker_models ( ranker_model_id BIGINT PRIMARY KEY, -- surrogate, one per version-row model_semver TEXT NOT NULL, -- 'v3.2.1' feature_set TEXT NOT NULL, deployed_pct NUMERIC(5,2) NOT NULL, -- traffic share at this version model_card_url TEXT, valid_from TIMESTAMPTZ NOT NULL, valid_to TIMESTAMPTZ, -- NULL = current is_current BOOLEAN NOT NULL DEFAULT true ); -- Per-POST rollup. Grain: 1 per (post × day). HLL for distinct reach, because -- exact COUNT DISTINCT over 10^8 viewers/post is unaffordable at this volume. CREATE TABLE mart_post_engagement_daily ( post_id BIGINT NOT NULL, day DATE NOT NULL, impressions BIGINT NOT NULL, reach_hll BYTEA NOT NULL, -- APPROX_COUNT_DISTINCT(viewer_id) sketch like_rate NUMERIC(7,5) NOT NULL, save_rate NUMERIC(7,5) NOT NULL, share_rate NUMERIC(7,5) NOT NULL, negative_rate NUMERIC(7,5) NOT NULL, -- (hide + report) / impressions avg_dwell_ms INTEGER NOT NULL, PRIMARY KEY (post_id, day) ); -- Per-USER rollup. Grain: 1 per (viewer × day). De-identified, kept long; -- feeds personalization while the raw impression rows expire at 30 days. CREATE TABLE mart_user_affinity_daily ( viewer_id BIGINT NOT NULL, day DATE NOT NULL, impressions BIGINT NOT NULL, engagements BIGINT NOT NULL, avg_dwell_ms INTEGER NOT NULL, top_audio_id BIGINT, PRIMARY KEY (viewer_id, day) );

One subtlety worth saying out loud at the whiteboard: the rollups do not store a count of likes — they store a rate, computed from the append-only facts with is_undone = FALSE applied at aggregation time. The raw facts remember every toggle; the marts remember only the settled truth. That separation is what lets the same engagement table serve a regret analysis (which needs the toggles) and a creator dashboard (which needs the net) without either one corrupting the other.


§ 04 — THE CORE INVARIANTThe fact is immutable; the model rides on it

The whole correctness of this system lives in one rule joining two halves: engagement facts are never mutated — an unlike appends — and every impression carries, frozen at serving time, the model version and score that chose it. Together they make two otherwise-impossible questions answerable: "did the new ranker actually lift engagement?" and "how often does the feed make people regret a tap?"

Why this is the invariant and not a nice-to-have. Counters lie by forgetting. A like-count that decrements on unlike can tell you the present and nothing else — it cannot reconstruct that the like happened, that it survived four minutes, that it was then withdrawn. The ranker's most valuable training label is buried in exactly that history: a like taken back inside five minutes is a near-miss the model should learn from, indistinguishable from a like that never happened if you let the DELETE through. So the engagement table appends, always, and "active" is a query — the latest non-undone row — not a stored state. Symmetrically, the model id must be stamped at impression-time, because models redeploy daily and traffic ramps continuously; if you do not capture that this impression was served by v3.2 at the instant it happened, you can never reconstruct it, and the A/B comparison the whole feed team depends on evaporates.

RANKED v3.2 IMPRESSED @ pos LIKED @ T1 UNLIKED @ T2 SETTLED: not liked

Read the lifecycle left to right and notice that nothing is ever overwritten. The model id is frozen between RANKED and IMPRESSED, before any reaction exists. The like at T1 is a row; the unlike at T2 is a second row, not an edit of the first; the settled state is computed, not stored. Because every step is an append, the gap between the impression and the analysis can be arbitrarily long and the comparison stays honest — a post shown ten million times under v3.2 with a high predicted score and a wave of five-minute unlikes is not noise, it is the cleanest possible indictment of that model, and you only have it because nothing in the chain was allowed to mutate.

SQL · THE TWO INSERTS THAT REPLACE A DELETE
-- A like, then an unlike five minutes later. TWO rows, never a DELETE. -- The first row stays; the second records the withdrawal. INSERT INTO fct_engagements (engagement_id, impression_id, viewer_id, post_id, event_type, event_ts, is_undone, undone_ts) VALUES ('ENG_1', 'IMP_A', 'U_42', 'POST_001', 'like', '2025-05-01 09:00:04', false, NULL), ('ENG_4', 'IMP_A', 'U_42', 'POST_001', 'like', '2025-05-01 09:05:00', true, '2025-05-01 09:05:00'); -- the unlike: is_undone=TRUE -- "Active" is never stored — it is derived. The settled state of every -- engagement, and the regret signal, both fall out of the same history: SELECT post_id, count(*) FILTER (WHERE event_type = 'like' AND NOT is_undone) AS active_likes, count(*) FILTER (WHERE event_type = 'like' AND is_undone AND undone_ts - event_ts < INTERVAL '5 minutes') AS fast_regrets FROM fct_engagements GROUP BY post_id;

The ranker_model_id foreign key and the immutability of the facts are doing the quiet work together. Because the model id is on the row and the row never changes, the same impression can be attributed to exactly one served model forever, and a head-to-head between v3.1 and v3.2 is a GROUP BY over a column — no separate experiment store, no shadow pipeline, no reconciliation. That is the whole payoff: the experiment infrastructure is the schema.

Never DELETE an unlike, and never serve an impression without stamping the model that chose it. The first preserves the regret; the second preserves the attribution. Lose either and the feed becomes a number you can no longer interrogate.FEED RULE Nº 1

§ 05 — INGESTION & STREAMSPython on the firehose

Three programs carry the system. The gateway that stamps the model id and partitions by viewer so no post can melt a shard, the heartbeat collapser that turns twelve trillion rows into a hundred billion, and the engagement applier that appends an unlike instead of deleting a like. Each is small; the judgment is in what they refuse to do.

1 · The ingest gateway — stamp the model, key by the viewer

This runs on the serving hot path. It does the minimum and emits asynchronously: it freezes the ranker_model_id and predicted_score onto the impression at the instant the post is shown, and — the load-bearing decision — it chooses the log partition by hashing the viewer_id, never the post_id. A celebrity post is impressed by millions of distinct viewers, so a viewer key scatters that million-per-second spike across all partitions, while a post key would bury it in one. The hottest object in the product is, by construction, incapable of being a hot partition.

PYTHON · INGEST GATEWAY — FREEZE MODEL, SPREAD THE HOT POST
import hashlib, time N_PARTITIONS = 4096 class ImpressionGateway: """Logs one impression per (viewer, post) shown. The partition key is the VIEWER, never the post: a celebrity post is served to millions of different viewers, so hashing the viewer scatters that spike evenly, while hashing the post would funnel a million writes/sec into one partition. Skew is defused at the key, not patched downstream.""" def __init__(self, log, model_registry): self.log, self.models = log, model_registry def log_impression(self, viewer, post, slot, surface, served) -> int: # 'served' is what the ranker returned: (model_id, predicted_score). imp_id = new_snowflake() record = { "impression_id": imp_id, "viewer_id": viewer.id, "post_id": post.id, "feed_position": slot, "surface": surface, "ranker_model_id": served.model_id, # ★ frozen at serve time "predicted_score": served.score, # the score it was chosen on "impression_ts": time.time_ns(), } # Async emit — NEVER block the feed response on the log write: self.log.emit(self._partition(viewer.id), record) return imp_id def _partition(self, viewer_id: int) -> int: h = hashlib.blake2b(f"v:{viewer_id}".encode(), digest_size=8) return int.from_bytes(h.digest()) % N_PARTITIONS # viewer, not post

One carve-out, always stated: the gateway never resolves the model id to a name on the hot path. It writes the surrogate ranker_model_id only; the join to semver and deployed-percentage happens offline against the SCD2 dimension. Resolving on the hot path would couple a 1.2-million-per-second writer to a slowly-changing lookup, for a label nobody reads until analysis time. Stamp the key, defer the meaning.

2 · The heartbeat collapser — twelve trillion into a hundred billion

PYTHON · DWELL ROLLUP — SUM HEARTBEATS, DISCARD THE RAW
def collapse_dwell(impression_id, heartbeats) -> dict: """Raw heartbeats arrive every 250ms a post is on screen — ~12 trillion rows/day across the firehose. This runs in the streaming layer and summarizes them to ONE row per impression, then the raw heartbeats are dropped (TTL of hours). The aggregation is not an optimization; it is the precondition for the dwell data existing at warehouse scale.""" if not heartbeats: return None # no on-screen time → no dwell row beats = sorted(heartbeats, key=lambda h: h.ts) dwell_ms = 250 * len(beats) # each beat = 250ms visible max_viewport = max(h.viewport_pct for h in beats) return { "impression_id": impression_id, "dwell_ms": dwell_ms, "viewport_pct": max_viewport, "was_full_view": max_viewport >= 0.99, # fully on screen at least once } # Note: we do NOT keep the heartbeats. Their only purpose was to be summed; # storing 12T rows/day would be a standing cost with no consumer.

3 · The engagement applier — append the unlike, never delete the like

PYTHON · ENGAGEMENT APPLIER — IDEMPOTENT, APPEND-ONLY
async def apply_engagement(store, ev) -> None: """Every engagement is an INSERT. A toggle (like -> unlike) is a NEW row flagged is_undone, NOT a delete or an update of the original. This is what preserves the regret time-series the ranker trains on. Idempotent on engagement_id so at-least-once delivery from the log is safe.""" if ev.action == "undo": # The user un-liked / un-saved. Do NOT touch the original row. await store.insert_engagement( engagement_id = ev.id, impression_id = ev.impression_id, # NULLable: may be a deep-link reaction viewer_id = ev.viewer_id, post_id = ev.post_id, event_type = ev.event_type, # same type as the act being undone event_ts = ev.original_ts, is_undone = True, # the withdrawal marker undone_ts = ev.ts) return # A fresh positive or negative signal — a plain append. await store.insert_engagement( engagement_id = ev.id, impression_id = ev.impression_id, viewer_id = ev.viewer_id, post_id = ev.post_id, event_type = ev.event_type, # like|save|share|comment|hide|report|follow event_ts = ev.ts, is_undone = False, undone_ts = None) # We never manufacture a row for an impression with no reaction. That # ABSENCE is the swipe-past signal; a LEFT JOIN recovers it at analysis time.

§ 06 — AGGREGATIONSketches, rollups, and the rate that is never a count

The aggregation layer turns the append-only firehose into something a dashboard can read in milliseconds. Two ideas carry it: HLL sketches that count distinct viewers without storing them, and per-post / per-user rollups that materialize the settled rates so nobody ever scans the spine. The craft is in counting drivers, not pings — distinct viewers, not impressions — and in computing rates from the append-only truth.

The intuition first, because it is what an interviewer wants out loud. "How many people saw this post?" sounds trivial until you remember a single post can reach a hundred million distinct viewers in a day, and exact COUNT DISTINCT at that cardinality, across petabytes, is a sort the size of a small country. The answer is a HyperLogLog sketch: a few kilobytes that estimate distinct count to within a couple of percent and — the property that makes them indispensable — merge. A per-hour sketch unions into a per-day sketch with no re-scan; a per-post sketch unions across posts for a creator total. You never store the viewers; you store the sketch, and the sketch composes.

PYTHON · ROLLUP JOB — HLL REACH + SETTLED RATES (SKETCH OF THE GRAPH)
# Materialize mart_post_engagement_daily from the append-only facts. # Distinct reach via HLL (mergeable, ~2% error); rates computed with the # is_undone filter so the mart holds SETTLED truth, not raw toggle counts. impressions = (fct_impressions .filter(lambda i: i.day == target_day) .key_by(lambda i: i.post_id) .aggregate(impressions=count(), reach=hll_sketch("viewer_id"), # distinct viewers, not rows avg_dwell=mean("dwell_ms"))) # joined from the dwell fact engagements = (fct_engagements .filter(lambda e: e.day == target_day and not e.is_undone) # SETTLED only .key_by(lambda e: e.post_id) .aggregate(likes=count_where("event_type == 'like'"), saves=count_where("event_type == 'save'"), shares=count_where("event_type == 'share'"), negatives=count_where("event_type in ('hide','report')"))) def to_rates(post, imp, eng): if imp.impressions == 0: return None # no denominator → no rate return { "post_id": post, "impressions": imp.impressions, "reach_hll": imp.reach.serialize(), # store the sketch, never the ids "like_rate": eng.likes / imp.impressions, "save_rate": eng.saves / imp.impressions, "share_rate": eng.shares / imp.impressions, "negative_rate": eng.negatives / imp.impressions, "avg_dwell_ms": imp.avg_dwell, } mart = (impressions.join(engagements, on="post_id", how="left") .map(to_rates) .sink(mart_post_engagement_daily)) # read by every creator + brand-safety dashboard

The per-user rollup is the mirror image and feeds personalization rather than reporting: per (viewer × day), how many impressions, how many settled engagements, the average dwell, the dominant audio — a compact affinity vector that survives long after the raw impressions expire at thirty days, because it is aggregate and de-identified where the raw is PII. The two marts are why the warehouse stays sane: the spine is written once and scanned by batch jobs, while every interactive query hits a table that is already at the grain the question asks. A dashboard never touches a hundred billion rows; it touches the few million rows of the rollup that already answered.

Count distinct viewers with a sketch, never a sort; compute rates from the settled facts, never from a counter. The mart holds what is true today; the log holds everything that ever happened. Both are necessary, and they are not the same table.FEED RULE Nº 2 — SKETCH THE REACH, SETTLE THE RATE

§ 07 — ANALYTICS SQLInterrogating the feed

The three facts are where the system explains itself. Three queries an interviewer loves, because each one carries a classic pattern on its back — conditional aggregation for the creator funnel, a window LAG for the brand-safety velocity, and the model-versioned GROUP BY that is the entire point of the ranker A/B.

Creator analytics — conditional aggregation

The bread-and-butter question: per post, what are the like / save / share rates, the unique reach, and the average dwell — counting only active engagements, with the undone ones filtered out. The pattern is conditional aggregation: a single scan with FILTER clauses fanning the one stream of engagements into per-type counts, joined to impressions for the denominator and to dwell for the time-on-post.

SQL · PER-POST ENGAGEMENT — COUNT … FILTER
WITH active_eng AS ( SELECT impression_id, post_id, event_type FROM fct_engagements WHERE NOT is_undone -- settled state only; toggles excluded ) SELECT i.post_id, count(*) AS impressions, approx_count_distinct(i.viewer_id) AS unique_viewers, count(e.event_type) FILTER (WHERE e.event_type = 'like') AS likes, count(e.event_type) FILTER (WHERE e.event_type = 'save') AS saves, count(e.event_type) FILTER (WHERE e.event_type = 'share') AS shares, round(100.0 * count(e.event_type) FILTER (WHERE e.event_type = 'like') / nullif(count(*), 0), 3) AS like_rate_pct, round(avg(d.dwell_ms), 0) AS avg_dwell_ms FROM fct_impressions i LEFT JOIN active_eng e ON e.impression_id = i.impression_id LEFT JOIN fct_dwell_per_impression d ON d.impression_id = i.impression_id WHERE i.impression_ts >= CURRENT_DATE - 1 GROUP BY i.post_id ORDER BY like_rate_pct DESC;

Brand safety — hide-rate velocity with a window LAG

A post going wrong is not a level, it is a slope: the negative rate jumping hour-over-hour is the alarm, because a steady 0.3% hide-rate is background noise while a leap from 0.1% to 0.9% in one hour is a post turning toxic in real time. The pattern is a window LAG over the per-post-hour negative rate — compare each hour to the one before and surface the accelerations to the human review queue.

SQL · NEGATIVE-SIGNAL VELOCITY — LAG OVER POST-HOURS
WITH hourly AS ( SELECT i.post_id, date_trunc('hour', i.impression_ts) AS hr, count(*) AS impressions, count(e.event_type) FILTER ( WHERE e.event_type IN ('hide','report') AND NOT e.is_undone) AS negatives FROM fct_impressions i LEFT JOIN fct_engagements e ON e.impression_id = i.impression_id WHERE i.impression_ts >= now() - INTERVAL '24 hours' GROUP BY i.post_id, date_trunc('hour', i.impression_ts) ), rated AS ( SELECT post_id, hr, impressions, negatives::numeric / nullif(impressions, 0) AS neg_rate, lag(negatives::numeric / nullif(impressions, 0)) OVER (PARTITION BY post_id ORDER BY hr) AS prev_rate FROM hourly WHERE impressions > 1000 -- ignore low-volume noise ) SELECT post_id, hr, round(100.0 * neg_rate, 3) AS neg_rate_pct, round(100.0 * (neg_rate - coalesce(prev_rate, 0)), 3) AS rate_jump_pct FROM rated WHERE neg_rate > 0.005 -- over the 0.5% brand-safety floor AND neg_rate - coalesce(prev_rate, 0) > 0.003 -- AND accelerating fast ORDER BY rate_jump_pct DESC; -- the queue: posts turning toxic right now

Ranker A/B lift — the model-versioned GROUP BY

The query that justifies the entire schema, and the one that separates a senior answer. Because ranker_model_id rides every impression, comparing two model versions is a GROUP BY on that column — no separate experiment store, no bespoke pipeline. The senior framing is to compute both the positive and the negative rate per version, because a model that lifts likes while also lifting hides has not won: the new ranker ships only if it raises engagement and does not worsen the negatives.

SQL · ENGAGEMENT-RATE LIFT — v3.2 OVER v3.1
WITH per_model AS ( SELECT m.model_semver, count(*) AS impressions, count(e.event_type) FILTER ( WHERE e.event_type = 'like' AND NOT e.is_undone) AS likes, count(e.event_type) FILTER ( WHERE e.event_type = 'save' AND NOT e.is_undone) AS saves, count(e.event_type) FILTER ( WHERE e.event_type IN ('hide','report')) AS negatives, count(*) FILTER (WHERE e.event_type = 'like' AND e.is_undone AND e.undone_ts - e.event_ts < INTERVAL '5 minutes') AS fast_regrets, avg(d.dwell_ms) AS avg_dwell_ms FROM fct_impressions i JOIN dim_ranker_models m ON m.ranker_model_id = i.ranker_model_id LEFT JOIN fct_engagements e ON e.impression_id = i.impression_id LEFT JOIN fct_dwell_per_impression d ON d.impression_id = i.impression_id WHERE i.impression_ts >= CURRENT_DATE - 7 AND m.model_semver IN ('v3.1', 'v3.2') GROUP BY m.model_semver ) SELECT model_semver, round(100.0 * likes / nullif(impressions, 0), 4) AS like_rate_pct, round(100.0 * saves / nullif(impressions, 0), 4) AS save_rate_pct, round(100.0 * negatives / nullif(impressions, 0), 4) AS negative_rate_pct, round(100.0 * fast_regrets / nullif(likes, 0), 3) AS regret_rate_pct, round(avg_dwell_ms, 0) AS avg_dwell_ms FROM per_model ORDER BY model_semver; -- v3.2 ships iff like_rate AND dwell rise, while negative_rate and regret do NOT.

§ 08 — THE DASHBOARDProving the feed is healthy

A senior design ends with observability, because every safeguard above is invisible without it. The feed dashboard watches three readers at once: is the live ranker lifting without raising regret, is the firehose ingesting cleanly without a partition on fire, and is any post turning toxic in real time.

RANKER HEALTH
like-rate lift (live model over the holdout), 5-min regret rate (likes taken back fast — the quality canary), negative-rate delta vs baseline, avg dwell — a model lifting likes while lifting hides or regret has not won.
INGEST HEALTH
impression rate vs expected, partition skew (max-partition over mean — the celebrity-melt alarm), heartbeat collapse ratio (should hover near 100×), log lag — a skew spike means a post went viral and the viewer-key is earning its keep.
SAFETY HEALTH
posts over hide-rate floor, review-queue depth, fastest-accelerating post (the LAG velocity leader), report-rate p99 — the early warning that a single post is going wrong before it goes everywhere.
Feed Engagement Ops — Global FRI 20:10 UTC · LIVE v3.2 · HOLDOUT v3.1 · 60s REFRESH
Like-Rate Lift
+5.8%
5-min Regret Rate
3.1%
Negative-Rate Δ
-0.4%
Avg Dwell
28.4s
Per-post impression skew — last 60 min (one celebrity post arriving, viewer-key spreading it)
viral post (spread across N partitions, none melts)
Impression Rate
1.34M/s
Partition Skew
1.4×
Heartbeat Collapse
98×
Posts > Hide Floor
23
Review Queue
147
Log Lag
3s
FIG. 2 — The story a healthy-but-watched feed tells: live v3.2 lifting likes and dwell with negatives down — and the amber tiles, regret rate and the review queue, climbing as one viral post drives both engagement and complaints while the partition-skew bars stay flat, the viewer-key absorbing the spike exactly as designed.

Read the amber tiles together and the dashboard narrates the central tension of the whole problem. The skew chart shows a single post arriving at a million impressions a second — the spike that would have melted a post-keyed partition — yet partition skew sits at 1.4×, because the viewer key scattered those writes across thousands of shards. Meanwhile the new ranker is winning on its own terms, likes and dwell up, negatives down; but the regret rate and the review queue climb, the canaries warning that the same viral content driving engagement is also driving complaints. That is exactly why regret and the safety velocity exist: they are the only instruments that see the feed straining underneath metrics that otherwise look excellent.


§ 09 — THE RUBRICWhat was actually being tested

Strip the Instagram details away and the question was testing five judgments, each of which generalizes far beyond a social feed:

GRAIN
Seeing that the unit of truth is the impression, not the post or the count — and that a stamped ranker_model_id on every served row turns a quarterly A/B project into a single GROUP BY. The grain is the architecture.
IMMUTABILITY
Appending an unlike instead of deleting a like, because the warehouse's job is to remember what happened, not to depict the present. The toggle history is the regret signal, and a counter throws it away.
SKEW
Defusing the hot object at the partition key rather than patching it downstream — keying the write path on the viewer, not the post, so the one celebrity who could melt a shard becomes evenly-spread rows instead.
SCALE DISCIPLINE
Killing the twelve-trillion-row heartbeat grain at ETL, counting distinct viewers with a mergeable sketch instead of a sort, and materializing rollups so no dashboard ever scans the spine. Aggregate before you durably store.
STAKEHOLDER HONESTY
Serving three readers who want contradictory truths from one model — the raw facts for the ranker's regret analysis, the settled rates for the creator, the velocity for brand safety — without letting any of their needs corrupt the others' numbers.
The post proposes the spike; the viewer key disposes of it. The model proposes the ranking; the frozen id makes it answerable. Everything else is the patient discipline of remembering exactly what happened — and never, ever overwriting it.— CLOSING ARGUMENT