Est. 2026Philosophy · Technology · WisdomLinkedIn ↗

PaddySpeaks

Where ancient wisdom meets the architecture of tomorrow

← All Articles
technology · interview prep

Ads Data Engineering Interview Prep

A senior / L5 handbook for the rounds that actually decide an ads DE loop — auctions, attribution, identity & consent, ads-specific SQL, platform-level stacks, measurement & visualization, plus ten full mock interview scenarios.

Overview

Overview

Senior Ads DE Interview Prep — Master Index

"The goal isn't to memorize answers. It's to build the mental model that generates them."

A production-grade study companion for senior-level Data Engineering interviews at large consumer ads platforms. Six parts, designed to be read in order but self-contained enough for targeted cramming.


The Stack

This guide is written for a generic large consumer ads platform — the kind that serves 50B+ impressions per day across multiple surfaces (feeds, short-form video, stories, publisher network, commerce). Throughout the guide you'll see references to platform-generic concepts like "the ads platform," "the warehouse," "the streaming spine." Substitute the specifics of your target company's stack as you read:

Generic term used here Common real-world equivalents
Event streaming spine Kafka, Kinesis, Pub/Sub
Batch orchestrator Airflow, Dagster, Prefect
Data warehouse BigQuery, Snowflake, Redshift, Iceberg + object storage
Distributed SQL engine Trino, Presto, Spark SQL, Impala
Real-time analytics store Druid, Pinot, ClickHouse
Stateful stream processor Flink, Spark Structured Streaming, Kafka Streams
Aggregated metrics platform Unified metrics layer, LookML, cube layer
Conversion pixel Web tracking pixel, advertiser SDK, server CAPI
AI-driven performance campaigns Auto-optimized campaign products across the industry

The interview substance — system design, SQL mechanics, growth accounting, behavioral rubrics — transfers cleanly across all of them.


Documents

Part 0: Introduction to Ads Engineering

Foundational understanding: how ads work end-to-end, the data pipeline stack, user information tiers, privacy regimes (GDPR, CCPA, ATT), and the aggregated metrics platform pattern. If ads is new to you, read this first. ~8,500 words.

  • The 7-stage ads lifecycle
  • The DE pipeline stack (event → bronze → silver → gold → serving)
  • The four data tiers and what each is for
  • Privacy-by-design principles
  • The DE mental model: the three questions you'll answer every sprint

Part 1: Mock Interview + 10 Scenarios + Practice

A full senior-level loop walked end-to-end, followed by deep dives on 10 scenarios you should be prepared to answer cold. ~35,000 words.

  • Round 1: System design (unified metrics platform)
  • Round 2: SQL deep dive (attribution windows, multidimensional benchmarks)
  • Round 3: Python exercise (config-driven SQL generator + data validator)
  • Round 4: Streaming vs batch (spend monitoring with reconciliation)
  • 10 scenarios: campaign reporting, real-time delivery monitoring, attribution at scale, audience segmentation, creative A/B testing, DQ frameworks, backfills, SCD for campaign entities, incident response, senior technical leadership

Part 2: Advanced Ads + Growth Accounting + Behavioral

Scenarios 11–22: auction data, cross-platform attribution, budget pacing, creative analytics, growth accounting pipelines (DAU, MAU, lifecycle, revenue decomposition), and the behavioral / conflict scenarios that separate senior from mid-level. ~18,000 words.

  • Ad auction data engineering (100B+ events/day)
  • Cross-platform identity graph and attribution
  • Budget pacing and spend forecasting
  • Creative-level performance analytics
  • Growth accounting: the L28 framework
  • STAR+I behavioral framework
  • Disagreeing with your manager, managing shared-table conflicts, cross-team incidents

Part 3: Deep Dive SQL

Production-ready SQL with senior commentary. Every query is accompanied by "why this and not that." ~12,000 words.

  • Full DAU growth accounting pipeline (incremental history → daily classification → summary → cohort retention → WAU/MAU)
  • DAU drop debugging
  • Frequency capping analytics
  • Advertiser ROAS pipelines
  • Audience overlap analysis
  • Incrementality (holdout-based lift)
  • Ad delivery health monitoring
  • Campaign cannibalization detection
  • Ad load vs user experience

Part 4: Platform Surfaces + A-Z Glossary

Surface-specific scenarios (short-form video, stories, carousel, AI-driven shopping, dynamic product ads, cross-surface pipelines) plus a comprehensive A-Z of ads terminology. ~16,000 words.

  • Short-form video ads engagement pipeline
  • Stories ads tap-through funnels
  • Carousel ads card-level analytics
  • AI-driven automated performance campaigns
  • Dynamic product ads (catalog-driven)
  • Cross-surface unified pipeline
  • A-Z glossary: surfaces, formats, objectives, attribution types, auction mechanics, delivery, measurement, targeting, privacy

Part 5: Visualization & Measurement of Campaign Objectives

How to measure whether a campaign worked, by objective — awareness, traffic, conversions, app installs, lead gen. Scorecards, visualizations, SQL templates. ~9,000 words.

  • Campaign scorecard template
  • Reach & frequency visualizations
  • Conversion funnel and ROAS by attribution window
  • App install funnel (install → event → retention)
  • Lead quality scoring
  • Cross-objective benchmarks

Week Days Focus Deliverables
1 Mon–Tue Part 0 — foundations Can you explain the 7-stage lifecycle without notes? Can you draw the pipeline stack?
Wed–Thu Part 3 — SQL deep dive Write all 5 DAU queries from scratch on a whiteboard.
Fri–Sun Part 4 glossary Every A-Z term memorized cold. Make flashcards for the ones you miss.
2 Mon–Wed Part 1 — Round 1 (system design) Rewrite the AMP architecture from memory three times, different framings each time.
Thu–Fri Part 1 — Rounds 2 & 3 (SQL, Python) Solve both SQL prompts cold. Code the config-driven validator without looking.
Sat–Sun Part 1 — Round 4 + scenarios 1–5 Record yourself answering each scenario out loud.
3 Mon–Wed Part 1 — scenarios 6–10 Record, review, redo.
Thu–Fri Part 2 scenarios 11–16 (advanced ads) Can you draw the cross-platform identity graph?
Sat–Sun Part 2 scenarios 17–22 (growth + behavioral) Prepare 6 STAR+I behavioral stories, one per loop slot.
4 Mon–Tue Part 4 — platform-specific scenarios Pick the 3 most relevant to your target company; drill.
Wed–Thu Part 5 — measurement & visualization Sketch 3 dashboards from scratch.
Fri Mock loop (with a friend) Full 5-round simulation.
Sat Weak-area drill Fix whatever the mock exposed.
Sun Rest Sleep. Eat protein. Show up.

How to read this guide

  • Study mode: linearly, Part 0 → Part 5.
  • Cramming mode: Part 0's DE mental model section → Part 1's Round 1 → Part 3's DAU pipeline → Part 4's glossary → Part 1's 10 scenarios.
  • Reference mode: jump to the specific scenario you need. Each one is self-contained.
  • Behavioral prep: Part 2's last four sections. Read them a week out.

The written style is deliberate: short sentences, hard numbers, actual code. Senior-level interviews reward concreteness. Every query, every ASCII diagram, every decision framework here is something you should be able to reproduce on a whiteboard under pressure.

Good luck.

↑ Back to top

Part 00

Introduction to Ads Engineering

Part 0 — Introduction to Ads Engineering

"Before you can build the pipeline, you have to understand what's flowing through it. Ads is not a generic OLAP warehouse — it's a high-throughput, privacy-constrained, financially-regulated, millisecond-sensitive distributed system. Know that, and the rest of the interview follows."

This chapter is the foundation: how ads work end-to-end at a large consumer platform, the data architecture DEs own, the privacy regimes that constrain every decision, and the mental model a senior DE brings to every sprint. If you're new to the ads domain — or if your last ads job was years ago — read this first and read it carefully. Every scenario in Parts 1–4 assumes this vocabulary.

Contents

  1. How ads work — the end-to-end lifecycle
  2. The data architecture: from event to insight
  3. User information, data tiers, and privacy
  4. The aggregated metrics platform pattern
  5. The DE mental model: the three questions
  6. Vocabulary cheat sheet

1. How ads work — the end-to-end lifecycle

1.1 The 60-second version

An advertiser comes to the platform with money and a goal (sell shoes, get app installs, build brand awareness). They create a campaign, define their audience, upload creatives, and set a budget. Every time a user on the platform loads a feed or opens a story, an auction happens: multiple advertisers compete to show their ad to that specific user. The platform picks a winner in milliseconds, shows the ad, and logs the impression. If the user clicks, we log a click. If the user later converts (buys shoes, installs the app), we attribute that conversion back to the ad that drove it. Every one of those events is data a DE pipeline processes, aggregates, and surfaces to advertisers, internal analysts, and ML optimization systems.

At a large platform, the numbers are staggering:

  • 50–100B+ impressions per day across all surfaces.
  • 5–20 TB/day of raw event data for impressions alone, before conversions, auctions, and derived signals.
  • 100k+ concurrent campaigns from millions of advertisers.
  • Sub-second p99 latency for auction decisioning; minutes-to-hours for reporting pipelines.
  • Cents per impression matters at this scale — a 0.1% measurement error on spend is $50K/day.

The DE's job is to turn that firehose into trustworthy numbers for advertisers, analysts, finance, and ML.

1.2 The full lifecycle (7 stages)

The Ads Value Chain
Advertiser
Platform
User / Conversion
1
Campaign creation
Campaign entity store
2
Audience targeting
Audience engine / graph
3
Auction → impression
Impression served
4
Impression log
5
Delivery → click log
Engagement (click, like, view)
6
Conversion log
User converts (pixel, SDK, server)
7
Advertiser dashboards · billing · ML feedback
Reporting & optimization
Seven stages flow through three actors. Each row shows who holds the data at that step.

Every arrow above is a system. Every system emits events. The DE owns most of the boxes downstream of the first two.

Stage 1 — Campaign creation

The advertiser logs into the ads UI or calls the Marketing API. They define:

  • Campaign: the top-level container with a single objective (e.g., conversions, reach, app_installs).
  • Ad Set (or ad group): targeting + budget + schedule + bidding strategy.
  • Ad: the creative + call-to-action that users see.
Campaign · Ad Set · Ad
Campaign
objective
1 : N
Ad Set
targeting · budget · bid
1 : N
Ad
creative · landing page

Entities are written to an OLTP store (think Postgres/MySQL/Spanner). They're replicated into the warehouse via CDC, typically with SCD Type 2 so historical analysis can reconstruct what the targeting looked like on any given day.

Stage 2 — Audience targeting

The advertiser specifies who should see the ad:

  • Demographic: age, gender, location.
  • Interest: topics the user has engaged with.
  • Behavioral: has installed similar apps, has converted on similar products.
  • Custom audiences: lists uploaded by the advertiser (hashed emails, hashed phone numbers), matched against platform users.
  • Lookalike audiences: statistical extensions of custom audiences.
  • Retargeting: users who visited the advertiser's site/app.

The audience engine compiles these into a match set that auction candidates can be filtered against in sub-millisecond time. DE pipelines feed this engine — custom audience matching is an identity-resolution problem with strict privacy constraints, and cohort definitions for lookalikes are offline ML pipelines that DEs often co-own.

Stage 3 — Auction

A user loads a feed. The ads serving stack triggers an auction. In milliseconds:

  1. Candidate retrieval: given the user's ID, their features, and the context (surface, session), retrieve thousands of eligible ads.
  2. Prediction: for each candidate, ML models predict p(click), p(conversion | click), p(retention) — whatever matches the advertiser's objective.
  3. Bid: compute an expected value per impression (for a cost-per-action advertiser: bid_cpm = CPA_bid × p(conversion) × 1000).
  4. Rank: order candidates by expected total value (advertiser expected value + platform utility terms like user experience).
  5. Second-price (or VCG) clearing: the winner pays the minimum they'd have needed to bid to win. Encourages truthful bidding; the clearing price ≠ the winning bid.
  6. Serve: render the ad; log the impression with full context (user, ad, surface, auction_id, bid, price, predictions).

Impressions are the atomic unit of ads data. Everything downstream joins back to them. A single day's impression log at scale is the largest event table in the warehouse.

Stage 4 — Delivery

Budget pacing, frequency capping, and quality controls govern whether to serve an ad even if it won the auction:

  • Pacing: spend this ad set's daily budget evenly across the day (linear) or weighted toward high-value hours (accelerated).
  • Frequency cap: don't show the same ad to the same user more than N times/week.
  • Quality filter: block ads flagged as misleading, violating policy, or poorly rated by users.

Delivery systems are mostly real-time, but they depend on rolling aggregates (spend-so-far, user-impressions-this-week) that DE pipelines compute and publish to low-latency stores.

Stage 5 — Engagement

After serving, the user may engage:

  • Main Feed: reactions, comments, shares, video plays.
  • Short-form video: watch duration, completes, re-watches, mutes.
  • Stories: tap-forward (skip), tap-back (re-watch), reply, sticker tap.
  • Carousel: card swipes, per-card dwell.
  • In-stream video: view durations, skips, completion.

Each surface has its own engagement event schema. DEs are typically the integration point: turning 15 heterogeneous engagement signals into one fact table per ad per user per session.

Stage 6 — Conversion measurement

A conversion is the advertiser's desired outcome (purchase, install, signup, lead submission) — typically happening off-platform:

  1. User clicks ad → lands on advertiser's site or app.
  2. User performs the action (adds to cart, completes checkout).
  3. Conversion tracking reports the event back:
    • Web pixel: JS tag on advertiser site POSTs event to platform.
    • Mobile SDK: in-app event forwarded to platform.
    • Server-side CAPI: advertiser's backend sends event server-to-server (more reliable post-iOS ATT and cookie restrictions).
    • Offline upload: advertiser batch-uploads purchases (e.g. in-store sales with loyalty ID).
  4. Platform matches the conversion event to the right impression/click using an attribution model (see Part 4 glossary).

The attribution join is the economic heart of ads measurement: it decides which advertiser gets credit for which sale, and therefore how dollars flow. Getting it wrong costs real money.

Stage 7 — Reporting and the optimization loop

The conversion event, now attributed, feeds two systems:

  • Advertiser reporting: "your campaign drove 423 conversions for $8.42 CPA yesterday." Dashboards, APIs, automated rules.
  • Platform optimization: the same event becomes training data for the next auction's p(conversion) model. Good data engineering closes this loop cleanly — garbage here means worse auctions and worse advertiser outcomes.

DEs own both paths. The first is the traditional ads reporting stack. The second is the ML training data pipeline — which has its own consistency, latency, and backfill requirements that differ from reporting.


2. The data architecture: from event to insight

2.1 The pipeline stack

Every event in the lifecycle above lands in a pipeline stack roughly shaped like this:

Streaming → Medallion → Serving
Serving / Product
JSON / Protobuf events · ~5–50 KB each
Event Streaming Spine
Kafka / Kinesis / Pub-Sub · ~1 M msg/sec aggregate
Real-time OLAP
Druid / Pinot / ClickHouse · sub-second dashboards
Stateful Stream Processor
Flink / Kafka Streams · EOS · online features · ML serving
Bronze
S3 / GCS + Iceberg / Delta · 1:1 with source · partitioned by event_date, hour
Silver
Dedup · schema standardization · dim joins · SCD2 · source-of-truth facts
Gold
Business-aggregate marts · daily_ad_performance · weekly_advertiser_roas
Serving
BI tools (Tableau / Looker) → gold
Advertiser APIs → gold
Finance / billing → silver (row-level truth)
ML training → silver or gold

Typical tools per layer (company-dependent):

  • Streaming spine: Kafka, Kinesis, Pub/Sub.
  • Stream processor: Flink, Spark Structured Streaming, Kafka Streams.
  • Lake storage: S3/GCS/ADLS with Iceberg/Delta/Hudi.
  • Batch compute: Spark (most shops), Trino/Presto for ad-hoc, Dremio.
  • Orchestrator: Airflow, Dagster, Prefect, cloud-native workflows.
  • Real-time OLAP: Druid, Pinot, ClickHouse.
  • Warehouse for gold: Snowflake, BigQuery, Redshift — or the lakehouse itself.

2.2 The key fact tables

Every ads DE ends up owning or touching these. Memorize the grain:

Table Grain (one row =) Daily volume Typical partitioning Primary use
fact_impression one ad served to one user 50–100B event_date, hour auctions, reach, delivery
fact_click one click on an ad 1–5B event_date, hour CTR, traffic quality
fact_engagement one engagement (like, share, watch_complete) 5–20B event_date, surface creative quality
fact_auction one auction decision (winner + losers) 500B–1T event_date, sampled auction health, market dynamics
fact_conversion one advertiser-reported outcome 100M–1B event_date, attribution window ROAS, billing
fact_spend one billing event 100M billing_date, campaign revenue, invoicing

Plus dimension tables:

  • dim_campaign — campaign, ad set, ad (SCD2)
  • dim_advertiser — account, agency, billing
  • dim_creative — asset, format, variant
  • dim_user — pseudonymous user + tier-controlled attributes
  • dim_audience — saved audiences, lookalikes
  • dim_surface — placement (main_feed, short_form_video, etc.)

2.3 The three flows through this stack

At a senior level you should be able to narrate three very different paths data takes:

Flow A: advertiser dashboard "how did my campaign do yesterday?" Impression → stream → bronze → silver (joined with conversions within attribution window) → gold (daily roll-up) → BI. Latency: typically T+3h to T+24h depending on attribution window.

Flow B: real-time delivery "am I pacing my budget?" Impression → stream → stream processor (sliding window over last 10min) → real-time OLAP store → serving. Latency: seconds. Does NOT use the warehouse path.

Flow C: ML training "what's next week's click prediction model?" Impression + click + conversion → silver → feature store → training pipeline → model artifact. Latency: daily. The join between impression and conversion has a strict waiting window (e.g., 7 days) before a training example is considered final.

Three flows. Three latency budgets. Three correctness definitions. The senior DE recognizes which flow a new requirement belongs to before picking a tool.

2.4 Why lambda/kappa vocabulary is alive in ads

Ads is one of the last bastions where "real-time is approximate, batch is authoritative" is still the right answer. You'll hear variants of this:

  • Real-time dashboards show spend-so-far within 30 seconds, but they're allowed to drift 1–2% because exactly-once through a 15-hop pipeline is hard.
  • The T+3h batch re-computes spend from bronze logs with the full attribution join — this is what the advertiser is billed for.
  • Reconciliation: for every campaign, real-time spend ≈ batch spend within ε. When it isn't, an alert fires and a DE investigates.

If someone asks "why don't you make the real-time output the one you bill from?" — that's the whole Part 1 Round 4 conversation. The answer is "correctness guarantees at this scale cost 10–100× more latency, for no measurable advertiser benefit."


3. User information, data tiers, and privacy

3.1 Why user data matters for ads

Ads effectiveness is a function of relevance; relevance requires user data. The platform's economic incentive is to use more data to serve more relevant ads, charge higher CPMs, and earn more. The user's interest is in seeing content that's useful (some relevance is welcome) without being surveilled (too much is creepy, and increasingly illegal). The DE works in the space between.

Four knobs govern the balance:

  1. What data is collected.
  2. What it's joined to.
  3. Who can query it.
  4. How long it's retained.

All four are constrained by regulation, by platform policy, and by the principle of least privilege. Every DE decision — especially partition keys, join keys, and access grants — is a privacy decision whether you realize it or not.

3.2 The data hierarchy (tiers)

Most large ads platforms adopt some variant of a tiered data model:

Tier Examples Access pattern Retention
T0 — Raw PII Email, phone, IP, device ID, precise location Only identity service; not joinable in analytics warehouse Encrypted at rest; minimum retention (often 90–365 days)
T1 — Hashed / resolved user_id_hash, device_id_hash, IP /24 Accessible to identity-resolution and fraud pipelines 90–730 days
T2 — Derived attributes Age bucket, country, inferred interests, engagement scores Accessible to targeting, modeling, reporting 730+ days
T3 — Aggregated Daily country-level impressions, weekly interest-cohort CTR Broadly accessible Multi-year
T4 — Synthesized / differentially private DP-protected exports, k-anonymized audiences Sharable externally under contract Perpetual

The senior DE thinks in tiers. Joining T0 data in a gold-layer table is a P0 privacy incident. Most gold tables live at T2 or T3.

3.3 The privacy regulatory landscape

A partial list you should be able to name and outline:

  • GDPR (EU, 2018) — lawful basis required; right of erasure; data portability; data protection impact assessments; DPO role; up to 4% global revenue fines.
  • CCPA / CPRA (California, 2020/2023) — right to know, delete, opt-out of sale/sharing; sensitive personal information category; private right of action for breaches.
  • LGPD (Brazil, 2020) — GDPR-like regime.
  • PIPL (China, 2021) — consent-heavy; cross-border transfer restrictions.
  • DMA / DSA (EU, 2024) — interoperability + transparency obligations for large platforms.
  • iOS ATT (App Tracking Transparency) (Apple, 2021) — user must opt-in for cross-app tracking via IDFA; opt-in rates 15–30%; massive effect on mobile attribution.
  • SKAdNetwork / Privacy Sandbox (Apple, Google) — platform-enforced attribution windows using aggregated, DP-protected signals instead of user-level joins.
  • Children's privacy: COPPA (US), age-appropriate design codes in EU/UK — users under 13 (or 16 in some regimes) can't be profiled for ads.

These aren't abstract policy. Every one of them changes what a DE can put in a join key. A pipeline that works today can become illegal next year when a new regime kicks in. The senior DE builds pipelines that can be re-keyed and re-aggregated without rewriting — because they will be.

3.4 iOS ATT impact (a concrete case)

Pre-ATT (2021 and earlier): mobile advertisers measured conversions via the device advertising identifier (IDFA), which was unique per device and queryable. Post-ATT: ~70–85% of iOS users don't grant tracking permission. IDFA is blanked for them. Consequences:

  • Attribution accuracy drops by 30–50% on iOS for non-consented traffic.
  • Platform-specific replacements: Apple's SKAdNetwork returns aggregated, delayed, noisy postbacks.
  • Server-side CAPI adoption: advertisers move conversion events off the web pixel (cookie-less) and onto server-to-server APIs with first-party identifiers.
  • Deterministic → probabilistic modeling: platforms fill the gap with modeled conversions trained on consented traffic.

The DE consequence is that a single "conversions" metric is now a sum of: deterministic-matched (from CAPI), probabilistic-modeled (from ML), platform-aggregated (from SKAN). Three very different lineages feeding one number. Reporting that hides the decomposition hides the noise. Senior DEs surface it.

3.5 Privacy-by-design principles (4 rules to memorize)

1. Data minimization — collect only what you need for the stated purpose. If a report doesn't need device ID, the pipeline shouldn't carry it. "We might want it later" is not a stated purpose.

2. Purpose limitation — data collected for one purpose can't be silently repurposed. A pipeline that reads T0 data for fraud detection can't have its output joined into an advertiser-facing dashboard.

3. Retention limits — every data asset has an expiration. Raw PII: weeks. Resolved identifiers: months. Aggregates: years. Automation enforces deletion; manual processes don't scale.

4. Access control by role + purpose — not everyone on the DE team needs access to every column. Sensitive columns gated by access policy; every query audited.

Violations of these principles are how companies end up in the news. The DE is often the last line of defense.


4. The aggregated metrics platform pattern

4.1 The problem

At scale, every team wants their own gold-layer table: "daily campaign performance," "weekly advertiser ROAS," "creative-level engagement by surface." Left alone, you end up with:

  • 200 gold tables, 80% overlapping.
  • Three teams computing "spend" three different ways (with slightly different values).
  • Dashboard numbers don't match across products.
  • The DE team spends 40% of its time reconciling.

4.2 The pattern

A unified metrics platform (UMP) — also called an aggregated metrics platform, a semantic layer, or a metrics hub — centralizes metric definitions so "spend" is computed exactly once from exactly one source and everyone reads that.

Architecture, roughly:

Metric Registry → Pipelines → Consumers
Metric Registry · YAML
spend:
  source: silver.fact_impression
  expr: SUM(charge_cents)/100.0
  dimensions: [campaign_id, ad_set_id, surface, country]
  privacy_tier: T3
  owners: [ads-billing-de]
Batch
Spark
Daily gold marts
Streaming
Flink
Real-time feeds
Ad-hoc SQL
Trino
Explorer queries
Advertiser & analyst surfaces

All three compute paths read the same metric definitions from the registry. The registry enforces:

  • Definition — what is spend? (source table, formula, grain).
  • Allowed dimensions — what can you slice by? (some dims are T3 only).
  • Privacy tier — can this metric be joined with T0 columns? (no).
  • Ownership — who fixes it when it breaks? (paged on-call).
  • Backfill strategy — how do we recompute history safely?

4.3 The privacy–utility spectrum

The AMP enforces a single knob: "how much detail in the cut." Finer cuts (e.g., spend by age-bucket × country × hour × campaign_id) are more useful to analysts but potentially re-identifiable. Coarser cuts (e.g., spend by country × day) are safe but less useful.

The AMP can enforce this via:

  • k-anonymity thresholds: suppress cells with fewer than k users.
  • Differential privacy: inject calibrated noise into every released number.
  • Cell suppression: hide cuts that fall below privacy thresholds; show "<k users" instead.
  • Per-query privacy budget: limit how many queries a single consumer can run before noise compounds.

At senior level, you should know whether your target company uses k-anonymity, DP, or pure access control. The architecture choice has implications for query semantics, which affects DE designs.

4.4 What a DE builds here

  • The metric registry: usually YAML/JSON checked into a repo, with CI validating that every metric compiles against the source schema.
  • The compiler: reads the registry and emits SQL for batch pipelines, streaming DSL for Flink, and a metadata layer for ad-hoc tools.
  • The backfill engine: when a metric definition changes, re-runs the derivation for history with idempotent writes.
  • The observability layer: for every metric, track freshness, row count, value range — alert on drift.

Part 1's Round 1 walks through designing this exact system end-to-end.


5. The DE mental model: the three questions

Every senior DE working in ads answers three questions every time they touch a pipeline:

Question 1 — Whose truth is this?

For every number, name:

  • The authoritative source (the bronze log or SOR — system of record — that the number ultimately traces to).
  • The attribution rules applied to the source (which conversions credit which impressions? under what window?).
  • The reconciliation target (what must this number match? advertiser billing? finance? ML training data?).

If you can't answer all three in two sentences, the number isn't ready to ship. At senior level, "whose truth is this" is the first question in every design review.

Question 2 — What latency does this need?

Three buckets:

  • Real-time (seconds): delivery monitoring, fraud, pacing, basic dashboards for "spending now." Approximate is acceptable.
  • Near-real-time (minutes): advertiser "yesterday's performance" appearing during the day as the attribution window closes. Mostly accurate.
  • Batch-authoritative (hours to T+1): billing, finance, ML training. Exact.

Pick one, then pick the tech. Never build one pipeline aiming to serve all three — you'll get a Frankenstein that's slow where it needs to be fast and approximate where it needs to be exact.

Question 3 — How does this scale, and fail?

  • Happy-path scale: 2× growth in a year? 10× for a big launch? What breaks first?
  • Failure modes: upstream event spike (campaign with a viral creative), upstream event drop (ingest outage), corrupt event (schema bug), late-arriving conversion (server CAPI lag), duplicate event (pixel fires twice).
  • Blast radius: if this pipeline breaks, what depends on it? Just one dashboard? Billing? ML training? Every downstream system?

A senior DE's design doc has a failure-modes section with specific mitigations for each. "We'll monitor it" is not a mitigation.

5.1 The ads data engineering triangle

The Core Tradeoff
Correctness
  • billable
  • reconcilable
  • auditable
Latency
  • real-time
  • near-real-time
  • batch
Cost
  • storage
  • compute
  • engineering time
  • operational burden
Pick any two; the third pays for them.

You get two. Every ads pipeline is a defensible position on this triangle. Senior-level interviews test whether you can articulate which two you picked and why, and whether you can recognize when a stakeholder is asking for all three (and coach them back to two).


6. Vocabulary cheat sheet

Term Meaning (short)
Impression One ad shown to one user, one time.
Reach Unique users exposed to an ad over a period.
Frequency Impressions per reached user.
CPM Cost per thousand impressions.
CPC Cost per click.
CPA Cost per action (conversion).
CTR Click-through rate = clicks / impressions.
CVR Conversion rate = conversions / clicks (or / impressions for view-through).
ROAS Return on ad spend = revenue attributed / spend.
Bid Max the advertiser is willing to pay for the desired outcome.
Clearing price Actual price charged (second-price or VCG minimum).
Pacing Rate at which a budget is spent through the day/week.
Frequency cap Max impressions per user per period.
Attribution window Lookback from conversion to the qualifying impression/click (e.g., 7-day click, 1-day view).
Click-through attribution Credit to the ad the user clicked.
View-through attribution Credit to the ad the user saw (but didn't click) if they later convert.
Multi-touch attribution Credit distributed across touchpoints (linear, position-based, Shapley).
Pixel / SDK / CAPI Three ways advertisers report conversions (web tag / mobile SDK / server API).
Surface Where the ad appears (main feed, short-form video, stories, etc.).
Placement Finer-grained location within a surface.
Creative The actual asset (image/video/text) shown.
Ad set / ad group Targeting + budget unit within a campaign.
Objective The advertiser's goal (conversions, reach, app installs, etc.).
Custom audience Advertiser-uploaded list matched to platform users.
Lookalike ML-expanded audience based on a seed.
Retargeting Ads served to users who previously visited the advertiser.
Incrementality True causal lift (vs. naive attribution).
Holdout Randomly withheld group for incrementality measurement.
Backfill Recomputing historical data with updated logic or corrections.
SCD2 Slowly changing dimension type 2 — preserves history of entity changes.
PII Personally identifiable information.
T0–T4 Data tiers from rawest (PII) to safest (DP aggregates).
ATT Apple's App Tracking Transparency (iOS opt-in requirement).
SKAN / SKAdNetwork Apple's privacy-preserving attribution for iOS.
AMP / UMP Aggregated / Unified Metrics Platform — the semantic layer pattern.
Bronze / Silver / Gold Medallion architecture layers (raw / cleaned / business).
Airflow / Dagster / Prefect Common batch orchestrators.
Flink / Structured Streaming / Kafka Streams Common stateful stream processors.
Trino / Presto Distributed SQL query engines.
Druid / Pinot / ClickHouse Common real-time OLAP stores.

Closing

That's the foundation. The rest of this guide assumes you can:

  • Draw the 7-stage lifecycle.
  • Explain the three flows through the pipeline stack.
  • Reason about privacy tiers when designing a new join.
  • Identify which of the three questions a new requirement is really asking.
  • Translate between the generic vocabulary here and whatever your target company calls their equivalent systems.

If any of that feels shaky, come back to this chapter before moving on. Part 1 walks into a senior interview loop with exactly these assumptions.

↑ Back to top

Part 01

Mock Interview + 10 Scenarios

Part 1: Mock Interview + 10 Scenarios + Practice

"Senior interviews aren't about having the answer. They're about having the judgment to pick among three plausible answers and defend the choice."


How to use this part

A full senior-level loop at a large consumer ads platform is typically 5 rounds: system design, SQL, Python / coding, streaming-vs-batch, and behavioral. This part walks the first four as if you were in the room, then runs 10 additional scenarios you should be prepared to answer cold.

Read each round once end-to-end. Then close the page and try to reproduce the architecture, the SQL, or the code from scratch. You'll find the gaps quickly.


The Loop (5 rounds, what each is testing)

Round Format What's actually being measured
1 — System Design Whiteboard / doc, 60 min Scale thinking, trade-off fluency, pipeline taste
2 — SQL Deep Dive Shared editor, 45 min Window functions, join shapes, anti-patterns
3 — Python / Coding Shared editor, 60 min Design sense, extensibility, tests
4 — Streaming vs Batch Whiteboard, 45 min Reconciliation, semantics, cost modeling
5 — Behavioral Video, 45 min Leadership, conflict handling, STAR+I storytelling

Behavioral is covered in Part 2 (scenarios 19–22). This part handles rounds 1–4 plus the technical scenario bank.


Round 1: System Design — Unified Metrics Platform

The prompt

"Design a Unified Metrics Platform (UMP) that serves every ad metric — impressions, clicks, spend, conversions, ROAS — to every downstream consumer: advertiser dashboards, internal analysts, the pacing system, the ML feature store, the exec scorecard. Treat it as greenfield. Scale: 50B impressions/day, 500K advertisers, 2M analysts and engineers running queries."

Clarifying questions you should ask (first 3 minutes)

  • Freshness requirements. Advertiser dashboards: minutes. Pacing: seconds. Analysts: hourly to daily. Exec scorecards: daily.
  • Truth model. Is there one canonical number per metric, or do different consumers get different numbers? (Answer: one canonical, but with explicit latency tiers.)
  • Attribution stance. Does the platform decide attribution, or does each consumer pick? (Answer: platform decides; consumers pick the window.)
  • Privacy tier. Do advertisers see user-level events? (No — only aggregates above a minimum cohort size.)
  • Cost budget. Warehouse spend cap, hot-store footprint.

If you skip these, you're designing blind. A strong opening is 30 seconds of clarifying before you touch the whiteboard.

Back-of-envelope sizing

  • Impressions: 50B/day ≈ 580K/sec peak; 2× for clicks + conversions + auction events on top.
  • Wide row size at bronze: ~1.5 KB raw. 50B × 1.5 KB = 75 TB/day just for impressions.
  • Storage over 2 years retention: 75 TB × 730 × 3 (fanout to silver + gold) ≈ 165 PB.
  • QPS on the serving tier (advertiser dashboards): ~500K advertisers × ~2 queries / min active window ≈ 20K QPS peak; 99th percentile should be <200 ms.
  • Streaming lag target: <60 sec end-to-end on the pacing path; <5 min on the advertiser path.

Write these numbers on the board before drawing anything. The interviewer wants to see the math.

Architecture

Ads Telemetry Pipeline
Client SDK · browser · server CAPI
Event streaming spine
Kafka-style · 1-day retention · ~600 K events/sec peak
Bronze (raw)
Iceberg · hourly
Silver
Clean · dedup · joined to dim
↓ daily
Gold · semantic layer
Metric definitions
Stream processor
Flink · micro-batch
Real-time OLAP
Druid / Pinot · sec-level
Advertiser UI
Pacing counter KV
Redis · sub-sec
Auction gate
Gold feeds ↓
metrics registry
dashboards
feature store
exec scorecard

Schema (facts + dims)

-- Bronze: one row per event, immutable
CREATE TABLE bronze.events (
  event_ts        TIMESTAMP,
  event_id        BINARY(16),
  event_name      STRING,          -- impression, click, conversion, bid_win
  user_id         BIGINT,
  device_id       STRING,
  session_id      STRING,
  ad_id           BIGINT,
  campaign_id     BIGINT,
  advertiser_id   BIGINT,
  surface_id      INT,
  placement_id    INT,
  geo_country     STRING,
  device_os       STRING,
  properties      MAP<STRING,STRING>  -- catch-all
)
PARTITIONED BY (dt STRING, event_name STRING);

-- Silver: fact_impression at its natural grain
CREATE TABLE silver.fact_impression (
  dt              DATE,
  impression_ts   TIMESTAMP,
  impression_id   BINARY(16),
  user_id         BIGINT,
  ad_id           BIGINT,
  campaign_id     BIGINT,
  advertiser_id   BIGINT,
  surface_id      INT,
  placement_id    INT,
  geo_country     STRING,
  device_os       STRING,
  clearing_cpm    DECIMAL(12,6),
  spend_cents     BIGINT,
  is_valid        BOOLEAN,          -- after invalid-traffic filter
  experiment_ids  ARRAY<BIGINT>
)
PARTITIONED BY (dt);

-- Gold: aggregated at common consumer grains
CREATE TABLE gold.metric_campaign_daily (
  dt                  DATE,
  campaign_id         BIGINT,
  advertiser_id       BIGINT,
  impressions         BIGINT,
  reach               BIGINT,
  clicks              BIGINT,
  ctr                 DOUBLE,
  spend_usd           DECIMAL(18,4),
  conversions_click_7d BIGINT,
  conversions_view_1d  BIGINT,
  conversion_value_usd DECIMAL(18,4),
  roas                DOUBLE
)
PARTITIONED BY (dt);

-- Dim: campaigns as Type-2 SCD
CREATE TABLE dim.campaign (
  campaign_id     BIGINT,
  advertiser_id   BIGINT,
  name            STRING,
  objective       STRING,
  bid_strategy    STRING,
  daily_budget    DECIMAL(18,4),
  valid_from      TIMESTAMP,
  valid_to        TIMESTAMP,        -- NULL for current version
  is_current      BOOLEAN
)
PARTITIONED BY (is_current);

The aggregation pipeline

The Aggregation Pipeline
Bronze · raw events
↘ streaming
1-minute windows
→ Real-time OLAP · serves pacing, near-real-time UI
↘ batch
hourly → silver → daily rollup → gold
└ gold marts
metric_campaign_daily
metric_creative_daily
metric_audience_daily
metric_platform_daily · exec

Cadences:

  • Streaming: 1-minute tumbling windows, watermark = 2 min.
  • Hourly batch: kicks at hour+5, processes impressions from the prior hour. Dedup on impression_id. Joins dim.campaign as of event time.
  • Daily rollup: kicks at 02:00 local time of the advertiser's reporting zone. Produces gold tables. This is the number the advertiser sees.
  • Reconciliation: a 7-day trailing rerun adjusts for late conversions (a conversion today might attribute to a click 6 days ago).

Serving layer

  • Advertiser dashboards: read from gold tables through a caching proxy with 2-minute TTL. P99 target <200 ms.
  • Pacing gate: reads directly from the Redis counter (updated by the streaming path).
  • Analyst queries: Trino/Presto against silver + gold. Optional materialized views for top 50 query shapes.
  • Exec scorecard: reads gold, locked at end-of-day; changes to prior days require a signed-off backfill.

Failure modes you should name

  • Late-arriving conversions. A click today might convert 7 days later. Batch must re-open windows for 7 days. Streaming shows a preview that will shift.
  • Late data quality (DQ) discoveries. You find a bug in a filter 3 days after the fact. You must recompute; the advertiser already saw the wrong number.
  • Idempotency. Retries can't double-count. Use event_id as the natural primary key; dedupe in silver.
  • Clock skew. Client timestamps lie. Use server-receive timestamp for billing-critical metrics; keep client timestamp for UX analysis.
  • Hot-key skew. A single huge advertiser can blow up a shuffle. Salt the campaign_id on the shuffle key.

Trade-offs to call out

Dimension Cheap / fast Expensive / correct
Freshness 24-hour batch only Streaming + hourly batch + daily locks
Truth model Each team has own pipeline Unified platform owns definitions
Schema evolution Add columns freely SCD-2 dims + contract tests
Late data Drop events >1h late Re-open windows 7 days
Cost Raw access, no materializations Pre-aggregate top shapes

Weak vs strong answer

A weak answer draws a single Airflow DAG, mumbles about "the warehouse," and can't say what's streamed vs batched. A mid answer gets the medallion structure right but handwaves the reconciliation story. A strong answer names late-arriving conversions as a first-class concern, quantifies freshness by consumer, and has a clear stance on where truth lives ("batch wins arguments; streaming gives you a preview"). Bonus points: mention the metric registry as the place definitions are versioned, and describe how a change flows from registry to all consumers.


Round 2: SQL Deep Dive — Attribution + Benchmarking

Prompt 1: Multi-touch click attribution

"Given raw click and conversion events, attribute each conversion to the most recent click from the same user within a 7-day window. One row per conversion. Include the campaign and creative of the attributed click."

WITH c AS (
  SELECT user_id, conversion_id, conversion_ts, conversion_value_usd
  FROM events
  WHERE event_name = 'conversion'
    AND conversion_ts >= CURRENT_DATE - INTERVAL '30' DAY
),
k AS (
  SELECT user_id, click_id, click_ts, campaign_id, ad_id, creative_id
  FROM events
  WHERE event_name = 'click'
    AND click_ts >= CURRENT_DATE - INTERVAL '37' DAY   -- wider window for joining
),
joined AS (
  SELECT
    c.conversion_id,
    c.user_id,
    c.conversion_ts,
    c.conversion_value_usd,
    k.click_id,
    k.click_ts,
    k.campaign_id,
    k.ad_id,
    k.creative_id,
    ROW_NUMBER() OVER (
      PARTITION BY c.conversion_id
      ORDER BY k.click_ts DESC
    ) AS rn
  FROM c
  LEFT JOIN k
    ON c.user_id = k.user_id
   AND k.click_ts <= c.conversion_ts
   AND k.click_ts >= c.conversion_ts - INTERVAL '7' DAY
)
SELECT
  conversion_id,
  user_id,
  conversion_ts,
  conversion_value_usd,
  click_id,
  click_ts,
  campaign_id,
  ad_id,
  creative_id,
  DATEDIFF(conversion_ts, click_ts) AS days_from_click
FROM joined
WHERE rn = 1
   OR rn IS NULL;   -- keep conversions with no matching click (organic)

Why this and not that. The ROW_NUMBER() approach is stable with ties (two clicks in the same millisecond) because you can add a secondary order key if needed. Alternatives: QUALIFY (cleaner in Snowflake), correlated subqueries (O(n²) on large tables), self-joins with NOT EXISTS (hard to read). The LEFT JOIN preserves unattributed conversions, which you almost always want for a "share of organic" metric.

Prompt 2: Benchmark creatives against their account's rolling mean

"For each creative × day, compute CTR. Rank it against the trailing 28-day mean of CTRs across all creatives of the same advertiser. Output the top and bottom 5% by z-score per advertiser per day."

WITH daily AS (
  SELECT
    dt,
    creative_id,
    advertiser_id,
    SUM(impressions) AS imps,
    SUM(clicks) AS clicks,
    CAST(SUM(clicks) AS DOUBLE) / NULLIF(SUM(impressions), 0) AS ctr
  FROM fact_creative_perf_daily
  WHERE dt BETWEEN CURRENT_DATE - 60 AND CURRENT_DATE - 1
  GROUP BY 1, 2, 3
),
with_baseline AS (
  SELECT
    dt, creative_id, advertiser_id, imps, clicks, ctr,
    AVG(ctr) OVER (
      PARTITION BY advertiser_id
      ORDER BY dt
      ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING
    ) AS baseline_mean,
    STDDEV_POP(ctr) OVER (
      PARTITION BY advertiser_id
      ORDER BY dt
      ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING
    ) AS baseline_sd
  FROM daily
),
ranked AS (
  SELECT
    *,
    (ctr - baseline_mean) / NULLIF(baseline_sd, 0) AS z_score,
    PERCENT_RANK() OVER (
      PARTITION BY advertiser_id, dt
      ORDER BY (ctr - baseline_mean) / NULLIF(baseline_sd, 0)
    ) AS pct_rank
  FROM with_baseline
  WHERE imps >= 1000                       -- minimum volume gate
    AND baseline_sd IS NOT NULL
)
SELECT *
FROM ranked
WHERE pct_rank <= 0.05 OR pct_rank >= 0.95
ORDER BY advertiser_id, dt, z_score;

Why this and not that. Window functions over a 28-day trailing range expressed in rows need the input to be dense (every day present per creative). If your table has gaps, switch to RANGE BETWEEN INTERVAL '28' DAY PRECEDING. The volume gate (imps >= 1000) is critical — without it, a creative with 10 impressions and 1 click has a meaningless CTR that dominates the z-score. Anti-patterns to avoid: computing mean via subquery per row (quadratic), using RANK() without the volume gate, forgetting NULLIF on the stddev (zero-variance advertisers blow up).

Anti-patterns

  • SELECT * on fact tables. Always project columns.
  • DISTINCT as a crutch for broken joins. If you need DISTINCT, your join is wrong.
  • String-concatenated keys (user_id || '-' || ad_id). Compose joins on typed columns; build indexes on the natural keys.
  • DATEDIFF(day, ...) when you meant DATEDIFF('hour', ...). Spot-check units.

Round 3: Python Exercise — Config-Driven SQL Generator + Validator

The prompt

"Build a library that takes a YAML metric definition and generates the SQL to compute it. Also include a pluggable data-quality framework that runs checks on the output table."

Module layout

Tradeoff Triangle
YAML
  • row_count
  • null_rate
  • freshness
SQL
  • row_count
  • null_rate
  • freshness
ABC
  • row_count
  • null_rate
  • freshness

The metric YAML

# metrics/defs/campaign_daily.yml
name: metric_campaign_daily
description: Per-campaign daily aggregates
grain: [dt, campaign_id, advertiser_id]
source: silver.fact_impression
time_column: dt
measures:
  - name: impressions
    expr: COUNT(*)
  - name: reach
    expr: COUNT(DISTINCT user_id)
  - name: spend_usd
    expr: SUM(spend_cents) / 100.0
filters:
  - expr: is_valid = TRUE
dq_checks:
  - check: row_count_min
    params: { min: 100000 }
  - check: null_rate_max
    params: { column: campaign_id, max: 0.0 }
  - check: freshness_max_lag_hours
    params: { column: dt, max: 30 }

The generator

"""generator.py — turn a MetricDef into SQL."""
from __future__ import annotations
from dataclasses import dataclass
from typing import Sequence
import yaml


@dataclass(frozen=True)
class Measure:
    name: str
    expr: str


@dataclass(frozen=True)
class MetricDef:
    name: str
    source: str
    grain: Sequence[str]
    measures: Sequence[Measure]
    filters: Sequence[str] = ()
    time_column: str = "dt"


def load_metric(path: str) -> MetricDef:
    """Parse a metric YAML into a MetricDef."""
    with open(path) as f:
        raw = yaml.safe_load(f)
    measures = [Measure(m["name"], m["expr"]) for m in raw["measures"]]
    filters = tuple(f["expr"] for f in raw.get("filters", []))
    return MetricDef(
        name=raw["name"],
        source=raw["source"],
        grain=tuple(raw["grain"]),
        measures=tuple(measures),
        filters=filters,
        time_column=raw.get("time_column", "dt"),
    )


def compile_sql(m: MetricDef, dt_from: str, dt_to: str) -> str:
    """Compile a MetricDef into a partitioned-insert SQL block."""
    select_cols = list(m.grain) + [f"{msr.expr} AS {msr.name}" for msr in m.measures]
    where = [f"{m.time_column} BETWEEN DATE '{dt_from}' AND DATE '{dt_to}'"]
    where.extend(m.filters)
    group_by = ", ".join(str(i + 1) for i in range(len(m.grain)))
    return (
        f"INSERT OVERWRITE TABLE gold.{m.name}\n"
        f"SELECT\n  " + ",\n  ".join(select_cols) + "\n"
        f"FROM {m.source}\n"
        f"WHERE " + "\n  AND ".join(where) + "\n"
        f"GROUP BY {group_by};"
    )

The DQ framework

"""dq/base.py — check base class + runner."""
from __future__ import annotations
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any, Mapping


@dataclass
class CheckResult:
    name: str
    passed: bool
    observed: Any
    threshold: Any
    message: str = ""


class Check(ABC):
    """A data-quality check. Subclasses implement run()."""

    def __init__(self, params: Mapping[str, Any]):
        self.params = params

    @abstractmethod
    def run(self, conn, table: str) -> CheckResult:
        raise NotImplementedError


class RowCountMin(Check):
    """Fail if the table has fewer than `min` rows for today's partition."""

    def run(self, conn, table):
        n = conn.scalar(f"SELECT COUNT(*) FROM {table} WHERE dt = CURRENT_DATE")
        ok = n >= self.params["min"]
        return CheckResult(
            name="row_count_min",
            passed=ok,
            observed=n,
            threshold=self.params["min"],
            message="" if ok else f"row count {n} < {self.params['min']}",
        )


class NullRateMax(Check):
    """Fail if fraction of NULLs in `column` exceeds `max`."""

    def run(self, conn, table):
        col = self.params["column"]
        q = (
            f"SELECT SUM(CASE WHEN {col} IS NULL THEN 1 ELSE 0 END) * 1.0 / "
            f"NULLIF(COUNT(*), 0) FROM {table} WHERE dt = CURRENT_DATE"
        )
        rate = conn.scalar(q) or 0.0
        ok = rate <= self.params["max"]
        return CheckResult("null_rate_max", ok, rate, self.params["max"])


class FreshnessMaxLagHours(Check):
    """Fail if newest row is older than `max` hours."""

    def run(self, conn, table):
        col = self.params["column"]
        q = f"SELECT EXTRACT(EPOCH FROM (NOW() - MAX({col}))) / 3600 FROM {table}"
        lag = conn.scalar(q)
        ok = lag <= self.params["max"]
        return CheckResult("freshness_max_lag_hours", ok, lag, self.params["max"])


CHECK_REGISTRY = {
    "row_count_min": RowCountMin,
    "null_rate_max": NullRateMax,
    "freshness_max_lag_hours": FreshnessMaxLagHours,
}


def run_checks(conn, table, dq_cfg) -> list[CheckResult]:
    results = []
    for cfg in dq_cfg:
        cls = CHECK_REGISTRY[cfg["check"]]
        results.append(cls(cfg.get("params", {})).run(conn, table))
    return results

Tests

# tests/test_generator.py
from metrics.generator import MetricDef, Measure, compile_sql


def test_compile_sql_basic():
    m = MetricDef(
        name="m_test",
        source="silver.fact_x",
        grain=("dt", "campaign_id"),
        measures=(Measure("imps", "COUNT(*)"),),
        filters=("is_valid = TRUE",),
    )
    sql = compile_sql(m, "2026-04-01", "2026-04-01")
    assert "INSERT OVERWRITE TABLE gold.m_test" in sql
    assert "COUNT(*) AS imps" in sql
    assert "is_valid = TRUE" in sql
    assert "GROUP BY 1, 2" in sql

What the interviewer is looking for

Not "does it work." They want: explicit types, a registry pattern for extensibility, separation of concerns (parse, compile, run), tests that document behavior, and a DQ framework that cleanly accepts new check types. The ability to talk through "how would I add a new check?" in 30 seconds is the differentiator.


Round 4: Streaming vs Batch — Spend Monitoring with Reconciliation

The prompt

"An advertiser sets a $10,000 daily budget. The platform must stop delivery at 95% of cap in under 60 seconds. At end of day, recompute actual spend from billing-truth data and reconcile. Design the full pipeline; discuss failure modes."

The two-loop design (again, because it's core)

Diagram
  • Fast loop (seconds) Slow loop (hours)
  • impression events stream processor billing pipeline (batch)
    • spend_counter_kv fact_spend_billed_hourly
    • pacing_gate
      • reconciliation job
      • (hourly) corrects KV

Fast loop is at-least-once with idempotent writes. Slow loop is exactly-once with immutable outputs.

Streaming pseudocode

# Flink-style stream topology (conceptual)
events = (
    kafka_source(topic="impressions", group="pacing")
    .assign_watermark(max_out_of_order=timedelta(seconds=30))
    .filter(lambda e: e.event_name == "impression")
)

# Dedup on impression_id (bounded state: 24-hour TTL)
deduped = events.key_by("impression_id").process(DedupByKey(ttl=timedelta(hours=24)))

# Windowed spend aggregation (1-min tumbling, 15-min slide for allowed-lateness)
windowed = (
    deduped
    .key_by("campaign_id")
    .window(Tumbling(duration=timedelta(minutes=1)))
    .allowed_lateness(timedelta(minutes=15))
    .aggregate(SumSpendCents())
)

# Publish to the pacing KV
windowed.sink_to(RedisSink(key_fn=lambda c, w: f"spend:{c.campaign_id}:{w.date}"))

Watermarking

  • Event time is the client-reported or server-received timestamp.
  • Watermark is the minimum event-time the processor has observed, minus max_out_of_order. A window fires when watermark passes window end.
  • Late events (past watermark) are routed to a side-output, written to a reconciliation topic, and absorbed on the next hourly batch.

Batch reconciliation SQL

-- Hourly job: compare ground-truth billed spend to streaming counter
WITH billed AS (
  SELECT
    campaign_id,
    DATE(impression_ts) AS dt,
    DATE_TRUNC('hour', impression_ts) AS hr,
    SUM(spend_cents) AS billed_cents
  FROM silver.fact_impression_billable
  WHERE dt = CURRENT_DATE
    AND impression_ts >= CURRENT_TIMESTAMP - INTERVAL '2' HOUR
  GROUP BY 1, 2, 3
),
counter_snapshot AS (
  SELECT campaign_id, dt, hr, counter_cents
  FROM pacing_counter_snapshot
  WHERE dt = CURRENT_DATE
    AND hr = DATE_TRUNC('hour', CURRENT_TIMESTAMP - INTERVAL '1' HOUR)
)
SELECT
  b.campaign_id, b.dt, b.hr,
  b.billed_cents,
  COALESCE(c.counter_cents, 0) AS counter_cents,
  b.billed_cents - COALESCE(c.counter_cents, 0) AS delta_cents
FROM billed b
LEFT JOIN counter_snapshot c
  USING (campaign_id, dt, hr)
WHERE ABS(b.billed_cents - COALESCE(c.counter_cents, 0)) > 0.01 * b.billed_cents;
-- Any delta > 1% is absorbed into the counter by the next pass

Can you get rid of the batch layer (Kappa)?

An interviewer sometimes pushes: "why do you need the batch layer at all? Can't streaming be the source of truth?" The right response:

  • Yes, theoretically. Kappa architecture has been defensible for 10 years.
  • In ads practice, no. Billing requires provable determinism: same inputs, same outputs, every time. Streaming with late data, out-of-order events, and consumer failover cannot make that guarantee cheaply at 50B/day.
  • So batch is the auditable system of record; streaming is a fast approximation with reconciliation. This is how every large ads platform works in 2026.

Mention the cost too: a pure-Kappa pipeline re-processes the entire history every time a bug is fixed. Batch corrections are cheaper at scale.

Semantics

  • Exactly-once end-to-end is achievable in a single system (Flink + Kafka with transactional sinks) but fragile across system boundaries.
  • At-least-once + idempotent sinks is the norm. Dedupe on impression_id at the sink; every downstream consumer can be replayed without double-counting.

The 10 Scenarios

Each is written for ~5 minutes of whiteboard time. The pattern is the same: restate the prompt, name the grain, draw the pipeline, call out the 3 biggest pitfalls, end with a follow-up question the interviewer will respect.

Scenario 1: Campaign Reporting Pipeline (batch)

Grain: (dt, campaign_id, audience_id, placement_id). Sources: fact_impression, fact_click, fact_conversion_attributed, dim_campaign. Cadence: hourly + daily locked. Output: gold.metric_campaign_daily, served to advertiser dashboards.

Pitfalls: (a) late conversions re-opening windows for 7 days — every day's number is mutable until D+7; (b) time-zone mismatch between the advertiser's reporting zone and UTC — store in UTC, convert on read; (c) dim_campaign SCD-2 not snapshotted at event time — you'll see "current" campaign names on historical data, which confuses advertisers who renamed their campaign.

Follow-up: "When does a number become immutable?" Answer: D+7 lock, or sooner if you're confident all late data has arrived.

Scenario 2: Real-Time Delivery Monitoring (streaming)

Need: under-delivery alert within 5 minutes. Architecture: streaming job consumes impressions, maintains delivered_so_far per campaign, compares to pacing_curve[hour], emits alert if delivered < 0.7 × expected sustained for 10 minutes.

Pitfalls: (a) the pacing curve itself is an assumption; if auction density surprises you, alert fires wrongly — gate on absolute thresholds too; (b) start-of-day noise: first 30 minutes are always noisy, delay alerts; (c) opaque alert: always attach "why under-delivering" (budget exhausted, bid too low, audience too narrow) — an alert without a diagnosis is useless.

Follow-up: "What do you do if the alert fires and the campaign is fine?" Answer: tune thresholds, add suppression, build a playbook for reviewers.

Scenario 3: Attribution at Scale (ID graph + windowing)

Covered in depth in Part 2 Scenario 12, but the short form: resolve each event to an id_graph component; collect clicks + impressions per component; for each conversion, apply the attribution rule (last click 7d, last view 1d); emit one row per attributed conversion.

Pitfalls: (a) component explosions from bad probabilistic edges — cap component size; (b) attribution rules must be versioned, because advertisers will debate your choices; (c) the "unattributed" bucket is first-class — make it visible so growth isn't understated.

Follow-up: "How do you A/B test attribution rules?" Answer: shadow attribution in parallel; log both; let analysts compare.

Scenario 4: Audience Segmentation

Problem: advertisers define audiences ("users interested in running, age 25–40, US"). Two flavors:

  • Static audiences — user list uploaded by the advertiser. Stored as a Bloom filter per audience, refreshed daily.
  • Dynamic audiences — rule-based over user attributes. Recomputed hourly or on-attribute-change.

Pitfalls: (a) audience overlap — a user can be in 50+ audiences, each matching at auction time adds latency; (b) freshness mismatch — a dynamic audience refreshed hourly can lag user state by 58 minutes; (c) privacy — audiences below a minimum size (e.g., 1000 users) must be blocked for targeting to prevent inference attacks.

Schema:

CREATE TABLE audience_membership (
  audience_id  BIGINT,
  user_id      BIGINT,
  added_ts     TIMESTAMP,
  source       STRING       -- 'upload', 'rule', 'lookalike', 'interaction'
) PARTITIONED BY (audience_id);

Follow-up: "How do you measure audience overlap efficiently?" Answer: HyperLogLog sketches per audience, intersect via MinHash estimates.

Scenario 5: Creative A/B Testing

Covered in Part 2 Scenario 14 for the analytics side. Key DE pitfalls:

  • Unit of randomization. User, not impression. Impression-level assignment contaminates.
  • Pre-exposure bias. Don't count metrics before the user was first exposed.
  • Novelty effect. Drop the first 3 days of every creative experiment.
  • Multiple comparisons. An experiment with 8 creatives needs corrected p-values.

Follow-up: "If two creatives are statistically tied after two weeks, how do you decide?" Answer: check secondary metrics (frequency, fatigue, halo effects) or declare a tie and rotate.

Scenario 6: DQ Frameworks at Scale

Real DQ has three axes:

  • Integrity. Schema matches contract, no NULLs where forbidden, FKs resolve.
  • Freshness. Last-row timestamp within SLA.
  • Drift. Row count, average metric values, distribution stability versus trailing baseline.

Architecture:

Pipeline Flow
pipeline emits check runner dq_result_daily
from dq_result_daily
pass ─► downstream unblocked
fail ─► page owner, block downstream

Pitfalls: (a) alert fatigue — every check has a priority and SLA; (b) dq_result table must be first-class — downstream consumers read it to gate their own jobs; (c) checks must be versioned — adding a new check shouldn't retroactively fail yesterday.

Follow-up: "How do you roll out a new, stricter check?" Answer: shadow mode (logs but doesn't block) for 2 weeks, then gate.

Scenario 7: Backfills

Requirements: bounded, idempotent, cheap. A backfill that re-runs 90 days of daily rollups must produce byte-identical output to the original (modulo bug fixes). Key patterns:

  • Partition-aware writes. INSERT OVERWRITE PARTITION (dt = ...), never DELETE then INSERT.
  • Deterministic random. Any random sampling uses a seed derived from (dt, partition_key).
  • Idempotent sources. Bronze partitions must be immutable; if you re-run yesterday, you read the same bronze partition.
  • Cost gate. A 90-day backfill at 50B events/day can cost $50K; require manager approval for any backfill > 30 days.

Follow-up: "How do you detect when a backfill produced different numbers than expected?" Answer: a fingerprint check — compare row count and key metric totals, pre/post, and alert on deltas > 0.1%.

Scenario 8: SCD for Campaign Entities

Campaigns change over time. A bid strategy flip at noon matters for attribution.

CREATE TABLE dim.campaign_history (
  campaign_id   BIGINT,
  valid_from    TIMESTAMP,
  valid_to      TIMESTAMP,
  name          STRING,
  objective     STRING,
  bid_strategy  STRING,
  daily_budget  DECIMAL(18,4),
  change_reason STRING
);

-- Query "as-of" a given impression time
SELECT f.impression_id, c.bid_strategy, f.clearing_cpm
FROM fact_impression f
JOIN dim.campaign_history c
  ON f.campaign_id = c.campaign_id
 AND f.impression_ts >= c.valid_from
 AND f.impression_ts <  c.valid_to;

Pitfalls: (a) valid_to must default to far-future, not NULL, to use range joins; (b) changes must be written atomically — a dual-write pattern where old row's valid_to closes and new row opens in the same transaction; (c) time-travel queries must always pass the event time.

Follow-up: "How do you deal with changes that applied retroactively?" Answer: new row with a retroactive_from flag; downstream chooses whether to honor retroactively.

Scenario 9: Incident Response — Spend Spike at 3am

"Page: platform-wide spend is running 40% above forecast. You're on-call. Walk me through the first 30 minutes."

Runbook:

  1. Confirm the signal. Is it a single advertiser, a single surface, or broad? Look at metric_platform_minute sliced by surface, geo, campaign cohort.
  2. Localize. Often it's one mega-advertiser with a bid bug — their campaign suddenly bidding $100 CPM. Or a pacing-system bug that disabled all gates.
  3. Triage — is customer action needed? If single advertiser and their fault: inform account team, no platform action. If platform-wide: consider emergency pacing override.
  4. Inform stakeholders. 15-minute update cadence until stable.
  5. Post-incident: forensic pipeline run, exact delta spend, refund decisions.

Pitfall: the temptation to "fix first, investigate later." Don't. You might kill a legitimate advertiser spending burst (Super Bowl, product launch). Always confirm anomaly vs legitimate.

Scenario 10: Senior Technical Leadership — Platform Migration

"Your team needs to migrate from an aging warehouse to a new one. 500+ pipelines, 2000+ dashboards, 30+ downstream teams. Walk me through the plan."

Shape:

  • Phase 0 (1 month): inventory. Build the lineage DAG. Identify the 50 tables that feed 80% of consumers.
  • Phase 1 (3 months): dual-write. New warehouse runs in parallel for the top 50 tables. Validate byte-by-byte for 2 weeks per table.
  • Phase 2 (6 months): consumer migration. Each consumer migrates on their schedule. Provide tooling (SQL-dialect translator, schema diff checker).
  • Phase 3 (2 months): deprecation. Old warehouse is read-only. Set an end-of-life date 6 months out.
  • Phase 4: decommission.

Risks to call out: SQL-dialect differences (forget one CASE expression and a dashboard breaks silently), cost spike during dual-write, political cost if one team's blocker delays the whole migration.

Governance: weekly stakeholder review; public dashboard of migration progress; a "red/yellow/green" status per consumer. Transparency is how you stay credible.


Appendix: Loop Preparation Checklist

  • I can draw the UMP architecture from memory.
  • I've written the attribution SQL cold, twice.
  • I've coded the config-driven generator + DQ framework.
  • I have 6 STAR+I behavioral stories prepped (see Part 2).
  • I can name the top 5 pitfalls of each of the 10 scenarios.
  • I know the current numbers cold: impressions/day, storage/day, QPS.
  • I've read Part 4's glossary and don't blink at any term.
  • I've done at least one full mock loop with a friend.
  • Sleep. Eat protein. Show up.

← Master Index · Part 0 · Part 2 →

↑ Back to top

Part 02

Advanced Ads, Growth & Behavioral

Part 2: Advanced Ads + Growth Accounting + Behavioral

"Mid-level candidates are judged on what they can build. Senior candidates are judged on what they'd change, what they'd push back on, and what they'd tell the room nobody else will say. Scenarios 11–22 are where that line gets drawn."


How to read this part

Part 1 walked a full five-round loop plus ten core technical scenarios. Part 2 picks up the scenario stream at number 11 and runs through 22. The first eight are advanced ads engineering — auction mechanics, identity graphs, pacing, creative analytics, growth accounting, experimentation, lineage, cost — all the places a senior should have strong opinions. The final four are behavioral — the STAR+I framework plus three of the most common senior-level prompts.

None of these scenarios are "draw a diagram." They're "here's a messy real-world situation — what do you do." Senior interviewers are looking at three things:

  1. Substance. Do you understand the actual mechanics, not just the buzzwords?
  2. Taste. Do you know what's worth building vs. what's a distraction?
  3. Communication. Can you explain a trade-off in 90 seconds without hand-waving?

The scenarios below are written with all three in mind. The prose is deliberately dense. Read each one twice.


Scenario 11: Ad Auction Data Engineering at 100B+ Events/Day

The prompt

"Walk me through how you'd design the data pipeline for the ad auction system. You're logging auction events at 100B+ per day. What's streamed, what's batched, what's sampled, what's dropped, and why?"

The funnel

Every ad shown to a user is the output of an auction. The funnel is roughly:

Diagram
  • bid_request candidate_selection ranking auction clearing impression
    • (100%) (top ~5000) (top ~50) (top ~5) (winner) (rendered)
  • ~1T/day candidate pool ranked pool auction pool ~50B/day ~50B/day

A single ad request typically produces:

  • 1 bid_request event — what the user asked for, what slot was available.
  • ~5,000 candidate events — every ad that was eligible before any ranking happened.
  • ~50 ranked events — ads that survived the first ranker pass.
  • ~5 auction events — final auction participants (after floor price, diversity constraints, policy filters).
  • 1 clearing event — who won, at what price, why.
  • 1 impression event — actually rendered (if the client confirmed render).

If a large platform runs 50B impressions/day, the upstream funnel is roughly:

Stage Events/day Storage if logged raw
bid_request 200B–1T ~50–250 TB/day raw
candidates 250T+ (50B × 5000) Infeasible to log raw
ranked 2.5T (50B × 50) ~600 TB/day raw
auction 250B (50B × 5) ~60 TB/day raw
clearing + impression 50B each ~12 TB/day each raw

You can't log the candidate pool raw. You can't even log the ranked pool raw at this scale. That's the core problem.

Schema

The auction fact grain is the ad_request × slot × stage composite. You keep one wide row for the terminal stages (auction + clearing + impression) and a sampled fanout for the upstream stages.

-- Terminal: one row per impression, fully dimensionalized
CREATE TABLE fact_auction_clearing (
  ad_request_id       BINARY(16)  NOT NULL,       -- uuid
  slot_id             STRING      NOT NULL,       -- surface × placement key
  ts_auction_us       BIGINT      NOT NULL,       -- micros since epoch
  user_id             BIGINT,                     -- null if logged-out
  surface_id          INT,                        -- Main Feed / Visual Feed / Short-form Video / Stories / Publisher Network
  placement_id        INT,                        -- feed position, story card, etc.
  geo_country         STRING,
  device_os           STRING,
  winning_campaign_id BIGINT      NOT NULL,
  winning_ad_id       BIGINT      NOT NULL,
  winning_advertiser  BIGINT      NOT NULL,
  winning_bid_cpm     DECIMAL(12,6),
  clearing_cpm        DECIMAL(12,6),              -- second-price or generalized-second-price
  floor_cpm           DECIMAL(12,6),
  num_auction_entrants INT,
  num_ranked           INT,
  num_candidates       BIGINT,                    -- usually estimated, not exact
  auction_format      STRING,                     -- 'vcg', 'gsp', 'first_price'
  was_impressed       BOOLEAN,                    -- did the client render
  impression_ts_us    BIGINT,                     -- null until impression event arrives
  policy_filters_hit  ARRAY<STRING>,              -- audit trail
  experiment_ids      ARRAY<BIGINT>
)
PARTITIONED BY (dt STRING, hour INT)
CLUSTERED BY (winning_advertiser, winning_campaign_id);
-- Sampled upstream: losing bids in the auction pool (kept for counterfactual research)
CREATE TABLE fact_auction_losers_sampled (
  ad_request_id   BINARY(16) NOT NULL,
  slot_id         STRING     NOT NULL,
  ts_auction_us   BIGINT     NOT NULL,
  campaign_id     BIGINT     NOT NULL,
  ad_id           BIGINT     NOT NULL,
  advertiser_id   BIGINT     NOT NULL,
  bid_cpm         DECIMAL(12,6),
  rank_score      DOUBLE,                         -- pre-auction ranker score
  rank_position   INT,                            -- final rank before auction
  filter_reason   STRING,                         -- why did they lose
  sample_weight   DOUBLE                          -- inverse probability — critical for unbiased rollups
)
PARTITIONED BY (dt STRING);

Notice sample_weight. That's not optional. Without it, every downstream aggregate is biased in ways nobody will notice until an advertiser calls their account manager furious.

Sampling strategy

You have three choices at each upstream stage:

Uniform sampling — drop 99 out of every 100 events at random. Simple, but it under-weights the tail. If 0.1% of auctions are for very high-value placements, you lose most of them.

Stratified sampling — different rates per stratum. For example, keep 100% of auctions above $50 CPM, 10% between $5 and $50, 1% below $5. You preserve tail behavior. The cost is carrying the strata definitions as a first-class config. Re-sampling historically when the strata change is a pain.

Tail-biased / weighted sampling — oversample rare events (new campaigns, rare surfaces, rare geographies), undersample common ones. Use inverse-probability weighting downstream. This is what most mature platforms end up doing. The mental model is: "every event carries its own weight; aggregates multiply by SUM(weight) instead of COUNT(*)."

At the candidate-pool stage, most platforms don't log the full pool even sampled. Instead they log "reason for not winning" aggregated counters (policy filter, floor price, budget exhausted, pacing gate, bid too low, diversity constraint). That's enough to answer 90% of research questions and costs ~1000x less to store.

What's streamed, what's batched

bid_request     → streaming (real-time pacing signals, fraud detection)
candidate pool  → aggregated counters only, streamed
ranked pool     → sampled, streamed (small sample, ~1%)
auction pool    → kept for winners, sampled for losers, streamed
clearing        → streamed + logged to bronze
impression      → streamed from client, needs careful dedup
conversion      → streamed from advertiser pixel / server API, re-joined in batch

Streamed because the pacing system, fraud system, and real-time dashboards all need them fresh. Impressions in particular need to be closed against click and conversion within minutes.

Batched is where the attribution join, holistic ROAS computation, and advertiser-facing reporting are assembled. Batch owns the source of truth; streaming is the early preview.

Why keep the losing bids?

Three reasons.

Research. "If we'd raised the floor by 10%, how much revenue would we have lost?" Can't answer without counterfactuals.

Floor-price tuning. Reserve prices are tuned per (surface, placement, audience, hour, geography). That tuning needs a decade of prior auctions to do well. Keep the losing bids at least for training data.

Advertiser debugging. "Why didn't my campaign serve yesterday?" The answer is almost always upstream of the auction — rank score too low, bid too low, filter hit. You need to show that trail. If you only log winners, the answer to that question is "I don't know."

Senior-grade trade-off to mention

A weak answer logs everything and blows up the warehouse. A mid answer samples uniformly and silently biases every rollup. A senior answer logs terminal events fully, samples upstream events stratified by value, carries sample_weight through every aggregate, and builds a counter-based fallback for the ultra-high-cardinality stages. And they explain the bias trade-off in under a minute.


Scenario 12: Cross-Platform Identity Graph and Attribution

The prompt

"Users show up on your ad platform from multiple surfaces, on multiple devices, sometimes logged in, sometimes not. Design the identity graph. How do you attribute a conversion that happens on mobile web to an ad that was shown inside an app three days earlier?"

The graph model

Nodes: every observable identifier.

  • user_id — the platform's internal stable ID, available only when logged in.
  • device_id — platform SDK-generated, stable on a device until the user resets it.
  • idfa / gaid — iOS and Android advertising IDs. Reset-able by the user.
  • cookie_id — browser first-party cookie, surface-specific.
  • hashed_email / hashed_phone — deterministic PII hash, shared with advertisers via server conversion APIs.
  • ip_address — low signal, high privacy cost, used sparingly.
  • user_agent_hash — helps for probabilistic matching.

Edges: observed co-occurrences.

  • Deterministic edges — two identifiers observed on the same authenticated session. user_id ↔︎ device_id when a user logs in. High confidence, low volume.
  • Probabilistic edges — two identifiers observed with the same (ip, user_agent, timezone) within a short window. Low confidence, high volume. These are where privacy teams spend most of their time.

Schema

CREATE TABLE id_graph_edges (
  edge_id        BINARY(16),
  src_type       STRING,          -- 'user', 'device', 'idfa', 'cookie', 'hashed_email'
  src_value      STRING,          -- hashed/salted
  dst_type       STRING,
  dst_value      STRING,
  edge_kind      STRING,          -- 'deterministic', 'probabilistic'
  confidence     DOUBLE,          -- 0-1
  first_seen_ts  BIGINT,
  last_seen_ts   BIGINT,
  evidence_count INT,             -- how many times observed
  source_surface STRING
)
PARTITIONED BY (dt STRING);

CREATE TABLE id_graph_components (
  component_id     BINARY(16),
  member_type      STRING,
  member_value     STRING,
  added_ts         BIGINT,
  confidence_score DOUBLE
)
PARTITIONED BY (dt STRING);

The graph is rebuilt as components — connected subgraphs of identifiers. For a user who's logged into two devices and also visits your publisher network in an incognito browser, you might end up with a component that contains five identifiers. The attribution engine resolves events to a component, not to a single identifier.

Rebuild cadence

Daily full rebuild — the 90-day edge window is re-clustered nightly. Gives you fresh components with full evidence.

Hourly incremental merge — new edges observed in the past hour are merged into existing components. Used for near-real-time attribution.

Continuous expire — edges older than 90 days (or the local privacy window) are dropped, components re-scored.

Cost at scale: the nightly full rebuild is typically the second- or third-most expensive job in the warehouse. It processes 10s of billions of edges. Optimize by partitioning by source-type pairs so the graph can be rebuilt region-by-region.

Privacy constraints

Not every edge is legal to keep. Concretely:

  • iOS ATT non-consent — if the user didn't grant app tracking transparency, the IDFA is zeroed out. You have to drop any edge that depended on it.
  • EU (GDPR) opt-out — the user's edges must be severed within 30 days of request. Maintain a "forget list" and re-run component assignment.
  • Minors — no probabilistic edges at all for under-18s. Deterministic-only.
  • Sensitive categories — health, financial, political inference edges banned in most jurisdictions.

This is where a senior DE earns their salary. The engineering work isn't hard; the policy compliance is. Build the forget-list into the daily rebuild so it's impossible to keep forbidden edges by accident.

Attribution on top of the graph

Flow:

Conversion Path
conversion_event (raw, w/ hashed_email or device_id)
resolve → id_graph_components (get component_id)
lookup impressions / clicks seen by any member of that component
apply attribution rules (last click 7d, last view 1d, linear, etc.)
write attributed_conversion with (campaign_id, ad_id, attribution_path)

The interesting bit is the component lookup. Naive implementation: join conversion to edges, find component, join component to all members, probe impressions by each member. At scale you'll OOM or wait 6 hours.

Better: pre-compute a denormalized user_event_history keyed by component_id, refreshed hourly. Each component carries a bounded list of events. Attribution is then a single join with no fanout.

Post-ATT / post-cookie world

Apple's ATT rollout and browser third-party cookie removal broke the deterministic-dominant graph. The industry response:

  • Server-side Conversion APIs — advertisers send conversion events directly from their server, with hashed email and event metadata. Bypasses the browser entirely. Deterministic when the hashed email matches a known user.
  • SKAdNetwork (SKAN) — Apple's privacy-preserving install attribution. Postbacks are delayed, aggregated, differentially private. You engineer a pipeline that accepts SKAN postbacks, joins them with campaign metadata, and reconciles with the advertiser's measurement.
  • Modeled conversions — for the unobservable fraction, ML models estimate missing conversions using aggregated signals. The DE job is to feed the model clean, consistent training data and join the modeled output back into reporting with the right caveats.
  • Clean rooms — the advertiser and the platform both put hashed identifiers into a neutral environment where a limited set of queries can run. Edges are computed without either side seeing the raw data.

Expect a senior interviewer to ask "How has ATT changed your architecture?" If your answer is "it hasn't," the interview is over.

Common pitfalls

  • Treating a device_id change as a new device — IDFA resets are user-initiated and frequent. Handle them as edge expiry, not as a new node.
  • Letting components get too large — a low-quality probabilistic edge can merge two unrelated components into a mega-component. Put a max_component_size guard (e.g., reject merges that push a component over 50 members).
  • Using the graph for non-attribution purposes without an audit — once a graph exists, every team wants to use it. Put access controls up-front.

Scenario 13: Budget Pacing and Spend Forecasting

The prompt

"An advertiser sets a $10,000 daily budget. How do you ensure the platform spends close to that, without going over? And what happens at the end of the day if you've only spent $7,000 — how do you decide whether to keep serving?"

This is a fan-favorite senior prompt because it exposes how you reason about real-time vs. batch, stochastic systems, and partial observability all at once.

The two-loop system

Fast loop (seconds): real-time spend gate. Every auction win increments a Redis / Pinot / KV-store counter of "spend today." When the counter hits 95% of budget, the campaign is throttled — pacing score decays sharply. When it hits 100%, the campaign is pulled from the auction entirely.

Slow loop (minutes to hours): batch reconciliation. The streaming counter is known to drift from the ground truth because:

  • Client didn't render the impression — you charged but shouldn't have.
  • Impression arrived late from the client (common on mobile with bad network).
  • Invalid traffic filter decided, hours after the fact, that the impression was a bot.
  • Billing rounding rules differ from real-time cost estimates.

Every hour, a batch job reconciles the ground-truth spend with the counter. If the counter over-reported, you credit the campaign back and re-enable it. If it under-reported, you quietly absorb — over-delivery is cheaper than angering the advertiser.

Streaming Pacing ⇆ Batch Ground Truth
Streaming win events
Real-time spend counter · KV
campaign_id → spent_today
read by auction pacing gate (sub-100 ms lookup)
hourly correction
Ground-truth batch pipeline
impressions ⋈ conversions ⋈ IVT
campaign_id, hour → $spent
source of truth for billing

Pacing curves

A daily budget isn't meant to be spent in the first hour. Three patterns:

  • Even pacing — spend at 1/24th of budget per hour, aligned to the campaign's audience time zone. Default for brand campaigns.
  • Accelerated pacing — spend as fast as the auction allows. Default for direct-response campaigns where competing for impression is the point.
  • Predicted pacing — predict the hour-by-hour auction demand curve, spend proportionally. High-quality predictive pacing can lift campaign delivery by 5–10% over even pacing.

The predicted curve uses:

  • Historical auction density for the campaign's target audience (14-day rolling).
  • Hour-of-day and day-of-week seasonality.
  • Local events (holidays, sports, major product launches).
  • Weather, if applicable (outdoor retail).

The DE pipeline for this is a daily job that produces pacing_curve(campaign_id, hour, expected_spend_share) and publishes it to the real-time gate.

Spend forecast

Separate from pacing is forecasting — predicting what total spend will be for a campaign before it runs. Inputs:

  • Bid + budget + audience size from the campaign setup.
  • Historical CPM for that audience × surface × objective.
  • Current fill rate (what fraction of the audience is actually auctionable today).
  • Competitive density (how many other advertisers target the same audience).

The forecast feeds two things: (1) the advertiser's "reach estimate" in the campaign setup UI, and (2) the delivery-monitoring system's expected-vs-actual alerts.

The reconciliation problem

Over-delivery happens. The streaming counter can't possibly be perfect. Policies:

  • Budget tolerance — most platforms commit to spending no more than ~2% over the daily cap. Advertisers accept this in the ToS.
  • Make-good — if over-delivery is severe (>5%), credit the advertiser.
  • Under-delivery — no credit; the advertiser paid for what they got. But if the campaign was under-delivered because the platform throttled it wrongly (reconciliation showed streaming was over-reporting), you re-enable and try to catch up the remaining delivery over the next few days.

Engineering: the reconciliation job outputs a spend_delta per campaign per hour. A scheduled task consumes deltas and updates the real-time counter. This isn't glamorous — but every senior ads DE has touched this loop.

What happens at budget exhaustion

The user-facing question here is: hard stop or graceful?

Hard stop — when the counter hits 100%, kill the campaign immediately. Risk: if the counter is over-reporting, the campaign is wrongly killed.

Graceful — at 95%, scale pacing down. At 100%, stop bidding but don't fully remove the campaign. At 102% (over-delivery), kill and absorb the loss.

Graceful is what everyone actually does. The cost of wrongly killing a big-budget campaign (advertiser complaint, trust loss, lost revenue in the recovery window) is much higher than the cost of a 2% over-delivery.

Senior-grade pitfalls to name

  • Clock drift across data centers. Your streaming counter has to tolerate clock skew. Use logical time (event sequence numbers) for correctness, wall clock for monitoring.
  • Reconciliation loops creating oscillation. If reconciliation re-enables a campaign, and streaming then wins a big burst of auctions, you can over-deliver worse than before. Dampen the re-enable (enable at 80% of remaining budget, not 100%).
  • Time zones. Budget is daily in the advertiser's time zone. The batch pipeline runs in UTC. One time zone bug can refund a campaign twice or never.

Scenario 14: Creative-Level Performance Analytics

The prompt

"Advertisers run many creatives per campaign. Build me the creative-level performance pipeline. What metrics, what grain, and how do you detect creative fatigue?"

Creative taxonomy

A creative is the actual asset shown — an image, video, carousel, playable, etc. Common taxonomy:

  • Image — single static image, one headline, one CTA.
  • Video — typically 6s / 15s / 30s / 60s versions.
  • Carousel — 2–10 cards, each with its own asset.
  • Dynamic Product Ad (DPA) — template + catalog, the actual rendered creative varies per impression per user.
  • Collection — video or image hero plus a product grid.
  • Playable — mini-interactive experience, common in gaming.
  • Story — vertical 9:16, with interactive elements.
  • Short-form Video — vertical 9:16 video, usually full-screen auto-play.
  • Lead Form — creative + inline form.

Each format has native metrics that the others don't. A video has completion rates; an image doesn't. A carousel has per-card performance; a single image doesn't.

Metric grain

The canonical fact table is at the creative × placement × audience × day grain:

CREATE TABLE fact_creative_perf_daily (
  dt            DATE,
  creative_id   BIGINT,
  ad_id         BIGINT,
  campaign_id   BIGINT,
  placement_id  INT,         -- surface × position
  audience_id   BIGINT,      -- audience segment or 'BAU' for broad

  -- Delivery
  impressions        BIGINT,
  reach              BIGINT,
  frequency          DOUBLE,     -- impressions / reach
  spend_usd          DECIMAL(18,4),

  -- Engagement
  clicks             BIGINT,
  ctr                DOUBLE,     -- clicks / impressions

  -- Video-specific (NULL for non-video)
  video_views_3s     BIGINT,
  video_views_15s    BIGINT,
  video_completes    BIGINT,
  thumbstop_rate     DOUBLE,     -- 3s_views / impressions
  completion_rate    DOUBLE,

  -- Outcomes
  conversions_view_1d  BIGINT,
  conversions_click_7d BIGINT,
  conversion_value_usd DECIMAL(18,4),
  cvr                  DOUBLE,
  cpa                  DOUBLE,
  roas                 DOUBLE
)
PARTITIONED BY (dt);

Notice: nullable video metrics. Don't build separate tables per format; you'll pay for every JOIN in every dashboard.

Creative fatigue

Creative fatigue is measurable. Symptoms:

  • Rising frequency — the average user is seeing this creative 8, 10, 15 times.
  • Falling CTR — engagement drops with repeated exposure.
  • Rising CPA — conversions get more expensive because the click-through rate is dropping.
  • Shrinking reach at constant budget — because the bid price required to win is climbing as the audience becomes less responsive.

Detection:

WITH daily AS (
  SELECT dt, creative_id, SUM(impressions) AS imps, SUM(reach) AS rch,
         SUM(clicks) AS clicks, SUM(spend_usd) AS spend,
         SUM(conversions_click_7d) AS conv
  FROM fact_creative_perf_daily
  WHERE dt BETWEEN CURRENT_DATE - 28 AND CURRENT_DATE - 1
  GROUP BY 1, 2
),
with_trend AS (
  SELECT
    creative_id, dt, imps, clicks, spend, conv,
    imps * 1.0 / NULLIF(rch, 0) AS freq,
    clicks * 1.0 / NULLIF(imps, 0) AS ctr,
    spend / NULLIF(conv, 0) AS cpa,
    -- 7-day baseline at day 1 of window
    AVG(clicks * 1.0 / NULLIF(imps, 0)) OVER (
      PARTITION BY creative_id
      ORDER BY dt ROWS BETWEEN 21 PRECEDING AND 15 PRECEDING
    ) AS baseline_ctr
  FROM daily
)
SELECT creative_id, dt, ctr, baseline_ctr,
       (ctr - baseline_ctr) / NULLIF(baseline_ctr, 0) AS ctr_delta
FROM with_trend
WHERE dt >= CURRENT_DATE - 7
  AND ctr_delta < -0.30                -- CTR down >30% from baseline
  AND freq > 6                          -- seen more than 6 times on average
ORDER BY ctr_delta;

The platform-side action is to deprioritize the creative in the ranking, or notify the advertiser via "Advertiser Console" alerts that it's time to refresh the ad.

Creative rotation and experiments

When an advertiser has 10 creatives in a campaign, you have to decide how much traffic each gets. Three strategies:

  • Even rotation — 10% each. Fair but wastes budget on losers.
  • Performance-weighted — exploit the winner. Risks early lock-in on a lucky creative.
  • Contextual bandit (Thompson sampling) — balance explore/exploit per impression. Industry standard now.

The DE pipeline behind this: the bandit needs the posterior over each creative's CVR, updated at least hourly. Build a streaming job that consumes impression + conversion events and materializes creative_posterior(creative_id, alpha, beta) to the ranking store. Keep a 14-day decay on the posterior so old wins don't lock in forever.

Creative A/B testing

Advertiser-run: the advertiser wants to know if Creative A beats Creative B on this campaign. Challenges:

  • Between-creative variance is smaller than between-audience variance — random assignment must be at the user level, not the impression level, or the experiment will be contaminated by audience-mix drift.
  • Attribution windows complicate readout — a click today might convert 6 days later. Lock a read-out cutoff and stick to it.
  • Novelty effect — both creatives will look good in the first few days because of novelty. Design for a 7-day minimum runtime.

Scenario 15: Growth Accounting — The L28 Framework

The prompt

"DAU dropped 3% week-over-week. What's the pipeline that lets you explain what happened?"

The answer is growth accounting, specifically the L28 / new-retained-resurrected-dormant framework. This is the single most-asked analytics scenario for senior DE roles at consumer platforms.

The four states

For any user, on any day, they are in exactly one of four states relative to a trailing window:

  • NEW — first day of lifetime activity. Not active any day before today.
  • RETAINED — active today and active yesterday.
  • RESURRECTED — active today but not active yesterday (though active sometime before yesterday).
  • DORMANT — active sometime in the window, not active today.

Equation that falls out:

DAU(today) = NEW(today) + RETAINED(today) + RESURRECTED(today)
DAU(today) - DAU(yesterday) = NEW(today) - DORMANT(today) + [RESURRECTED(today) - RETAINED_LOSS]

Every DAU change is accountable to user-level flow between states. No more "DAU dropped, dunno why."

Daily user state table

CREATE TABLE daily_user_state (
  dt                 DATE,
  user_id            BIGINT,
  is_active_today    BOOLEAN,           -- did any qualifying event happen
  is_active_yday     BOOLEAN,           -- from LAG
  last_active_before_yday DATE,         -- from previous row's state
  lifetime_active_days INT,             -- cumulative count up to and including today

  state              STRING,            -- 'NEW', 'RETAINED', 'RESURRECTED', 'DORMANT'
  days_since_last_active INT
)
PARTITIONED BY (dt);

state is the classification we compute. The others are inputs.

Daily classification SQL

-- Step 1: raw daily activity (user appears in events that day)
CREATE OR REPLACE TABLE user_activity_daily AS
SELECT DISTINCT user_id, DATE(event_ts) AS dt
FROM events_bronze
WHERE event_name IN ('impression', 'click', 'app_open')   -- "qualifying activity"
  AND dt BETWEEN CURRENT_DATE - 400 AND CURRENT_DATE;

-- Step 2: classify each (user, day)
CREATE OR REPLACE TABLE daily_user_state AS
WITH active_days AS (
  SELECT
    user_id, dt,
    LAG(dt) OVER (PARTITION BY user_id ORDER BY dt) AS prev_active_dt,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY dt) AS lifetime_rank
  FROM user_activity_daily
)
SELECT
  dt,
  user_id,
  TRUE AS is_active_today,
  CASE
    WHEN lifetime_rank = 1 THEN 'NEW'
    WHEN prev_active_dt = dt - INTERVAL '1' DAY THEN 'RETAINED'
    ELSE 'RESURRECTED'
  END AS state,
  DATEDIFF(dt, prev_active_dt) AS days_since_last_active,
  prev_active_dt AS last_active_before_yday,
  lifetime_rank AS lifetime_active_days
FROM active_days;

-- Step 3: attach DORMANT rows for each user who was active recently but not today
-- (they still exist in the daily table, just flipped state)
INSERT INTO daily_user_state
SELECT
  d.dt,
  prev.user_id,
  FALSE AS is_active_today,
  'DORMANT' AS state,
  DATEDIFF(d.dt, MAX(prev.dt)) AS days_since_last_active,
  MAX(prev.dt) AS last_active_before_yday,
  NULL AS lifetime_active_days
FROM (SELECT DISTINCT dt FROM user_activity_daily) d
CROSS JOIN user_activity_daily prev
WHERE prev.dt BETWEEN d.dt - INTERVAL '28' DAY AND d.dt - INTERVAL '1' DAY
  AND NOT EXISTS (
    SELECT 1 FROM user_activity_daily act
    WHERE act.user_id = prev.user_id AND act.dt = d.dt
  )
GROUP BY d.dt, prev.user_id;

In practice this is built incrementally — one day at a time — because a full rebuild at 1B+ users is expensive. The incremental version uses yesterday's daily_user_state as input and only classifies today.

The L28 rollup

L28 = "Last 28 days of activity." For any user on any day, their L28 is {0, 1, 2, ..., 28}: the number of days out of the trailing 28 they were active. This is the most predictive retention feature in consumer ads.

CREATE OR REPLACE TABLE user_l28 AS
SELECT
  dt,
  user_id,
  COUNT(*) AS active_days_in_28,   -- range [1, 28]
  CASE
    WHEN COUNT(*) >= 25 THEN 'SUPER_USER'
    WHEN COUNT(*) >= 15 THEN 'CORE_USER'
    WHEN COUNT(*) >= 5  THEN 'CASUAL_USER'
    ELSE 'LIGHT_USER'
  END AS l28_segment
FROM daily_user_state
WHERE is_active_today = TRUE
  AND dt BETWEEN CURRENT_DATE - 28 AND CURRENT_DATE
GROUP BY 1, 2;

For the DAU drop diagnosis, you pivot this by state:

SELECT
  dt,
  state,
  l28_segment,
  COUNT(DISTINCT user_id) AS users,
  SUM(CASE WHEN state = 'NEW' THEN 1 ELSE 0 END) -
  SUM(CASE WHEN state = 'DORMANT' THEN 1 ELSE 0 END) AS net_new
FROM daily_user_state dus
LEFT JOIN user_l28 l USING (dt, user_id)
WHERE dt BETWEEN CURRENT_DATE - 14 AND CURRENT_DATE
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;

Wiring into DAU / WAU / MAU

DAU is just SUM(is_active_today) over the user set. WAU and MAU are the distinct counts over trailing 7 and 30 days:

SELECT
  dt,
  COUNT(DISTINCT user_id) FILTER (WHERE is_active_today) AS dau,
  COUNT(DISTINCT user_id) AS wau_28d        -- careful: this counts everyone in the window
FROM daily_user_state
WHERE dt BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE
GROUP BY dt;

The gotcha: wau is a rolling window. You can't compute wau(today) = wau(yesterday) + new_today - dropped_today naively because a user active 6 days ago rolls out of the window today. Always recompute from the base table for a 7-day rolling number.

Revenue-weighted L28

Ads revenue DAU ≠ user DAU. A user who opens the app once and sees zero ads contributes zero revenue. A user who scrolls the feed for 40 minutes and sees 50 ads contributes dollars.

CREATE TABLE revenue_user_state AS
SELECT
  dus.dt,
  dus.user_id,
  dus.state,
  COALESCE(SUM(f.spend_usd), 0) AS revenue_today_usd
FROM daily_user_state dus
LEFT JOIN fact_impression_revenue f
  ON dus.user_id = f.user_id AND dus.dt = f.dt
GROUP BY 1, 2, 3;

The "revenue DAU drop" decomposes into per-state revenue flow: "We lost $X from DORMANT users because our resurrect rate fell; we gained $Y from NEW users because a surface experiment launched." That's a senior-level diagnosis.

What a senior asks next

If the interviewer is good, they'll follow up: "OK, we see the state flows. What's your next step?" The senior answer isn't "look at the data more." It's:

  1. Pick the largest contributing state flow (e.g., "resurrected users dropped 8%").
  2. Decompose along a second dimension (geography, OS, signup cohort, surface).
  3. Find the concentration (e.g., "the entire drop is in iOS 17 users in North America").
  4. Correlate with deploys and experiments (was there a release that touched that slice?).
  5. Recommend a hypothesis-test (e.g., "roll back the experiment on that segment and observe 48-hour recovery").

Scenario 16: Experimentation Data Infrastructure

The prompt

"Design the data infrastructure for running thousands of concurrent A/B tests on an ads platform. What goes wrong at scale?"

Exposure logging: at request time or at session start?

Two schools.

Request-time exposure — every ad request logs the assigned experiment IDs. Every downstream event can be joined to experiment membership. Pros: maximum granularity; you can split any metric by experiment. Cons: massive write volume (every ad request × every active experiment the user is in); hard to manage experiment-end-of-life.

Session-start exposure — a user's assignments are computed once per session (or per day) and cached. Pros: much lower write volume. Cons: you can't experiment on per-request decisions (like "which ranker variant to use for this particular request").

Most platforms end up running both. Session-start is the default; request-time is available opt-in for experiments that need it.

Assignment logging schema

CREATE TABLE fact_experiment_exposure (
  event_ts     TIMESTAMP,
  user_id      BIGINT,
  session_id   STRING,
  request_id   BINARY(16),
  experiment_id BIGINT,
  variant_id   INT,
  assigned_ts  TIMESTAMP,    -- when the assignment first happened
  is_first_exposure BOOLEAN,
  surface_id   INT
)
PARTITIONED BY (dt);

First-exposure matters: that's the moment the user was actually "treated." Metrics measured before first exposure don't count toward the experiment.

The unit of randomization

  • User — most common. assignment = hash(user_id + experiment_salt) mod 100 < treatment_pct. Stable, clean.
  • Device — needed if a user might bounce between logged-in and logged-out states; otherwise same as user.
  • Session — for experiments where the experience shouldn't "stick" (e.g., novelty fatigue studies).
  • Ad request — for pure ranker experiments that don't affect user experience durably.

A senior should be able to name each and a use case.

Exposure vs treatment windows

Exposure window = the time during which a user entering the experiment counts. Typically the full experiment duration.

Treatment window = the time during which metrics are attributed to the treatment. Usually wider than the exposure window, because slow metrics (D7 retention, 28-day revenue) need time to materialize.

Practically: an experiment runs for 14 days (exposure window), but you read out metrics at D14 and D28 (treatment windows of 14 and 28 days).

Variance reduction

CUPED (Controlled pre-Experiment Using Pre-data) — for each metric, include a covariate that's the same metric measured before the user entered the experiment. This removes cross-user variance, typically halving required sample size.

-- Pseudo: the regression-adjusted estimate
SELECT
  experiment_id, variant_id,
  AVG(metric_during_exp - theta * metric_before_exp) AS adjusted_metric
FROM (
  SELECT e.experiment_id, e.variant_id, e.user_id,
         m_during.metric_value AS metric_during_exp,
         m_before.metric_value AS metric_before_exp
  FROM fact_experiment_exposure e
  JOIN metrics_user_daily m_during
    ON e.user_id = m_during.user_id
   AND m_during.dt BETWEEN e.exp_start AND e.exp_end
  JOIN metrics_user_preperiod m_before
    ON e.user_id = m_before.user_id
) j
GROUP BY 1, 2;

theta is computed as the regression coefficient of metric_during on metric_before over the full sample.

Stratified assignment — ensure balance on known-important dimensions (country, OS, user tenure) instead of relying on randomness to balance them.

Multiple comparisons

Running 1000 experiments at once, each with 20 metrics = 20,000 significance tests. If you used p < 0.05 naively, you'd expect 1000 false positives. Mitigations:

  • Pre-register primary + guardrail metrics. Primary metrics use p < 0.05. Everything else gets a stricter threshold.
  • False Discovery Rate (FDR) control. Benjamini-Hochberg for the full metric panel.
  • Sequential testing. Don't read out until the pre-registered sample size is reached; otherwise apply always-valid inference (mSPRT).

Metrics registry

# experiment-metrics-registry.yml
metrics:
  dau:
    kind: north_star
    target_mde: 0.5%
    required_duration_days: 14

  revenue_per_user:
    kind: north_star
    target_mde: 1.0%
    required_duration_days: 14
    variance_reduction: cuped(pre_period_days=14)

  crash_rate:
    kind: guardrail
    fail_on_regression: 2.0%
    severity: high

  ad_load_p95:
    kind: guardrail
    fail_on_regression: 5.0%
    severity: medium

  click_through_rate:
    kind: diagnostic
    # no auto-block; reported but not blocking

North-star = promotion criterion. Guardrail = auto-rollback if regressed. Diagnostic = informational.

The DE job: this registry is the source of truth. The metrics pipeline consumes it, computes every registered metric × every running experiment × every day, and materializes to a readout table. The experimentation UI reads from that table. Changes to the registry flow everywhere.

Gotchas

  • Exposure-pollution. A user sees treatment in experiment A and treatment in experiment B at the same time, and B's treatment changes whether A's treatment actually loaded. Use experiment-conflict registration.
  • Learning effects across sessions. User sees treatment today, it changes their behavior forever, but your read-out window is 14 days. Your measured lift is an underestimate of the long-term lift. Mention this if asked about long-term impact.
  • Novelty effect. Users react to newness, which decays. Always run at least 14 days and drop the first 3.
  • Peeking. Looking at results daily and stopping when significance is reached inflates false-positive rate. Either commit to fixed duration or use sequential tests.

Scenario 17: Data Lineage and Impact Analysis

The prompt

"I'm about to change the dtype of fact_impression.spend_cents from int to bigint. Who breaks? How do you know?"

Lineage, explicitly

Data lineage is a DAG where:

  • Nodes = tables, columns, dashboards, metric definitions, models.
  • Edges = "A uses B." Directed from consumer → producer.

Column-level lineage is what the senior interview question requires. Table-level ("Does anything read from this table?") is too coarse — you can't answer "does anything read spend_cents specifically."

How you build it

The input is the SQL that defines every pipeline in your orchestrator. Parse each query into an AST, walk the AST, emit (input_table, input_column) → (output_table, output_column) edges for every SELECT. Aggregate across all queries.

Tools that do this at the warehouse level: OpenLineage, Spline (Spark), dbt's manifest, warehouse-specific offerings (BigQuery lineage, Snowflake ACCESS_HISTORY). In-house implementations use ANTLR or a dialect-specific SQL parser.

The catch: column-level lineage across complex expressions (CASE, UDFs, string-aggregation) is lossy. Treat it as "probably used" rather than "exactly used."

Schema for the lineage DAG

CREATE TABLE lineage_edges (
  src_table    STRING,
  src_column   STRING,     -- NULL for table-level only
  dst_table    STRING,
  dst_column   STRING,     -- NULL for table-level only
  pipeline_id  STRING,
  last_seen_dt DATE,
  edge_kind    STRING      -- 'select', 'aggregate', 'derived', 'renamed'
);

CREATE INDEX idx_lineage_src ON lineage_edges (src_table, src_column);

A reverse BFS from (fact_impression, spend_cents) gives you the transitive closure of "everything downstream." That's your impact set.

Impact analysis

Given the question "I'm about to change fact_impression.spend_cents":

  1. Reverse-BFS from (fact_impression, spend_cents) through lineage_edges.
  2. Filter the result to recently-seen pipelines (last_seen_dt >= CURRENT_DATE - 30). Stale edges produce false positives.
  3. Group the result by owning team (pulled from table metadata).
  4. Send each team a Slack/email summary: "Here's the breaking change, here's what depends on it, here's the migration window."

In a large warehouse, that's easily 200–500 downstream objects for a core fact table column. Without lineage, you're doing this by asking around.

Integration with orchestrator

When the orchestrator deploys a new pipeline, it should register the SQL with the lineage tool before or at deploy time. That gives you:

  • Real-time (within an hour) lineage updates.
  • Blockable changes: "this PR would break 14 downstream dashboards, are you sure?"
  • SLA tracking: "Table X depends on A, B, C; if any of them miss SLA, X will too."

What a senior should say at the end

"Lineage is only as good as its freshness. The real engineering work isn't building the DAG — it's keeping the SQL parser current with every dialect quirk your team uses, and auto-parsing every new pipeline within an hour of its deploy. Otherwise you ship a change that breaks things the lineage said were safe."


Scenario 18: Cost Accounting at Warehouse Scale

The prompt

"Warehouse spend has grown 40% YoY and nobody knows why. Design a cost accounting system."

Cost attribution

Every query that runs in the warehouse should be tagged with:

  • User who ran it (service account or human).
  • Team that owns the user.
  • Pipeline it's part of, if any.
  • Cost in dollars (or slot-seconds / credits / DBU — pick a currency).

Sources:

  • Warehouse query history (Snowflake QUERY_HISTORY, BigQuery INFORMATION_SCHEMA.JOBS, Redshift STL_QUERY).
  • Orchestrator run metadata (which DAG, which task, which batch).
  • Team mapping table maintained by the platform team.

Build a fact_query_cost_daily with all four dimensions.

Chargeback model

Three common models.

  • Informational. Show each team their costs. Embarrass the outliers. Low friction, low impact.
  • Soft budgets. Each team has a quarterly budget. Overruns trigger a conversation with their manager. Moderate friction.
  • Hard chargeback. The team's cost literally debits their org's P&L. Highest friction, highest compliance.

Most data platforms run soft budgets with informational dashboards. Hard chargeback requires an internal finance system to support it and tends to breed gaming (teams hiding costs in infra).

Expensive-query warnings

Real-time during query submission:

def validate_query_cost(query, user):
    est_cost = warehouse_api.dry_run_estimate(query)
    team = lookup_team(user)
    remaining_budget = budget_service.get(team, quarter=current_quarter)

    if est_cost > 100.0:                                    # >$100 single query
        notify_slack(user, f"This query will cost ${est_cost}. Confirm?")
        require_approval = True

    if est_cost > 0.05 * remaining_budget:                  # >5% of remaining budget
        block(user, f"This query would use 5%+ of {team}'s remaining quarter budget.")

    if query_touches_full_scan(query, tables_larger_than_1TB):
        warn(user, "This query does a full scan on a >1TB table. Add a partition filter.")

Killing a runaway query

The mid answer: ALTER SYSTEM KILL QUERY <id>.

The senior answer: "Depends who's running it. If it's a scheduled pipeline and killing it breaks an SLA, I pause the downstream alert, kill the query, investigate, and restart. If it's an analyst's interactive query, I DM them first: 'I'm about to kill this, it's $400 and climbing, do you actually need it.' If it's on a cluster shared with a critical job, I kill without asking but follow up personally within 15 minutes. The rule is never kill silently and never kill something you don't understand."

That answer signals judgment, not just knowledge.

Cost-reduction playbook

Useful to have on hand:

  1. Partition pruning — the single biggest win. Scan WHERE dt = today not WHERE DATE_TRUNC(ts) = today.
  2. Column pruningSELECT * on a 200-column fact table costs 40x a SELECT 5 columns.
  3. Materialize the expensive middle. If 20 dashboards all do the same 5-table join, build a materialized table.
  4. Right-size clusters. Large warehouses running small queries is waste.
  5. Archive cold data to cheap storage. 90 days in the warehouse, 3 years in object storage, >3 years in Glacier-class archive.
  6. Deduplicate metrics. Every platform has 6 definitions of "active user" and pays to compute all six. Pick one, deprecate the others.

Scenario 19: The Behavioral Loop — STAR+I Framework

The prompt

"Tell me about a time you disagreed with your manager."

The mid-level candidate tells a story. The senior candidate tells a structured story that leaves no room for ambiguity about scope, judgment, or outcome. The structure is STAR+I.

STAR+I, defined

  • S — Situation. Where were you, what's the context? 20 seconds max. Who are the other players, what was at stake?
  • T — Task. What specifically was your job here? Not "the team's job." Your job.
  • A — Action. What did you do, concretely? This is usually the longest section. Real specifics: the meeting you called, the document you wrote, the trade-off you named. Senior candidates narrate their reasoning, not just their actions.
  • R — Result. What measurable outcome resulted? Numbers if possible. "Reduced pipeline runtime from 6 hours to 90 minutes. Saved $120K/year in warehouse cost. Unblocked a product launch that shipped 2 weeks ahead of schedule."
  • +I — Impact / Insight. This is what separates senior. After you describe what happened, tell the interviewer what you learned and how it changed how you work. "The thing I took away was that I should have raised this concern in writing instead of in a 1:1, because the decision-maker wasn't my manager — he just didn't realize that at the time."

Adding "+I" is what signals senior. It says: I reflect, I grow, I can teach this to a more junior person.

How to prep stories

You need 6 stories that cover the space:

# Axis Example situation
1 Cross-team conflict Two teams fighting over a shared table's schema
2 Technical disagreement with manager Manager wants a quick fix, you advocate for the proper solution
3 Production incident leadership 3am page, rallying the response, the postmortem
4 Ambiguous scope Exec asks "tell me about our ads health," 6-week project with no spec
5 Bad news to exec Metric you own is broken; you have to tell the VP
6 Mentorship Grew a junior DE's ability to lead their own project

Write each one as a 90-second answer, practice out loud, record yourself. The interviewer will steer them into whatever they need.

Rules of thumb

  • Every story needs a number. "Saved X hours," "shipped 2 weeks faster," "unblocked 5 teams," "reduced error rate from 4% to 0.3%."
  • Every story needs a moment of judgment. A point where you chose between two plausible options and had to decide.
  • Every story names the other people. "I worked with" is fine. "I drove the process" is stronger. Don't steal credit; don't hide behind the team.
  • No hypotheticals. Real, first-person, stuff that actually happened. Interviewers smell hypotheticals from across the room.
  • No bad-mouthing. Never describe a former manager, teammate, or company as "incompetent" or "toxic." You can describe actions and outcomes without judgment.

Scenario 20: Behavioral — Disagreeing With Your Manager

The prompt

"Tell me about a time you disagreed with your manager on something important."

The trap

Two failure modes:

  • The martyr. "I fought to the end against my manager's bad decision because I knew better." Signals poor judgment and inability to commit.
  • The doormat. "I voiced my concern and then did what they asked." Signals weakness; no real disagreement here.

The senior answer sits in between: you disagreed, you made the case, you committed to the decided path, you measured the outcome, you revisited if data supported a change.

Shape of a strong answer

Situation: two years ago, our team was rolling out a new metrics pipeline. My manager wanted to ship a version that used hourly batches with an SLA of 4 hours. I believed our advertiser-facing team needed 15-minute freshness, based on recent calls with their PM, and that 4-hour SLA would be walked back within a quarter.

Task: my job was to lead the architecture. My manager owned the shipping decision.

Action: I wrote a 2-page doc making the case for 15-minute latency, with cost modeling (about 1.5× the hourly version), the expected advertiser-satisfaction lift, and a migration plan that started with hourly and upgraded incrementally. I walked my manager through it. He pushed back — he was worried about the cost blowback from leadership. We agreed on a compromise: ship hourly first, but design the streaming layer so the 15-minute version was a ~2-week migration, not a 6-month rebuild. I committed to the compromise and led the hourly launch.

Result: advertiser-team NPS on freshness was a named regression in the first quarterly review. My manager asked me to execute the 15-minute migration I'd originally proposed. It shipped 4 weeks later. Advertiser-side complaints stopped within a month.

Impact: the insight I took from this: when I'm more convinced of something than my manager is, a written doc with cost modeling and a reversible migration path beats arguing in 1:1s. I've used that pattern four times since, two of which ended with my original plan accepted immediately.

That's ~240 words, ~90 seconds spoken. Situation specific, task scoped, action showing both the assertiveness and the committing, result with numbers, insight that shows growth.

What the interviewer is scoring

  • Did you name what you disagreed with, concretely? (Not "my manager's style.")
  • Did you make a case using data, not just opinion?
  • Did you commit after the decision was made?
  • Did you revisit when data supported it?
  • Did you keep the relationship?

Senior = all five. Mid = three. Anything under three = rehearse harder.


Scenario 21: Behavioral — Managing Shared-Table Conflicts

The prompt

"Two teams depend on a shared table you own. One wants to change the schema in a way that breaks the other. Walk me through how you'd handle it."

This prompt tests your ability to run a small diplomacy + engineering problem at the same time.

Shape of a strong answer

Situation: last year, my team owned fact_campaign_daily, a core table used by both our advertiser-facing analytics team and the ML feature platform. The analytics team proposed adding a new attribution window (click_14d), which would require widening the fact from 8 conversion columns to 10. The feature platform's model trainer read all columns via SELECT * and would fail on the schema change.

Task: I owned the table. My job was to make the change happen without breaking the model pipeline, on both teams' timelines.

Action: I started with a written deprecation policy we didn't have before: all schema changes to shared tables published 2 weeks in advance, column additions allowed immediately, breaking changes require a 30-day migration window with a shadow table. I walked both teams through it in a 30-minute meeting. Because the advertiser team's request was an addition, not a breaking change, it actually didn't need the 30-day window — we just had to fix the feature platform's SELECT * dependency. I paired with the feature platform lead for an afternoon to switch their trainer to explicit columns. We landed both changes the following week.

Result: the deprecation policy became the team standard. Over the next six months, eight schema changes shipped using it, zero cross-team production incidents caused by schema drift.

Impact: what I learned was that schema conflicts are usually process conflicts in disguise. Once the process was written down, the decision about "who gets to change what" stopped requiring a judgment call every time. I apply the same pattern now whenever I see recurring cross-team friction over the same surface — write the rule, then apply it.

Components an interviewer is listening for

  • You recognized the right problem (process, not schema).
  • You wrote something down (doc, policy, agreement).
  • You paired with the other team (not just sent an email).
  • You measured the outcome (6 months, 8 changes, 0 incidents).
  • You generalized the lesson.

Scenario 22: Behavioral — Cross-Team Incident Response

The prompt

"You get paged at 3am. The alert is on a dashboard you own, but after 20 minutes of debugging you realize the root cause is in another team's code. They're not on-call. Walk me through the next hour."

What the interviewer is listening for

Two things: (1) can you lead a response without owning the root cause, and (2) do you understand the political dimension — calling someone at 3am about their code is a fraught act.

Shape of a strong answer

Situation: two years ago, I got paged at about 3:20am on a Saturday. The alert was on the ads-reporting freshness dashboard — hourly spend numbers were stale by 45 minutes and climbing. Ads reporting was mine; the upstream pipeline that fed it belonged to a neighboring team.

Task: keep the customer-facing dashboard accurate on my SLA, diagnose and escalate the root cause, minimize damage to the on-call relationship.

Action: first 10 minutes I ruled out anything in my own pipeline — checked the job status, recent deploys, the warehouse. Clean. At minute 20 I traced the staleness to the neighboring team's streaming job, which was lag-accumulating. I had two choices: page their on-call, or sit on the problem and hope. I paged. Before paging I wrote up everything I'd ruled out, the exact error trace, and a suggested next step in a runbook-style ticket. I added a note: "Sorry for the page; here's the triage so you can go fast." Their on-call acknowledged in ~3 minutes, fixed it in 20. I stayed on the call so they weren't debugging alone and so my dashboard could be marked recovered the moment their job caught up.

Afterward I wrote the postmortem. The root cause was a memory leak triggered by a config change 9 hours earlier. I sent the postmortem draft to both teams' managers before broadcasting it, to avoid blind-siding anyone. The postmortem identified two structural fixes: the memory-leak guard on their side, and a shared alert on the streaming lag that would have fired earlier. Both shipped in the following sprint.

Result: my dashboard was stale for 70 minutes. The postmortem got praised for being blameless and generative — both teams named it as a good template for future incidents. The on-call relationship was fine; I sent a thank-you lunch afterward.

Impact: the takeaway for me was that the pre-page triage doc is the single most important thing you can do at 3am. It halves the responder's mean-time-to-fix and triples their goodwill. I've been religious about it since.

Blameless postmortem components

  • Timeline with timestamps. What happened, in sequence.
  • Root cause, not proximate cause. The memory leak is a root cause; the OOM is a proximate cause.
  • Impact. Duration, users/campaigns affected, revenue (if applicable), customer-facing consequences.
  • What went well. Real. Someone did something good. Name them.
  • What we could do better. Structural, not personal. "The alert fired 45 minutes late" — not "Bob didn't respond."
  • Action items. Owned, dated, tracked.

Appendix: 6-Story Behavioral Matrix Template

Prep this before any senior loop. Fill in one story per row. Practice each out loud for 90 seconds. Pre-assign which stories you'll use for which prompts.

Story # Theme Situation (1 line) What's at stake The judgment moment Numeric result +I takeaway
1 Cross-team conflict
2 Technical disagreement with manager
3 Production incident
4 Ambiguous scope
5 Bad news to exec
6 Mentorship / growing someone else

Print it. Fill it in by hand. You'll remember the stories better that way.


Closing note

The scenarios in this part are the ones that, in my experience, most clearly sort senior candidates from mid-level. Part 3 returns to pure SQL — the DAU pipeline, frequency capping, ROAS, holdout-based incrementality, delivery health. Read it next.


← Master Index · ← Part 1 · Part 3 →

↑ Back to top

Part 03

Deep Dive SQL

Part 3: Deep Dive SQL

"Production SQL is the only sentence where every word is a decision. Senior candidates can defend every word."


How to read this part

Every query below is something you should be able to write cold on a whiteboard in an interview. After each one there's a commentary section: why this and not that. Skim the query, then read the commentary, then close the page and try to reproduce the query from scratch. That's the drill.

All SQL is written in a neutral dialect (Postgres / Snowflake / Trino / BigQuery — you'll need to tweak syntax, but the shape is the same).


Section 1: The DAU Growth Accounting Pipeline (5 steps)

This is the single most-asked SQL scenario at senior consumer ads interviews. Learn it cold.

Step 1: Incremental user activity history

-- Source of truth: one row per (user, day) that user was active
CREATE TABLE user_activity_daily (
  dt       DATE,
  user_id  BIGINT,
  PRIMARY KEY (dt, user_id)
) PARTITIONED BY (dt);

-- Incremental insert: just process today
INSERT OVERWRITE TABLE user_activity_daily PARTITION (dt = CURRENT_DATE)
SELECT DISTINCT user_id, CAST(event_ts AS DATE) AS dt
FROM bronze.events
WHERE dt = CURRENT_DATE
  AND event_name IN ('impression', 'click', 'app_open', 'page_view')
  AND user_id IS NOT NULL;

Why this and not that. DISTINCT on (user_id, dt) avoids the fanout of multiple events per user. Filter to specific "qualifying" events — don't count side-effect events like "ad_blocked" as activity. The INSERT OVERWRITE is idempotent — re-running today produces identical output, which matters for backfills. Partition by dt so queries over a date range scan only the partitions they need.

Step 2: Daily user state classification

CREATE TABLE daily_user_state (
  dt                     DATE,
  user_id                BIGINT,
  state                  STRING,        -- 'NEW', 'RETAINED', 'RESURRECTED', 'DORMANT'
  is_active_today        BOOLEAN,
  prev_active_dt         DATE,
  days_since_last_active INT,
  lifetime_active_days   INT,
  PRIMARY KEY (dt, user_id)
) PARTITIONED BY (dt);

INSERT OVERWRITE TABLE daily_user_state PARTITION (dt = CURRENT_DATE)
WITH active AS (
  SELECT user_id FROM user_activity_daily WHERE dt = CURRENT_DATE
),
prev_state AS (
  SELECT user_id,
         MAX(prev_active_dt) AS last_active_dt,
         MAX(lifetime_active_days) AS life_days_before_today
  FROM daily_user_state
  WHERE dt = CURRENT_DATE - INTERVAL '1' DAY
  GROUP BY user_id
),
active_joined AS (
  SELECT a.user_id,
         p.last_active_dt,
         p.life_days_before_today
  FROM active a
  LEFT JOIN prev_state p USING (user_id)
)
SELECT
  CURRENT_DATE AS dt,
  user_id,
  CASE
    WHEN last_active_dt IS NULL                             THEN 'NEW'
    WHEN last_active_dt = CURRENT_DATE - INTERVAL '1' DAY   THEN 'RETAINED'
    ELSE                                                         'RESURRECTED'
  END AS state,
  TRUE AS is_active_today,
  last_active_dt AS prev_active_dt,
  DATEDIFF(CURRENT_DATE, last_active_dt) AS days_since_last_active,
  COALESCE(life_days_before_today, 0) + 1 AS lifetime_active_days
FROM active_joined;

Why this and not that. Using LAG() works for a full rebuild but is expensive at billion-user scale. Incremental classification uses yesterday's state as input — you only process today's active users. The trade-off is you need a correct seed row for each user's first day (NEW state). The COALESCE(life_days_before_today, 0) + 1 handles both first-seen users and returning ones.

Step 3: Daily state summary (the DAU decomposition)

SELECT
  dt,
  COUNT(*) FILTER (WHERE state = 'NEW')         AS new_users,
  COUNT(*) FILTER (WHERE state = 'RETAINED')    AS retained_users,
  COUNT(*) FILTER (WHERE state = 'RESURRECTED') AS resurrected_users,
  COUNT(*)                                      AS dau
FROM daily_user_state
WHERE dt BETWEEN CURRENT_DATE - 30 AND CURRENT_DATE
GROUP BY dt
ORDER BY dt;

Commentary. FILTER clauses keep the query flat. Alternative: CASE WHEN state = 'NEW' THEN 1 END — identical semantically, slightly more verbose. The trailing 30 days of this table is what you put on the DAU health dashboard.

Step 4: Cohort retention (the triangular matrix)

WITH first_seen AS (
  SELECT user_id, MIN(dt) AS cohort_dt
  FROM user_activity_daily
  WHERE dt >= CURRENT_DATE - 180
  GROUP BY user_id
),
activity_with_cohort AS (
  SELECT a.user_id,
         f.cohort_dt,
         a.dt,
         DATEDIFF(a.dt, f.cohort_dt) AS days_since_cohort
  FROM user_activity_daily a
  JOIN first_seen f USING (user_id)
  WHERE a.dt BETWEEN f.cohort_dt AND f.cohort_dt + INTERVAL '90' DAY
)
SELECT
  cohort_dt,
  days_since_cohort,
  COUNT(DISTINCT user_id) AS users_active,
  COUNT(DISTINCT user_id) * 1.0
    / FIRST_VALUE(COUNT(DISTINCT user_id)) OVER (
        PARTITION BY cohort_dt
        ORDER BY days_since_cohort
        ROWS UNBOUNDED PRECEDING
      ) AS retention_rate
FROM activity_with_cohort
GROUP BY cohort_dt, days_since_cohort;

Why this and not that. Cohort + days-since-cohort is a triangular table — each cohort has up to 90 rows, one per day since signup. Using FIRST_VALUE() over the window to get "day 0 size" is cleaner than a self-join. Stop at 90 days of retention — past that the signal-to-noise drops and storage explodes.

Step 5: WAU / MAU from the same base

SELECT
  dt,
  COUNT(DISTINCT user_id) FILTER (
    WHERE activity_dt BETWEEN dt - INTERVAL '6'  DAY AND dt
  ) AS wau,
  COUNT(DISTINCT user_id) FILTER (
    WHERE activity_dt BETWEEN dt - INTERVAL '29' DAY AND dt
  ) AS mau
FROM (
  SELECT dt, user_id, dt AS activity_dt
  FROM user_activity_daily
  WHERE dt BETWEEN CURRENT_DATE - 60 AND CURRENT_DATE
) t
GROUP BY dt;

Commentary. This is the naïve approach and it's O(N²) in user count because each day scans the preceding window. At 1B users + 60 days this won't run. The production approach: maintain user_l28(dt, user_id, active_days_last_28) incrementally and do SUM(CASE WHEN active_days_last_28 > 0 THEN 1 END) per day. Mention this trade-off when asked.


Section 2: DAU Drop Debugging

"DAU dropped 3% yesterday. Walk me through the SQL to find out why."

-- Step 1: confirm the drop and localize by day
SELECT dt, new_users, retained_users, resurrected_users, dau,
       dau - LAG(dau) OVER (ORDER BY dt) AS dau_delta
FROM daily_state_summary
WHERE dt BETWEEN CURRENT_DATE - 14 AND CURRENT_DATE
ORDER BY dt;

-- Step 2: decompose by surface (is it one surface or all?)
SELECT
  dt,
  surface_id,
  COUNT(DISTINCT user_id) AS surface_dau,
  COUNT(DISTINCT user_id) -
    LAG(COUNT(DISTINCT user_id), 1) OVER (PARTITION BY surface_id ORDER BY dt) AS delta
FROM bronze.events
WHERE dt BETWEEN CURRENT_DATE - 3 AND CURRENT_DATE
  AND event_name = 'app_open'
GROUP BY 1, 2
ORDER BY 1, 2;

-- Step 3: decompose by OS
SELECT dt, device_os,
       COUNT(DISTINCT user_id) AS os_dau
FROM bronze.events
WHERE dt BETWEEN CURRENT_DATE - 3 AND CURRENT_DATE
GROUP BY 1, 2
ORDER BY 1, 2;

-- Step 4: geography
SELECT dt, geo_country,
       COUNT(DISTINCT user_id) AS geo_dau,
       LAG(COUNT(DISTINCT user_id), 1) OVER (
         PARTITION BY geo_country ORDER BY dt
       ) AS prev_dau
FROM bronze.events
WHERE dt BETWEEN CURRENT_DATE - 2 AND CURRENT_DATE
GROUP BY 1, 2
HAVING COUNT(DISTINCT user_id) > 10000
ORDER BY 1, (COUNT(DISTINCT user_id) - LAG(COUNT(DISTINCT user_id), 1) OVER (
  PARTITION BY geo_country ORDER BY dt
));

-- Step 5: correlate with recent deploys
SELECT deploy_ts, service, commit_sha, deployer
FROM deploy_log
WHERE deploy_ts BETWEEN (CURRENT_DATE - 2)::TIMESTAMP AND CURRENT_TIMESTAMP
  AND service IN ('ads_ranker', 'feed_renderer', 'app_shell')
ORDER BY deploy_ts;

Commentary. The sequence matters. Each query narrows the slice. Naming the progression out loud ("first confirm, then decompose by the three coarsest dimensions, then correlate with deploys") is how a senior signals process rather than flailing.


Section 3: Frequency Capping Analytics

"Write a query that, for each campaign, computes the distribution of frequency (impressions per reached user) for the past 7 days, and flag campaigns where >20% of their reach is at frequency > 10."

WITH user_freq AS (
  SELECT
    campaign_id,
    user_id,
    COUNT(*) AS impressions_7d
  FROM fact_impression
  WHERE dt BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
    AND is_valid = TRUE
  GROUP BY campaign_id, user_id
),
freq_buckets AS (
  SELECT
    campaign_id,
    CASE
      WHEN impressions_7d = 1        THEN '01'
      WHEN impressions_7d BETWEEN 2 AND 3   THEN '02_03'
      WHEN impressions_7d BETWEEN 4 AND 6   THEN '04_06'
      WHEN impressions_7d BETWEEN 7 AND 10  THEN '07_10'
      ELSE '11_plus'
    END AS freq_bucket,
    COUNT(*) AS users_in_bucket,
    SUM(impressions_7d) AS impressions_in_bucket
  FROM user_freq
  GROUP BY 1, 2
),
campaign_totals AS (
  SELECT
    campaign_id,
    SUM(users_in_bucket) AS reach,
    SUM(CASE WHEN freq_bucket = '11_plus' THEN users_in_bucket END) AS high_freq_reach
  FROM freq_buckets
  GROUP BY campaign_id
)
SELECT
  c.campaign_id,
  c.reach,
  c.high_freq_reach,
  c.high_freq_reach * 1.0 / NULLIF(c.reach, 0) AS high_freq_share,
  CASE WHEN c.high_freq_reach * 1.0 / NULLIF(c.reach, 0) > 0.20 THEN 'OVER_CAPPING' END AS flag
FROM campaign_totals c
WHERE c.reach > 10000
ORDER BY high_freq_share DESC;

Commentary. Buckets are inclusive on both ends — BETWEEN is inclusive in SQL. The NULLIF guards against division-by-zero when a campaign's reach is weirdly zero. The volume gate (reach > 10000) excludes noise from tiny campaigns. Ordering by high_freq_share DESC surfaces the worst offenders first.


Section 4: Advertiser ROAS Pipeline

"Build me a daily ROAS pipeline. Conversions can arrive up to 7 days after the impression. Your numbers must match what advertisers see in the Campaign Manager UI."

-- Step 1: attributed conversions (last-click, 7-day)
CREATE OR REPLACE TABLE fact_attributed_conversion AS
WITH c AS (
  SELECT event_id AS conversion_id, user_id, event_ts, properties['value']::DECIMAL(18,4) AS conv_value
  FROM bronze.events
  WHERE event_name = 'conversion'
    AND event_ts >= CURRENT_TIMESTAMP - INTERVAL '8' DAY
),
k AS (
  SELECT event_id AS click_id, user_id, event_ts, campaign_id, ad_id
  FROM bronze.events
  WHERE event_name = 'click'
    AND event_ts >= CURRENT_TIMESTAMP - INTERVAL '15' DAY
),
paired AS (
  SELECT
    c.conversion_id,
    c.event_ts AS conversion_ts,
    c.conv_value,
    k.click_id, k.campaign_id, k.ad_id,
    ROW_NUMBER() OVER (
      PARTITION BY c.conversion_id
      ORDER BY k.event_ts DESC
    ) AS rn
  FROM c
  LEFT JOIN k
    ON c.user_id = k.user_id
   AND k.event_ts BETWEEN c.event_ts - INTERVAL '7' DAY AND c.event_ts
)
SELECT conversion_id, conversion_ts, conv_value, click_id, campaign_id, ad_id
FROM paired
WHERE rn = 1;

-- Step 2: daily ROAS, with D+7 truth lock
SELECT
  DATE(conversion_ts) AS attribution_dt,
  campaign_id,
  COUNT(*) AS conversions,
  SUM(conv_value) AS revenue_usd,
  COALESCE(SUM(conv_value), 0) /
    NULLIF(spend.spend_usd, 0) AS roas
FROM fact_attributed_conversion
LEFT JOIN (
  SELECT dt, campaign_id, SUM(spend_cents) / 100.0 AS spend_usd
  FROM fact_impression
  WHERE dt >= CURRENT_DATE - 14
  GROUP BY 1, 2
) spend
  ON spend.dt = DATE(conversion_ts)
 AND spend.campaign_id = fact_attributed_conversion.campaign_id
WHERE DATE(conversion_ts) >= CURRENT_DATE - 14
GROUP BY attribution_dt, campaign_id, spend.spend_usd
ORDER BY attribution_dt DESC, revenue_usd DESC;

Commentary. The 7-day click-to-conversion window is industry standard. The query needs to look 15 days back on clicks to cover conversions received up to 7 days after the latest click. ROW_NUMBER() picks the last click; rn = 1 with LEFT JOIN preserves unattributed conversions (report them separately). The ROAS join is on the conversion date, not the click date — this matches what advertisers see.


Section 5: Audience Overlap Analysis

"Two audiences can overlap. For each pair of audiences owned by the same advertiser, compute Jaccard similarity. Efficient at 100M+ users per audience."

-- Naïve: exact counts
WITH pairs AS (
  SELECT
    a.advertiser_id,
    a.audience_id AS aud_a,
    b.audience_id AS aud_b
  FROM audience_membership a
  JOIN audience_membership b
    ON a.advertiser_id = b.advertiser_id
   AND a.audience_id < b.audience_id
  WHERE a.user_id = b.user_id
),
intersects AS (
  SELECT advertiser_id, aud_a, aud_b, COUNT(*) AS intersection_size
  FROM pairs
  GROUP BY 1, 2, 3
),
sizes AS (
  SELECT audience_id, COUNT(DISTINCT user_id) AS aud_size
  FROM audience_membership
  GROUP BY audience_id
)
SELECT
  i.advertiser_id, i.aud_a, i.aud_b,
  i.intersection_size,
  sa.aud_size + sb.aud_size - i.intersection_size AS union_size,
  i.intersection_size * 1.0 /
    (sa.aud_size + sb.aud_size - i.intersection_size) AS jaccard
FROM intersects i
JOIN sizes sa ON i.aud_a = sa.audience_id
JOIN sizes sb ON i.aud_b = sb.audience_id
WHERE sa.aud_size + sb.aud_size - i.intersection_size > 0
ORDER BY advertiser_id, jaccard DESC;

Commentary on scale. The naïve version is O(N²) in users per advertiser. It works for small advertisers; for big ones you switch to HyperLogLog / MinHash sketches:

-- Sketch-based: each audience is a HLL sketch, intersect via inclusion-exclusion
SELECT
  aud_a, aud_b,
  HLL_CARDINALITY(HLL_UNION(s.sketch, t.sketch)) AS union_size_est,
  HLL_CARDINALITY(s.sketch) + HLL_CARDINALITY(t.sketch)
    - HLL_CARDINALITY(HLL_UNION(s.sketch, t.sketch)) AS intersection_size_est
FROM audience_hll_sketches s
JOIN audience_hll_sketches t ON s.advertiser_id = t.advertiser_id
 AND s.audience_id < t.audience_id;

Error is ~1% at default precision. Good enough for audience planning.


Section 6: Incrementality (Holdout-Based Lift)

"You ran a holdout: 1% of the target audience was excluded from seeing your ads. Compute the incremental lift in conversions."

WITH exposed AS (
  SELECT user_id, SUM(conv_value) AS revenue
  FROM fact_attributed_conversion
  WHERE user_id IN (SELECT user_id FROM holdout_assignment WHERE arm = 'treatment')
    AND DATE(conversion_ts) BETWEEN :experiment_start AND :experiment_end
  GROUP BY user_id
),
control AS (
  SELECT user_id, SUM(conv_value) AS revenue
  FROM bronze.events e                      -- organic conversions too; no ad click needed
  WHERE event_name = 'conversion'
    AND user_id IN (SELECT user_id FROM holdout_assignment WHERE arm = 'holdout')
    AND DATE(event_ts) BETWEEN :experiment_start AND :experiment_end
  GROUP BY user_id
),
treatment_revenue AS (
  SELECT COUNT(*) AS n, SUM(revenue) AS total_rev, AVG(revenue) AS avg_rev
  FROM exposed
),
control_revenue AS (
  SELECT COUNT(*) AS n, SUM(revenue) AS total_rev, AVG(revenue) AS avg_rev
  FROM control
)
SELECT
  t.n AS treatment_users,
  c.n AS control_users,
  t.avg_rev AS treatment_avg_revenue,
  c.avg_rev AS control_avg_revenue,
  (t.avg_rev - c.avg_rev) / NULLIF(c.avg_rev, 0) AS incremental_lift,
  t.n * (t.avg_rev - c.avg_rev) AS incremental_revenue
FROM treatment_revenue t CROSS JOIN control_revenue c;

Commentary. The holdout is the cleanest incrementality measure, but expensive — you're paying with foregone revenue. Key pitfall: the control group's conversions include organic conversions (no ad path). You're measuring ad-driven incremental revenue, not "conversions that saw an ad." Also: contamination — if a holdout user sees your ad on another surface (Publisher Network, an email retarget), they're contaminated and must be excluded.


Section 7: Ad Delivery Health Monitoring

"For each campaign, alert if today's delivery is <70% of yesterday's for 2+ consecutive hours."

WITH hourly AS (
  SELECT
    DATE_TRUNC('hour', impression_ts) AS hr,
    campaign_id,
    COUNT(*) AS impressions
  FROM fact_impression
  WHERE impression_ts >= CURRENT_TIMESTAMP - INTERVAL '48' HOUR
  GROUP BY 1, 2
),
paired AS (
  SELECT
    today.hr,
    today.campaign_id,
    today.impressions AS today_imps,
    yesterday.impressions AS yday_imps,
    today.impressions * 1.0 / NULLIF(yesterday.impressions, 0) AS ratio
  FROM hourly today
  LEFT JOIN hourly yesterday
    ON yesterday.campaign_id = today.campaign_id
   AND yesterday.hr = today.hr - INTERVAL '24' HOUR
  WHERE today.hr >= CURRENT_TIMESTAMP - INTERVAL '4' HOUR
),
flagged AS (
  SELECT
    hr, campaign_id, ratio,
    SUM(CASE WHEN ratio < 0.70 THEN 1 ELSE 0 END)
      OVER (PARTITION BY campaign_id ORDER BY hr ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS consec_low
  FROM paired
)
SELECT campaign_id, hr, ratio
FROM flagged
WHERE consec_low >= 2
ORDER BY hr DESC, campaign_id;

Commentary. The "consecutive hours" condition uses a rolling window sum — cleaner than self-joins. Always include a volume gate (yday_imps > 100) to avoid noise; omitted here for brevity.


Section 8: Campaign Cannibalization Detection

"When advertiser A launches a new campaign, does it steal impressions from their existing campaigns? Detect."

WITH pre AS (
  SELECT campaign_id, AVG(daily_imps) AS pre_avg_imps
  FROM (
    SELECT campaign_id, dt, SUM(impressions) AS daily_imps
    FROM metric_campaign_daily
    WHERE dt BETWEEN :new_campaign_start - 14 AND :new_campaign_start - 1
    GROUP BY campaign_id, dt
  ) d
  GROUP BY campaign_id
),
post AS (
  SELECT campaign_id, AVG(daily_imps) AS post_avg_imps
  FROM (
    SELECT campaign_id, dt, SUM(impressions) AS daily_imps
    FROM metric_campaign_daily
    WHERE dt BETWEEN :new_campaign_start AND :new_campaign_start + 13
    GROUP BY campaign_id, dt
  ) d
  GROUP BY campaign_id
),
new_campaign AS (
  SELECT advertiser_id FROM dim.campaign WHERE campaign_id = :new_campaign_id
)
SELECT
  pre.campaign_id,
  pre.pre_avg_imps,
  post.post_avg_imps,
  (post.post_avg_imps - pre.pre_avg_imps) / NULLIF(pre.pre_avg_imps, 0) AS pct_change
FROM pre
JOIN post USING (campaign_id)
JOIN dim.campaign dc USING (campaign_id)
JOIN new_campaign nc USING (advertiser_id)
WHERE dc.campaign_id != :new_campaign_id
  AND pre.pre_avg_imps > 1000
ORDER BY pct_change ASC;

Commentary. Before/after averages on the same advertiser's other campaigns. A cannibalization signature: existing campaigns drop 20–50% in the 2 weeks after the new one launches. Caveat: this is correlational, not causal — seasonality or audience overlap changes could also explain it. Mention that out loud.


Section 9: Ad Load vs User Experience

"An experiment increases ad load from 1-in-6 feed items to 1-in-4. Write a SQL that measures impact on session length and return rate."

WITH sessions AS (
  SELECT
    e.session_id,
    e.user_id,
    MIN(e.event_ts) AS session_start,
    MAX(e.event_ts) AS session_end,
    COUNT(DISTINCT e.item_id) AS items_viewed,
    COUNT(*) FILTER (WHERE e.event_name = 'impression' AND e.is_ad) AS ads_shown,
    MAX(exp.variant_id) AS variant_id
  FROM bronze.events e
  LEFT JOIN fact_experiment_exposure exp
    ON e.user_id = exp.user_id
   AND exp.experiment_id = :ad_load_experiment_id
  WHERE e.dt BETWEEN :exp_start AND :exp_end
  GROUP BY e.session_id, e.user_id
),
per_variant AS (
  SELECT
    variant_id,
    COUNT(DISTINCT user_id) AS users,
    AVG(EXTRACT(EPOCH FROM (session_end - session_start))) AS avg_session_sec,
    AVG(items_viewed) AS avg_items,
    AVG(ads_shown) AS avg_ads
  FROM sessions
  WHERE variant_id IS NOT NULL
  GROUP BY variant_id
),
return_rate AS (
  SELECT
    variant_id,
    COUNT(DISTINCT user_id) FILTER (
      WHERE user_id IN (
        SELECT user_id FROM bronze.events
        WHERE event_name = 'app_open'
          AND dt BETWEEN :exp_start + 7 AND :exp_end
      )
    ) * 1.0 / COUNT(DISTINCT user_id) AS d7_return_rate
  FROM sessions
  GROUP BY variant_id
)
SELECT v.variant_id, v.users, v.avg_session_sec, v.avg_items, v.avg_ads, r.d7_return_rate
FROM per_variant v JOIN return_rate r USING (variant_id);

Commentary. The ad-load experiment is the single highest-stakes experiment in any consumer ads business. The DE pipeline for it needs to be airtight. Two pitfalls to call out: (1) session length as a metric is gamed by longer but less engaged scrolling — always pair with items_viewed; (2) D7 return rate measures the lagged effect of a bad experience, which an immediate in-session metric won't catch.


Section 10: Senior Commentary — Five Patterns That Show Up Everywhere

  1. Trailing-window joins (WHERE t.event_ts BETWEEN c.event_ts - INTERVAL '7' DAY AND c.event_ts). Attribution, fatigue, cohorts. The left-side must be broader than the right-side's window.

  2. Row-number for deduplication and "latest-of" (ROW_NUMBER() OVER (PARTITION BY k ORDER BY ts DESC)). Mandatory for attribution and SCD-2 lookups.

  3. Volume gates. Every ratio / rate / percent query needs a floor on the denominator (WHERE imps > 1000). Without it, noise dominates.

  4. Incremental rebuilds. INSERT OVERWRITE PARTITION (dt = ...) is the correct idempotent pattern. Avoid DELETE WHERE dt = ... + INSERT — it's not atomic.

  5. FILTER clauses over CASE WHEN ... THEN 1 END. Modern SQL supports it. Cleaner, better optimized.


← Master Index · ← Part 2 · Part 4 →

↑ Back to top

Part 04

Platform Surfaces & A–Z Glossary

Part 4: Platform Surfaces + A–Z Glossary

"Every surface is a little universe with its own physics. You can't design pipelines for them as if they're the same thing."


Section 1: Short-Form Video Ads — Engagement Pipeline

Short-form vertical video is now the dominant growth surface on every major consumer platform. The ad format is full-screen, auto-play, sound-on-by-default, with in-feed placement between organic videos.

What makes it different

  • Time-based engagement matters more than CTR. Did they watch 3 seconds? 15? Complete it?
  • Sound-on is the norm. Audio becomes a dimension: muted rate, sound-on rate.
  • Swipe mechanics make the unit of engagement "scrolled past" vs "paused on." Explicit negative signal.
  • Creators are part of the ecosystem — sponsored content from creators is its own ad format.

Schema

CREATE TABLE fact_shortform_video_engagement (
  dt                   DATE,
  impression_id        BINARY(16),
  user_id              BIGINT,
  ad_id                BIGINT,
  creative_id          BIGINT,
  -- Watch time
  view_started         BOOLEAN,        -- autoplay initiated
  view_duration_ms     INT,            -- actual time spent in view
  completions          INT,            -- 1 if watched to end
  loops                INT,            -- replays
  -- Audio
  sound_on             BOOLEAN,
  sound_on_duration_ms INT,
  -- Interaction
  likes                INT,
  comments             INT,
  shares               INT,
  saves                INT,
  profile_visits       INT,
  -- Negative
  swiped_away_ms       INT,            -- if swiped, how long before
  marked_not_interested BOOLEAN,
  hidden_ad            BOOLEAN
)
PARTITIONED BY (dt);

Key metrics

  • Thumbstop rate = 3-second views / impressions. "Did we stop the scroll?"
  • Hold rate = 15-second views / 3-second views. "Did the story hold them?"
  • Completion rate = completions / impressions.
  • Sound-on rate = sound_on views / total views. Branding signal.
  • Negative signal rate = (not_interested + hidden) / impressions.

Pipeline pattern

Short-form video events are 10–20× higher volume than static ad events because the user scrolls fast. Stream the engagement events, batch the aggregate:

Pipeline Flow
client streaming spine stream processor (1-min windows)
from stream processor (1-min windows)
real-time creator-facing metrics
bronze (raw) → silver (cleaned) → gold (daily)

The rendering client emits engagement events at 250ms intervals while the video is on-screen. Dedupe aggressively: one impression_id can produce 60+ engagement events.


Section 2: Stories Ads — Tap-Through Funnels

The format

Stories ads are full-screen vertical cards that auto-advance every 5–15 seconds. The ad is one or two cards inserted between organic stories. User actions are limited: tap to advance, hold to pause, swipe up for CTA, swipe left to exit the whole story sequence.

Funnel

Diagram
  • entry ad shown ad completed swipe_up (CTA)
    • skipped (tap next)
    • exited (swipe away)

Schema

CREATE TABLE fact_stories_engagement (
  dt                     DATE,
  impression_id          BINARY(16),
  user_id                BIGINT,
  ad_id                  BIGINT,
  card_index             INT,           -- 1 or 2 for multi-card ads
  -- Outcomes
  duration_shown_ms      INT,
  completed              BOOLEAN,
  tapped_next            BOOLEAN,       -- user skipped mid-card
  swiped_up              BOOLEAN,       -- tapped CTA
  swiped_away            BOOLEAN,       -- exited the whole sequence
  paused_duration_ms     INT
)
PARTITIONED BY (dt);

Metrics that matter

  • Completion rate = completed / impressions.
  • Exit rate = swiped_away / impressions. High exit rate means "my ad made them leave" — one of the worst UX signals in ads.
  • Tap-through rate = swiped_up / impressions.

Pitfalls

  • Stories are consumed in rapid bursts. User watches 50 stories in 3 minutes; your ad has maybe 2 seconds of attention.
  • The position in the sequence matters enormously. Ads after story #15 perform much worse than after story #3. Log position as a first-class dimension.

The format

Carousel ads are 2–10 swipeable cards, each with its own image/video + headline + CTA. Used heavily in e-commerce and education.

What's unique

Engagement is per-card. Advertiser wants to know: "which card got the click, which was skipped, which converted."

Schema

CREATE TABLE fact_carousel_engagement (
  dt              DATE,
  impression_id   BINARY(16),
  user_id         BIGINT,
  ad_id           BIGINT,
  card_id         BIGINT,       -- PK: unique per card
  card_position   INT,          -- 1..10
  viewed          BOOLEAN,      -- scrolled into view
  view_duration_ms INT,
  clicked         BOOLEAN,
  converted       BOOLEAN       -- attributed later
)
PARTITIONED BY (dt);

Common query: which card drove conversion?

WITH attributed AS (
  SELECT
    f.conversion_id,
    f.campaign_id,
    c.card_id,
    c.card_position
  FROM fact_attributed_conversion f
  JOIN fact_carousel_engagement c
    ON f.click_id = c.click_id          -- join through click context
)
SELECT
  campaign_id,
  card_position,
  COUNT(*) AS conversions,
  RANK() OVER (PARTITION BY campaign_id ORDER BY COUNT(*) DESC) AS rank_in_campaign
FROM attributed
GROUP BY 1, 2
ORDER BY campaign_id, rank_in_campaign;

Pitfall: card fatigue

The first card is always the most-viewed. Measuring "card engagement rate" without normalizing for position will always make the first card look best. Compute conditional engagement — given that a user viewed the card, did they engage? — so positions are comparable.


Section 4: AI-Driven Automated Performance Campaigns

The format

The advertiser provides a budget, an objective (conversions, app installs, etc.), a set of creatives, and minimal targeting — often just a country. The platform decides everything else: audience, placement, bid, creative rotation.

This is now the dominant paradigm in direct-response ads. Different platforms call it different things (auto-optimized campaigns, performance-plus, broad-targeted delivery, etc.). The DE challenges are structural.

What's different for data engineering

  • Attribution is fuzzy. Without advertiser-selected audiences, you can't report "your audience of runners converted at 3%." There is no chosen audience. You have to report discovered segments instead.
  • Diagnostic reporting is different. The advertiser can't A/B test between "audience A vs audience B" — the platform does that internally. You report diagnostics, not levers.
  • Platform reporting = platform accountability. If the platform chose everything, the platform has to show why.

Diagnostic schema

CREATE TABLE fact_ai_campaign_diagnostics (
  dt             DATE,
  campaign_id    BIGINT,
  -- What the platform chose
  surface_mix    MAP<STRING, DOUBLE>,   -- e.g., {"main_feed": 0.4, "short_form_video": 0.5, ...}
  audience_mix   MAP<STRING, DOUBLE>,   -- demographics, interest buckets (without user-level info)
  creative_mix   MAP<BIGINT, DOUBLE>,   -- creative_id → share of spend
  placement_mix  MAP<STRING, DOUBLE>,
  -- Performance by choice
  cpa_by_surface MAP<STRING, DOUBLE>,
  cpa_by_creative MAP<BIGINT, DOUBLE>,
  -- Exploration vs exploitation
  exploration_spend_pct DOUBLE
) PARTITIONED BY (dt);

Reporting philosophy

Senior DE insight: "Reports for AI-driven campaigns are advertiser-readable summaries of what the platform explored and why it converged on what it did." Building this well is harder than reporting classic campaigns. You need trust signals — "we spent 12% of your budget exploring, found that Surface X converts 2× better, and shifted 60% of your budget there in week 2."


Section 5: Dynamic Product Ads (Catalog-Driven)

The format

The advertiser uploads a product catalog (CSV, XML feed, API). The platform matches users to products via browse/cart signals and renders a templated creative at auction time. One ad, millions of variants.

Data model

-- Catalog
CREATE TABLE advertiser_catalog (
  advertiser_id   BIGINT,
  product_id      STRING,            -- advertiser's ID
  product_sku     STRING,
  title           STRING,
  description     STRING,
  price_usd       DECIMAL(10,2),
  image_url       STRING,
  category        STRING,
  availability    STRING,             -- 'in_stock', 'out_of_stock'
  updated_ts      TIMESTAMP,
  PRIMARY KEY (advertiser_id, product_id)
);

-- User-product signals
CREATE TABLE user_product_signals (
  user_id         BIGINT,
  advertiser_id   BIGINT,
  product_id      STRING,
  signal_type     STRING,              -- 'view', 'add_to_cart', 'purchase'
  signal_ts       TIMESTAMP,
  decay_score     DOUBLE               -- 1.0 freshest → 0.0 stale
) PARTITIONED BY (DATE(signal_ts));

-- Impressions tagged with the specific product shown
CREATE TABLE fact_dpa_impression (
  dt              DATE,
  impression_id   BINARY(16),
  user_id         BIGINT,
  advertiser_id   BIGINT,
  campaign_id     BIGINT,
  product_id      STRING,               -- which product variant was rendered
  template_id     BIGINT,
  rendered_price  DECIMAL(10,2),
  is_retargeting  BOOLEAN                -- user had prior signal for this product
) PARTITIONED BY (dt);

Key challenges

  • Catalog freshness. An out-of-stock product shown to a user wastes the impression. Ingest catalog updates hourly minimum. Fail the pipeline if a catalog is >24 hours stale.
  • Signal decay. A "view" from 30 days ago shouldn't drive an impression today. Decay signals with a 7-day half-life.
  • Privacy. Catalog-level targeting uses browsing history — subject to all local privacy regimes. Per-user signal must be deletable on request.

Reporting: ROAS by product

SELECT
  advertiser_id,
  product_id,
  SUM(impressions) AS imps,
  SUM(spend_usd) AS spend,
  SUM(attributed_revenue) AS revenue,
  SUM(attributed_revenue) / NULLIF(SUM(spend_usd), 0) AS roas
FROM fact_dpa_impression
JOIN fact_attributed_conversion USING (impression_id)
WHERE dt BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
GROUP BY 1, 2
HAVING SUM(spend_usd) > 100
ORDER BY roas DESC;

Advertisers love per-product ROAS reports. Make this query fast.


Section 6: Cross-Surface Unified Pipeline

The pattern

A user can see the same ad on Main Feed, Visual Feed, Short-form Video, Stories, and Publisher Network — potentially on the same day. Your reporting needs to show:

  • Per-surface performance (diagnostic).
  • Cross-surface reach (total unique users across all surfaces).
  • Frequency capping across surfaces (don't show the same ad 20× by stacking surfaces).

Schema

CREATE TABLE fact_ad_exposure (
  dt              DATE,
  impression_id   BINARY(16),
  user_id         BIGINT,
  ad_id           BIGINT,
  campaign_id     BIGINT,
  surface_id      INT,
  surface_name    STRING,          -- 'main_feed', 'visual_feed', 'short_form_video', ...
  placement_id    INT,
  impression_ts   TIMESTAMP
) PARTITIONED BY (dt);

Cross-surface reach

SELECT
  campaign_id,
  COUNT(DISTINCT user_id) AS cross_surface_reach,
  COUNT(DISTINCT CASE WHEN surface_name = 'main_feed' THEN user_id END) AS main_feed_reach,
  COUNT(DISTINCT CASE WHEN surface_name = 'visual_feed' THEN user_id END) AS visual_feed_reach,
  COUNT(DISTINCT CASE WHEN surface_name = 'short_form_video' THEN user_id END) AS sfv_reach,
  -- Surface-exclusive reach (users seen only on this surface)
  COUNT(DISTINCT CASE WHEN n_surfaces = 1 AND surface_name = 'main_feed' THEN user_id END) AS main_feed_exclusive
FROM (
  SELECT *,
    COUNT(DISTINCT surface_name) OVER (PARTITION BY campaign_id, user_id) AS n_surfaces
  FROM fact_ad_exposure
  WHERE dt BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
) t
GROUP BY campaign_id;

Cross-surface frequency cap

A frequency cap of "10 per week" must be enforced globally, not per surface. In the streaming path: the pacing KV stores (user_id, campaign_id) → count_this_week and the auction gate reads from this key on every candidate selection. Decay the counter at week-boundary.


Section 7: A–Z Glossary

Compressed version. Every term a senior ads DE should know cold.

A

  • A/B test — controlled experiment with random assignment. Single-treatment variant.
  • Ad exchange — a marketplace where ad buyers and sellers meet via real-time bidding.
  • Ad load — fraction of feed items (or minutes of video watched) that are ads. Most dials in product, highest-stakes experiment.
  • Ad request — when a client asks the platform for an ad to fill a slot.
  • Ad server — the system that selects and delivers the winning ad after the auction.
  • Ads platform — the full system serving ads. Used generically throughout this guide.
  • Advertiser — the entity running the campaign; pays for impressions/clicks/conversions.
  • Aggregated Metrics Platform (AMP) — unified pipeline that materializes every platform metric from a single event source.
  • App install campaign — objective where advertiser wants installs, bid on CPI or CPE.
  • Attribution — assigning credit for a conversion to an ad event (click or view).
  • Attribution window — the time range within which a conversion is credited to a prior ad interaction.
  • Audience — a group of users defined for targeting (static list, rule-based, or lookalike).
  • Audience overlap — the fraction of users in common between two audiences.
  • Auction — the mechanism that chooses which ad wins an ad slot.
  • Auto-optimized campaign — the advertiser lets the platform choose audience, placement, creative rotation, bidding. See Section 4 of this part.

B

  • Backfill — reprocessing historical data to correct or populate a table.
  • Bid — the amount an advertiser is willing to pay for an outcome.
  • Bid request — the upstream signal "a slot is available, who wants it."
  • Bid shading — bid-management technique that reduces a bid to the second-price expected value in first-price auctions.
  • Bidding strategy — how the advertiser controls their bids (manual, target CPA, lowest cost, value-optimization).
  • Bloom filter — probabilistic data structure used to represent audience membership efficiently.
  • Bronze / Silver / Gold — medallion architecture tiers for data warehouses.
  • Brand campaign — objective is reach or awareness; bid on CPM.
  • Budget pacing — controlling the rate of spend through the day.

C

  • CAPI (Server Conversion API) — advertiser sends conversion events directly server-to-server; bypasses browser.
  • Campaign — a group of ads sharing an objective and budget.
  • Carousel — multi-card swipeable ad format.
  • Catalog — advertiser's product feed used for dynamic product ads.
  • CCPA / CPRA — California Consumer Privacy Act; opt-out + deletion rights.
  • Clean room — neutral environment where advertiser + platform join hashed IDs under privacy constraints.
  • Clearing price — what the auction winner actually pays (often second-price).
  • Click-through rate (CTR) — clicks / impressions.
  • Conversion — the advertiser's desired outcome (purchase, signup, install).
  • Conversion window — same as attribution window.
  • Cost-per-thousand (CPM) — cost per thousand impressions.
  • Cost-per-click (CPC) — cost per click.
  • Cost-per-action (CPA) — cost per defined action (conversion, install, etc.).
  • Creative — the actual rendered asset (image, video, copy, card).
  • Creative fatigue — declining engagement with repeated exposure.
  • Cross-surface — touching multiple product surfaces (Main Feed, Visual Feed, etc.).
  • CUPED — Controlled pre-Experiment Using Pre-data; variance-reduction technique.

D

  • DAU — Daily Active Users.
  • Deferred deep link — link that remembers its destination through the install flow.
  • Delivery — the act of serving an ad into a slot.
  • Delivery monitoring — real-time alerting on under- or over-delivery.
  • Demand-Side Platform (DSP) — buy-side adtech that bids on behalf of advertisers.
  • Differential privacy (DP) — formal privacy guarantee; add calibrated noise to aggregates.
  • Dim — dimension table. Typically SCD-2.
  • DMA / DSA — EU's Digital Markets Act and Digital Services Act.
  • Dynamic Product Ads (DPA) — catalog-driven template ads; one ad, many product variants.

E

  • Engagement — any user interaction (like, comment, share, save).
  • Event — a logged user action or system action.
  • Experiment — A/B or multi-variant test.
  • Experiment exposure — first moment a user saw the treatment.
  • Exactly-once — processing semantics where each event is processed exactly once end-to-end.

F

  • Fact — fact table; event-grain table of measurable actions.
  • False Discovery Rate (FDR) — multiple-comparison correction method.
  • Fill rate — fraction of ad requests that got a winning ad.
  • First-price auction — winner pays their bid.
  • Flink — example of a stateful stream processor.
  • Floor price (reserve price) — minimum acceptable bid for the auction.
  • Frequency — impressions per unique user.
  • Frequency cap — maximum impressions per user per window.

G

  • GDPR — EU's General Data Protection Regulation.
  • Generalized Second-Price (GSP) — auction format common in sponsored search.
  • Gold — serving-layer / semantic layer of the medallion architecture.
  • Growth accounting — NEW/RETAINED/RESURRECTED/DORMANT framework for DAU/MAU decomposition.

H

  • Holdout — a fraction of users excluded from ads to measure incremental lift.
  • HyperLogLog (HLL) — probabilistic cardinality sketch.

I

  • Iceberg — open table format supporting ACID on object storage.
  • Identity graph — graph of observed identifiers per user across devices/sessions.
  • IDFA / GAID — iOS / Android advertising identifiers.
  • IDFV — identifier for vendor (iOS, per-app).
  • Impression — an ad was rendered and viewable.
  • Incrementality — causal lift attributable to the ad; measured via holdout.
  • Invalid traffic (IVT) — bot/fraud impressions filtered from billable.
  • iOS ATT — App Tracking Transparency; requires user consent for IDFA access.

J

  • Jaccard similarity — size of intersection / size of union. Used for audience overlap.

K

  • Kafka — example of a distributed event streaming spine.
  • Kappa architecture — streaming-only alternative to Lambda.
  • KPI — Key Performance Indicator.

L

  • L28 — "Last 28 days" active. Most predictive retention feature on consumer apps.
  • Lambda architecture — hybrid streaming + batch pipeline with reconciliation.
  • Late-arriving data — events that arrive past the window they should be aggregated in.
  • Lead form — ad format that collects lead information inline.
  • LGPD — Brazil's General Data Protection Law.
  • Lineage — DAG of which data objects feed which.
  • Lookalike audience — computed audience similar to a seed set.

M

  • Main Feed — primary consumption surface (infinite scrolling organic + ads).
  • MAU — Monthly Active Users.
  • Materialized view — cached query result that refreshes on a cadence.
  • Messaging platform — standalone messaging surface.
  • Metric registry — versioned, centrally-defined catalog of metric definitions.
  • MMP (Mobile Measurement Partner) — third-party app-install attribution provider.
  • Multi-touch attribution (MTA) — credit-splitting across multiple ad touches pre-conversion.

N

  • North-star metric — top-level success metric.
  • NPS — Net Promoter Score.

O

  • Objective — what the campaign optimizes for (reach, traffic, conversions, installs, etc.).
  • Optimization goal — the specific event the bidder is trying to maximize.
  • Over-delivery — platform served more than the budget allowed.

P

  • Pacing — rate of spend through the day.
  • Pacing gate — real-time check that prevents further delivery once a threshold is crossed.
  • Partition — unit of data in a table (usually by date).
  • Pinot / Druid / ClickHouse — examples of real-time OLAP stores.
  • Pixel — web tracking tag installed by the advertiser to log conversions.
  • Placement — the specific position within a surface (in-feed, in-stream, top-of-stories).
  • PIPL — China's Personal Information Protection Law.
  • Postback — server-to-server callback, used in SKAdNetwork and MMPs.
  • Privacy regime — GDPR, CCPA, LGPD, PIPL, ATT, etc.
  • Publisher Network — third-party publisher inventory monetized by the ads platform.

Q

  • QPS — Queries Per Second.

R

  • Reach — unique users exposed.
  • Reach and frequency — standard brand-campaign pair of metrics.
  • Real-time bidding (RTB) — auction conducted in ~100ms per ad request.
  • Reconciliation — correcting streaming estimates with batch-truth.
  • Retention — fraction of users from a cohort still active at day N.
  • Retargeting — showing ads to users who previously interacted with the brand.
  • ROAS — Return on Ad Spend = revenue / spend.

S

  • SCD-2 — Slowly Changing Dimension type 2 (history of value changes).
  • Second-price auction — winner pays the second-highest bid.
  • Server conversion API — see CAPI.
  • Session — bounded period of user activity, usually defined by inactivity gap.
  • Shadow mode — new pipeline runs in parallel; output compared, not served.
  • Short-form Video — vertical, full-screen, auto-play video surface.
  • SKAdNetwork (SKAN) — Apple's privacy-preserving install attribution.
  • Slot — a specific ad position in a specific surface.
  • Spend-forecast — predicting what a campaign will spend over its lifetime.
  • Stateful stream processor — stream processor that maintains per-key state (e.g., Flink).
  • Static audience — explicit user list uploaded by advertiser.
  • Stories — time-bounded vertical story card format, auto-advancing.
  • Streaming spine — event bus (Kafka analog).
  • Surface — a product surface (Main Feed, Visual Feed, Short-form Video, Stories, etc.).

T

  • Targeting — audience rules for who can see the ad.
  • Third-party cookie — browser cookie set by a domain other than the current site; deprecated.
  • Thumbstop rate — 3-second views / impressions. Short-form video metric.
  • TOS — Terms of Service.
  • Trino / Presto — distributed SQL engines.
  • Truncation — dropping events or rows to fit storage budgets.

U

  • UMP (Unified Metrics Platform) — same as AMP.
  • Under-delivery — platform served less than the budget allowed.
  • Unit of randomization — the entity on which experiment assignment is done (user, device, session, ad request).

V

  • Variance reduction — techniques (CUPED, stratified assignment) that reduce required sample size.
  • VCG (Vickrey-Clarke-Groves) — truthful auction format.
  • Viewability — fraction of impressions that met the viewability standard (e.g., MRC: 50% of pixels on screen for 1+ second).
  • Visual Feed — image-and-video-first consumption surface.

W

  • WAU — Weekly Active Users.
  • Warehouse — data warehouse (BigQuery, Snowflake, Redshift, etc.).
  • Watermark — in stream processing, the timestamp below which no more events are expected.

X

  • XML feed — structured catalog feed (used in DPA).

Y

  • Yield optimization — the supply-side act of maximizing revenue per ad slot.

Z

  • Zero-party data — data the user deliberately shares (preferences, quiz answers).
  • Z-score — standardized metric deviation from baseline.

← Master Index · ← Part 3 · Part 5 →

↑ Back to top

Part 05

Visualization & Measurement

Part 5: Visualization & Measurement of Campaign Objectives

"A campaign only succeeded if you can prove it. And you can only prove it if the measurement was designed before the campaign launched."


Section 1: The Campaign Scorecard Template

Every campaign, regardless of objective, should roll up to a one-page scorecard. The shape is identical. Only the numerator rows change.

Tradeoff Triangle
CAMPAIGN
  • Z%
  • see per-objective blocks below
DELIVERY
  • Z%
  • see per-objective blocks below
SFV
  • Z%
  • see per-objective blocks below

Every scorecard has four blocks: delivery, objective-specific KPIs, benchmarks, health. Every block is non-negotiable. Dropping benchmarks turns a scorecard into a number dump.


Section 2: Reach & Frequency (Awareness Objective)

What matters

  • Reach — unique users exposed at least once.
  • Frequency — impressions per reached user.
  • Frequency distribution — not just the average. 90% of users at freq 2 and 10% at freq 20 is very different from everyone at freq 4.

The key visualization: frequency distribution histogram

Impressions per user (7-day window)
8
1
7
2
5
3
4
4
2
5
1
6
0
7
0
8+
0
11+

Overlay: vertical line at the frequency cap (e.g., 8). Count of users above the cap is your "over-cap" rate — should be near zero.

SQL

WITH user_freq AS (
  SELECT
    user_id,
    COUNT(*) AS impressions_7d
  FROM fact_impression
  WHERE campaign_id = :c
    AND dt BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
    AND is_valid = TRUE
  GROUP BY user_id
),
buckets AS (
  SELECT
    CASE
      WHEN impressions_7d = 1  THEN '01'
      WHEN impressions_7d <= 3 THEN '02_03'
      WHEN impressions_7d <= 6 THEN '04_06'
      WHEN impressions_7d <= 10 THEN '07_10'
      ELSE '11_plus'
    END AS freq_bucket,
    COUNT(*) AS users,
    SUM(impressions_7d) AS impressions
  FROM user_freq
  GROUP BY 1
)
SELECT
  freq_bucket,
  users,
  impressions,
  users * 1.0 / SUM(users) OVER () AS users_share,
  impressions * 1.0 / SUM(impressions) OVER () AS impressions_share
FROM buckets
ORDER BY freq_bucket;

Health check

  • Reach curve that flattens early. If reach stops growing after day 3, you're oversaturating a small audience. Expand targeting.
  • Right tail that's too fat. >10% of users at freq 10+ means your frequency cap isn't working. Investigate pacing.

Section 3: Conversion Funnel (Conversions Objective)

The funnel

Dashboard · Conversion Funnel

From impression to return purchase — live drop-off

    above benchmark below benchmark on par Tap a tab to switch attribution window.
    Conversion Path
    Impressions
    CTR
    Clicks
    landing page view rate
    Landing Page Views
    add-to-cart rate
    Add to Cart
    checkout initiation rate
    Checkout
    purchase rate
    Purchase
    Return Purchase (30d)

    Each stage's drop-off is diagnostic.

    Visualization: funnel bar chart

    Dashboard · Conversion Funnel

    From impression to return purchase — live drop-off

      above benchmark below benchmark on par Tap a tab to switch attribution window.
      Funnel
      Impressions
      10,000,000
      Clicks
      150,000
      CTR 1.5%
      Page Views
      135,000
      90%
      Add to Cart
      22,500
      16.7%
      Checkout
      13,500
      60%
      Purchase
      5,400
      40%

      Annotate with industry benchmarks. If your click→purchase rate is 3.6% vs vertical median 4.5%, you know where to focus.

      ROAS by attribution window

      Dashboard · Attribution

      Same campaign. Same reality. Five different ROAS numbers.

      1-day click$12.0K · 2.4×
      1-day view$18.0K · 3.6×
      7-day click$23.0K · 4.6×
      7-day click + 1-day view$29.5K · 5.9×
      28-day click$27.0K · 5.4×
      The attribution window is not a number — it's a policy. A senior analyst always answers "which window" before reporting ROAS. Standardize across advertisers; show all three (1d / 7d / 28d) side by side.

      A critical chart. The same campaign shows very different ROAS at 1d-click vs 7d-click vs 28d-click. Advertisers need to see all three.

      Attribution Window    Attributed Revenue    ROAS
        1-day click          $12,000              2.4x
        7-day click          $23,000              4.6x
       28-day click          $27,000              5.4x
        1-day view           $18,000              3.6x
        7-day click + 1v     $29,500              5.9x
      

      SQL

      WITH windows AS (
        SELECT 'click_1d' AS window_label, 1 AS click_d, 0 AS view_d UNION ALL
        SELECT 'click_7d', 7, 0 UNION ALL
        SELECT 'click_28d', 28, 0 UNION ALL
        SELECT 'click_7d_view_1d', 7, 1
      ),
      attributed AS (
        SELECT
          w.window_label,
          SUM(c.conv_value) AS revenue
        FROM windows w
        CROSS JOIN fact_attributed_conversion c
        WHERE c.campaign_id = :c
          AND (
            (c.source = 'click' AND DATEDIFF(c.conversion_ts, c.click_ts) <= w.click_d)
            OR
            (c.source = 'view'  AND w.view_d > 0
              AND DATEDIFF(c.conversion_ts, c.view_ts) <= w.view_d)
          )
        GROUP BY w.window_label
      ),
      spend AS (
        SELECT SUM(spend_cents) / 100.0 AS spend_usd
        FROM fact_impression
        WHERE campaign_id = :c
      )
      SELECT
        a.window_label,
        a.revenue,
        s.spend_usd,
        a.revenue / NULLIF(s.spend_usd, 0) AS roas
      FROM attributed a CROSS JOIN spend s
      ORDER BY a.window_label;
      

      Section 4: App Install Funnel (App Installs Objective)

      The funnel

      ImpressionsClicksInstall-Page OpenInstallFirst-OpenEvent1Event2Retention D7

      App installs are their own beast. Attribution is through a Mobile Measurement Partner (MMP) in most cases, or SKAdNetwork on iOS.

      Install-to-event funnel

      Dashboard · App Install Funnel

      Install to purchase to D7 retention

      cohort size drop
      D7 retention is measured against installs, not the previous step. That's the comparable number for LTV modeling.
      Installs               10,000
      First-open              8,500 (85%)
      Tutorial complete       5,100 (60% of first-open)
      Registration            3,400 (40%)
      Purchase                1,200 (14%)
      D7 retention            2,800 (28% of installs)
      

      The "event" schema

      Advertiser-defined milestone events flow back via the MMP or server API:

      CREATE TABLE fact_app_event (
        dt            DATE,
        install_id    BINARY(16),         -- from install attribution
        app_id        BIGINT,
        event_name    STRING,              -- 'first_open', 'tutorial_complete', 'purchase', ...
        event_ts      TIMESTAMP,
        value_usd     DECIMAL(18,4)        -- for purchases
      )
      PARTITIONED BY (dt);
      

      Retention curve

      WITH installs AS (
        SELECT install_id, install_ts
        FROM fact_install
        WHERE campaign_id = :c
          AND install_ts BETWEEN :start AND :end
      ),
      opens AS (
        SELECT install_id, DATE(event_ts) AS open_dt
        FROM fact_app_event
        WHERE event_name = 'app_open'
      )
      SELECT
        DATEDIFF(o.open_dt, DATE(i.install_ts)) AS days_since_install,
        COUNT(DISTINCT o.install_id) * 1.0 /
          (SELECT COUNT(DISTINCT install_id) FROM installs) AS retention_rate
      FROM installs i
      JOIN opens o USING (install_id)
      WHERE o.open_dt BETWEEN DATE(i.install_ts) AND DATE(i.install_ts) + INTERVAL '30' DAY
      GROUP BY days_since_install
      ORDER BY days_since_install;
      

      Standard curve is D1 ~50%, D7 ~25%, D30 ~15% for a solid app. Less than that, the install was probably a low-quality install from poor audience targeting.

      SKAN caveat

      SKAdNetwork postbacks are:

      • Delayed (24h minimum, up to 60h).
      • Conversion-value-constrained (6-bit on classic SKAN, 64-bit with SKAN 4).
      • Subject to anonymization thresholds (no postback below cohort-size floor).
      • Non-deterministic at granular grain.

      Your SKAN pipeline has to handle all four. Don't mix SKAN-derived metrics with MMP-derived metrics in the same chart without clear labeling.


      Section 5: Lead Quality Scoring (Lead Gen Objective)

      The format

      Advertiser runs a lead form ad. User fills it out without leaving the platform. Advertiser receives the lead record for follow-up.

      The pipeline

      lead_form_impressionlead_form_openlead_form_submit(advertiser CRM)qualified_leadclosed_deal

      The last two stages happen in the advertiser's CRM, not on the ad platform. So:

      • Volume metrics (submissions, CPL) — measurable on-platform.
      • Quality metrics (qualified rate, close rate, deal value) — require advertiser to feed back.

      Feedback API

      The advertiser's CRM reports back qualified-lead status (and eventually close/value) via server-to-server API. Schema:

      CREATE TABLE lead_outcome_feedback (
        lead_id         BINARY(16),
        campaign_id     BIGINT,
        advertiser_id   BIGINT,
        submitted_ts    TIMESTAMP,
        qualified       BOOLEAN,
        qualified_ts    TIMESTAMP,
        closed_won      BOOLEAN,
        closed_ts       TIMESTAMP,
        deal_value_usd  DECIMAL(18,4)
      );
      

      Lead-quality ratio

      SELECT
        campaign_id,
        COUNT(*) AS leads_submitted,
        SUM(CASE WHEN qualified THEN 1 ELSE 0 END) AS qualified,
        SUM(CASE WHEN qualified THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS qualified_rate,
        SUM(CASE WHEN closed_won THEN deal_value_usd ELSE 0 END) AS revenue,
        SUM(CASE WHEN closed_won THEN deal_value_usd ELSE 0 END) /
          NULLIF(SUM(spend_usd), 0) AS ROAS_leadgen
      FROM lead_outcome_feedback f
      JOIN (SELECT campaign_id, SUM(spend_cents)/100.0 AS spend_usd
            FROM fact_impression WHERE dt BETWEEN :start AND :end
            GROUP BY campaign_id) s USING (campaign_id)
      WHERE f.submitted_ts BETWEEN :start AND :end
      GROUP BY campaign_id
      ORDER BY ROAS_leadgen DESC;
      

      Key chart: lead funnel + cycle time

      Dashboard · Lead Gen

      Leads → qualified → closed · with cycle time and ROAS

      Spend
      $60,000
      Revenue
      $450,000
      ROAS
      7.5×
      Leads submitted
      1,200
      Cost per lead · $50
      60% pass quality gate · 7d
      Qualified
      720
      SQL-to-CRM lag · 7d
      25% close · 30d
      Closed won
      180
      Avg deal · $2,500
      Cycle-time distribution
      1-7d8-14d15-21d22-28d29d+
      Avg cycle · 18 days. Do not declare failure at week 2 — majority of closes land between day 15 and 28.
      Leads submitted:        1,200
      Qualified (within 7d):    720 (60%)
      Closed won (within 30d):  180 (15%)
      Avg cycle time:           18 days
      Revenue:              $450,000
      Spend:                 $60,000
      ROAS:                    7.5x
      

      The "cycle time" dimension matters — you can't declare the campaign a failure after 2 weeks if the average lead closes on day 18.


      Section 6: Cross-Objective Benchmark Table

      Most senior dashboards include a benchmark row showing where this campaign sits relative to a reference set.

      WITH this_campaign AS (
        SELECT SUM(impressions) imps, SUM(clicks) clicks, SUM(spend_usd) spend, SUM(revenue) rev
        FROM metric_campaign_daily
        WHERE campaign_id = :c AND dt BETWEEN :start AND :end
      ),
      advertiser_baseline AS (
        SELECT
          PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY clicks * 1.0 / imps) AS median_ctr,
          PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY rev / spend) AS median_roas
        FROM (
          SELECT campaign_id,
                 SUM(impressions) AS imps,
                 SUM(clicks) AS clicks,
                 SUM(spend_usd) AS spend,
                 SUM(revenue) AS rev
          FROM metric_campaign_daily
          WHERE advertiser_id = (SELECT advertiser_id FROM dim.campaign WHERE campaign_id = :c)
            AND dt BETWEEN CURRENT_DATE - 28 AND CURRENT_DATE - 1
          GROUP BY campaign_id
        ) t
        WHERE imps > 10000
      ),
      vertical_baseline AS (
        SELECT
          PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY clicks * 1.0 / imps) AS median_ctr,
          PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY rev / spend) AS median_roas
        FROM (...vertical-level aggregation...) t
      )
      SELECT
        tc.clicks * 1.0 / tc.imps AS campaign_ctr,
        ab.median_ctr AS acct_median_ctr,
        (tc.clicks * 1.0 / tc.imps) / NULLIF(ab.median_ctr, 0) - 1.0 AS vs_account_pct,
        vb.median_ctr AS vert_median_ctr,
        (tc.clicks * 1.0 / tc.imps) / NULLIF(vb.median_ctr, 0) - 1.0 AS vs_vertical_pct
      FROM this_campaign tc CROSS JOIN advertiser_baseline ab CROSS JOIN vertical_baseline vb;
      

      Benchmarks should always include at least two references — the advertiser's own trailing performance, and a vertical-wide median. One reference is too easy to game.


      Section 7: Visualization Anti-Patterns

      • Dual-axis charts. Always confusing. Split into two panels.
      • Mixed-metric stacked bars. "Clicks + impressions" stacked makes no sense.
      • Percentages without denominators. "CVR dropped 30%" — from 3% to 2.1% or from 0.1% to 0.07%? Always show the base.
      • Moving-average without the raw. Hides short-term changes.
      • Color-coded-only diff. 8% of users are color-blind. Always label with a sign or arrow too.
      • Truncated Y-axis. Zoomed axes lie. Start at zero unless there's a reason to not.

      Section 8: The One-Pager the Exec Actually Reads

      At the most senior level, the scorecard shrinks to one screen with five numbers:

      CAMPAIGN HEALTH — <campaign_name>
      
        Spend:         $54,230 (91% of budget)
        Primary KPI:   ROAS 4.2x  (target 4.0x)  ✓
        Guardrail:     IVT 0.8%   (cap 2.0%)     ✓
        Trajectory:    +12% vs. last week
        Next action:   Increase budget 20%, refresh creative
      
        🟢 Running well — no intervention needed.
      

      That's it. If the exec wants more, they'll click through. If they don't, you've already told them everything they need. Senior DE mastery is knowing when to stop adding numbers.


      Section 9: Measurement Checklist (before a campaign launches)

      • Objective is declared and mapped to exactly one primary KPI.
      • At least two guardrail metrics are declared.
      • Attribution window(s) are pre-committed; no post-hoc changes.
      • Benchmark references (account, vertical) are identified.
      • Dashboard is wired to the metric registry, not a hand-rolled SQL.
      • Incrementality measurement plan exists (holdout, or lift study, or forgo).
      • D+7 lock policy is written down — numbers become immutable on day 7.
      • Advertiser approved the measurement plan in writing.

      A campaign without measurement is a spend event. With measurement, it's a learning event. The DE's job is to make sure it's the second one.


      ← Master Index · ← Part 4


      Dashboard · Growth Accounting

      Dashboard · Growth Accounting (L28)

      Where does MAU actually come from — and where does it leak?

      0
      NewNever active before this period
      ResurrectedDormant ≥ 28d; active again this period
      RetainedActive last period and this period
      ChurnedActive last period, not this period
      Formula: Activet = Retained + New + Resurrected  ·  Activet-1 = Retained + Churned. Net = New + Resurrected − Churned.

      Dashboard · Where the Money Is Left on the Table

      Dashboard · Media Waste Waterfall

      $142,300 gross spend → $89,720 effective spend. Where does the other $52,580 go?

      Gross spend
      $142,300
      Invalid traffic (IVT, bots, fraud)
      −$9,000
      Pre-bid + post-bid filtering reduces to ~2%; expected loss on open exchange.
      Viewability failures
      −$7,000
      Ads rendered but not measurably visible. Pay-on-view or MRC-viewable pricing would recover this.
      Over-frequency (cap breaches)
      −$12,800
      Largest single leak. 14% of the reached audience saw >10 impressions in 7d. Enforce cap in the bidder, not post-hoc.
      Wrong-geo / wrong-lang spill
      −$3,500
      IP→geo resolver miss rate at the edges. Tighten geo predicate on the bid filter.
      Creative fatigue (CTR decay post-day 10)
      −$10,380
      Hero creative CTR down 22% in 14 days. Rotate at day 10, not day 20.
      Late-funnel attribution loss (iOS SKAN)
      −$5,000
      SKAN postback timeout on iOS installs + privacy thresholding. Supplement with server-side CAPI.
      Duplicate attribution (cross-network)
      −$4,900
      Same install credited to two networks because last-touch windows overlap. Reconcile with MMP.
      Effective spend (working media)
      $89,720
      Recoverable waste
      $52,580
      37% of gross spend
      Top fix by $ (90-day)
      Frequency cap
      Est. recovery: $10,200 / month
      Fastest fix (7-day)
      Creative rotation
      Est. recovery: $6,800 / month

      Dashboard · Campaign Improvement Opportunities

      Dashboard · Priority Matrix · Impact × Effort

      Six moves you can make next. Bubble size = monthly $ recovery.

      High impactLow impact
      Low effortHigh effort
      Do now
      Schedule
      Nice-to-have
      Defer / deprioritize
      Frequency cap $10.2K/mo
      Creative rotation $6.8K/mo
      Geo resolver $3.5K/mo
      Server-side CAPI $5.0K/mo
      Viewable pricing $2.1K/mo
      Cross-net dedupe $1.6K/mo
      Sum of monthly recovery across all six: $29.2K/mo · $350K annualized Assumes spend mix constant; each fix treated independently.

      ↑ Back to top

      Part 06

      The Room · Real Decisions

      If you've been preparing for an ads data-engineering loop by reading about attribution models, you've been preparing wrong. I've watched strong candidates who knew more about MTA than the interviewer lose the loop inside four minutes — not because their knowledge was weak, but because they answered the wrong question with the right words. This part is not a syllabus. It is a catalogue of the specific moments where candidates go silent, and what should have come out of their mouth instead.

      Everything below is simulation, not explanation. Read it with a timer. Say the answers out loud. If any of them feel uncomfortable to say, that is the signal that you haven't yet rehearsed them — not that the answer is wrong.


      1. The Scar Files

      Each of the transcripts below is a specific moment I have seen end a loop. The candidate's answer is not wrong on the surface. The answer is wrong because the interviewer is scoring something the candidate did not know was being tested. I will tell you what was being scored in each case. You will not forget it.

      Scar 1 — The "we use Kafka" death

      Interviewer: "Walk me through how your current ads pipeline handles impression events."

      Candidate: "Sure — we use Kafka for ingest, Flink for real-time, Spark for batch, Iceberg for storage, and Looker for dashboards."

      Interviewer (silent for 3 seconds, then): "OK. Anything else?"

      Candidate: "Uh… and we have data quality checks in Great Expectations."

      Interviewer: "Got it. Let's move on."

      Loop score at this point: weak hire, trending no hire. The candidate listed tools. The interviewer was asking them to tell a story. What should have come out of their mouth:

      "We carry roughly 800K impressions per second at peak, 300K steady. Kafka is partitioned by advertiser_id to keep hot advertisers' downstream pacing co-located — we learned that the hard way when LinkedIn-scale campaigns knocked over our initial user-id partitioning in April last year. Flink consumes into a pacing KV with 5-second tolerance; batch reconciles it hourly. The painful thing in our stack right now is that Iceberg compaction is bumping against the batch window on quarter-close weeks — I'm six weeks into solving it."

      Notice what the second answer does that the first doesn't: it gives numbers, a war story (the LinkedIn-scale campaign), a pain point you've lived with, and a thing you're actively working on. Interviewers score operational ownership, not tool knowledge. This is the single most common way strong candidates look mid-level.

      Scar 2 — Attribution, confidently wrong

      Interviewer: "How do you think about attribution for a campaign that runs across display and search?"

      Candidate: "We use last-click attribution. It's standard."

      Interviewer: "And your marketing team is OK with that?"

      Candidate: "Yeah, they've always used it."

      Interviewer (neutral tone): "I see. Next question."

      The candidate doesn't know it, but the loop is over. The interviewer was testing whether the candidate has ever been in the room when this was fought over. The correct answer is not a better attribution model. The correct answer is:

      "We default to last-click for operational attribution, but we maintain a parallel MTA lens for budget reallocation decisions. Last-click systematically under-credits upper-funnel — when our marketing director cut display spend 40% last year because last-click showed it was unprofitable, search ROAS fell 30% within six weeks, because search had been catching users that display had primed. We restored display. Now we show both lenses on the same dashboard, labeled by what decision each is fit for. I'd never recommend a team unify on one attribution number, because the number and the decision are coupled."

      What's being scored: whether the candidate has been inside the consequence of last-click. The tell is the six-week lag — you cannot fabricate that specific detail unless you lived it.

      Scar 3 — The "exactly once" lie

      Interviewer: "How do you guarantee impression counts are exact?"

      Candidate: "We use Kafka's exactly-once semantics and Flink's checkpointing, so we're exactly-once end to end."

      Interviewer: "OK. What about the SDK?"

      Candidate: "…it sends to Kafka."

      Interviewer: "Right. And when the user's phone loses network, the SDK buffers and retries?"

      Candidate: "…yes."

      Interviewer: "So you're not exactly-once end to end."

      The candidate never recovers. What they should have said:

      "Exactly-once is a useful guarantee within the Kafka-to-sink path, but the SDK side is at-least-once by design — retries on network drop create duplicates we have to catch at ingest. We run a triple-key dedupe on (user_id, event_name, event_ts rounded to the second) with a 24-hour reconciliation window against a suspected-duplicates sidecar. We accept a 0.1% false-positive loss rate in exchange for ~0.8% duplicate catch. Naming that budget in the contract with the advertiser team was the hardest conversation of that project."

      The signal: senior engineers know where "exactly-once" ends. They know it ends at the SDK. Candidates who claim end-to-end exactly-once reveal they have never been paged at 3 a.m. about a 2% revenue overcount.

      Scar 4 — The system design that doesn't scope

      Interviewer: "Design a pipeline for a new ads product that reports per-creative performance."

      Candidate (starts drawing): "OK so we have Kafka here, events go into bronze, then we have a silver layer…"

      Interviewer (four minutes in): "Who consumes this?"

      Candidate: "…the advertiser."

      Interviewer: "How often?"

      Candidate: "Daily?"

      Interviewer: "If it's daily, why did we just spend four minutes on a real-time ingestion layer?"

      I have watched this exchange kill three candidates this year. The cure is uncomfortably simple: do not draw a single box until you have asked who reads it and how often. The first 60 seconds of a system-design answer should be clarifying questions, not architecture.

      Scar 5 — The metric that didn't move

      Interviewer: "Tell me about a project you shipped."

      Candidate: "I built the new impression deduplication pipeline."

      Interviewer: "Great — what did it change?"

      Candidate: "It's faster and more accurate."

      Interviewer: "By how much? And what downstream behavior changed because it was more accurate?"

      Candidate: "…I'd have to check."

      If you do not know what your project changed in dollars, hours, or headcount, you shipped a feature, not an outcome. Senior candidates walk in with three numbers for every project: the number that moved, the number that stayed flat (to show you knew what to watch), and the number you'd still want to improve. "I'd have to check" closes the loop immediately.

      Scar 6 — The monitoring vacuum

      Interviewer: "How do you know your pipeline is healthy?"

      Candidate: "We have alerts on job failures."

      Interviewer: "What about when the job succeeds but the data is wrong?"

      Candidate: "We'd catch that in the downstream dashboards eventually."

      Interviewer (written in the scorecard afterwards): "No data-quality discipline. Weak on production ownership."

      "Eventually" is the word that killed this loop. The correct answer distinguishes pipeline liveness (job finished) from data correctness (output is what downstream expected), names the specific checks that run on every successful completion (row count floors, null rate ceilings, schema stability, freshness), and describes how violations route — blocking downstream until triage vs warning to a channel the owner watches.

      Scar 7 — The "it depends" that wasn't followed up

      Interviewer: "Should we denormalize this schema for the new reporting layer?"

      Candidate: "It depends."

      (Silence.)

      Interviewer: "On?"

      Candidate: "The tradeoffs."

      This candidate had the knowledge. They had built denormalized reporting tables. They lost the loop because they did not name the axes the decision depends on. "It depends" without what-it-depends-on is indistinguishable from "I don't know." The cure, drilled to reflex: "It depends on X, Y, and Z. If X then denormalize because of tradeoff T. If not, keep normalized because of tradeoff U. I'd lean toward denormalize unless [specific condition], because in my experience the reversibility cost of denormalization is low and the query-performance cost of over-normalization is high."

      Scar 8 — The candidate who was too junior by three words

      Interviewer: "How would you handle a schema change in the event stream?"

      Candidate: "I'd ask my manager."

      The candidate is a competent senior engineer. But they just signaled to the interviewer that the decision does not feel like theirs to make. Staff+ roles are scored on whether you own the decision, not whether you escalate it. The answer the interviewer needed to hear: "I'd flag it to my manager as context, and I'd bring a proposed path — versioned schema with a 30-day producer notice, backward-compatible field addition, dual-write during transition, consumer migration gated by a coverage metric. If the manager has a better path, great. If not, I execute." Owning the proposal is the signal.


      2. This Sounds Smart But Gets Rejected

      Every bullet below is an answer I have heard in an interview loop that sounded competent enough that the candidate sat down thinking they did well. In every case the loop was scored down. Read each one with the assumption that your first instinct was probably the answer the candidate gave.

      "We'd use Kafka for exactly-once delivery."

      Gets rejected because: Kafka exactly-once is a guarantee within Kafka, not across the SDK-to-sink path. The interviewer is probing for the candidate's understanding of where the guarantee breaks. The right answer names the breakpoints (SDK retries, consumer side effects outside the transaction, external API calls from the processor) before it names the feature.

      "We run data-quality checks in Great Expectations."

      Gets rejected because: it is a tool, not a discipline. The interviewer wants to know which specific checks run on which specific tables, how thresholds are set (learned or hand-configured), who is paged on violation, and what the ratio of false-positive pages is. If you cannot answer those four questions, "we use Great Expectations" reads as "we installed Great Expectations."

      "I'd partition by user_id for better parallelism."

      Gets rejected because: it reveals the candidate has not thought about cardinality explosion. user_id as a partition key in a file-based store produces tens of millions of files over a retention window, and the metadata overhead kills every engine that reads it. The right instinct is partition by date plus bucketed user_id.

      "We use medallion architecture: bronze, silver, gold."

      Gets rejected because: it is vocabulary. Any candidate can say this. The interviewer wants the rule for what earns a silver table, the criterion for promotion to gold, and a specific example of a row that lives in silver but not in gold (with reason). Without those specifics, "medallion" is a word, not an answer.

      "Let's put a cache in front of it."

      Gets rejected because: caching is the lazy answer to read-heavy performance. The interviewer is scoring whether the candidate thought about invalidation. A cache in front of an ads aggregation without invalidation will serve stale bid-eligibility decisions and cost real money. Naming the invalidation policy (TTL, event-driven, versioned key) is the senior signal.

      "We do idempotent writes using upserts."

      Gets rejected because: idempotent writes are easy; idempotent pipelines are hard. The interviewer probes: what about the side effects — emails sent, counters incremented, webhooks fired? Upserts handle the table. Pipelines need end-to-end idempotency, which often requires an explicit processed-events ledger the pipeline checks before acting.

      "Schema-on-read gives us flexibility."

      Gets rejected because: flexibility is engineer-speak for "we punted the decision downstream." The interviewer is checking whether the candidate has felt the pain of five downstream consumers each implementing a slightly different schema interpretation of the same raw events. The senior answer says: schema-on-read at bronze; enforced typed contract at bronze-to-silver; if you do it the other way you pay for it for years.

      "We'd scale horizontally."

      Gets rejected because: it is the answer that means "I haven't thought about where the bottleneck is." The interviewer is looking for whether the candidate knows which specific component is constrained: the producer, the broker, the consumer, the sink, or (most often in ads pipelines) the shuffle in the downstream aggregation. Horizontal scaling the wrong component moves the bottleneck without solving it.

      "We use Parquet because it's columnar."

      Gets rejected because: columnar is table stakes. The interviewer wants dictionary encoding, predicate pushdown, min-max statistics, bloom-filter usage, and row-group sizing decisions. If the candidate says "Parquet is columnar" and stops, they are a consumer of formats, not an engineer who has tuned one.

      "We'd denormalize for query speed."

      Gets rejected because: denormalization is a tool, not a default. The interviewer is checking whether the candidate understands the asymmetry: denormalization is easy to add, extremely expensive to remove. Senior candidates default to keeping the grain in silver and denormalizing only in gold, where the denormalization is attached to a specific consumer-facing view.

      "We use Airflow."

      Gets rejected because: "we use Airflow" says nothing about how your team treats Airflow. Who owns DAG code review? How do you test a DAG before production? How do you handle cross-DAG dependencies? How do you prevent DAG sprawl at 500+ DAGs? If the candidate has answers to those, they are senior. If not, they are downstream of someone who set up Airflow for them.

      "I would A/B test it."

      Gets rejected because: A/B testing for DE changes (not product changes) is a specific discipline. The interviewer probes: how do you split traffic? Do you backfill historical data through both pipelines for comparison? What's your minimum detectable effect, and how did you calculate sample size? "I'd A/B test" without these answers reads as a phrase the candidate learned without running one.


      3. The Ads Landmines — Where the Consequences Live

      These are not topics. They are the specific moments when an ads pipeline hurts you financially, loudly, and publicly. Every landmine below has taken someone down in a loop because they could not name the consequence — they could only name the concept.

      Landmine 1 — Last-click paid for itself until it didn't

      The landmine: leadership sees that paid-search is credited with 68% of last-click conversions and concludes that display is not working. They cut display 40%. Paid-search ROAS drops 30% six weeks later. Revenue craters for one quarter before the team reverses. That quarter is on the ads engineering team's scorecard because the attribution report they shipped told leadership display was unprofitable.

      What the interviewer is testing: whether you understand that attribution numbers are not neutral data. They are policy decisions wearing a number's clothes. The senior move is to ship last-click alongside an MTA view, and to label each view with the decision it is fit for — never a single number for a multi-channel decision.

      Landmine 2 — Conversions that arrive tomorrow, attributed to yesterday

      The landmine: your pipeline closes yesterday's books at 6 a.m. today. A conversion that arrives at 10 a.m. today, clicked yesterday, cannot be attributed to yesterday's report without rewriting yesterday's numbers. Rewriting yesterday's numbers means every dashboard that consumed yesterday silently drifts. Finance loses confidence within two quarters.

      What the interviewer is testing: whether you know the pattern — append-only, with event_time (when it happened in the world) and attribution_time (when it was credited in our books) stored separately. Dashboards choose which clock to read. History never mutates. Candidates who propose rerunning yesterday reveal they have never been in a finance-close conversation.

      Landmine 3 — The duplicate that grew revenue by 3% for two weeks

      The landmine: a SDK update inadvertently retries on transient server errors without marking retries as duplicates. Your revenue numbers quietly grow 3% for two weeks before anyone notices. Legal calls. The advertiser asks for a refund of the overcount. Your team spends a month on the reconciliation, another on writing a post-mortem that names which SDK version introduced the bug and when the dedupe logic failed to catch it.

      What the interviewer is testing: whether you know that duplicates are not a theoretical problem. They are an audit problem. The answer is a suspected-duplicate sidecar and a reconciliation budget negotiated with the advertiser team, so the 0.1% false-positive rate is an acceptable disclosed number rather than a surprise.

      Landmine 4 — Identity stitching that dropped 12% of conversions overnight

      The landmine: the identity team updates its graph with stricter confidence thresholds to address a privacy audit. Your pipeline now fails to stitch 12% of cross-device conversions that were previously counted. Advertiser dashboards show a 12% revenue drop that is not a revenue drop — it is a measurement drop. The distinction is lost on the advertiser, who calls the account manager, who calls your director, who calls you.

      What the interviewer is testing: whether you know that identity is an upstream dependency your pipeline does not own, and whether you have contract tests with the identity team — when they change the graph, you know before production reads it. Without a contract, every identity update is a potential incident.

      Landmine 5 — The pacing counter that ran 90 seconds behind during Black Friday

      The landmine: Kafka lag on peak. The real-time pacing counter that the bidder reads is 90 seconds stale. The bidder thinks the advertiser has $400 left when they actually have $50 left. 9,000 more auctions get won. The advertiser over-spends by $400. Multiply by every advertiser affected, for the duration of the lag spike. Finance's post-incident memo lands in your inbox with a number attached.

      What the interviewer is testing: whether you treat lag as a dollar-cost number. The senior move is a fallback pacing policy — at 30+ seconds of consumer lag, the bidder switches to conservative bidding (cap at 80% of target) automatically. Degradation is written into the design doc before the incident. Candidates who say "we'd scale Kafka" during an active incident reveal they have not lived through one.

      Landmine 6 — Kappa architecture that couldn't sign off on billing

      The landmine: your team migrates to streaming-only (Kappa) because it looks modern. Quarter close comes. Finance refuses to sign off on a revenue number sourced from a Flink checkpoint because it cannot be reproduced if the checkpoint is lost. You spend a weekend rebuilding a batch path. The next quarter's tech-planning meeting is heated.

      What the interviewer is testing: whether you know that Lambda is not dead; it is alive wherever a number has to survive an audit. Kappa for operational metrics, Lambda for billing. Finance will never accept a streaming-only source of truth, and senior engineers know that before they propose the architecture, not after.

      Landmine 7 — The iOS 14 revenue cliff nobody modeled

      The landmine: iOS 14 ships with ATT prompts. Opt-in rates are 15–30% industry-wide. Your iOS attribution numbers fall 60%+ overnight. Advertisers see "their iOS campaigns stopped working" and pause budget. The revenue impact is real and takes two quarters to explain with a reworked measurement stack that honors SKAN postback limits.

      What the interviewer is testing: whether you see regulatory changes as measurable engineering risks that should be modeled in advance, not as events that happen to you. Staff+ candidates maintain a privacy-impact register — a list of pending regulatory changes and the specific measurement systems each will affect. They do not get surprised.


      4. Thirty-Second Drills — Forced-Choice Tradeoffs

      For each drill below, start a timer. Answer out loud. If you pause more than ten seconds before speaking, you have not rehearsed it enough. If you do not say one of the named sentences below, you have said the wrong thing.

      Drill 1 — The 4 TB query

      Your query is correct but scans 4 TB. The interviewer is watching what you do next.

      What the junior says: "I'd add a partition filter." Why it fails: you jumped to optimization before asking what the query is for. A daily analyst query at 4 TB is fine; a 5-minute dashboard query at 4 TB is a six-figure monthly bill. The interviewer is scoring whether you distinguish between the two.

      What the senior says: "Two questions. Who runs this, and how often? If it is an analyst's one-off, 4 TB is acceptable. If it is a dashboard hitting it every 5 minutes, we materialize the aggregate in gold, not tune the scan. Which case are we in?"

      Drill 2 — SLA or correctness

      Your pipeline is late. You can ship stale-by-two-hours at 6 a.m., or you can slip to 8 a.m. and include everything.

      What the junior says: picks one. Why it fails: there is no engineer's answer — this is a stakeholder call.

      What the senior says: "If it is the CFO's revenue number, I am not making this call — I am paging the VP of Finance with both options and a recommended one in 10 minutes. If it is a team standup dashboard, I ship at 6 a.m. with a banner that says 'incomplete — reconciled at 8 a.m.' The axis is who is paying for the miss."

      Drill 3 — Facts as events or aggregated

      You are designing a click-stream schema. Thirty seconds: facts as events, or facts as aggregated impressions?

      What the senior says: "Events at bronze, always — the grain is irreversible going up and cheap going down. At silver: events unless we have no impression-level consumer, in which case daily aggregates. I ask what the hardest downstream query is; if any of them needs per-impression, the answer is events. Aggregation is an optimization; un-aggregation is a backfill."

      Drill 4 — Push back on a staff engineer's "we don't need Iceberg"

      What the senior says: "You are right that a new format has real cost. I am not proposing Iceberg because it is new — I am proposing it because of a specific pain we have lived through: we lost a day last quarter rebuilding a silver table after a schema change. If we have two more of those this year at eight engineer-days each, Iceberg's migration cost pays back in a year. If you think we will not have those incidents, I should not propose it. Can we look at the last four quarters of schema-change incidents together?"

      Drill 5 — Kafka lag on Black Friday

      What the senior says: "None of the three options in isolation. I route pacing consumption to a pre-provisioned stand-by cluster immediately — a stale pacing signal is the only thing costing money right now. I accept dashboard lag up to 5 minutes silently, page the dashboard owner at 10. I start the broker-cluster scale-up in parallel with a 30-minute mitigation horizon. Dropping data is the last resort and requires an executive decision, because the lost rows may be billable."

      Drill 6 — The conversion 7 days late

      What the senior says: "For which attribution window? Same conversion, four answers. SKAN: out of window, not counted. Paid search 7-day click: counted. MMP 28-day click: counted. Finance billing for a closed month: counted in the current month's revenue, not retroactively. The engineer's job is not to pick — it is to preserve event_time so every consumer applies its own policy."

      Drill 7 — The numbers that don't match

      What the senior says: "Three possibilities, ranked: timezone, refund treatment, attribution-window drift. I reconcile by asking for the exact filters of both sides, not the queries. Both are usually correct for their purpose. The fix is almost always a labeled definition on each dashboard, not a query change."

      Drill 8 — The vendor pitch

      What the senior says: "Shadow mode 60 days. Daily divergence check against our pipeline. Kill criteria written before we sign: if divergence exceeds 2% in any week, or operational cost exceeds in-house, we walk. That reframe usually reveals that sales loves the vendor because of ownership anxiety, not the technology."

      Drill 9 — The identity merge

      What the senior says: "Two-track. Forward-only for attribution; sidecar ledger for historical reconciliation. Billing tables are immutable after monthly close. Identity changes never rewrite published revenue. Anyone proposing retroactive rewrites does not know what quarterly close looks like."

      Drill 10 — The benchmark-resistant peer

      What the senior says: "I don't know whether Postgres hits 500K/sec for our shape until we measure. Three engineer-days for a benchmark beats six months of migration on a hunch. Same schema, same partition key, same class of hardware. I'll have the numbers Friday."

      Drill 11 — Privacy vs fraud

      What the senior says: "Hashed IP with rotating salt, 7-day retention aligned to the fraud team's detection window. Privacy team gets minimization. Fraud team keeps 40% of their signal. The design decision to escalate: is there any case we absolutely need raw IP for, and if so, can we run that case in an enclave with tighter governance?"

      Drill 12 — The 10x scale question

      What the senior says: "At 10x, compaction breaks before Kafka or Spark — our file count per partition goes from 10K to 100K and the rewriter collides with the next day's load. Storage egress costs scale linearly and become the single largest line item. At 100x, the partition scheme itself has to change — hourly plus bucketed sub-partitioning. Kafka and Spark scale with configuration; the scheduling graph and the cloud bill do not."


      5. The Interviewer-Pushback Playbook

      When the interviewer pushes back, they are not saying you are wrong. They are saying "prove you can hold this under pressure." The weak move is to capitulate. The senior move is to restate with more specificity.

      "That's not how we do it here."

      "Fair. What I gave you is from my previous stack. What is different in yours, so I can tell you whether my answer changes?"

      "What if we had 100x the volume?"

      "At 10x, compaction breaks. At 100x, the partition scheme does. But the real decision is whether we plan the migration in advance or emergency-handle it later, because the migration is 3–6 months of work. What is your growth trajectory, so I can tell you whether I am designing for 10x or 100x?"

      "Isn't that over-engineered?"

      "You might be right. Let me name the specific pain I was solving — [pain]. If we don't have that pain, I simplify to [simpler design]. Which side are we on?"

      "Why not just use a vendor?"

      "Buy if three conditions hold: accuracy within 2% of ours, total cost under 18 months of engineering, no existential dependency that would make the vendor a single point of failure. If any fails, we build. I've seen teams default in either direction and regret it."

      "Convince me."

      "Three reasons, ranked. One — [strongest, addresses biggest risk]. Two — [second]. Three — [weakest, named as weakest]. If you kill it, my concern is [specific downside]. If you approve it, my concern is [specific downside]. I'm making a case for one path with eyes on both costs."

      "What would you regret?"

      "The decision I'd regret most is [specific choice]. It was reversible but had ongoing ops cost I may have under-weighted. Escape hatch: keep schema compatible with [alternative] so we can migrate in 6 months if it bites."

      The silence after your answer

      Junior candidates fill silence with more content, often undermining a good answer. Senior candidates wait five seconds and then say: "Did that answer what you asked, or did I miss the actual question?" This reframe preserves the good answer and surfaces what was actually being tested. Interviewers almost always tell you.


      6. What Changes After You Read This

      If you walked into this part thinking the ads interview was a knowledge test, you now know it is not. It is a pressure test where knowledge is the surface layer and judgment is what gets scored. The candidates who pass have rehearsed not concepts but responses — the pause, the question back, the named tradeoff, the admitted regret.

      Three things to do before your next loop:

      1. Record yourself answering the twelve drills with a timer. Play it back. Count the filler words, the moments you said "it depends" without what-it-depends-on, the places you named a tool instead of a tradeoff. Those are the places the loop will die.
      2. Prepare three numbers for every project on your resume. What moved. What stayed flat. What you would still want to change. If you cannot produce those numbers, you shipped a feature, not an outcome — and that is what the interviewer sees.
      3. Practice the pushback responses out loud. Not the concepts, the exact words. The distance between reading these responses silently and saying them with conviction is the distance between a mid-level offer and a staff offer.

      The uncomfortable truth: every section in this part was built from watching candidates I thought were strong lose loops on questions I thought they should pass. They did not lose on knowledge. They lost on form. You now know the form.

      Good luck.

      ↑ Back to top