Overview
Senior Ads DE Interview Prep — Master Index
"The goal isn't to memorize answers. It's to build the mental model that generates them."
A production-grade study companion for senior-level Data Engineering interviews at large consumer ads platforms. Six parts, designed to be read in order but self-contained enough for targeted cramming.
The Stack
This guide is written for a generic large consumer ads platform — the kind that serves 50B+ impressions per day across multiple surfaces (feeds, short-form video, stories, publisher network, commerce). Throughout the guide you'll see references to platform-generic concepts like "the ads platform," "the warehouse," "the streaming spine." Substitute the specifics of your target company's stack as you read:
| Generic term used here | Common real-world equivalents |
|---|---|
| Event streaming spine | Kafka, Kinesis, Pub/Sub |
| Batch orchestrator | Airflow, Dagster, Prefect |
| Data warehouse | BigQuery, Snowflake, Redshift, Iceberg + object storage |
| Distributed SQL engine | Trino, Presto, Spark SQL, Impala |
| Real-time analytics store | Druid, Pinot, ClickHouse |
| Stateful stream processor | Flink, Spark Structured Streaming, Kafka Streams |
| Aggregated metrics platform | Unified metrics layer, LookML, cube layer |
| Conversion pixel | Web tracking pixel, advertiser SDK, server CAPI |
| AI-driven performance campaigns | Auto-optimized campaign products across the industry |
The interview substance — system design, SQL mechanics, growth accounting, behavioral rubrics — transfers cleanly across all of them.
Documents
Part 0: Introduction to Ads Engineering
Foundational understanding: how ads work end-to-end, the data pipeline stack, user information tiers, privacy regimes (GDPR, CCPA, ATT), and the aggregated metrics platform pattern. If ads is new to you, read this first. ~8,500 words.
- The 7-stage ads lifecycle
- The DE pipeline stack (event → bronze → silver → gold → serving)
- The four data tiers and what each is for
- Privacy-by-design principles
- The DE mental model: the three questions you'll answer every sprint
Part 1: Mock Interview + 10 Scenarios + Practice
A full senior-level loop walked end-to-end, followed by deep dives on 10 scenarios you should be prepared to answer cold. ~35,000 words.
- Round 1: System design (unified metrics platform)
- Round 2: SQL deep dive (attribution windows, multidimensional benchmarks)
- Round 3: Python exercise (config-driven SQL generator + data validator)
- Round 4: Streaming vs batch (spend monitoring with reconciliation)
- 10 scenarios: campaign reporting, real-time delivery monitoring, attribution at scale, audience segmentation, creative A/B testing, DQ frameworks, backfills, SCD for campaign entities, incident response, senior technical leadership
Part 2: Advanced Ads + Growth Accounting + Behavioral
Scenarios 11–22: auction data, cross-platform attribution, budget pacing, creative analytics, growth accounting pipelines (DAU, MAU, lifecycle, revenue decomposition), and the behavioral / conflict scenarios that separate senior from mid-level. ~18,000 words.
- Ad auction data engineering (100B+ events/day)
- Cross-platform identity graph and attribution
- Budget pacing and spend forecasting
- Creative-level performance analytics
- Growth accounting: the L28 framework
- STAR+I behavioral framework
- Disagreeing with your manager, managing shared-table conflicts, cross-team incidents
Part 3: Deep Dive SQL
Production-ready SQL with senior commentary. Every query is accompanied by "why this and not that." ~12,000 words.
- Full DAU growth accounting pipeline (incremental history → daily classification → summary → cohort retention → WAU/MAU)
- DAU drop debugging
- Frequency capping analytics
- Advertiser ROAS pipelines
- Audience overlap analysis
- Incrementality (holdout-based lift)
- Ad delivery health monitoring
- Campaign cannibalization detection
- Ad load vs user experience
Part 4: Platform Surfaces + A-Z Glossary
Surface-specific scenarios (short-form video, stories, carousel, AI-driven shopping, dynamic product ads, cross-surface pipelines) plus a comprehensive A-Z of ads terminology. ~16,000 words.
- Short-form video ads engagement pipeline
- Stories ads tap-through funnels
- Carousel ads card-level analytics
- AI-driven automated performance campaigns
- Dynamic product ads (catalog-driven)
- Cross-surface unified pipeline
- A-Z glossary: surfaces, formats, objectives, attribution types, auction mechanics, delivery, measurement, targeting, privacy
Part 5: Visualization & Measurement of Campaign Objectives
How to measure whether a campaign worked, by objective — awareness, traffic, conversions, app installs, lead gen. Scorecards, visualizations, SQL templates. ~9,000 words.
- Campaign scorecard template
- Reach & frequency visualizations
- Conversion funnel and ROAS by attribution window
- App install funnel (install → event → retention)
- Lead quality scoring
- Cross-objective benchmarks
Recommended Study Plan (4 Weeks)
| Week | Days | Focus | Deliverables |
|---|---|---|---|
| 1 | Mon–Tue | Part 0 — foundations | Can you explain the 7-stage lifecycle without notes? Can you draw the pipeline stack? |
| Wed–Thu | Part 3 — SQL deep dive | Write all 5 DAU queries from scratch on a whiteboard. | |
| Fri–Sun | Part 4 glossary | Every A-Z term memorized cold. Make flashcards for the ones you miss. | |
| 2 | Mon–Wed | Part 1 — Round 1 (system design) | Rewrite the AMP architecture from memory three times, different framings each time. |
| Thu–Fri | Part 1 — Rounds 2 & 3 (SQL, Python) | Solve both SQL prompts cold. Code the config-driven validator without looking. | |
| Sat–Sun | Part 1 — Round 4 + scenarios 1–5 | Record yourself answering each scenario out loud. | |
| 3 | Mon–Wed | Part 1 — scenarios 6–10 | Record, review, redo. |
| Thu–Fri | Part 2 scenarios 11–16 (advanced ads) | Can you draw the cross-platform identity graph? | |
| Sat–Sun | Part 2 scenarios 17–22 (growth + behavioral) | Prepare 6 STAR+I behavioral stories, one per loop slot. | |
| 4 | Mon–Tue | Part 4 — platform-specific scenarios | Pick the 3 most relevant to your target company; drill. |
| Wed–Thu | Part 5 — measurement & visualization | Sketch 3 dashboards from scratch. | |
| Fri | Mock loop (with a friend) | Full 5-round simulation. | |
| Sat | Weak-area drill | Fix whatever the mock exposed. | |
| Sun | Rest | Sleep. Eat protein. Show up. |
How to read this guide
- Study mode: linearly, Part 0 → Part 5.
- Cramming mode: Part 0's DE mental model section → Part 1's Round 1 → Part 3's DAU pipeline → Part 4's glossary → Part 1's 10 scenarios.
- Reference mode: jump to the specific scenario you need. Each one is self-contained.
- Behavioral prep: Part 2's last four sections. Read them a week out.
The written style is deliberate: short sentences, hard numbers, actual code. Senior-level interviews reward concreteness. Every query, every ASCII diagram, every decision framework here is something you should be able to reproduce on a whiteboard under pressure.
Good luck.
Introduction to Ads Engineering
Part 0 — Introduction to Ads Engineering
"Before you can build the pipeline, you have to understand what's flowing through it. Ads is not a generic OLAP warehouse — it's a high-throughput, privacy-constrained, financially-regulated, millisecond-sensitive distributed system. Know that, and the rest of the interview follows."
This chapter is the foundation: how ads work end-to-end at a large consumer platform, the data architecture DEs own, the privacy regimes that constrain every decision, and the mental model a senior DE brings to every sprint. If you're new to the ads domain — or if your last ads job was years ago — read this first and read it carefully. Every scenario in Parts 1–4 assumes this vocabulary.
Contents
- How ads work — the end-to-end lifecycle
- The data architecture: from event to insight
- User information, data tiers, and privacy
- The aggregated metrics platform pattern
- The DE mental model: the three questions
- Vocabulary cheat sheet
1. How ads work — the end-to-end lifecycle
1.1 The 60-second version
An advertiser comes to the platform with money and a goal (sell shoes, get app installs, build brand awareness). They create a campaign, define their audience, upload creatives, and set a budget. Every time a user on the platform loads a feed or opens a story, an auction happens: multiple advertisers compete to show their ad to that specific user. The platform picks a winner in milliseconds, shows the ad, and logs the impression. If the user clicks, we log a click. If the user later converts (buys shoes, installs the app), we attribute that conversion back to the ad that drove it. Every one of those events is data a DE pipeline processes, aggregates, and surfaces to advertisers, internal analysts, and ML optimization systems.
At a large platform, the numbers are staggering:
- 50–100B+ impressions per day across all surfaces.
- 5–20 TB/day of raw event data for impressions alone, before conversions, auctions, and derived signals.
- 100k+ concurrent campaigns from millions of advertisers.
- Sub-second p99 latency for auction decisioning; minutes-to-hours for reporting pipelines.
- Cents per impression matters at this scale — a 0.1% measurement error on spend is $50K/day.
The DE's job is to turn that firehose into trustworthy numbers for advertisers, analysts, finance, and ML.
1.2 The full lifecycle (7 stages)
Every arrow above is a system. Every system emits events. The DE owns most of the boxes downstream of the first two.
Stage 1 — Campaign creation
The advertiser logs into the ads UI or calls the Marketing API. They define:
- Campaign: the top-level container with a single objective (e.g.,
conversions,reach,app_installs). - Ad Set (or ad group): targeting + budget + schedule + bidding strategy.
- Ad: the creative + call-to-action that users see.
Entities are written to an OLTP store (think Postgres/MySQL/Spanner). They're replicated into the warehouse via CDC, typically with SCD Type 2 so historical analysis can reconstruct what the targeting looked like on any given day.
Stage 2 — Audience targeting
The advertiser specifies who should see the ad:
- Demographic: age, gender, location.
- Interest: topics the user has engaged with.
- Behavioral: has installed similar apps, has converted on similar products.
- Custom audiences: lists uploaded by the advertiser (hashed emails, hashed phone numbers), matched against platform users.
- Lookalike audiences: statistical extensions of custom audiences.
- Retargeting: users who visited the advertiser's site/app.
The audience engine compiles these into a match set that auction candidates can be filtered against in sub-millisecond time. DE pipelines feed this engine — custom audience matching is an identity-resolution problem with strict privacy constraints, and cohort definitions for lookalikes are offline ML pipelines that DEs often co-own.
Stage 3 — Auction
A user loads a feed. The ads serving stack triggers an auction. In milliseconds:
- Candidate retrieval: given the user's ID, their features, and the context (surface, session), retrieve thousands of eligible ads.
- Prediction: for each candidate, ML models predict
p(click),p(conversion | click),p(retention)— whatever matches the advertiser's objective. - Bid: compute an expected value per impression (for a cost-per-action advertiser:
bid_cpm = CPA_bid × p(conversion) × 1000). - Rank: order candidates by expected total value (advertiser expected value + platform utility terms like user experience).
- Second-price (or VCG) clearing: the winner pays the minimum they'd have needed to bid to win. Encourages truthful bidding; the clearing price ≠ the winning bid.
- Serve: render the ad; log the impression with full context (user, ad, surface, auction_id, bid, price, predictions).
Impressions are the atomic unit of ads data. Everything downstream joins back to them. A single day's impression log at scale is the largest event table in the warehouse.
Stage 4 — Delivery
Budget pacing, frequency capping, and quality controls govern whether to serve an ad even if it won the auction:
- Pacing: spend this ad set's daily budget evenly across the day (linear) or weighted toward high-value hours (accelerated).
- Frequency cap: don't show the same ad to the same user more than N times/week.
- Quality filter: block ads flagged as misleading, violating policy, or poorly rated by users.
Delivery systems are mostly real-time, but they depend on rolling aggregates (spend-so-far, user-impressions-this-week) that DE pipelines compute and publish to low-latency stores.
Stage 5 — Engagement
After serving, the user may engage:
- Main Feed: reactions, comments, shares, video plays.
- Short-form video: watch duration, completes, re-watches, mutes.
- Stories: tap-forward (skip), tap-back (re-watch), reply, sticker tap.
- Carousel: card swipes, per-card dwell.
- In-stream video: view durations, skips, completion.
Each surface has its own engagement event schema. DEs are typically the integration point: turning 15 heterogeneous engagement signals into one fact table per ad per user per session.
Stage 6 — Conversion measurement
A conversion is the advertiser's desired outcome (purchase, install, signup, lead submission) — typically happening off-platform:
- User clicks ad → lands on advertiser's site or app.
- User performs the action (adds to cart, completes checkout).
- Conversion tracking reports the event back:
- Web pixel: JS tag on advertiser site POSTs event to platform.
- Mobile SDK: in-app event forwarded to platform.
- Server-side CAPI: advertiser's backend sends event server-to-server (more reliable post-iOS ATT and cookie restrictions).
- Offline upload: advertiser batch-uploads purchases (e.g. in-store sales with loyalty ID).
- Platform matches the conversion event to the right impression/click using an attribution model (see Part 4 glossary).
The attribution join is the economic heart of ads measurement: it decides which advertiser gets credit for which sale, and therefore how dollars flow. Getting it wrong costs real money.
Stage 7 — Reporting and the optimization loop
The conversion event, now attributed, feeds two systems:
- Advertiser reporting: "your campaign drove 423 conversions for $8.42 CPA yesterday." Dashboards, APIs, automated rules.
- Platform optimization: the same event becomes training data for the next auction's
p(conversion)model. Good data engineering closes this loop cleanly — garbage here means worse auctions and worse advertiser outcomes.
DEs own both paths. The first is the traditional ads reporting stack. The second is the ML training data pipeline — which has its own consistency, latency, and backfill requirements that differ from reporting.
2. The data architecture: from event to insight
2.1 The pipeline stack
Every event in the lifecycle above lands in a pipeline stack roughly shaped like this:
Typical tools per layer (company-dependent):
- Streaming spine: Kafka, Kinesis, Pub/Sub.
- Stream processor: Flink, Spark Structured Streaming, Kafka Streams.
- Lake storage: S3/GCS/ADLS with Iceberg/Delta/Hudi.
- Batch compute: Spark (most shops), Trino/Presto for ad-hoc, Dremio.
- Orchestrator: Airflow, Dagster, Prefect, cloud-native workflows.
- Real-time OLAP: Druid, Pinot, ClickHouse.
- Warehouse for gold: Snowflake, BigQuery, Redshift — or the lakehouse itself.
2.2 The key fact tables
Every ads DE ends up owning or touching these. Memorize the grain:
| Table | Grain (one row =) | Daily volume | Typical partitioning | Primary use |
|---|---|---|---|---|
fact_impression |
one ad served to one user | 50–100B | event_date, hour |
auctions, reach, delivery |
fact_click |
one click on an ad | 1–5B | event_date, hour |
CTR, traffic quality |
fact_engagement |
one engagement (like, share, watch_complete) | 5–20B | event_date, surface |
creative quality |
fact_auction |
one auction decision (winner + losers) | 500B–1T | event_date, sampled |
auction health, market dynamics |
fact_conversion |
one advertiser-reported outcome | 100M–1B | event_date, attribution window |
ROAS, billing |
fact_spend |
one billing event | 100M | billing_date, campaign |
revenue, invoicing |
Plus dimension tables:
dim_campaign— campaign, ad set, ad (SCD2)dim_advertiser— account, agency, billingdim_creative— asset, format, variantdim_user— pseudonymous user + tier-controlled attributesdim_audience— saved audiences, lookalikesdim_surface— placement (main_feed, short_form_video, etc.)
2.3 The three flows through this stack
At a senior level you should be able to narrate three very different paths data takes:
Flow A: advertiser dashboard "how did my campaign do yesterday?" Impression → stream → bronze → silver (joined with conversions within attribution window) → gold (daily roll-up) → BI. Latency: typically T+3h to T+24h depending on attribution window.
Flow B: real-time delivery "am I pacing my budget?" Impression → stream → stream processor (sliding window over last 10min) → real-time OLAP store → serving. Latency: seconds. Does NOT use the warehouse path.
Flow C: ML training "what's next week's click prediction model?" Impression + click + conversion → silver → feature store → training pipeline → model artifact. Latency: daily. The join between impression and conversion has a strict waiting window (e.g., 7 days) before a training example is considered final.
Three flows. Three latency budgets. Three correctness definitions. The senior DE recognizes which flow a new requirement belongs to before picking a tool.
2.4 Why lambda/kappa vocabulary is alive in ads
Ads is one of the last bastions where "real-time is approximate, batch is authoritative" is still the right answer. You'll hear variants of this:
- Real-time dashboards show spend-so-far within 30 seconds, but they're allowed to drift 1–2% because exactly-once through a 15-hop pipeline is hard.
- The T+3h batch re-computes spend from bronze logs with the full attribution join — this is what the advertiser is billed for.
- Reconciliation: for every campaign, real-time spend ≈ batch spend within ε. When it isn't, an alert fires and a DE investigates.
If someone asks "why don't you make the real-time output the one you bill from?" — that's the whole Part 1 Round 4 conversation. The answer is "correctness guarantees at this scale cost 10–100× more latency, for no measurable advertiser benefit."
3. User information, data tiers, and privacy
3.1 Why user data matters for ads
Ads effectiveness is a function of relevance; relevance requires user data. The platform's economic incentive is to use more data to serve more relevant ads, charge higher CPMs, and earn more. The user's interest is in seeing content that's useful (some relevance is welcome) without being surveilled (too much is creepy, and increasingly illegal). The DE works in the space between.
Four knobs govern the balance:
- What data is collected.
- What it's joined to.
- Who can query it.
- How long it's retained.
All four are constrained by regulation, by platform policy, and by the principle of least privilege. Every DE decision — especially partition keys, join keys, and access grants — is a privacy decision whether you realize it or not.
3.2 The data hierarchy (tiers)
Most large ads platforms adopt some variant of a tiered data model:
| Tier | Examples | Access pattern | Retention |
|---|---|---|---|
| T0 — Raw PII | Email, phone, IP, device ID, precise location | Only identity service; not joinable in analytics warehouse | Encrypted at rest; minimum retention (often 90–365 days) |
| T1 — Hashed / resolved | user_id_hash, device_id_hash, IP /24 |
Accessible to identity-resolution and fraud pipelines | 90–730 days |
| T2 — Derived attributes | Age bucket, country, inferred interests, engagement scores | Accessible to targeting, modeling, reporting | 730+ days |
| T3 — Aggregated | Daily country-level impressions, weekly interest-cohort CTR | Broadly accessible | Multi-year |
| T4 — Synthesized / differentially private | DP-protected exports, k-anonymized audiences | Sharable externally under contract | Perpetual |
The senior DE thinks in tiers. Joining T0 data in a gold-layer table is a P0 privacy incident. Most gold tables live at T2 or T3.
3.3 The privacy regulatory landscape
A partial list you should be able to name and outline:
- GDPR (EU, 2018) — lawful basis required; right of erasure; data portability; data protection impact assessments; DPO role; up to 4% global revenue fines.
- CCPA / CPRA (California, 2020/2023) — right to know, delete, opt-out of sale/sharing; sensitive personal information category; private right of action for breaches.
- LGPD (Brazil, 2020) — GDPR-like regime.
- PIPL (China, 2021) — consent-heavy; cross-border transfer restrictions.
- DMA / DSA (EU, 2024) — interoperability + transparency obligations for large platforms.
- iOS ATT (App Tracking Transparency) (Apple, 2021) — user must opt-in for cross-app tracking via IDFA; opt-in rates 15–30%; massive effect on mobile attribution.
- SKAdNetwork / Privacy Sandbox (Apple, Google) — platform-enforced attribution windows using aggregated, DP-protected signals instead of user-level joins.
- Children's privacy: COPPA (US), age-appropriate design codes in EU/UK — users under 13 (or 16 in some regimes) can't be profiled for ads.
These aren't abstract policy. Every one of them changes what a DE can put in a join key. A pipeline that works today can become illegal next year when a new regime kicks in. The senior DE builds pipelines that can be re-keyed and re-aggregated without rewriting — because they will be.
3.4 iOS ATT impact (a concrete case)
Pre-ATT (2021 and earlier): mobile advertisers measured conversions via the device advertising identifier (IDFA), which was unique per device and queryable. Post-ATT: ~70–85% of iOS users don't grant tracking permission. IDFA is blanked for them. Consequences:
- Attribution accuracy drops by 30–50% on iOS for non-consented traffic.
- Platform-specific replacements: Apple's SKAdNetwork returns aggregated, delayed, noisy postbacks.
- Server-side CAPI adoption: advertisers move conversion events off the web pixel (cookie-less) and onto server-to-server APIs with first-party identifiers.
- Deterministic → probabilistic modeling: platforms fill the gap with modeled conversions trained on consented traffic.
The DE consequence is that a single "conversions" metric is now a sum of: deterministic-matched (from CAPI), probabilistic-modeled (from ML), platform-aggregated (from SKAN). Three very different lineages feeding one number. Reporting that hides the decomposition hides the noise. Senior DEs surface it.
3.5 Privacy-by-design principles (4 rules to memorize)
1. Data minimization — collect only what you need for the stated purpose. If a report doesn't need device ID, the pipeline shouldn't carry it. "We might want it later" is not a stated purpose.
2. Purpose limitation — data collected for one purpose can't be silently repurposed. A pipeline that reads T0 data for fraud detection can't have its output joined into an advertiser-facing dashboard.
3. Retention limits — every data asset has an expiration. Raw PII: weeks. Resolved identifiers: months. Aggregates: years. Automation enforces deletion; manual processes don't scale.
4. Access control by role + purpose — not everyone on the DE team needs access to every column. Sensitive columns gated by access policy; every query audited.
Violations of these principles are how companies end up in the news. The DE is often the last line of defense.
4. The aggregated metrics platform pattern
4.1 The problem
At scale, every team wants their own gold-layer table: "daily campaign performance," "weekly advertiser ROAS," "creative-level engagement by surface." Left alone, you end up with:
- 200 gold tables, 80% overlapping.
- Three teams computing "spend" three different ways (with slightly different values).
- Dashboard numbers don't match across products.
- The DE team spends 40% of its time reconciling.
4.2 The pattern
A unified metrics platform (UMP) — also called an aggregated metrics platform, a semantic layer, or a metrics hub — centralizes metric definitions so "spend" is computed exactly once from exactly one source and everyone reads that.
Architecture, roughly:
spend: source: silver.fact_impression expr: SUM(charge_cents)/100.0 dimensions: [campaign_id, ad_set_id, surface, country] privacy_tier: T3 owners: [ads-billing-de]
All three compute paths read the same metric definitions from the registry. The registry enforces:
- Definition — what is spend? (source table, formula, grain).
- Allowed dimensions — what can you slice by? (some dims are T3 only).
- Privacy tier — can this metric be joined with T0 columns? (no).
- Ownership — who fixes it when it breaks? (paged on-call).
- Backfill strategy — how do we recompute history safely?
4.3 The privacy–utility spectrum
The AMP enforces a single knob: "how much detail in the cut." Finer cuts (e.g., spend by age-bucket × country × hour × campaign_id) are more useful to analysts but potentially re-identifiable. Coarser cuts (e.g., spend by country × day) are safe but less useful.
The AMP can enforce this via:
- k-anonymity thresholds: suppress cells with fewer than k users.
- Differential privacy: inject calibrated noise into every released number.
- Cell suppression: hide cuts that fall below privacy thresholds; show "<k users" instead.
- Per-query privacy budget: limit how many queries a single consumer can run before noise compounds.
At senior level, you should know whether your target company uses k-anonymity, DP, or pure access control. The architecture choice has implications for query semantics, which affects DE designs.
4.4 What a DE builds here
- The metric registry: usually YAML/JSON checked into a repo, with CI validating that every metric compiles against the source schema.
- The compiler: reads the registry and emits SQL for batch pipelines, streaming DSL for Flink, and a metadata layer for ad-hoc tools.
- The backfill engine: when a metric definition changes, re-runs the derivation for history with idempotent writes.
- The observability layer: for every metric, track freshness, row count, value range — alert on drift.
Part 1's Round 1 walks through designing this exact system end-to-end.
5. The DE mental model: the three questions
Every senior DE working in ads answers three questions every time they touch a pipeline:
Question 1 — Whose truth is this?
For every number, name:
- The authoritative source (the bronze log or SOR — system of record — that the number ultimately traces to).
- The attribution rules applied to the source (which conversions credit which impressions? under what window?).
- The reconciliation target (what must this number match? advertiser billing? finance? ML training data?).
If you can't answer all three in two sentences, the number isn't ready to ship. At senior level, "whose truth is this" is the first question in every design review.
Question 2 — What latency does this need?
Three buckets:
- Real-time (seconds): delivery monitoring, fraud, pacing, basic dashboards for "spending now." Approximate is acceptable.
- Near-real-time (minutes): advertiser "yesterday's performance" appearing during the day as the attribution window closes. Mostly accurate.
- Batch-authoritative (hours to T+1): billing, finance, ML training. Exact.
Pick one, then pick the tech. Never build one pipeline aiming to serve all three — you'll get a Frankenstein that's slow where it needs to be fast and approximate where it needs to be exact.
Question 3 — How does this scale, and fail?
- Happy-path scale: 2× growth in a year? 10× for a big launch? What breaks first?
- Failure modes: upstream event spike (campaign with a viral creative), upstream event drop (ingest outage), corrupt event (schema bug), late-arriving conversion (server CAPI lag), duplicate event (pixel fires twice).
- Blast radius: if this pipeline breaks, what depends on it? Just one dashboard? Billing? ML training? Every downstream system?
A senior DE's design doc has a failure-modes section with specific mitigations for each. "We'll monitor it" is not a mitigation.
5.1 The ads data engineering triangle
- billable
- reconcilable
- auditable
- real-time
- near-real-time
- batch
- storage
- compute
- engineering time
- operational burden
You get two. Every ads pipeline is a defensible position on this triangle. Senior-level interviews test whether you can articulate which two you picked and why, and whether you can recognize when a stakeholder is asking for all three (and coach them back to two).
6. Vocabulary cheat sheet
| Term | Meaning (short) |
|---|---|
| Impression | One ad shown to one user, one time. |
| Reach | Unique users exposed to an ad over a period. |
| Frequency | Impressions per reached user. |
| CPM | Cost per thousand impressions. |
| CPC | Cost per click. |
| CPA | Cost per action (conversion). |
| CTR | Click-through rate = clicks / impressions. |
| CVR | Conversion rate = conversions / clicks (or / impressions for view-through). |
| ROAS | Return on ad spend = revenue attributed / spend. |
| Bid | Max the advertiser is willing to pay for the desired outcome. |
| Clearing price | Actual price charged (second-price or VCG minimum). |
| Pacing | Rate at which a budget is spent through the day/week. |
| Frequency cap | Max impressions per user per period. |
| Attribution window | Lookback from conversion to the qualifying impression/click (e.g., 7-day click, 1-day view). |
| Click-through attribution | Credit to the ad the user clicked. |
| View-through attribution | Credit to the ad the user saw (but didn't click) if they later convert. |
| Multi-touch attribution | Credit distributed across touchpoints (linear, position-based, Shapley). |
| Pixel / SDK / CAPI | Three ways advertisers report conversions (web tag / mobile SDK / server API). |
| Surface | Where the ad appears (main feed, short-form video, stories, etc.). |
| Placement | Finer-grained location within a surface. |
| Creative | The actual asset (image/video/text) shown. |
| Ad set / ad group | Targeting + budget unit within a campaign. |
| Objective | The advertiser's goal (conversions, reach, app installs, etc.). |
| Custom audience | Advertiser-uploaded list matched to platform users. |
| Lookalike | ML-expanded audience based on a seed. |
| Retargeting | Ads served to users who previously visited the advertiser. |
| Incrementality | True causal lift (vs. naive attribution). |
| Holdout | Randomly withheld group for incrementality measurement. |
| Backfill | Recomputing historical data with updated logic or corrections. |
| SCD2 | Slowly changing dimension type 2 — preserves history of entity changes. |
| PII | Personally identifiable information. |
| T0–T4 | Data tiers from rawest (PII) to safest (DP aggregates). |
| ATT | Apple's App Tracking Transparency (iOS opt-in requirement). |
| SKAN / SKAdNetwork | Apple's privacy-preserving attribution for iOS. |
| AMP / UMP | Aggregated / Unified Metrics Platform — the semantic layer pattern. |
| Bronze / Silver / Gold | Medallion architecture layers (raw / cleaned / business). |
| Airflow / Dagster / Prefect | Common batch orchestrators. |
| Flink / Structured Streaming / Kafka Streams | Common stateful stream processors. |
| Trino / Presto | Distributed SQL query engines. |
| Druid / Pinot / ClickHouse | Common real-time OLAP stores. |
Closing
That's the foundation. The rest of this guide assumes you can:
- Draw the 7-stage lifecycle.
- Explain the three flows through the pipeline stack.
- Reason about privacy tiers when designing a new join.
- Identify which of the three questions a new requirement is really asking.
- Translate between the generic vocabulary here and whatever your target company calls their equivalent systems.
If any of that feels shaky, come back to this chapter before moving on. Part 1 walks into a senior interview loop with exactly these assumptions.
Mock Interview + 10 Scenarios
Part 1: Mock Interview + 10 Scenarios + Practice
"Senior interviews aren't about having the answer. They're about having the judgment to pick among three plausible answers and defend the choice."
How to use this part
A full senior-level loop at a large consumer ads platform is typically 5 rounds: system design, SQL, Python / coding, streaming-vs-batch, and behavioral. This part walks the first four as if you were in the room, then runs 10 additional scenarios you should be prepared to answer cold.
Read each round once end-to-end. Then close the page and try to reproduce the architecture, the SQL, or the code from scratch. You'll find the gaps quickly.
The Loop (5 rounds, what each is testing)
| Round | Format | What's actually being measured |
|---|---|---|
| 1 — System Design | Whiteboard / doc, 60 min | Scale thinking, trade-off fluency, pipeline taste |
| 2 — SQL Deep Dive | Shared editor, 45 min | Window functions, join shapes, anti-patterns |
| 3 — Python / Coding | Shared editor, 60 min | Design sense, extensibility, tests |
| 4 — Streaming vs Batch | Whiteboard, 45 min | Reconciliation, semantics, cost modeling |
| 5 — Behavioral | Video, 45 min | Leadership, conflict handling, STAR+I storytelling |
Behavioral is covered in Part 2 (scenarios 19–22). This part handles rounds 1–4 plus the technical scenario bank.
Round 1: System Design — Unified Metrics Platform
The prompt
"Design a Unified Metrics Platform (UMP) that serves every ad metric — impressions, clicks, spend, conversions, ROAS — to every downstream consumer: advertiser dashboards, internal analysts, the pacing system, the ML feature store, the exec scorecard. Treat it as greenfield. Scale: 50B impressions/day, 500K advertisers, 2M analysts and engineers running queries."
Clarifying questions you should ask (first 3 minutes)
- Freshness requirements. Advertiser dashboards: minutes. Pacing: seconds. Analysts: hourly to daily. Exec scorecards: daily.
- Truth model. Is there one canonical number per metric, or do different consumers get different numbers? (Answer: one canonical, but with explicit latency tiers.)
- Attribution stance. Does the platform decide attribution, or does each consumer pick? (Answer: platform decides; consumers pick the window.)
- Privacy tier. Do advertisers see user-level events? (No — only aggregates above a minimum cohort size.)
- Cost budget. Warehouse spend cap, hot-store footprint.
If you skip these, you're designing blind. A strong opening is 30 seconds of clarifying before you touch the whiteboard.
Back-of-envelope sizing
- Impressions: 50B/day ≈ 580K/sec peak; 2× for clicks + conversions + auction events on top.
- Wide row size at bronze: ~1.5 KB raw. 50B × 1.5 KB = 75 TB/day just for impressions.
- Storage over 2 years retention: 75 TB × 730 × 3 (fanout to silver + gold) ≈ 165 PB.
- QPS on the serving tier (advertiser dashboards): ~500K advertisers × ~2 queries / min active window ≈ 20K QPS peak; 99th percentile should be <200 ms.
- Streaming lag target: <60 sec end-to-end on the pacing path; <5 min on the advertiser path.
Write these numbers on the board before drawing anything. The interviewer wants to see the math.
Architecture
Schema (facts + dims)
-- Bronze: one row per event, immutable
CREATE TABLE bronze.events (
event_ts TIMESTAMP,
event_id BINARY(16),
event_name STRING, -- impression, click, conversion, bid_win
user_id BIGINT,
device_id STRING,
session_id STRING,
ad_id BIGINT,
campaign_id BIGINT,
advertiser_id BIGINT,
surface_id INT,
placement_id INT,
geo_country STRING,
device_os STRING,
properties MAP<STRING,STRING> -- catch-all
)
PARTITIONED BY (dt STRING, event_name STRING);
-- Silver: fact_impression at its natural grain
CREATE TABLE silver.fact_impression (
dt DATE,
impression_ts TIMESTAMP,
impression_id BINARY(16),
user_id BIGINT,
ad_id BIGINT,
campaign_id BIGINT,
advertiser_id BIGINT,
surface_id INT,
placement_id INT,
geo_country STRING,
device_os STRING,
clearing_cpm DECIMAL(12,6),
spend_cents BIGINT,
is_valid BOOLEAN, -- after invalid-traffic filter
experiment_ids ARRAY<BIGINT>
)
PARTITIONED BY (dt);
-- Gold: aggregated at common consumer grains
CREATE TABLE gold.metric_campaign_daily (
dt DATE,
campaign_id BIGINT,
advertiser_id BIGINT,
impressions BIGINT,
reach BIGINT,
clicks BIGINT,
ctr DOUBLE,
spend_usd DECIMAL(18,4),
conversions_click_7d BIGINT,
conversions_view_1d BIGINT,
conversion_value_usd DECIMAL(18,4),
roas DOUBLE
)
PARTITIONED BY (dt);
-- Dim: campaigns as Type-2 SCD
CREATE TABLE dim.campaign (
campaign_id BIGINT,
advertiser_id BIGINT,
name STRING,
objective STRING,
bid_strategy STRING,
daily_budget DECIMAL(18,4),
valid_from TIMESTAMP,
valid_to TIMESTAMP, -- NULL for current version
is_current BOOLEAN
)
PARTITIONED BY (is_current);
The aggregation pipeline
Cadences:
- Streaming: 1-minute tumbling windows, watermark = 2 min.
- Hourly batch: kicks at hour+5, processes impressions from the prior hour. Dedup on impression_id. Joins dim.campaign as of event time.
- Daily rollup: kicks at 02:00 local time of the advertiser's reporting zone. Produces gold tables. This is the number the advertiser sees.
- Reconciliation: a 7-day trailing rerun adjusts for late conversions (a conversion today might attribute to a click 6 days ago).
Serving layer
- Advertiser dashboards: read from gold tables through a caching proxy with 2-minute TTL. P99 target <200 ms.
- Pacing gate: reads directly from the Redis counter (updated by the streaming path).
- Analyst queries: Trino/Presto against silver + gold. Optional materialized views for top 50 query shapes.
- Exec scorecard: reads gold, locked at end-of-day; changes to prior days require a signed-off backfill.
Failure modes you should name
- Late-arriving conversions. A click today might convert 7 days later. Batch must re-open windows for 7 days. Streaming shows a preview that will shift.
- Late data quality (DQ) discoveries. You find a bug in a filter 3 days after the fact. You must recompute; the advertiser already saw the wrong number.
- Idempotency. Retries can't double-count. Use
event_idas the natural primary key; dedupe in silver. - Clock skew. Client timestamps lie. Use server-receive timestamp for billing-critical metrics; keep client timestamp for UX analysis.
- Hot-key skew. A single huge advertiser can blow up a shuffle. Salt the campaign_id on the shuffle key.
Trade-offs to call out
| Dimension | Cheap / fast | Expensive / correct |
|---|---|---|
| Freshness | 24-hour batch only | Streaming + hourly batch + daily locks |
| Truth model | Each team has own pipeline | Unified platform owns definitions |
| Schema evolution | Add columns freely | SCD-2 dims + contract tests |
| Late data | Drop events >1h late | Re-open windows 7 days |
| Cost | Raw access, no materializations | Pre-aggregate top shapes |
Weak vs strong answer
A weak answer draws a single Airflow DAG, mumbles about "the warehouse," and can't say what's streamed vs batched. A mid answer gets the medallion structure right but handwaves the reconciliation story. A strong answer names late-arriving conversions as a first-class concern, quantifies freshness by consumer, and has a clear stance on where truth lives ("batch wins arguments; streaming gives you a preview"). Bonus points: mention the metric registry as the place definitions are versioned, and describe how a change flows from registry to all consumers.
Round 2: SQL Deep Dive — Attribution + Benchmarking
Prompt 1: Multi-touch click attribution
"Given raw
clickandconversionevents, attribute each conversion to the most recent click from the same user within a 7-day window. One row per conversion. Include the campaign and creative of the attributed click."
WITH c AS (
SELECT user_id, conversion_id, conversion_ts, conversion_value_usd
FROM events
WHERE event_name = 'conversion'
AND conversion_ts >= CURRENT_DATE - INTERVAL '30' DAY
),
k AS (
SELECT user_id, click_id, click_ts, campaign_id, ad_id, creative_id
FROM events
WHERE event_name = 'click'
AND click_ts >= CURRENT_DATE - INTERVAL '37' DAY -- wider window for joining
),
joined AS (
SELECT
c.conversion_id,
c.user_id,
c.conversion_ts,
c.conversion_value_usd,
k.click_id,
k.click_ts,
k.campaign_id,
k.ad_id,
k.creative_id,
ROW_NUMBER() OVER (
PARTITION BY c.conversion_id
ORDER BY k.click_ts DESC
) AS rn
FROM c
LEFT JOIN k
ON c.user_id = k.user_id
AND k.click_ts <= c.conversion_ts
AND k.click_ts >= c.conversion_ts - INTERVAL '7' DAY
)
SELECT
conversion_id,
user_id,
conversion_ts,
conversion_value_usd,
click_id,
click_ts,
campaign_id,
ad_id,
creative_id,
DATEDIFF(conversion_ts, click_ts) AS days_from_click
FROM joined
WHERE rn = 1
OR rn IS NULL; -- keep conversions with no matching click (organic)
Why this and not that. The ROW_NUMBER() approach is stable with ties (two clicks in the same millisecond) because you can add a secondary order key if needed. Alternatives: QUALIFY (cleaner in Snowflake), correlated subqueries (O(n²) on large tables), self-joins with NOT EXISTS (hard to read). The LEFT JOIN preserves unattributed conversions, which you almost always want for a "share of organic" metric.
Prompt 2: Benchmark creatives against their account's rolling mean
"For each creative × day, compute CTR. Rank it against the trailing 28-day mean of CTRs across all creatives of the same advertiser. Output the top and bottom 5% by z-score per advertiser per day."
WITH daily AS (
SELECT
dt,
creative_id,
advertiser_id,
SUM(impressions) AS imps,
SUM(clicks) AS clicks,
CAST(SUM(clicks) AS DOUBLE) / NULLIF(SUM(impressions), 0) AS ctr
FROM fact_creative_perf_daily
WHERE dt BETWEEN CURRENT_DATE - 60 AND CURRENT_DATE - 1
GROUP BY 1, 2, 3
),
with_baseline AS (
SELECT
dt, creative_id, advertiser_id, imps, clicks, ctr,
AVG(ctr) OVER (
PARTITION BY advertiser_id
ORDER BY dt
ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING
) AS baseline_mean,
STDDEV_POP(ctr) OVER (
PARTITION BY advertiser_id
ORDER BY dt
ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING
) AS baseline_sd
FROM daily
),
ranked AS (
SELECT
*,
(ctr - baseline_mean) / NULLIF(baseline_sd, 0) AS z_score,
PERCENT_RANK() OVER (
PARTITION BY advertiser_id, dt
ORDER BY (ctr - baseline_mean) / NULLIF(baseline_sd, 0)
) AS pct_rank
FROM with_baseline
WHERE imps >= 1000 -- minimum volume gate
AND baseline_sd IS NOT NULL
)
SELECT *
FROM ranked
WHERE pct_rank <= 0.05 OR pct_rank >= 0.95
ORDER BY advertiser_id, dt, z_score;
Why this and not that. Window functions over a 28-day trailing range expressed in rows need the input to be dense (every day present per creative). If your table has gaps, switch to RANGE BETWEEN INTERVAL '28' DAY PRECEDING. The volume gate (imps >= 1000) is critical — without it, a creative with 10 impressions and 1 click has a meaningless CTR that dominates the z-score. Anti-patterns to avoid: computing mean via subquery per row (quadratic), using RANK() without the volume gate, forgetting NULLIF on the stddev (zero-variance advertisers blow up).
Anti-patterns
SELECT *on fact tables. Always project columns.DISTINCTas a crutch for broken joins. If you needDISTINCT, your join is wrong.- String-concatenated keys (
user_id || '-' || ad_id). Compose joins on typed columns; build indexes on the natural keys. DATEDIFF(day, ...)when you meantDATEDIFF('hour', ...). Spot-check units.
Round 3: Python Exercise — Config-Driven SQL Generator + Validator
The prompt
"Build a library that takes a YAML metric definition and generates the SQL to compute it. Also include a pluggable data-quality framework that runs checks on the output table."
Module layout
- row_count
- null_rate
- freshness
- row_count
- null_rate
- freshness
- row_count
- null_rate
- freshness
The metric YAML
# metrics/defs/campaign_daily.yml
name: metric_campaign_daily
description: Per-campaign daily aggregates
grain: [dt, campaign_id, advertiser_id]
source: silver.fact_impression
time_column: dt
measures:
- name: impressions
expr: COUNT(*)
- name: reach
expr: COUNT(DISTINCT user_id)
- name: spend_usd
expr: SUM(spend_cents) / 100.0
filters:
- expr: is_valid = TRUE
dq_checks:
- check: row_count_min
params: { min: 100000 }
- check: null_rate_max
params: { column: campaign_id, max: 0.0 }
- check: freshness_max_lag_hours
params: { column: dt, max: 30 }
The generator
"""generator.py — turn a MetricDef into SQL."""
from __future__ import annotations
from dataclasses import dataclass
from typing import Sequence
import yaml
@dataclass(frozen=True)
class Measure:
name: str
expr: str
@dataclass(frozen=True)
class MetricDef:
name: str
source: str
grain: Sequence[str]
measures: Sequence[Measure]
filters: Sequence[str] = ()
time_column: str = "dt"
def load_metric(path: str) -> MetricDef:
"""Parse a metric YAML into a MetricDef."""
with open(path) as f:
raw = yaml.safe_load(f)
measures = [Measure(m["name"], m["expr"]) for m in raw["measures"]]
filters = tuple(f["expr"] for f in raw.get("filters", []))
return MetricDef(
name=raw["name"],
source=raw["source"],
grain=tuple(raw["grain"]),
measures=tuple(measures),
filters=filters,
time_column=raw.get("time_column", "dt"),
)
def compile_sql(m: MetricDef, dt_from: str, dt_to: str) -> str:
"""Compile a MetricDef into a partitioned-insert SQL block."""
select_cols = list(m.grain) + [f"{msr.expr} AS {msr.name}" for msr in m.measures]
where = [f"{m.time_column} BETWEEN DATE '{dt_from}' AND DATE '{dt_to}'"]
where.extend(m.filters)
group_by = ", ".join(str(i + 1) for i in range(len(m.grain)))
return (
f"INSERT OVERWRITE TABLE gold.{m.name}\n"
f"SELECT\n " + ",\n ".join(select_cols) + "\n"
f"FROM {m.source}\n"
f"WHERE " + "\n AND ".join(where) + "\n"
f"GROUP BY {group_by};"
)
The DQ framework
"""dq/base.py — check base class + runner."""
from __future__ import annotations
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any, Mapping
@dataclass
class CheckResult:
name: str
passed: bool
observed: Any
threshold: Any
message: str = ""
class Check(ABC):
"""A data-quality check. Subclasses implement run()."""
def __init__(self, params: Mapping[str, Any]):
self.params = params
@abstractmethod
def run(self, conn, table: str) -> CheckResult:
raise NotImplementedError
class RowCountMin(Check):
"""Fail if the table has fewer than `min` rows for today's partition."""
def run(self, conn, table):
n = conn.scalar(f"SELECT COUNT(*) FROM {table} WHERE dt = CURRENT_DATE")
ok = n >= self.params["min"]
return CheckResult(
name="row_count_min",
passed=ok,
observed=n,
threshold=self.params["min"],
message="" if ok else f"row count {n} < {self.params['min']}",
)
class NullRateMax(Check):
"""Fail if fraction of NULLs in `column` exceeds `max`."""
def run(self, conn, table):
col = self.params["column"]
q = (
f"SELECT SUM(CASE WHEN {col} IS NULL THEN 1 ELSE 0 END) * 1.0 / "
f"NULLIF(COUNT(*), 0) FROM {table} WHERE dt = CURRENT_DATE"
)
rate = conn.scalar(q) or 0.0
ok = rate <= self.params["max"]
return CheckResult("null_rate_max", ok, rate, self.params["max"])
class FreshnessMaxLagHours(Check):
"""Fail if newest row is older than `max` hours."""
def run(self, conn, table):
col = self.params["column"]
q = f"SELECT EXTRACT(EPOCH FROM (NOW() - MAX({col}))) / 3600 FROM {table}"
lag = conn.scalar(q)
ok = lag <= self.params["max"]
return CheckResult("freshness_max_lag_hours", ok, lag, self.params["max"])
CHECK_REGISTRY = {
"row_count_min": RowCountMin,
"null_rate_max": NullRateMax,
"freshness_max_lag_hours": FreshnessMaxLagHours,
}
def run_checks(conn, table, dq_cfg) -> list[CheckResult]:
results = []
for cfg in dq_cfg:
cls = CHECK_REGISTRY[cfg["check"]]
results.append(cls(cfg.get("params", {})).run(conn, table))
return results
Tests
# tests/test_generator.py
from metrics.generator import MetricDef, Measure, compile_sql
def test_compile_sql_basic():
m = MetricDef(
name="m_test",
source="silver.fact_x",
grain=("dt", "campaign_id"),
measures=(Measure("imps", "COUNT(*)"),),
filters=("is_valid = TRUE",),
)
sql = compile_sql(m, "2026-04-01", "2026-04-01")
assert "INSERT OVERWRITE TABLE gold.m_test" in sql
assert "COUNT(*) AS imps" in sql
assert "is_valid = TRUE" in sql
assert "GROUP BY 1, 2" in sql
What the interviewer is looking for
Not "does it work." They want: explicit types, a registry pattern for extensibility, separation of concerns (parse, compile, run), tests that document behavior, and a DQ framework that cleanly accepts new check types. The ability to talk through "how would I add a new check?" in 30 seconds is the differentiator.
Round 4: Streaming vs Batch — Spend Monitoring with Reconciliation
The prompt
"An advertiser sets a $10,000 daily budget. The platform must stop delivery at 95% of cap in under 60 seconds. At end of day, recompute actual spend from billing-truth data and reconcile. Design the full pipeline; discuss failure modes."
The two-loop design (again, because it's core)
- Fast loop (seconds) Slow loop (hours)
- impression events stream processor billing pipeline (batch)
- spend_counter_kv fact_spend_billed_hourly
- pacing_gate
- reconciliation job
- (hourly) corrects KV
Fast loop is at-least-once with idempotent writes. Slow loop is exactly-once with immutable outputs.
Streaming pseudocode
# Flink-style stream topology (conceptual)
events = (
kafka_source(topic="impressions", group="pacing")
.assign_watermark(max_out_of_order=timedelta(seconds=30))
.filter(lambda e: e.event_name == "impression")
)
# Dedup on impression_id (bounded state: 24-hour TTL)
deduped = events.key_by("impression_id").process(DedupByKey(ttl=timedelta(hours=24)))
# Windowed spend aggregation (1-min tumbling, 15-min slide for allowed-lateness)
windowed = (
deduped
.key_by("campaign_id")
.window(Tumbling(duration=timedelta(minutes=1)))
.allowed_lateness(timedelta(minutes=15))
.aggregate(SumSpendCents())
)
# Publish to the pacing KV
windowed.sink_to(RedisSink(key_fn=lambda c, w: f"spend:{c.campaign_id}:{w.date}"))
Watermarking
- Event time is the client-reported or server-received timestamp.
- Watermark is the minimum event-time the processor has observed, minus
max_out_of_order. A window fires when watermark passes window end. - Late events (past watermark) are routed to a side-output, written to a reconciliation topic, and absorbed on the next hourly batch.
Batch reconciliation SQL
-- Hourly job: compare ground-truth billed spend to streaming counter
WITH billed AS (
SELECT
campaign_id,
DATE(impression_ts) AS dt,
DATE_TRUNC('hour', impression_ts) AS hr,
SUM(spend_cents) AS billed_cents
FROM silver.fact_impression_billable
WHERE dt = CURRENT_DATE
AND impression_ts >= CURRENT_TIMESTAMP - INTERVAL '2' HOUR
GROUP BY 1, 2, 3
),
counter_snapshot AS (
SELECT campaign_id, dt, hr, counter_cents
FROM pacing_counter_snapshot
WHERE dt = CURRENT_DATE
AND hr = DATE_TRUNC('hour', CURRENT_TIMESTAMP - INTERVAL '1' HOUR)
)
SELECT
b.campaign_id, b.dt, b.hr,
b.billed_cents,
COALESCE(c.counter_cents, 0) AS counter_cents,
b.billed_cents - COALESCE(c.counter_cents, 0) AS delta_cents
FROM billed b
LEFT JOIN counter_snapshot c
USING (campaign_id, dt, hr)
WHERE ABS(b.billed_cents - COALESCE(c.counter_cents, 0)) > 0.01 * b.billed_cents;
-- Any delta > 1% is absorbed into the counter by the next pass
Can you get rid of the batch layer (Kappa)?
An interviewer sometimes pushes: "why do you need the batch layer at all? Can't streaming be the source of truth?" The right response:
- Yes, theoretically. Kappa architecture has been defensible for 10 years.
- In ads practice, no. Billing requires provable determinism: same inputs, same outputs, every time. Streaming with late data, out-of-order events, and consumer failover cannot make that guarantee cheaply at 50B/day.
- So batch is the auditable system of record; streaming is a fast approximation with reconciliation. This is how every large ads platform works in 2026.
Mention the cost too: a pure-Kappa pipeline re-processes the entire history every time a bug is fixed. Batch corrections are cheaper at scale.
Semantics
- Exactly-once end-to-end is achievable in a single system (Flink + Kafka with transactional sinks) but fragile across system boundaries.
- At-least-once + idempotent sinks is the norm. Dedupe on
impression_idat the sink; every downstream consumer can be replayed without double-counting.
The 10 Scenarios
Each is written for ~5 minutes of whiteboard time. The pattern is the same: restate the prompt, name the grain, draw the pipeline, call out the 3 biggest pitfalls, end with a follow-up question the interviewer will respect.
Scenario 1: Campaign Reporting Pipeline (batch)
Grain: (dt, campaign_id, audience_id, placement_id). Sources: fact_impression, fact_click, fact_conversion_attributed, dim_campaign. Cadence: hourly + daily locked. Output: gold.metric_campaign_daily, served to advertiser dashboards.
Pitfalls: (a) late conversions re-opening windows for 7 days — every day's number is mutable until D+7; (b) time-zone mismatch between the advertiser's reporting zone and UTC — store in UTC, convert on read; (c) dim_campaign SCD-2 not snapshotted at event time — you'll see "current" campaign names on historical data, which confuses advertisers who renamed their campaign.
Follow-up: "When does a number become immutable?" Answer: D+7 lock, or sooner if you're confident all late data has arrived.
Scenario 2: Real-Time Delivery Monitoring (streaming)
Need: under-delivery alert within 5 minutes. Architecture: streaming job consumes impressions, maintains delivered_so_far per campaign, compares to pacing_curve[hour], emits alert if delivered < 0.7 × expected sustained for 10 minutes.
Pitfalls: (a) the pacing curve itself is an assumption; if auction density surprises you, alert fires wrongly — gate on absolute thresholds too; (b) start-of-day noise: first 30 minutes are always noisy, delay alerts; (c) opaque alert: always attach "why under-delivering" (budget exhausted, bid too low, audience too narrow) — an alert without a diagnosis is useless.
Follow-up: "What do you do if the alert fires and the campaign is fine?" Answer: tune thresholds, add suppression, build a playbook for reviewers.
Scenario 3: Attribution at Scale (ID graph + windowing)
Covered in depth in Part 2 Scenario 12, but the short form: resolve each event to an id_graph component; collect clicks + impressions per component; for each conversion, apply the attribution rule (last click 7d, last view 1d); emit one row per attributed conversion.
Pitfalls: (a) component explosions from bad probabilistic edges — cap component size; (b) attribution rules must be versioned, because advertisers will debate your choices; (c) the "unattributed" bucket is first-class — make it visible so growth isn't understated.
Follow-up: "How do you A/B test attribution rules?" Answer: shadow attribution in parallel; log both; let analysts compare.
Scenario 4: Audience Segmentation
Problem: advertisers define audiences ("users interested in running, age 25–40, US"). Two flavors:
- Static audiences — user list uploaded by the advertiser. Stored as a Bloom filter per audience, refreshed daily.
- Dynamic audiences — rule-based over user attributes. Recomputed hourly or on-attribute-change.
Pitfalls: (a) audience overlap — a user can be in 50+ audiences, each matching at auction time adds latency; (b) freshness mismatch — a dynamic audience refreshed hourly can lag user state by 58 minutes; (c) privacy — audiences below a minimum size (e.g., 1000 users) must be blocked for targeting to prevent inference attacks.
Schema:
CREATE TABLE audience_membership (
audience_id BIGINT,
user_id BIGINT,
added_ts TIMESTAMP,
source STRING -- 'upload', 'rule', 'lookalike', 'interaction'
) PARTITIONED BY (audience_id);
Follow-up: "How do you measure audience overlap efficiently?" Answer: HyperLogLog sketches per audience, intersect via MinHash estimates.
Scenario 5: Creative A/B Testing
Covered in Part 2 Scenario 14 for the analytics side. Key DE pitfalls:
- Unit of randomization. User, not impression. Impression-level assignment contaminates.
- Pre-exposure bias. Don't count metrics before the user was first exposed.
- Novelty effect. Drop the first 3 days of every creative experiment.
- Multiple comparisons. An experiment with 8 creatives needs corrected p-values.
Follow-up: "If two creatives are statistically tied after two weeks, how do you decide?" Answer: check secondary metrics (frequency, fatigue, halo effects) or declare a tie and rotate.
Scenario 6: DQ Frameworks at Scale
Real DQ has three axes:
- Integrity. Schema matches contract, no NULLs where forbidden, FKs resolve.
- Freshness. Last-row timestamp within SLA.
- Drift. Row count, average metric values, distribution stability versus trailing baseline.
Architecture:
Pitfalls: (a) alert fatigue — every check has a priority and SLA; (b) dq_result table must be first-class — downstream consumers read it to gate their own jobs; (c) checks must be versioned — adding a new check shouldn't retroactively fail yesterday.
Follow-up: "How do you roll out a new, stricter check?" Answer: shadow mode (logs but doesn't block) for 2 weeks, then gate.
Scenario 7: Backfills
Requirements: bounded, idempotent, cheap. A backfill that re-runs 90 days of daily rollups must produce byte-identical output to the original (modulo bug fixes). Key patterns:
- Partition-aware writes.
INSERT OVERWRITE PARTITION (dt = ...), neverDELETEthenINSERT. - Deterministic random. Any random sampling uses a seed derived from
(dt, partition_key). - Idempotent sources. Bronze partitions must be immutable; if you re-run yesterday, you read the same bronze partition.
- Cost gate. A 90-day backfill at 50B events/day can cost $50K; require manager approval for any backfill > 30 days.
Follow-up: "How do you detect when a backfill produced different numbers than expected?" Answer: a fingerprint check — compare row count and key metric totals, pre/post, and alert on deltas > 0.1%.
Scenario 8: SCD for Campaign Entities
Campaigns change over time. A bid strategy flip at noon matters for attribution.
CREATE TABLE dim.campaign_history (
campaign_id BIGINT,
valid_from TIMESTAMP,
valid_to TIMESTAMP,
name STRING,
objective STRING,
bid_strategy STRING,
daily_budget DECIMAL(18,4),
change_reason STRING
);
-- Query "as-of" a given impression time
SELECT f.impression_id, c.bid_strategy, f.clearing_cpm
FROM fact_impression f
JOIN dim.campaign_history c
ON f.campaign_id = c.campaign_id
AND f.impression_ts >= c.valid_from
AND f.impression_ts < c.valid_to;
Pitfalls: (a) valid_to must default to far-future, not NULL, to use range joins; (b) changes must be written atomically — a dual-write pattern where old row's valid_to closes and new row opens in the same transaction; (c) time-travel queries must always pass the event time.
Follow-up: "How do you deal with changes that applied retroactively?" Answer: new row with a retroactive_from flag; downstream chooses whether to honor retroactively.
Scenario 9: Incident Response — Spend Spike at 3am
"Page: platform-wide spend is running 40% above forecast. You're on-call. Walk me through the first 30 minutes."
Runbook:
- Confirm the signal. Is it a single advertiser, a single surface, or broad? Look at
metric_platform_minutesliced by surface, geo, campaign cohort. - Localize. Often it's one mega-advertiser with a bid bug — their campaign suddenly bidding $100 CPM. Or a pacing-system bug that disabled all gates.
- Triage — is customer action needed? If single advertiser and their fault: inform account team, no platform action. If platform-wide: consider emergency pacing override.
- Inform stakeholders. 15-minute update cadence until stable.
- Post-incident: forensic pipeline run, exact delta spend, refund decisions.
Pitfall: the temptation to "fix first, investigate later." Don't. You might kill a legitimate advertiser spending burst (Super Bowl, product launch). Always confirm anomaly vs legitimate.
Scenario 10: Senior Technical Leadership — Platform Migration
"Your team needs to migrate from an aging warehouse to a new one. 500+ pipelines, 2000+ dashboards, 30+ downstream teams. Walk me through the plan."
Shape:
- Phase 0 (1 month): inventory. Build the lineage DAG. Identify the 50 tables that feed 80% of consumers.
- Phase 1 (3 months): dual-write. New warehouse runs in parallel for the top 50 tables. Validate byte-by-byte for 2 weeks per table.
- Phase 2 (6 months): consumer migration. Each consumer migrates on their schedule. Provide tooling (SQL-dialect translator, schema diff checker).
- Phase 3 (2 months): deprecation. Old warehouse is read-only. Set an end-of-life date 6 months out.
- Phase 4: decommission.
Risks to call out: SQL-dialect differences (forget one CASE expression and a dashboard breaks silently), cost spike during dual-write, political cost if one team's blocker delays the whole migration.
Governance: weekly stakeholder review; public dashboard of migration progress; a "red/yellow/green" status per consumer. Transparency is how you stay credible.
Appendix: Loop Preparation Checklist
- I can draw the UMP architecture from memory.
- I've written the attribution SQL cold, twice.
- I've coded the config-driven generator + DQ framework.
- I have 6 STAR+I behavioral stories prepped (see Part 2).
- I can name the top 5 pitfalls of each of the 10 scenarios.
- I know the current numbers cold: impressions/day, storage/day, QPS.
- I've read Part 4's glossary and don't blink at any term.
- I've done at least one full mock loop with a friend.
- Sleep. Eat protein. Show up.
Advanced Ads, Growth & Behavioral
Part 2: Advanced Ads + Growth Accounting + Behavioral
"Mid-level candidates are judged on what they can build. Senior candidates are judged on what they'd change, what they'd push back on, and what they'd tell the room nobody else will say. Scenarios 11–22 are where that line gets drawn."
How to read this part
Part 1 walked a full five-round loop plus ten core technical scenarios. Part 2 picks up the scenario stream at number 11 and runs through 22. The first eight are advanced ads engineering — auction mechanics, identity graphs, pacing, creative analytics, growth accounting, experimentation, lineage, cost — all the places a senior should have strong opinions. The final four are behavioral — the STAR+I framework plus three of the most common senior-level prompts.
None of these scenarios are "draw a diagram." They're "here's a messy real-world situation — what do you do." Senior interviewers are looking at three things:
- Substance. Do you understand the actual mechanics, not just the buzzwords?
- Taste. Do you know what's worth building vs. what's a distraction?
- Communication. Can you explain a trade-off in 90 seconds without hand-waving?
The scenarios below are written with all three in mind. The prose is deliberately dense. Read each one twice.
Scenario 11: Ad Auction Data Engineering at 100B+ Events/Day
The prompt
"Walk me through how you'd design the data pipeline for the ad auction system. You're logging auction events at 100B+ per day. What's streamed, what's batched, what's sampled, what's dropped, and why?"
The funnel
Every ad shown to a user is the output of an auction. The funnel is roughly:
- bid_request candidate_selection ranking auction clearing impression
- (100%) (top ~5000) (top ~50) (top ~5) (winner) (rendered)
- ~1T/day candidate pool ranked pool auction pool ~50B/day ~50B/day
A single ad request typically produces:
- 1 bid_request event — what the user asked for, what slot was available.
- ~5,000 candidate events — every ad that was eligible before any ranking happened.
- ~50 ranked events — ads that survived the first ranker pass.
- ~5 auction events — final auction participants (after floor price, diversity constraints, policy filters).
- 1 clearing event — who won, at what price, why.
- 1 impression event — actually rendered (if the client confirmed render).
If a large platform runs 50B impressions/day, the upstream funnel is roughly:
| Stage | Events/day | Storage if logged raw |
|---|---|---|
| bid_request | 200B–1T | ~50–250 TB/day raw |
| candidates | 250T+ (50B × 5000) | Infeasible to log raw |
| ranked | 2.5T (50B × 50) | ~600 TB/day raw |
| auction | 250B (50B × 5) | ~60 TB/day raw |
| clearing + impression | 50B each | ~12 TB/day each raw |
You can't log the candidate pool raw. You can't even log the ranked pool raw at this scale. That's the core problem.
Schema
The auction fact grain is the ad_request × slot × stage composite. You keep one wide row for the terminal stages (auction + clearing + impression) and a sampled fanout for the upstream stages.
-- Terminal: one row per impression, fully dimensionalized
CREATE TABLE fact_auction_clearing (
ad_request_id BINARY(16) NOT NULL, -- uuid
slot_id STRING NOT NULL, -- surface × placement key
ts_auction_us BIGINT NOT NULL, -- micros since epoch
user_id BIGINT, -- null if logged-out
surface_id INT, -- Main Feed / Visual Feed / Short-form Video / Stories / Publisher Network
placement_id INT, -- feed position, story card, etc.
geo_country STRING,
device_os STRING,
winning_campaign_id BIGINT NOT NULL,
winning_ad_id BIGINT NOT NULL,
winning_advertiser BIGINT NOT NULL,
winning_bid_cpm DECIMAL(12,6),
clearing_cpm DECIMAL(12,6), -- second-price or generalized-second-price
floor_cpm DECIMAL(12,6),
num_auction_entrants INT,
num_ranked INT,
num_candidates BIGINT, -- usually estimated, not exact
auction_format STRING, -- 'vcg', 'gsp', 'first_price'
was_impressed BOOLEAN, -- did the client render
impression_ts_us BIGINT, -- null until impression event arrives
policy_filters_hit ARRAY<STRING>, -- audit trail
experiment_ids ARRAY<BIGINT>
)
PARTITIONED BY (dt STRING, hour INT)
CLUSTERED BY (winning_advertiser, winning_campaign_id);
-- Sampled upstream: losing bids in the auction pool (kept for counterfactual research)
CREATE TABLE fact_auction_losers_sampled (
ad_request_id BINARY(16) NOT NULL,
slot_id STRING NOT NULL,
ts_auction_us BIGINT NOT NULL,
campaign_id BIGINT NOT NULL,
ad_id BIGINT NOT NULL,
advertiser_id BIGINT NOT NULL,
bid_cpm DECIMAL(12,6),
rank_score DOUBLE, -- pre-auction ranker score
rank_position INT, -- final rank before auction
filter_reason STRING, -- why did they lose
sample_weight DOUBLE -- inverse probability — critical for unbiased rollups
)
PARTITIONED BY (dt STRING);
Notice sample_weight. That's not optional. Without it, every downstream aggregate is biased in ways nobody will notice until an advertiser calls their account manager furious.
Sampling strategy
You have three choices at each upstream stage:
Uniform sampling — drop 99 out of every 100 events at random. Simple, but it under-weights the tail. If 0.1% of auctions are for very high-value placements, you lose most of them.
Stratified sampling — different rates per stratum. For example, keep 100% of auctions above $50 CPM, 10% between $5 and $50, 1% below $5. You preserve tail behavior. The cost is carrying the strata definitions as a first-class config. Re-sampling historically when the strata change is a pain.
Tail-biased / weighted sampling — oversample rare events (new campaigns, rare surfaces, rare geographies), undersample common ones. Use inverse-probability weighting downstream. This is what most mature platforms end up doing. The mental model is: "every event carries its own weight; aggregates multiply by SUM(weight) instead of COUNT(*)."
At the candidate-pool stage, most platforms don't log the full pool even sampled. Instead they log "reason for not winning" aggregated counters (policy filter, floor price, budget exhausted, pacing gate, bid too low, diversity constraint). That's enough to answer 90% of research questions and costs ~1000x less to store.
What's streamed, what's batched
bid_request → streaming (real-time pacing signals, fraud detection)
candidate pool → aggregated counters only, streamed
ranked pool → sampled, streamed (small sample, ~1%)
auction pool → kept for winners, sampled for losers, streamed
clearing → streamed + logged to bronze
impression → streamed from client, needs careful dedup
conversion → streamed from advertiser pixel / server API, re-joined in batch
Streamed because the pacing system, fraud system, and real-time dashboards all need them fresh. Impressions in particular need to be closed against click and conversion within minutes.
Batched is where the attribution join, holistic ROAS computation, and advertiser-facing reporting are assembled. Batch owns the source of truth; streaming is the early preview.
Why keep the losing bids?
Three reasons.
Research. "If we'd raised the floor by 10%, how much revenue would we have lost?" Can't answer without counterfactuals.
Floor-price tuning. Reserve prices are tuned per (surface, placement, audience, hour, geography). That tuning needs a decade of prior auctions to do well. Keep the losing bids at least for training data.
Advertiser debugging. "Why didn't my campaign serve yesterday?" The answer is almost always upstream of the auction — rank score too low, bid too low, filter hit. You need to show that trail. If you only log winners, the answer to that question is "I don't know."
Senior-grade trade-off to mention
A weak answer logs everything and blows up the warehouse. A mid answer samples uniformly and silently biases every rollup. A senior answer logs terminal events fully, samples upstream events stratified by value, carries sample_weight through every aggregate, and builds a counter-based fallback for the ultra-high-cardinality stages. And they explain the bias trade-off in under a minute.
Scenario 12: Cross-Platform Identity Graph and Attribution
The prompt
"Users show up on your ad platform from multiple surfaces, on multiple devices, sometimes logged in, sometimes not. Design the identity graph. How do you attribute a conversion that happens on mobile web to an ad that was shown inside an app three days earlier?"
The graph model
Nodes: every observable identifier.
user_id— the platform's internal stable ID, available only when logged in.device_id— platform SDK-generated, stable on a device until the user resets it.idfa/gaid— iOS and Android advertising IDs. Reset-able by the user.cookie_id— browser first-party cookie, surface-specific.hashed_email/hashed_phone— deterministic PII hash, shared with advertisers via server conversion APIs.ip_address— low signal, high privacy cost, used sparingly.user_agent_hash— helps for probabilistic matching.
Edges: observed co-occurrences.
- Deterministic edges — two identifiers observed on the same authenticated session.
user_id ↔︎ device_idwhen a user logs in. High confidence, low volume. - Probabilistic edges — two identifiers observed with the same (
ip,user_agent, timezone) within a short window. Low confidence, high volume. These are where privacy teams spend most of their time.
Schema
CREATE TABLE id_graph_edges (
edge_id BINARY(16),
src_type STRING, -- 'user', 'device', 'idfa', 'cookie', 'hashed_email'
src_value STRING, -- hashed/salted
dst_type STRING,
dst_value STRING,
edge_kind STRING, -- 'deterministic', 'probabilistic'
confidence DOUBLE, -- 0-1
first_seen_ts BIGINT,
last_seen_ts BIGINT,
evidence_count INT, -- how many times observed
source_surface STRING
)
PARTITIONED BY (dt STRING);
CREATE TABLE id_graph_components (
component_id BINARY(16),
member_type STRING,
member_value STRING,
added_ts BIGINT,
confidence_score DOUBLE
)
PARTITIONED BY (dt STRING);
The graph is rebuilt as components — connected subgraphs of identifiers. For a user who's logged into two devices and also visits your publisher network in an incognito browser, you might end up with a component that contains five identifiers. The attribution engine resolves events to a component, not to a single identifier.
Rebuild cadence
Daily full rebuild — the 90-day edge window is re-clustered nightly. Gives you fresh components with full evidence.
Hourly incremental merge — new edges observed in the past hour are merged into existing components. Used for near-real-time attribution.
Continuous expire — edges older than 90 days (or the local privacy window) are dropped, components re-scored.
Cost at scale: the nightly full rebuild is typically the second- or third-most expensive job in the warehouse. It processes 10s of billions of edges. Optimize by partitioning by source-type pairs so the graph can be rebuilt region-by-region.
Privacy constraints
Not every edge is legal to keep. Concretely:
- iOS ATT non-consent — if the user didn't grant app tracking transparency, the IDFA is zeroed out. You have to drop any edge that depended on it.
- EU (GDPR) opt-out — the user's edges must be severed within 30 days of request. Maintain a "forget list" and re-run component assignment.
- Minors — no probabilistic edges at all for under-18s. Deterministic-only.
- Sensitive categories — health, financial, political inference edges banned in most jurisdictions.
This is where a senior DE earns their salary. The engineering work isn't hard; the policy compliance is. Build the forget-list into the daily rebuild so it's impossible to keep forbidden edges by accident.
Attribution on top of the graph
Flow:
The interesting bit is the component lookup. Naive implementation: join conversion to edges, find component, join component to all members, probe impressions by each member. At scale you'll OOM or wait 6 hours.
Better: pre-compute a denormalized user_event_history keyed by component_id, refreshed hourly. Each component carries a bounded list of events. Attribution is then a single join with no fanout.
Post-ATT / post-cookie world
Apple's ATT rollout and browser third-party cookie removal broke the deterministic-dominant graph. The industry response:
- Server-side Conversion APIs — advertisers send conversion events directly from their server, with hashed email and event metadata. Bypasses the browser entirely. Deterministic when the hashed email matches a known user.
- SKAdNetwork (SKAN) — Apple's privacy-preserving install attribution. Postbacks are delayed, aggregated, differentially private. You engineer a pipeline that accepts SKAN postbacks, joins them with campaign metadata, and reconciles with the advertiser's measurement.
- Modeled conversions — for the unobservable fraction, ML models estimate missing conversions using aggregated signals. The DE job is to feed the model clean, consistent training data and join the modeled output back into reporting with the right caveats.
- Clean rooms — the advertiser and the platform both put hashed identifiers into a neutral environment where a limited set of queries can run. Edges are computed without either side seeing the raw data.
Expect a senior interviewer to ask "How has ATT changed your architecture?" If your answer is "it hasn't," the interview is over.
Common pitfalls
- Treating a
device_idchange as a new device — IDFA resets are user-initiated and frequent. Handle them as edge expiry, not as a new node. - Letting components get too large — a low-quality probabilistic edge can merge two unrelated components into a mega-component. Put a
max_component_sizeguard (e.g., reject merges that push a component over 50 members). - Using the graph for non-attribution purposes without an audit — once a graph exists, every team wants to use it. Put access controls up-front.
Scenario 13: Budget Pacing and Spend Forecasting
The prompt
"An advertiser sets a $10,000 daily budget. How do you ensure the platform spends close to that, without going over? And what happens at the end of the day if you've only spent $7,000 — how do you decide whether to keep serving?"
This is a fan-favorite senior prompt because it exposes how you reason about real-time vs. batch, stochastic systems, and partial observability all at once.
The two-loop system
Fast loop (seconds): real-time spend gate. Every auction win increments a Redis / Pinot / KV-store counter of "spend today." When the counter hits 95% of budget, the campaign is throttled — pacing score decays sharply. When it hits 100%, the campaign is pulled from the auction entirely.
Slow loop (minutes to hours): batch reconciliation. The streaming counter is known to drift from the ground truth because:
- Client didn't render the impression — you charged but shouldn't have.
- Impression arrived late from the client (common on mobile with bad network).
- Invalid traffic filter decided, hours after the fact, that the impression was a bot.
- Billing rounding rules differ from real-time cost estimates.
Every hour, a batch job reconciles the ground-truth spend with the counter. If the counter over-reported, you credit the campaign back and re-enable it. If it under-reported, you quietly absorb — over-delivery is cheaper than angering the advertiser.
Pacing curves
A daily budget isn't meant to be spent in the first hour. Three patterns:
- Even pacing — spend at 1/24th of budget per hour, aligned to the campaign's audience time zone. Default for brand campaigns.
- Accelerated pacing — spend as fast as the auction allows. Default for direct-response campaigns where competing for impression is the point.
- Predicted pacing — predict the hour-by-hour auction demand curve, spend proportionally. High-quality predictive pacing can lift campaign delivery by 5–10% over even pacing.
The predicted curve uses:
- Historical auction density for the campaign's target audience (14-day rolling).
- Hour-of-day and day-of-week seasonality.
- Local events (holidays, sports, major product launches).
- Weather, if applicable (outdoor retail).
The DE pipeline for this is a daily job that produces pacing_curve(campaign_id, hour, expected_spend_share) and publishes it to the real-time gate.
Spend forecast
Separate from pacing is forecasting — predicting what total spend will be for a campaign before it runs. Inputs:
- Bid + budget + audience size from the campaign setup.
- Historical CPM for that audience × surface × objective.
- Current fill rate (what fraction of the audience is actually auctionable today).
- Competitive density (how many other advertisers target the same audience).
The forecast feeds two things: (1) the advertiser's "reach estimate" in the campaign setup UI, and (2) the delivery-monitoring system's expected-vs-actual alerts.
The reconciliation problem
Over-delivery happens. The streaming counter can't possibly be perfect. Policies:
- Budget tolerance — most platforms commit to spending no more than ~2% over the daily cap. Advertisers accept this in the ToS.
- Make-good — if over-delivery is severe (>5%), credit the advertiser.
- Under-delivery — no credit; the advertiser paid for what they got. But if the campaign was under-delivered because the platform throttled it wrongly (reconciliation showed streaming was over-reporting), you re-enable and try to catch up the remaining delivery over the next few days.
Engineering: the reconciliation job outputs a spend_delta per campaign per hour. A scheduled task consumes deltas and updates the real-time counter. This isn't glamorous — but every senior ads DE has touched this loop.
What happens at budget exhaustion
The user-facing question here is: hard stop or graceful?
Hard stop — when the counter hits 100%, kill the campaign immediately. Risk: if the counter is over-reporting, the campaign is wrongly killed.
Graceful — at 95%, scale pacing down. At 100%, stop bidding but don't fully remove the campaign. At 102% (over-delivery), kill and absorb the loss.
Graceful is what everyone actually does. The cost of wrongly killing a big-budget campaign (advertiser complaint, trust loss, lost revenue in the recovery window) is much higher than the cost of a 2% over-delivery.
Senior-grade pitfalls to name
- Clock drift across data centers. Your streaming counter has to tolerate clock skew. Use logical time (event sequence numbers) for correctness, wall clock for monitoring.
- Reconciliation loops creating oscillation. If reconciliation re-enables a campaign, and streaming then wins a big burst of auctions, you can over-deliver worse than before. Dampen the re-enable (enable at 80% of remaining budget, not 100%).
- Time zones. Budget is daily in the advertiser's time zone. The batch pipeline runs in UTC. One time zone bug can refund a campaign twice or never.
Scenario 14: Creative-Level Performance Analytics
The prompt
"Advertisers run many creatives per campaign. Build me the creative-level performance pipeline. What metrics, what grain, and how do you detect creative fatigue?"
Creative taxonomy
A creative is the actual asset shown — an image, video, carousel, playable, etc. Common taxonomy:
- Image — single static image, one headline, one CTA.
- Video — typically 6s / 15s / 30s / 60s versions.
- Carousel — 2–10 cards, each with its own asset.
- Dynamic Product Ad (DPA) — template + catalog, the actual rendered creative varies per impression per user.
- Collection — video or image hero plus a product grid.
- Playable — mini-interactive experience, common in gaming.
- Story — vertical 9:16, with interactive elements.
- Short-form Video — vertical 9:16 video, usually full-screen auto-play.
- Lead Form — creative + inline form.
Each format has native metrics that the others don't. A video has completion rates; an image doesn't. A carousel has per-card performance; a single image doesn't.
Metric grain
The canonical fact table is at the creative × placement × audience × day grain:
CREATE TABLE fact_creative_perf_daily (
dt DATE,
creative_id BIGINT,
ad_id BIGINT,
campaign_id BIGINT,
placement_id INT, -- surface × position
audience_id BIGINT, -- audience segment or 'BAU' for broad
-- Delivery
impressions BIGINT,
reach BIGINT,
frequency DOUBLE, -- impressions / reach
spend_usd DECIMAL(18,4),
-- Engagement
clicks BIGINT,
ctr DOUBLE, -- clicks / impressions
-- Video-specific (NULL for non-video)
video_views_3s BIGINT,
video_views_15s BIGINT,
video_completes BIGINT,
thumbstop_rate DOUBLE, -- 3s_views / impressions
completion_rate DOUBLE,
-- Outcomes
conversions_view_1d BIGINT,
conversions_click_7d BIGINT,
conversion_value_usd DECIMAL(18,4),
cvr DOUBLE,
cpa DOUBLE,
roas DOUBLE
)
PARTITIONED BY (dt);
Notice: nullable video metrics. Don't build separate tables per format; you'll pay for every JOIN in every dashboard.
Creative fatigue
Creative fatigue is measurable. Symptoms:
- Rising frequency — the average user is seeing this creative 8, 10, 15 times.
- Falling CTR — engagement drops with repeated exposure.
- Rising CPA — conversions get more expensive because the click-through rate is dropping.
- Shrinking reach at constant budget — because the bid price required to win is climbing as the audience becomes less responsive.
Detection:
WITH daily AS (
SELECT dt, creative_id, SUM(impressions) AS imps, SUM(reach) AS rch,
SUM(clicks) AS clicks, SUM(spend_usd) AS spend,
SUM(conversions_click_7d) AS conv
FROM fact_creative_perf_daily
WHERE dt BETWEEN CURRENT_DATE - 28 AND CURRENT_DATE - 1
GROUP BY 1, 2
),
with_trend AS (
SELECT
creative_id, dt, imps, clicks, spend, conv,
imps * 1.0 / NULLIF(rch, 0) AS freq,
clicks * 1.0 / NULLIF(imps, 0) AS ctr,
spend / NULLIF(conv, 0) AS cpa,
-- 7-day baseline at day 1 of window
AVG(clicks * 1.0 / NULLIF(imps, 0)) OVER (
PARTITION BY creative_id
ORDER BY dt ROWS BETWEEN 21 PRECEDING AND 15 PRECEDING
) AS baseline_ctr
FROM daily
)
SELECT creative_id, dt, ctr, baseline_ctr,
(ctr - baseline_ctr) / NULLIF(baseline_ctr, 0) AS ctr_delta
FROM with_trend
WHERE dt >= CURRENT_DATE - 7
AND ctr_delta < -0.30 -- CTR down >30% from baseline
AND freq > 6 -- seen more than 6 times on average
ORDER BY ctr_delta;
The platform-side action is to deprioritize the creative in the ranking, or notify the advertiser via "Advertiser Console" alerts that it's time to refresh the ad.
Creative rotation and experiments
When an advertiser has 10 creatives in a campaign, you have to decide how much traffic each gets. Three strategies:
- Even rotation — 10% each. Fair but wastes budget on losers.
- Performance-weighted — exploit the winner. Risks early lock-in on a lucky creative.
- Contextual bandit (Thompson sampling) — balance explore/exploit per impression. Industry standard now.
The DE pipeline behind this: the bandit needs the posterior over each creative's CVR, updated at least hourly. Build a streaming job that consumes impression + conversion events and materializes creative_posterior(creative_id, alpha, beta) to the ranking store. Keep a 14-day decay on the posterior so old wins don't lock in forever.
Creative A/B testing
Advertiser-run: the advertiser wants to know if Creative A beats Creative B on this campaign. Challenges:
- Between-creative variance is smaller than between-audience variance — random assignment must be at the user level, not the impression level, or the experiment will be contaminated by audience-mix drift.
- Attribution windows complicate readout — a click today might convert 6 days later. Lock a read-out cutoff and stick to it.
- Novelty effect — both creatives will look good in the first few days because of novelty. Design for a 7-day minimum runtime.
Scenario 15: Growth Accounting — The L28 Framework
The prompt
"DAU dropped 3% week-over-week. What's the pipeline that lets you explain what happened?"
The answer is growth accounting, specifically the L28 / new-retained-resurrected-dormant framework. This is the single most-asked analytics scenario for senior DE roles at consumer platforms.
The four states
For any user, on any day, they are in exactly one of four states relative to a trailing window:
- NEW — first day of lifetime activity. Not active any day before today.
- RETAINED — active today and active yesterday.
- RESURRECTED — active today but not active yesterday (though active sometime before yesterday).
- DORMANT — active sometime in the window, not active today.
Equation that falls out:
DAU(today) = NEW(today) + RETAINED(today) + RESURRECTED(today)
DAU(today) - DAU(yesterday) = NEW(today) - DORMANT(today) + [RESURRECTED(today) - RETAINED_LOSS]
Every DAU change is accountable to user-level flow between states. No more "DAU dropped, dunno why."
Daily user state table
CREATE TABLE daily_user_state (
dt DATE,
user_id BIGINT,
is_active_today BOOLEAN, -- did any qualifying event happen
is_active_yday BOOLEAN, -- from LAG
last_active_before_yday DATE, -- from previous row's state
lifetime_active_days INT, -- cumulative count up to and including today
state STRING, -- 'NEW', 'RETAINED', 'RESURRECTED', 'DORMANT'
days_since_last_active INT
)
PARTITIONED BY (dt);
state is the classification we compute. The others are inputs.
Daily classification SQL
-- Step 1: raw daily activity (user appears in events that day)
CREATE OR REPLACE TABLE user_activity_daily AS
SELECT DISTINCT user_id, DATE(event_ts) AS dt
FROM events_bronze
WHERE event_name IN ('impression', 'click', 'app_open') -- "qualifying activity"
AND dt BETWEEN CURRENT_DATE - 400 AND CURRENT_DATE;
-- Step 2: classify each (user, day)
CREATE OR REPLACE TABLE daily_user_state AS
WITH active_days AS (
SELECT
user_id, dt,
LAG(dt) OVER (PARTITION BY user_id ORDER BY dt) AS prev_active_dt,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY dt) AS lifetime_rank
FROM user_activity_daily
)
SELECT
dt,
user_id,
TRUE AS is_active_today,
CASE
WHEN lifetime_rank = 1 THEN 'NEW'
WHEN prev_active_dt = dt - INTERVAL '1' DAY THEN 'RETAINED'
ELSE 'RESURRECTED'
END AS state,
DATEDIFF(dt, prev_active_dt) AS days_since_last_active,
prev_active_dt AS last_active_before_yday,
lifetime_rank AS lifetime_active_days
FROM active_days;
-- Step 3: attach DORMANT rows for each user who was active recently but not today
-- (they still exist in the daily table, just flipped state)
INSERT INTO daily_user_state
SELECT
d.dt,
prev.user_id,
FALSE AS is_active_today,
'DORMANT' AS state,
DATEDIFF(d.dt, MAX(prev.dt)) AS days_since_last_active,
MAX(prev.dt) AS last_active_before_yday,
NULL AS lifetime_active_days
FROM (SELECT DISTINCT dt FROM user_activity_daily) d
CROSS JOIN user_activity_daily prev
WHERE prev.dt BETWEEN d.dt - INTERVAL '28' DAY AND d.dt - INTERVAL '1' DAY
AND NOT EXISTS (
SELECT 1 FROM user_activity_daily act
WHERE act.user_id = prev.user_id AND act.dt = d.dt
)
GROUP BY d.dt, prev.user_id;
In practice this is built incrementally — one day at a time — because a full rebuild at 1B+ users is expensive. The incremental version uses yesterday's daily_user_state as input and only classifies today.
The L28 rollup
L28 = "Last 28 days of activity." For any user on any day, their L28 is {0, 1, 2, ..., 28}: the number of days out of the trailing 28 they were active. This is the most predictive retention feature in consumer ads.
CREATE OR REPLACE TABLE user_l28 AS
SELECT
dt,
user_id,
COUNT(*) AS active_days_in_28, -- range [1, 28]
CASE
WHEN COUNT(*) >= 25 THEN 'SUPER_USER'
WHEN COUNT(*) >= 15 THEN 'CORE_USER'
WHEN COUNT(*) >= 5 THEN 'CASUAL_USER'
ELSE 'LIGHT_USER'
END AS l28_segment
FROM daily_user_state
WHERE is_active_today = TRUE
AND dt BETWEEN CURRENT_DATE - 28 AND CURRENT_DATE
GROUP BY 1, 2;
For the DAU drop diagnosis, you pivot this by state:
SELECT
dt,
state,
l28_segment,
COUNT(DISTINCT user_id) AS users,
SUM(CASE WHEN state = 'NEW' THEN 1 ELSE 0 END) -
SUM(CASE WHEN state = 'DORMANT' THEN 1 ELSE 0 END) AS net_new
FROM daily_user_state dus
LEFT JOIN user_l28 l USING (dt, user_id)
WHERE dt BETWEEN CURRENT_DATE - 14 AND CURRENT_DATE
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;
Wiring into DAU / WAU / MAU
DAU is just SUM(is_active_today) over the user set. WAU and MAU are the distinct counts over trailing 7 and 30 days:
SELECT
dt,
COUNT(DISTINCT user_id) FILTER (WHERE is_active_today) AS dau,
COUNT(DISTINCT user_id) AS wau_28d -- careful: this counts everyone in the window
FROM daily_user_state
WHERE dt BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE
GROUP BY dt;
The gotcha: wau is a rolling window. You can't compute wau(today) = wau(yesterday) + new_today - dropped_today naively because a user active 6 days ago rolls out of the window today. Always recompute from the base table for a 7-day rolling number.
Revenue-weighted L28
Ads revenue DAU ≠ user DAU. A user who opens the app once and sees zero ads contributes zero revenue. A user who scrolls the feed for 40 minutes and sees 50 ads contributes dollars.
CREATE TABLE revenue_user_state AS
SELECT
dus.dt,
dus.user_id,
dus.state,
COALESCE(SUM(f.spend_usd), 0) AS revenue_today_usd
FROM daily_user_state dus
LEFT JOIN fact_impression_revenue f
ON dus.user_id = f.user_id AND dus.dt = f.dt
GROUP BY 1, 2, 3;
The "revenue DAU drop" decomposes into per-state revenue flow: "We lost $X from DORMANT users because our resurrect rate fell; we gained $Y from NEW users because a surface experiment launched." That's a senior-level diagnosis.
What a senior asks next
If the interviewer is good, they'll follow up: "OK, we see the state flows. What's your next step?" The senior answer isn't "look at the data more." It's:
- Pick the largest contributing state flow (e.g., "resurrected users dropped 8%").
- Decompose along a second dimension (geography, OS, signup cohort, surface).
- Find the concentration (e.g., "the entire drop is in iOS 17 users in North America").
- Correlate with deploys and experiments (was there a release that touched that slice?).
- Recommend a hypothesis-test (e.g., "roll back the experiment on that segment and observe 48-hour recovery").
Scenario 16: Experimentation Data Infrastructure
The prompt
"Design the data infrastructure for running thousands of concurrent A/B tests on an ads platform. What goes wrong at scale?"
Exposure logging: at request time or at session start?
Two schools.
Request-time exposure — every ad request logs the assigned experiment IDs. Every downstream event can be joined to experiment membership. Pros: maximum granularity; you can split any metric by experiment. Cons: massive write volume (every ad request × every active experiment the user is in); hard to manage experiment-end-of-life.
Session-start exposure — a user's assignments are computed once per session (or per day) and cached. Pros: much lower write volume. Cons: you can't experiment on per-request decisions (like "which ranker variant to use for this particular request").
Most platforms end up running both. Session-start is the default; request-time is available opt-in for experiments that need it.
Assignment logging schema
CREATE TABLE fact_experiment_exposure (
event_ts TIMESTAMP,
user_id BIGINT,
session_id STRING,
request_id BINARY(16),
experiment_id BIGINT,
variant_id INT,
assigned_ts TIMESTAMP, -- when the assignment first happened
is_first_exposure BOOLEAN,
surface_id INT
)
PARTITIONED BY (dt);
First-exposure matters: that's the moment the user was actually "treated." Metrics measured before first exposure don't count toward the experiment.
The unit of randomization
- User — most common.
assignment = hash(user_id + experiment_salt) mod 100 < treatment_pct. Stable, clean. - Device — needed if a user might bounce between logged-in and logged-out states; otherwise same as user.
- Session — for experiments where the experience shouldn't "stick" (e.g., novelty fatigue studies).
- Ad request — for pure ranker experiments that don't affect user experience durably.
A senior should be able to name each and a use case.
Exposure vs treatment windows
Exposure window = the time during which a user entering the experiment counts. Typically the full experiment duration.
Treatment window = the time during which metrics are attributed to the treatment. Usually wider than the exposure window, because slow metrics (D7 retention, 28-day revenue) need time to materialize.
Practically: an experiment runs for 14 days (exposure window), but you read out metrics at D14 and D28 (treatment windows of 14 and 28 days).
Variance reduction
CUPED (Controlled pre-Experiment Using Pre-data) — for each metric, include a covariate that's the same metric measured before the user entered the experiment. This removes cross-user variance, typically halving required sample size.
-- Pseudo: the regression-adjusted estimate
SELECT
experiment_id, variant_id,
AVG(metric_during_exp - theta * metric_before_exp) AS adjusted_metric
FROM (
SELECT e.experiment_id, e.variant_id, e.user_id,
m_during.metric_value AS metric_during_exp,
m_before.metric_value AS metric_before_exp
FROM fact_experiment_exposure e
JOIN metrics_user_daily m_during
ON e.user_id = m_during.user_id
AND m_during.dt BETWEEN e.exp_start AND e.exp_end
JOIN metrics_user_preperiod m_before
ON e.user_id = m_before.user_id
) j
GROUP BY 1, 2;
theta is computed as the regression coefficient of metric_during on metric_before over the full sample.
Stratified assignment — ensure balance on known-important dimensions (country, OS, user tenure) instead of relying on randomness to balance them.
Multiple comparisons
Running 1000 experiments at once, each with 20 metrics = 20,000 significance tests. If you used p < 0.05 naively, you'd expect 1000 false positives. Mitigations:
- Pre-register primary + guardrail metrics. Primary metrics use p < 0.05. Everything else gets a stricter threshold.
- False Discovery Rate (FDR) control. Benjamini-Hochberg for the full metric panel.
- Sequential testing. Don't read out until the pre-registered sample size is reached; otherwise apply always-valid inference (mSPRT).
Metrics registry
# experiment-metrics-registry.yml
metrics:
dau:
kind: north_star
target_mde: 0.5%
required_duration_days: 14
revenue_per_user:
kind: north_star
target_mde: 1.0%
required_duration_days: 14
variance_reduction: cuped(pre_period_days=14)
crash_rate:
kind: guardrail
fail_on_regression: 2.0%
severity: high
ad_load_p95:
kind: guardrail
fail_on_regression: 5.0%
severity: medium
click_through_rate:
kind: diagnostic
# no auto-block; reported but not blocking
North-star = promotion criterion. Guardrail = auto-rollback if regressed. Diagnostic = informational.
The DE job: this registry is the source of truth. The metrics pipeline consumes it, computes every registered metric × every running experiment × every day, and materializes to a readout table. The experimentation UI reads from that table. Changes to the registry flow everywhere.
Gotchas
- Exposure-pollution. A user sees treatment in experiment A and treatment in experiment B at the same time, and B's treatment changes whether A's treatment actually loaded. Use experiment-conflict registration.
- Learning effects across sessions. User sees treatment today, it changes their behavior forever, but your read-out window is 14 days. Your measured lift is an underestimate of the long-term lift. Mention this if asked about long-term impact.
- Novelty effect. Users react to newness, which decays. Always run at least 14 days and drop the first 3.
- Peeking. Looking at results daily and stopping when significance is reached inflates false-positive rate. Either commit to fixed duration or use sequential tests.
Scenario 17: Data Lineage and Impact Analysis
The prompt
"I'm about to change the dtype of
fact_impression.spend_centsfrom int to bigint. Who breaks? How do you know?"
Lineage, explicitly
Data lineage is a DAG where:
- Nodes = tables, columns, dashboards, metric definitions, models.
- Edges = "A uses B." Directed from consumer → producer.
Column-level lineage is what the senior interview question requires. Table-level ("Does anything read from this table?") is too coarse — you can't answer "does anything read spend_cents specifically."
How you build it
The input is the SQL that defines every pipeline in your orchestrator. Parse each query into an AST, walk the AST, emit (input_table, input_column) → (output_table, output_column) edges for every SELECT. Aggregate across all queries.
Tools that do this at the warehouse level: OpenLineage, Spline (Spark), dbt's manifest, warehouse-specific offerings (BigQuery lineage, Snowflake ACCESS_HISTORY). In-house implementations use ANTLR or a dialect-specific SQL parser.
The catch: column-level lineage across complex expressions (CASE, UDFs, string-aggregation) is lossy. Treat it as "probably used" rather than "exactly used."
Schema for the lineage DAG
CREATE TABLE lineage_edges (
src_table STRING,
src_column STRING, -- NULL for table-level only
dst_table STRING,
dst_column STRING, -- NULL for table-level only
pipeline_id STRING,
last_seen_dt DATE,
edge_kind STRING -- 'select', 'aggregate', 'derived', 'renamed'
);
CREATE INDEX idx_lineage_src ON lineage_edges (src_table, src_column);
A reverse BFS from (fact_impression, spend_cents) gives you the transitive closure of "everything downstream." That's your impact set.
Impact analysis
Given the question "I'm about to change fact_impression.spend_cents":
- Reverse-BFS from
(fact_impression, spend_cents)throughlineage_edges. - Filter the result to recently-seen pipelines (
last_seen_dt >= CURRENT_DATE - 30). Stale edges produce false positives. - Group the result by owning team (pulled from table metadata).
- Send each team a Slack/email summary: "Here's the breaking change, here's what depends on it, here's the migration window."
In a large warehouse, that's easily 200–500 downstream objects for a core fact table column. Without lineage, you're doing this by asking around.
Integration with orchestrator
When the orchestrator deploys a new pipeline, it should register the SQL with the lineage tool before or at deploy time. That gives you:
- Real-time (within an hour) lineage updates.
- Blockable changes: "this PR would break 14 downstream dashboards, are you sure?"
- SLA tracking: "Table X depends on A, B, C; if any of them miss SLA, X will too."
What a senior should say at the end
"Lineage is only as good as its freshness. The real engineering work isn't building the DAG — it's keeping the SQL parser current with every dialect quirk your team uses, and auto-parsing every new pipeline within an hour of its deploy. Otherwise you ship a change that breaks things the lineage said were safe."
Scenario 18: Cost Accounting at Warehouse Scale
The prompt
"Warehouse spend has grown 40% YoY and nobody knows why. Design a cost accounting system."
Cost attribution
Every query that runs in the warehouse should be tagged with:
- User who ran it (service account or human).
- Team that owns the user.
- Pipeline it's part of, if any.
- Cost in dollars (or slot-seconds / credits / DBU — pick a currency).
Sources:
- Warehouse query history (Snowflake
QUERY_HISTORY, BigQueryINFORMATION_SCHEMA.JOBS, RedshiftSTL_QUERY). - Orchestrator run metadata (which DAG, which task, which batch).
- Team mapping table maintained by the platform team.
Build a fact_query_cost_daily with all four dimensions.
Chargeback model
Three common models.
- Informational. Show each team their costs. Embarrass the outliers. Low friction, low impact.
- Soft budgets. Each team has a quarterly budget. Overruns trigger a conversation with their manager. Moderate friction.
- Hard chargeback. The team's cost literally debits their org's P&L. Highest friction, highest compliance.
Most data platforms run soft budgets with informational dashboards. Hard chargeback requires an internal finance system to support it and tends to breed gaming (teams hiding costs in infra).
Expensive-query warnings
Real-time during query submission:
def validate_query_cost(query, user):
est_cost = warehouse_api.dry_run_estimate(query)
team = lookup_team(user)
remaining_budget = budget_service.get(team, quarter=current_quarter)
if est_cost > 100.0: # >$100 single query
notify_slack(user, f"This query will cost ${est_cost}. Confirm?")
require_approval = True
if est_cost > 0.05 * remaining_budget: # >5% of remaining budget
block(user, f"This query would use 5%+ of {team}'s remaining quarter budget.")
if query_touches_full_scan(query, tables_larger_than_1TB):
warn(user, "This query does a full scan on a >1TB table. Add a partition filter.")
Killing a runaway query
The mid answer: ALTER SYSTEM KILL QUERY <id>.
The senior answer: "Depends who's running it. If it's a scheduled pipeline and killing it breaks an SLA, I pause the downstream alert, kill the query, investigate, and restart. If it's an analyst's interactive query, I DM them first: 'I'm about to kill this, it's $400 and climbing, do you actually need it.' If it's on a cluster shared with a critical job, I kill without asking but follow up personally within 15 minutes. The rule is never kill silently and never kill something you don't understand."
That answer signals judgment, not just knowledge.
Cost-reduction playbook
Useful to have on hand:
- Partition pruning — the single biggest win. Scan
WHERE dt = todaynotWHERE DATE_TRUNC(ts) = today. - Column pruning —
SELECT *on a 200-column fact table costs 40x aSELECT 5 columns. - Materialize the expensive middle. If 20 dashboards all do the same 5-table join, build a materialized table.
- Right-size clusters. Large warehouses running small queries is waste.
- Archive cold data to cheap storage. 90 days in the warehouse, 3 years in object storage, >3 years in Glacier-class archive.
- Deduplicate metrics. Every platform has 6 definitions of "active user" and pays to compute all six. Pick one, deprecate the others.
Scenario 19: The Behavioral Loop — STAR+I Framework
The prompt
"Tell me about a time you disagreed with your manager."
The mid-level candidate tells a story. The senior candidate tells a structured story that leaves no room for ambiguity about scope, judgment, or outcome. The structure is STAR+I.
STAR+I, defined
- S — Situation. Where were you, what's the context? 20 seconds max. Who are the other players, what was at stake?
- T — Task. What specifically was your job here? Not "the team's job." Your job.
- A — Action. What did you do, concretely? This is usually the longest section. Real specifics: the meeting you called, the document you wrote, the trade-off you named. Senior candidates narrate their reasoning, not just their actions.
- R — Result. What measurable outcome resulted? Numbers if possible. "Reduced pipeline runtime from 6 hours to 90 minutes. Saved $120K/year in warehouse cost. Unblocked a product launch that shipped 2 weeks ahead of schedule."
- +I — Impact / Insight. This is what separates senior. After you describe what happened, tell the interviewer what you learned and how it changed how you work. "The thing I took away was that I should have raised this concern in writing instead of in a 1:1, because the decision-maker wasn't my manager — he just didn't realize that at the time."
Adding "+I" is what signals senior. It says: I reflect, I grow, I can teach this to a more junior person.
How to prep stories
You need 6 stories that cover the space:
| # | Axis | Example situation |
|---|---|---|
| 1 | Cross-team conflict | Two teams fighting over a shared table's schema |
| 2 | Technical disagreement with manager | Manager wants a quick fix, you advocate for the proper solution |
| 3 | Production incident leadership | 3am page, rallying the response, the postmortem |
| 4 | Ambiguous scope | Exec asks "tell me about our ads health," 6-week project with no spec |
| 5 | Bad news to exec | Metric you own is broken; you have to tell the VP |
| 6 | Mentorship | Grew a junior DE's ability to lead their own project |
Write each one as a 90-second answer, practice out loud, record yourself. The interviewer will steer them into whatever they need.
Rules of thumb
- Every story needs a number. "Saved X hours," "shipped 2 weeks faster," "unblocked 5 teams," "reduced error rate from 4% to 0.3%."
- Every story needs a moment of judgment. A point where you chose between two plausible options and had to decide.
- Every story names the other people. "I worked with" is fine. "I drove the process" is stronger. Don't steal credit; don't hide behind the team.
- No hypotheticals. Real, first-person, stuff that actually happened. Interviewers smell hypotheticals from across the room.
- No bad-mouthing. Never describe a former manager, teammate, or company as "incompetent" or "toxic." You can describe actions and outcomes without judgment.
Scenario 20: Behavioral — Disagreeing With Your Manager
The prompt
"Tell me about a time you disagreed with your manager on something important."
The trap
Two failure modes:
- The martyr. "I fought to the end against my manager's bad decision because I knew better." Signals poor judgment and inability to commit.
- The doormat. "I voiced my concern and then did what they asked." Signals weakness; no real disagreement here.
The senior answer sits in between: you disagreed, you made the case, you committed to the decided path, you measured the outcome, you revisited if data supported a change.
Shape of a strong answer
Situation: two years ago, our team was rolling out a new metrics pipeline. My manager wanted to ship a version that used hourly batches with an SLA of 4 hours. I believed our advertiser-facing team needed 15-minute freshness, based on recent calls with their PM, and that 4-hour SLA would be walked back within a quarter.
Task: my job was to lead the architecture. My manager owned the shipping decision.
Action: I wrote a 2-page doc making the case for 15-minute latency, with cost modeling (about 1.5× the hourly version), the expected advertiser-satisfaction lift, and a migration plan that started with hourly and upgraded incrementally. I walked my manager through it. He pushed back — he was worried about the cost blowback from leadership. We agreed on a compromise: ship hourly first, but design the streaming layer so the 15-minute version was a ~2-week migration, not a 6-month rebuild. I committed to the compromise and led the hourly launch.
Result: advertiser-team NPS on freshness was a named regression in the first quarterly review. My manager asked me to execute the 15-minute migration I'd originally proposed. It shipped 4 weeks later. Advertiser-side complaints stopped within a month.
Impact: the insight I took from this: when I'm more convinced of something than my manager is, a written doc with cost modeling and a reversible migration path beats arguing in 1:1s. I've used that pattern four times since, two of which ended with my original plan accepted immediately.
That's ~240 words, ~90 seconds spoken. Situation specific, task scoped, action showing both the assertiveness and the committing, result with numbers, insight that shows growth.
What the interviewer is scoring
- Did you name what you disagreed with, concretely? (Not "my manager's style.")
- Did you make a case using data, not just opinion?
- Did you commit after the decision was made?
- Did you revisit when data supported it?
- Did you keep the relationship?
Senior = all five. Mid = three. Anything under three = rehearse harder.
Scenario 21: Behavioral — Managing Shared-Table Conflicts
The prompt
"Two teams depend on a shared table you own. One wants to change the schema in a way that breaks the other. Walk me through how you'd handle it."
This prompt tests your ability to run a small diplomacy + engineering problem at the same time.
Shape of a strong answer
Situation: last year, my team owned
fact_campaign_daily, a core table used by both our advertiser-facing analytics team and the ML feature platform. The analytics team proposed adding a new attribution window (click_14d), which would require widening the fact from 8 conversion columns to 10. The feature platform's model trainer read all columns viaSELECT *and would fail on the schema change.Task: I owned the table. My job was to make the change happen without breaking the model pipeline, on both teams' timelines.
Action: I started with a written deprecation policy we didn't have before: all schema changes to shared tables published 2 weeks in advance, column additions allowed immediately, breaking changes require a 30-day migration window with a shadow table. I walked both teams through it in a 30-minute meeting. Because the advertiser team's request was an addition, not a breaking change, it actually didn't need the 30-day window — we just had to fix the feature platform's
SELECT *dependency. I paired with the feature platform lead for an afternoon to switch their trainer to explicit columns. We landed both changes the following week.Result: the deprecation policy became the team standard. Over the next six months, eight schema changes shipped using it, zero cross-team production incidents caused by schema drift.
Impact: what I learned was that schema conflicts are usually process conflicts in disguise. Once the process was written down, the decision about "who gets to change what" stopped requiring a judgment call every time. I apply the same pattern now whenever I see recurring cross-team friction over the same surface — write the rule, then apply it.
Components an interviewer is listening for
- You recognized the right problem (process, not schema).
- You wrote something down (doc, policy, agreement).
- You paired with the other team (not just sent an email).
- You measured the outcome (6 months, 8 changes, 0 incidents).
- You generalized the lesson.
Scenario 22: Behavioral — Cross-Team Incident Response
The prompt
"You get paged at 3am. The alert is on a dashboard you own, but after 20 minutes of debugging you realize the root cause is in another team's code. They're not on-call. Walk me through the next hour."
What the interviewer is listening for
Two things: (1) can you lead a response without owning the root cause, and (2) do you understand the political dimension — calling someone at 3am about their code is a fraught act.
Shape of a strong answer
Situation: two years ago, I got paged at about 3:20am on a Saturday. The alert was on the ads-reporting freshness dashboard — hourly spend numbers were stale by 45 minutes and climbing. Ads reporting was mine; the upstream pipeline that fed it belonged to a neighboring team.
Task: keep the customer-facing dashboard accurate on my SLA, diagnose and escalate the root cause, minimize damage to the on-call relationship.
Action: first 10 minutes I ruled out anything in my own pipeline — checked the job status, recent deploys, the warehouse. Clean. At minute 20 I traced the staleness to the neighboring team's streaming job, which was lag-accumulating. I had two choices: page their on-call, or sit on the problem and hope. I paged. Before paging I wrote up everything I'd ruled out, the exact error trace, and a suggested next step in a runbook-style ticket. I added a note: "Sorry for the page; here's the triage so you can go fast." Their on-call acknowledged in ~3 minutes, fixed it in 20. I stayed on the call so they weren't debugging alone and so my dashboard could be marked recovered the moment their job caught up.
Afterward I wrote the postmortem. The root cause was a memory leak triggered by a config change 9 hours earlier. I sent the postmortem draft to both teams' managers before broadcasting it, to avoid blind-siding anyone. The postmortem identified two structural fixes: the memory-leak guard on their side, and a shared alert on the streaming lag that would have fired earlier. Both shipped in the following sprint.
Result: my dashboard was stale for 70 minutes. The postmortem got praised for being blameless and generative — both teams named it as a good template for future incidents. The on-call relationship was fine; I sent a thank-you lunch afterward.
Impact: the takeaway for me was that the pre-page triage doc is the single most important thing you can do at 3am. It halves the responder's mean-time-to-fix and triples their goodwill. I've been religious about it since.
Blameless postmortem components
- Timeline with timestamps. What happened, in sequence.
- Root cause, not proximate cause. The memory leak is a root cause; the OOM is a proximate cause.
- Impact. Duration, users/campaigns affected, revenue (if applicable), customer-facing consequences.
- What went well. Real. Someone did something good. Name them.
- What we could do better. Structural, not personal. "The alert fired 45 minutes late" — not "Bob didn't respond."
- Action items. Owned, dated, tracked.
Appendix: 6-Story Behavioral Matrix Template
Prep this before any senior loop. Fill in one story per row. Practice each out loud for 90 seconds. Pre-assign which stories you'll use for which prompts.
| Story # | Theme | Situation (1 line) | What's at stake | The judgment moment | Numeric result | +I takeaway |
|---|---|---|---|---|---|---|
| 1 | Cross-team conflict | |||||
| 2 | Technical disagreement with manager | |||||
| 3 | Production incident | |||||
| 4 | Ambiguous scope | |||||
| 5 | Bad news to exec | |||||
| 6 | Mentorship / growing someone else |
Print it. Fill it in by hand. You'll remember the stories better that way.
Closing note
The scenarios in this part are the ones that, in my experience, most clearly sort senior candidates from mid-level. Part 3 returns to pure SQL — the DAU pipeline, frequency capping, ROAS, holdout-based incrementality, delivery health. Read it next.
Deep Dive SQL
Part 3: Deep Dive SQL
"Production SQL is the only sentence where every word is a decision. Senior candidates can defend every word."
How to read this part
Every query below is something you should be able to write cold on a whiteboard in an interview. After each one there's a commentary section: why this and not that. Skim the query, then read the commentary, then close the page and try to reproduce the query from scratch. That's the drill.
All SQL is written in a neutral dialect (Postgres / Snowflake / Trino / BigQuery — you'll need to tweak syntax, but the shape is the same).
Section 1: The DAU Growth Accounting Pipeline (5 steps)
This is the single most-asked SQL scenario at senior consumer ads interviews. Learn it cold.
Step 1: Incremental user activity history
-- Source of truth: one row per (user, day) that user was active
CREATE TABLE user_activity_daily (
dt DATE,
user_id BIGINT,
PRIMARY KEY (dt, user_id)
) PARTITIONED BY (dt);
-- Incremental insert: just process today
INSERT OVERWRITE TABLE user_activity_daily PARTITION (dt = CURRENT_DATE)
SELECT DISTINCT user_id, CAST(event_ts AS DATE) AS dt
FROM bronze.events
WHERE dt = CURRENT_DATE
AND event_name IN ('impression', 'click', 'app_open', 'page_view')
AND user_id IS NOT NULL;
Why this and not that. DISTINCT on (user_id, dt) avoids the fanout of multiple events per user. Filter to specific "qualifying" events — don't count side-effect events like "ad_blocked" as activity. The INSERT OVERWRITE is idempotent — re-running today produces identical output, which matters for backfills. Partition by dt so queries over a date range scan only the partitions they need.
Step 2: Daily user state classification
CREATE TABLE daily_user_state (
dt DATE,
user_id BIGINT,
state STRING, -- 'NEW', 'RETAINED', 'RESURRECTED', 'DORMANT'
is_active_today BOOLEAN,
prev_active_dt DATE,
days_since_last_active INT,
lifetime_active_days INT,
PRIMARY KEY (dt, user_id)
) PARTITIONED BY (dt);
INSERT OVERWRITE TABLE daily_user_state PARTITION (dt = CURRENT_DATE)
WITH active AS (
SELECT user_id FROM user_activity_daily WHERE dt = CURRENT_DATE
),
prev_state AS (
SELECT user_id,
MAX(prev_active_dt) AS last_active_dt,
MAX(lifetime_active_days) AS life_days_before_today
FROM daily_user_state
WHERE dt = CURRENT_DATE - INTERVAL '1' DAY
GROUP BY user_id
),
active_joined AS (
SELECT a.user_id,
p.last_active_dt,
p.life_days_before_today
FROM active a
LEFT JOIN prev_state p USING (user_id)
)
SELECT
CURRENT_DATE AS dt,
user_id,
CASE
WHEN last_active_dt IS NULL THEN 'NEW'
WHEN last_active_dt = CURRENT_DATE - INTERVAL '1' DAY THEN 'RETAINED'
ELSE 'RESURRECTED'
END AS state,
TRUE AS is_active_today,
last_active_dt AS prev_active_dt,
DATEDIFF(CURRENT_DATE, last_active_dt) AS days_since_last_active,
COALESCE(life_days_before_today, 0) + 1 AS lifetime_active_days
FROM active_joined;
Why this and not that. Using LAG() works for a full rebuild but is expensive at billion-user scale. Incremental classification uses yesterday's state as input — you only process today's active users. The trade-off is you need a correct seed row for each user's first day (NEW state). The COALESCE(life_days_before_today, 0) + 1 handles both first-seen users and returning ones.
Step 3: Daily state summary (the DAU decomposition)
SELECT
dt,
COUNT(*) FILTER (WHERE state = 'NEW') AS new_users,
COUNT(*) FILTER (WHERE state = 'RETAINED') AS retained_users,
COUNT(*) FILTER (WHERE state = 'RESURRECTED') AS resurrected_users,
COUNT(*) AS dau
FROM daily_user_state
WHERE dt BETWEEN CURRENT_DATE - 30 AND CURRENT_DATE
GROUP BY dt
ORDER BY dt;
Commentary. FILTER clauses keep the query flat. Alternative: CASE WHEN state = 'NEW' THEN 1 END — identical semantically, slightly more verbose. The trailing 30 days of this table is what you put on the DAU health dashboard.
Step 4: Cohort retention (the triangular matrix)
WITH first_seen AS (
SELECT user_id, MIN(dt) AS cohort_dt
FROM user_activity_daily
WHERE dt >= CURRENT_DATE - 180
GROUP BY user_id
),
activity_with_cohort AS (
SELECT a.user_id,
f.cohort_dt,
a.dt,
DATEDIFF(a.dt, f.cohort_dt) AS days_since_cohort
FROM user_activity_daily a
JOIN first_seen f USING (user_id)
WHERE a.dt BETWEEN f.cohort_dt AND f.cohort_dt + INTERVAL '90' DAY
)
SELECT
cohort_dt,
days_since_cohort,
COUNT(DISTINCT user_id) AS users_active,
COUNT(DISTINCT user_id) * 1.0
/ FIRST_VALUE(COUNT(DISTINCT user_id)) OVER (
PARTITION BY cohort_dt
ORDER BY days_since_cohort
ROWS UNBOUNDED PRECEDING
) AS retention_rate
FROM activity_with_cohort
GROUP BY cohort_dt, days_since_cohort;
Why this and not that. Cohort + days-since-cohort is a triangular table — each cohort has up to 90 rows, one per day since signup. Using FIRST_VALUE() over the window to get "day 0 size" is cleaner than a self-join. Stop at 90 days of retention — past that the signal-to-noise drops and storage explodes.
Step 5: WAU / MAU from the same base
SELECT
dt,
COUNT(DISTINCT user_id) FILTER (
WHERE activity_dt BETWEEN dt - INTERVAL '6' DAY AND dt
) AS wau,
COUNT(DISTINCT user_id) FILTER (
WHERE activity_dt BETWEEN dt - INTERVAL '29' DAY AND dt
) AS mau
FROM (
SELECT dt, user_id, dt AS activity_dt
FROM user_activity_daily
WHERE dt BETWEEN CURRENT_DATE - 60 AND CURRENT_DATE
) t
GROUP BY dt;
Commentary. This is the naïve approach and it's O(N²) in user count because each day scans the preceding window. At 1B users + 60 days this won't run. The production approach: maintain user_l28(dt, user_id, active_days_last_28) incrementally and do SUM(CASE WHEN active_days_last_28 > 0 THEN 1 END) per day. Mention this trade-off when asked.
Section 2: DAU Drop Debugging
"DAU dropped 3% yesterday. Walk me through the SQL to find out why."
-- Step 1: confirm the drop and localize by day
SELECT dt, new_users, retained_users, resurrected_users, dau,
dau - LAG(dau) OVER (ORDER BY dt) AS dau_delta
FROM daily_state_summary
WHERE dt BETWEEN CURRENT_DATE - 14 AND CURRENT_DATE
ORDER BY dt;
-- Step 2: decompose by surface (is it one surface or all?)
SELECT
dt,
surface_id,
COUNT(DISTINCT user_id) AS surface_dau,
COUNT(DISTINCT user_id) -
LAG(COUNT(DISTINCT user_id), 1) OVER (PARTITION BY surface_id ORDER BY dt) AS delta
FROM bronze.events
WHERE dt BETWEEN CURRENT_DATE - 3 AND CURRENT_DATE
AND event_name = 'app_open'
GROUP BY 1, 2
ORDER BY 1, 2;
-- Step 3: decompose by OS
SELECT dt, device_os,
COUNT(DISTINCT user_id) AS os_dau
FROM bronze.events
WHERE dt BETWEEN CURRENT_DATE - 3 AND CURRENT_DATE
GROUP BY 1, 2
ORDER BY 1, 2;
-- Step 4: geography
SELECT dt, geo_country,
COUNT(DISTINCT user_id) AS geo_dau,
LAG(COUNT(DISTINCT user_id), 1) OVER (
PARTITION BY geo_country ORDER BY dt
) AS prev_dau
FROM bronze.events
WHERE dt BETWEEN CURRENT_DATE - 2 AND CURRENT_DATE
GROUP BY 1, 2
HAVING COUNT(DISTINCT user_id) > 10000
ORDER BY 1, (COUNT(DISTINCT user_id) - LAG(COUNT(DISTINCT user_id), 1) OVER (
PARTITION BY geo_country ORDER BY dt
));
-- Step 5: correlate with recent deploys
SELECT deploy_ts, service, commit_sha, deployer
FROM deploy_log
WHERE deploy_ts BETWEEN (CURRENT_DATE - 2)::TIMESTAMP AND CURRENT_TIMESTAMP
AND service IN ('ads_ranker', 'feed_renderer', 'app_shell')
ORDER BY deploy_ts;
Commentary. The sequence matters. Each query narrows the slice. Naming the progression out loud ("first confirm, then decompose by the three coarsest dimensions, then correlate with deploys") is how a senior signals process rather than flailing.
Section 3: Frequency Capping Analytics
"Write a query that, for each campaign, computes the distribution of frequency (impressions per reached user) for the past 7 days, and flag campaigns where >20% of their reach is at frequency > 10."
WITH user_freq AS (
SELECT
campaign_id,
user_id,
COUNT(*) AS impressions_7d
FROM fact_impression
WHERE dt BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
AND is_valid = TRUE
GROUP BY campaign_id, user_id
),
freq_buckets AS (
SELECT
campaign_id,
CASE
WHEN impressions_7d = 1 THEN '01'
WHEN impressions_7d BETWEEN 2 AND 3 THEN '02_03'
WHEN impressions_7d BETWEEN 4 AND 6 THEN '04_06'
WHEN impressions_7d BETWEEN 7 AND 10 THEN '07_10'
ELSE '11_plus'
END AS freq_bucket,
COUNT(*) AS users_in_bucket,
SUM(impressions_7d) AS impressions_in_bucket
FROM user_freq
GROUP BY 1, 2
),
campaign_totals AS (
SELECT
campaign_id,
SUM(users_in_bucket) AS reach,
SUM(CASE WHEN freq_bucket = '11_plus' THEN users_in_bucket END) AS high_freq_reach
FROM freq_buckets
GROUP BY campaign_id
)
SELECT
c.campaign_id,
c.reach,
c.high_freq_reach,
c.high_freq_reach * 1.0 / NULLIF(c.reach, 0) AS high_freq_share,
CASE WHEN c.high_freq_reach * 1.0 / NULLIF(c.reach, 0) > 0.20 THEN 'OVER_CAPPING' END AS flag
FROM campaign_totals c
WHERE c.reach > 10000
ORDER BY high_freq_share DESC;
Commentary. Buckets are inclusive on both ends — BETWEEN is inclusive in SQL. The NULLIF guards against division-by-zero when a campaign's reach is weirdly zero. The volume gate (reach > 10000) excludes noise from tiny campaigns. Ordering by high_freq_share DESC surfaces the worst offenders first.
Section 4: Advertiser ROAS Pipeline
"Build me a daily ROAS pipeline. Conversions can arrive up to 7 days after the impression. Your numbers must match what advertisers see in the Campaign Manager UI."
-- Step 1: attributed conversions (last-click, 7-day)
CREATE OR REPLACE TABLE fact_attributed_conversion AS
WITH c AS (
SELECT event_id AS conversion_id, user_id, event_ts, properties['value']::DECIMAL(18,4) AS conv_value
FROM bronze.events
WHERE event_name = 'conversion'
AND event_ts >= CURRENT_TIMESTAMP - INTERVAL '8' DAY
),
k AS (
SELECT event_id AS click_id, user_id, event_ts, campaign_id, ad_id
FROM bronze.events
WHERE event_name = 'click'
AND event_ts >= CURRENT_TIMESTAMP - INTERVAL '15' DAY
),
paired AS (
SELECT
c.conversion_id,
c.event_ts AS conversion_ts,
c.conv_value,
k.click_id, k.campaign_id, k.ad_id,
ROW_NUMBER() OVER (
PARTITION BY c.conversion_id
ORDER BY k.event_ts DESC
) AS rn
FROM c
LEFT JOIN k
ON c.user_id = k.user_id
AND k.event_ts BETWEEN c.event_ts - INTERVAL '7' DAY AND c.event_ts
)
SELECT conversion_id, conversion_ts, conv_value, click_id, campaign_id, ad_id
FROM paired
WHERE rn = 1;
-- Step 2: daily ROAS, with D+7 truth lock
SELECT
DATE(conversion_ts) AS attribution_dt,
campaign_id,
COUNT(*) AS conversions,
SUM(conv_value) AS revenue_usd,
COALESCE(SUM(conv_value), 0) /
NULLIF(spend.spend_usd, 0) AS roas
FROM fact_attributed_conversion
LEFT JOIN (
SELECT dt, campaign_id, SUM(spend_cents) / 100.0 AS spend_usd
FROM fact_impression
WHERE dt >= CURRENT_DATE - 14
GROUP BY 1, 2
) spend
ON spend.dt = DATE(conversion_ts)
AND spend.campaign_id = fact_attributed_conversion.campaign_id
WHERE DATE(conversion_ts) >= CURRENT_DATE - 14
GROUP BY attribution_dt, campaign_id, spend.spend_usd
ORDER BY attribution_dt DESC, revenue_usd DESC;
Commentary. The 7-day click-to-conversion window is industry standard. The query needs to look 15 days back on clicks to cover conversions received up to 7 days after the latest click. ROW_NUMBER() picks the last click; rn = 1 with LEFT JOIN preserves unattributed conversions (report them separately). The ROAS join is on the conversion date, not the click date — this matches what advertisers see.
Section 5: Audience Overlap Analysis
"Two audiences can overlap. For each pair of audiences owned by the same advertiser, compute Jaccard similarity. Efficient at 100M+ users per audience."
-- Naïve: exact counts
WITH pairs AS (
SELECT
a.advertiser_id,
a.audience_id AS aud_a,
b.audience_id AS aud_b
FROM audience_membership a
JOIN audience_membership b
ON a.advertiser_id = b.advertiser_id
AND a.audience_id < b.audience_id
WHERE a.user_id = b.user_id
),
intersects AS (
SELECT advertiser_id, aud_a, aud_b, COUNT(*) AS intersection_size
FROM pairs
GROUP BY 1, 2, 3
),
sizes AS (
SELECT audience_id, COUNT(DISTINCT user_id) AS aud_size
FROM audience_membership
GROUP BY audience_id
)
SELECT
i.advertiser_id, i.aud_a, i.aud_b,
i.intersection_size,
sa.aud_size + sb.aud_size - i.intersection_size AS union_size,
i.intersection_size * 1.0 /
(sa.aud_size + sb.aud_size - i.intersection_size) AS jaccard
FROM intersects i
JOIN sizes sa ON i.aud_a = sa.audience_id
JOIN sizes sb ON i.aud_b = sb.audience_id
WHERE sa.aud_size + sb.aud_size - i.intersection_size > 0
ORDER BY advertiser_id, jaccard DESC;
Commentary on scale. The naïve version is O(N²) in users per advertiser. It works for small advertisers; for big ones you switch to HyperLogLog / MinHash sketches:
-- Sketch-based: each audience is a HLL sketch, intersect via inclusion-exclusion
SELECT
aud_a, aud_b,
HLL_CARDINALITY(HLL_UNION(s.sketch, t.sketch)) AS union_size_est,
HLL_CARDINALITY(s.sketch) + HLL_CARDINALITY(t.sketch)
- HLL_CARDINALITY(HLL_UNION(s.sketch, t.sketch)) AS intersection_size_est
FROM audience_hll_sketches s
JOIN audience_hll_sketches t ON s.advertiser_id = t.advertiser_id
AND s.audience_id < t.audience_id;
Error is ~1% at default precision. Good enough for audience planning.
Section 6: Incrementality (Holdout-Based Lift)
"You ran a holdout: 1% of the target audience was excluded from seeing your ads. Compute the incremental lift in conversions."
WITH exposed AS (
SELECT user_id, SUM(conv_value) AS revenue
FROM fact_attributed_conversion
WHERE user_id IN (SELECT user_id FROM holdout_assignment WHERE arm = 'treatment')
AND DATE(conversion_ts) BETWEEN :experiment_start AND :experiment_end
GROUP BY user_id
),
control AS (
SELECT user_id, SUM(conv_value) AS revenue
FROM bronze.events e -- organic conversions too; no ad click needed
WHERE event_name = 'conversion'
AND user_id IN (SELECT user_id FROM holdout_assignment WHERE arm = 'holdout')
AND DATE(event_ts) BETWEEN :experiment_start AND :experiment_end
GROUP BY user_id
),
treatment_revenue AS (
SELECT COUNT(*) AS n, SUM(revenue) AS total_rev, AVG(revenue) AS avg_rev
FROM exposed
),
control_revenue AS (
SELECT COUNT(*) AS n, SUM(revenue) AS total_rev, AVG(revenue) AS avg_rev
FROM control
)
SELECT
t.n AS treatment_users,
c.n AS control_users,
t.avg_rev AS treatment_avg_revenue,
c.avg_rev AS control_avg_revenue,
(t.avg_rev - c.avg_rev) / NULLIF(c.avg_rev, 0) AS incremental_lift,
t.n * (t.avg_rev - c.avg_rev) AS incremental_revenue
FROM treatment_revenue t CROSS JOIN control_revenue c;
Commentary. The holdout is the cleanest incrementality measure, but expensive — you're paying with foregone revenue. Key pitfall: the control group's conversions include organic conversions (no ad path). You're measuring ad-driven incremental revenue, not "conversions that saw an ad." Also: contamination — if a holdout user sees your ad on another surface (Publisher Network, an email retarget), they're contaminated and must be excluded.
Section 7: Ad Delivery Health Monitoring
"For each campaign, alert if today's delivery is <70% of yesterday's for 2+ consecutive hours."
WITH hourly AS (
SELECT
DATE_TRUNC('hour', impression_ts) AS hr,
campaign_id,
COUNT(*) AS impressions
FROM fact_impression
WHERE impression_ts >= CURRENT_TIMESTAMP - INTERVAL '48' HOUR
GROUP BY 1, 2
),
paired AS (
SELECT
today.hr,
today.campaign_id,
today.impressions AS today_imps,
yesterday.impressions AS yday_imps,
today.impressions * 1.0 / NULLIF(yesterday.impressions, 0) AS ratio
FROM hourly today
LEFT JOIN hourly yesterday
ON yesterday.campaign_id = today.campaign_id
AND yesterday.hr = today.hr - INTERVAL '24' HOUR
WHERE today.hr >= CURRENT_TIMESTAMP - INTERVAL '4' HOUR
),
flagged AS (
SELECT
hr, campaign_id, ratio,
SUM(CASE WHEN ratio < 0.70 THEN 1 ELSE 0 END)
OVER (PARTITION BY campaign_id ORDER BY hr ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS consec_low
FROM paired
)
SELECT campaign_id, hr, ratio
FROM flagged
WHERE consec_low >= 2
ORDER BY hr DESC, campaign_id;
Commentary. The "consecutive hours" condition uses a rolling window sum — cleaner than self-joins. Always include a volume gate (yday_imps > 100) to avoid noise; omitted here for brevity.
Section 8: Campaign Cannibalization Detection
"When advertiser A launches a new campaign, does it steal impressions from their existing campaigns? Detect."
WITH pre AS (
SELECT campaign_id, AVG(daily_imps) AS pre_avg_imps
FROM (
SELECT campaign_id, dt, SUM(impressions) AS daily_imps
FROM metric_campaign_daily
WHERE dt BETWEEN :new_campaign_start - 14 AND :new_campaign_start - 1
GROUP BY campaign_id, dt
) d
GROUP BY campaign_id
),
post AS (
SELECT campaign_id, AVG(daily_imps) AS post_avg_imps
FROM (
SELECT campaign_id, dt, SUM(impressions) AS daily_imps
FROM metric_campaign_daily
WHERE dt BETWEEN :new_campaign_start AND :new_campaign_start + 13
GROUP BY campaign_id, dt
) d
GROUP BY campaign_id
),
new_campaign AS (
SELECT advertiser_id FROM dim.campaign WHERE campaign_id = :new_campaign_id
)
SELECT
pre.campaign_id,
pre.pre_avg_imps,
post.post_avg_imps,
(post.post_avg_imps - pre.pre_avg_imps) / NULLIF(pre.pre_avg_imps, 0) AS pct_change
FROM pre
JOIN post USING (campaign_id)
JOIN dim.campaign dc USING (campaign_id)
JOIN new_campaign nc USING (advertiser_id)
WHERE dc.campaign_id != :new_campaign_id
AND pre.pre_avg_imps > 1000
ORDER BY pct_change ASC;
Commentary. Before/after averages on the same advertiser's other campaigns. A cannibalization signature: existing campaigns drop 20–50% in the 2 weeks after the new one launches. Caveat: this is correlational, not causal — seasonality or audience overlap changes could also explain it. Mention that out loud.
Section 9: Ad Load vs User Experience
"An experiment increases ad load from 1-in-6 feed items to 1-in-4. Write a SQL that measures impact on session length and return rate."
WITH sessions AS (
SELECT
e.session_id,
e.user_id,
MIN(e.event_ts) AS session_start,
MAX(e.event_ts) AS session_end,
COUNT(DISTINCT e.item_id) AS items_viewed,
COUNT(*) FILTER (WHERE e.event_name = 'impression' AND e.is_ad) AS ads_shown,
MAX(exp.variant_id) AS variant_id
FROM bronze.events e
LEFT JOIN fact_experiment_exposure exp
ON e.user_id = exp.user_id
AND exp.experiment_id = :ad_load_experiment_id
WHERE e.dt BETWEEN :exp_start AND :exp_end
GROUP BY e.session_id, e.user_id
),
per_variant AS (
SELECT
variant_id,
COUNT(DISTINCT user_id) AS users,
AVG(EXTRACT(EPOCH FROM (session_end - session_start))) AS avg_session_sec,
AVG(items_viewed) AS avg_items,
AVG(ads_shown) AS avg_ads
FROM sessions
WHERE variant_id IS NOT NULL
GROUP BY variant_id
),
return_rate AS (
SELECT
variant_id,
COUNT(DISTINCT user_id) FILTER (
WHERE user_id IN (
SELECT user_id FROM bronze.events
WHERE event_name = 'app_open'
AND dt BETWEEN :exp_start + 7 AND :exp_end
)
) * 1.0 / COUNT(DISTINCT user_id) AS d7_return_rate
FROM sessions
GROUP BY variant_id
)
SELECT v.variant_id, v.users, v.avg_session_sec, v.avg_items, v.avg_ads, r.d7_return_rate
FROM per_variant v JOIN return_rate r USING (variant_id);
Commentary. The ad-load experiment is the single highest-stakes experiment in any consumer ads business. The DE pipeline for it needs to be airtight. Two pitfalls to call out: (1) session length as a metric is gamed by longer but less engaged scrolling — always pair with items_viewed; (2) D7 return rate measures the lagged effect of a bad experience, which an immediate in-session metric won't catch.
Section 10: Senior Commentary — Five Patterns That Show Up Everywhere
Trailing-window joins (
WHERE t.event_ts BETWEEN c.event_ts - INTERVAL '7' DAY AND c.event_ts). Attribution, fatigue, cohorts. The left-side must be broader than the right-side's window.Row-number for deduplication and "latest-of" (
ROW_NUMBER() OVER (PARTITION BY k ORDER BY ts DESC)). Mandatory for attribution and SCD-2 lookups.Volume gates. Every ratio / rate / percent query needs a floor on the denominator (
WHERE imps > 1000). Without it, noise dominates.Incremental rebuilds.
INSERT OVERWRITE PARTITION (dt = ...)is the correct idempotent pattern. AvoidDELETE WHERE dt = ...+INSERT— it's not atomic.FILTER clauses over
CASE WHEN ... THEN 1 END. Modern SQL supports it. Cleaner, better optimized.
Platform Surfaces & A–Z Glossary
Part 4: Platform Surfaces + A–Z Glossary
"Every surface is a little universe with its own physics. You can't design pipelines for them as if they're the same thing."
Section 1: Short-Form Video Ads — Engagement Pipeline
Short-form vertical video is now the dominant growth surface on every major consumer platform. The ad format is full-screen, auto-play, sound-on-by-default, with in-feed placement between organic videos.
What makes it different
- Time-based engagement matters more than CTR. Did they watch 3 seconds? 15? Complete it?
- Sound-on is the norm. Audio becomes a dimension: muted rate, sound-on rate.
- Swipe mechanics make the unit of engagement "scrolled past" vs "paused on." Explicit negative signal.
- Creators are part of the ecosystem — sponsored content from creators is its own ad format.
Schema
CREATE TABLE fact_shortform_video_engagement (
dt DATE,
impression_id BINARY(16),
user_id BIGINT,
ad_id BIGINT,
creative_id BIGINT,
-- Watch time
view_started BOOLEAN, -- autoplay initiated
view_duration_ms INT, -- actual time spent in view
completions INT, -- 1 if watched to end
loops INT, -- replays
-- Audio
sound_on BOOLEAN,
sound_on_duration_ms INT,
-- Interaction
likes INT,
comments INT,
shares INT,
saves INT,
profile_visits INT,
-- Negative
swiped_away_ms INT, -- if swiped, how long before
marked_not_interested BOOLEAN,
hidden_ad BOOLEAN
)
PARTITIONED BY (dt);
Key metrics
- Thumbstop rate = 3-second views / impressions. "Did we stop the scroll?"
- Hold rate = 15-second views / 3-second views. "Did the story hold them?"
- Completion rate = completions / impressions.
- Sound-on rate = sound_on views / total views. Branding signal.
- Negative signal rate = (not_interested + hidden) / impressions.
Pipeline pattern
Short-form video events are 10–20× higher volume than static ad events because the user scrolls fast. Stream the engagement events, batch the aggregate:
The rendering client emits engagement events at 250ms intervals while the video is on-screen. Dedupe aggressively: one impression_id can produce 60+ engagement events.
Section 2: Stories Ads — Tap-Through Funnels
The format
Stories ads are full-screen vertical cards that auto-advance every 5–15 seconds. The ad is one or two cards inserted between organic stories. User actions are limited: tap to advance, hold to pause, swipe up for CTA, swipe left to exit the whole story sequence.
Funnel
- entry ad shown ad completed swipe_up (CTA)
- skipped (tap next)
- exited (swipe away)
Schema
CREATE TABLE fact_stories_engagement (
dt DATE,
impression_id BINARY(16),
user_id BIGINT,
ad_id BIGINT,
card_index INT, -- 1 or 2 for multi-card ads
-- Outcomes
duration_shown_ms INT,
completed BOOLEAN,
tapped_next BOOLEAN, -- user skipped mid-card
swiped_up BOOLEAN, -- tapped CTA
swiped_away BOOLEAN, -- exited the whole sequence
paused_duration_ms INT
)
PARTITIONED BY (dt);
Metrics that matter
- Completion rate = completed / impressions.
- Exit rate = swiped_away / impressions. High exit rate means "my ad made them leave" — one of the worst UX signals in ads.
- Tap-through rate = swiped_up / impressions.
Pitfalls
- Stories are consumed in rapid bursts. User watches 50 stories in 3 minutes; your ad has maybe 2 seconds of attention.
- The position in the sequence matters enormously. Ads after story #15 perform much worse than after story #3. Log position as a first-class dimension.
Section 3: Carousel Ads — Card-Level Analytics
The format
Carousel ads are 2–10 swipeable cards, each with its own image/video + headline + CTA. Used heavily in e-commerce and education.
What's unique
Engagement is per-card. Advertiser wants to know: "which card got the click, which was skipped, which converted."
Schema
CREATE TABLE fact_carousel_engagement (
dt DATE,
impression_id BINARY(16),
user_id BIGINT,
ad_id BIGINT,
card_id BIGINT, -- PK: unique per card
card_position INT, -- 1..10
viewed BOOLEAN, -- scrolled into view
view_duration_ms INT,
clicked BOOLEAN,
converted BOOLEAN -- attributed later
)
PARTITIONED BY (dt);
Common query: which card drove conversion?
WITH attributed AS (
SELECT
f.conversion_id,
f.campaign_id,
c.card_id,
c.card_position
FROM fact_attributed_conversion f
JOIN fact_carousel_engagement c
ON f.click_id = c.click_id -- join through click context
)
SELECT
campaign_id,
card_position,
COUNT(*) AS conversions,
RANK() OVER (PARTITION BY campaign_id ORDER BY COUNT(*) DESC) AS rank_in_campaign
FROM attributed
GROUP BY 1, 2
ORDER BY campaign_id, rank_in_campaign;
Pitfall: card fatigue
The first card is always the most-viewed. Measuring "card engagement rate" without normalizing for position will always make the first card look best. Compute conditional engagement — given that a user viewed the card, did they engage? — so positions are comparable.
Section 4: AI-Driven Automated Performance Campaigns
The format
The advertiser provides a budget, an objective (conversions, app installs, etc.), a set of creatives, and minimal targeting — often just a country. The platform decides everything else: audience, placement, bid, creative rotation.
This is now the dominant paradigm in direct-response ads. Different platforms call it different things (auto-optimized campaigns, performance-plus, broad-targeted delivery, etc.). The DE challenges are structural.
What's different for data engineering
- Attribution is fuzzy. Without advertiser-selected audiences, you can't report "your audience of runners converted at 3%." There is no chosen audience. You have to report discovered segments instead.
- Diagnostic reporting is different. The advertiser can't A/B test between "audience A vs audience B" — the platform does that internally. You report diagnostics, not levers.
- Platform reporting = platform accountability. If the platform chose everything, the platform has to show why.
Diagnostic schema
CREATE TABLE fact_ai_campaign_diagnostics (
dt DATE,
campaign_id BIGINT,
-- What the platform chose
surface_mix MAP<STRING, DOUBLE>, -- e.g., {"main_feed": 0.4, "short_form_video": 0.5, ...}
audience_mix MAP<STRING, DOUBLE>, -- demographics, interest buckets (without user-level info)
creative_mix MAP<BIGINT, DOUBLE>, -- creative_id → share of spend
placement_mix MAP<STRING, DOUBLE>,
-- Performance by choice
cpa_by_surface MAP<STRING, DOUBLE>,
cpa_by_creative MAP<BIGINT, DOUBLE>,
-- Exploration vs exploitation
exploration_spend_pct DOUBLE
) PARTITIONED BY (dt);
Reporting philosophy
Senior DE insight: "Reports for AI-driven campaigns are advertiser-readable summaries of what the platform explored and why it converged on what it did." Building this well is harder than reporting classic campaigns. You need trust signals — "we spent 12% of your budget exploring, found that Surface X converts 2× better, and shifted 60% of your budget there in week 2."
Section 5: Dynamic Product Ads (Catalog-Driven)
The format
The advertiser uploads a product catalog (CSV, XML feed, API). The platform matches users to products via browse/cart signals and renders a templated creative at auction time. One ad, millions of variants.
Data model
-- Catalog
CREATE TABLE advertiser_catalog (
advertiser_id BIGINT,
product_id STRING, -- advertiser's ID
product_sku STRING,
title STRING,
description STRING,
price_usd DECIMAL(10,2),
image_url STRING,
category STRING,
availability STRING, -- 'in_stock', 'out_of_stock'
updated_ts TIMESTAMP,
PRIMARY KEY (advertiser_id, product_id)
);
-- User-product signals
CREATE TABLE user_product_signals (
user_id BIGINT,
advertiser_id BIGINT,
product_id STRING,
signal_type STRING, -- 'view', 'add_to_cart', 'purchase'
signal_ts TIMESTAMP,
decay_score DOUBLE -- 1.0 freshest → 0.0 stale
) PARTITIONED BY (DATE(signal_ts));
-- Impressions tagged with the specific product shown
CREATE TABLE fact_dpa_impression (
dt DATE,
impression_id BINARY(16),
user_id BIGINT,
advertiser_id BIGINT,
campaign_id BIGINT,
product_id STRING, -- which product variant was rendered
template_id BIGINT,
rendered_price DECIMAL(10,2),
is_retargeting BOOLEAN -- user had prior signal for this product
) PARTITIONED BY (dt);
Key challenges
- Catalog freshness. An out-of-stock product shown to a user wastes the impression. Ingest catalog updates hourly minimum. Fail the pipeline if a catalog is >24 hours stale.
- Signal decay. A "view" from 30 days ago shouldn't drive an impression today. Decay signals with a 7-day half-life.
- Privacy. Catalog-level targeting uses browsing history — subject to all local privacy regimes. Per-user signal must be deletable on request.
Reporting: ROAS by product
SELECT
advertiser_id,
product_id,
SUM(impressions) AS imps,
SUM(spend_usd) AS spend,
SUM(attributed_revenue) AS revenue,
SUM(attributed_revenue) / NULLIF(SUM(spend_usd), 0) AS roas
FROM fact_dpa_impression
JOIN fact_attributed_conversion USING (impression_id)
WHERE dt BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
GROUP BY 1, 2
HAVING SUM(spend_usd) > 100
ORDER BY roas DESC;
Advertisers love per-product ROAS reports. Make this query fast.
Section 6: Cross-Surface Unified Pipeline
The pattern
A user can see the same ad on Main Feed, Visual Feed, Short-form Video, Stories, and Publisher Network — potentially on the same day. Your reporting needs to show:
- Per-surface performance (diagnostic).
- Cross-surface reach (total unique users across all surfaces).
- Frequency capping across surfaces (don't show the same ad 20× by stacking surfaces).
Schema
CREATE TABLE fact_ad_exposure (
dt DATE,
impression_id BINARY(16),
user_id BIGINT,
ad_id BIGINT,
campaign_id BIGINT,
surface_id INT,
surface_name STRING, -- 'main_feed', 'visual_feed', 'short_form_video', ...
placement_id INT,
impression_ts TIMESTAMP
) PARTITIONED BY (dt);
Cross-surface reach
SELECT
campaign_id,
COUNT(DISTINCT user_id) AS cross_surface_reach,
COUNT(DISTINCT CASE WHEN surface_name = 'main_feed' THEN user_id END) AS main_feed_reach,
COUNT(DISTINCT CASE WHEN surface_name = 'visual_feed' THEN user_id END) AS visual_feed_reach,
COUNT(DISTINCT CASE WHEN surface_name = 'short_form_video' THEN user_id END) AS sfv_reach,
-- Surface-exclusive reach (users seen only on this surface)
COUNT(DISTINCT CASE WHEN n_surfaces = 1 AND surface_name = 'main_feed' THEN user_id END) AS main_feed_exclusive
FROM (
SELECT *,
COUNT(DISTINCT surface_name) OVER (PARTITION BY campaign_id, user_id) AS n_surfaces
FROM fact_ad_exposure
WHERE dt BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
) t
GROUP BY campaign_id;
Cross-surface frequency cap
A frequency cap of "10 per week" must be enforced globally, not per surface. In the streaming path: the pacing KV stores (user_id, campaign_id) → count_this_week and the auction gate reads from this key on every candidate selection. Decay the counter at week-boundary.
Section 7: A–Z Glossary
Compressed version. Every term a senior ads DE should know cold.
A
- A/B test — controlled experiment with random assignment. Single-treatment variant.
- Ad exchange — a marketplace where ad buyers and sellers meet via real-time bidding.
- Ad load — fraction of feed items (or minutes of video watched) that are ads. Most dials in product, highest-stakes experiment.
- Ad request — when a client asks the platform for an ad to fill a slot.
- Ad server — the system that selects and delivers the winning ad after the auction.
- Ads platform — the full system serving ads. Used generically throughout this guide.
- Advertiser — the entity running the campaign; pays for impressions/clicks/conversions.
- Aggregated Metrics Platform (AMP) — unified pipeline that materializes every platform metric from a single event source.
- App install campaign — objective where advertiser wants installs, bid on CPI or CPE.
- Attribution — assigning credit for a conversion to an ad event (click or view).
- Attribution window — the time range within which a conversion is credited to a prior ad interaction.
- Audience — a group of users defined for targeting (static list, rule-based, or lookalike).
- Audience overlap — the fraction of users in common between two audiences.
- Auction — the mechanism that chooses which ad wins an ad slot.
- Auto-optimized campaign — the advertiser lets the platform choose audience, placement, creative rotation, bidding. See Section 4 of this part.
B
- Backfill — reprocessing historical data to correct or populate a table.
- Bid — the amount an advertiser is willing to pay for an outcome.
- Bid request — the upstream signal "a slot is available, who wants it."
- Bid shading — bid-management technique that reduces a bid to the second-price expected value in first-price auctions.
- Bidding strategy — how the advertiser controls their bids (manual, target CPA, lowest cost, value-optimization).
- Bloom filter — probabilistic data structure used to represent audience membership efficiently.
- Bronze / Silver / Gold — medallion architecture tiers for data warehouses.
- Brand campaign — objective is reach or awareness; bid on CPM.
- Budget pacing — controlling the rate of spend through the day.
C
- CAPI (Server Conversion API) — advertiser sends conversion events directly server-to-server; bypasses browser.
- Campaign — a group of ads sharing an objective and budget.
- Carousel — multi-card swipeable ad format.
- Catalog — advertiser's product feed used for dynamic product ads.
- CCPA / CPRA — California Consumer Privacy Act; opt-out + deletion rights.
- Clean room — neutral environment where advertiser + platform join hashed IDs under privacy constraints.
- Clearing price — what the auction winner actually pays (often second-price).
- Click-through rate (CTR) — clicks / impressions.
- Conversion — the advertiser's desired outcome (purchase, signup, install).
- Conversion window — same as attribution window.
- Cost-per-thousand (CPM) — cost per thousand impressions.
- Cost-per-click (CPC) — cost per click.
- Cost-per-action (CPA) — cost per defined action (conversion, install, etc.).
- Creative — the actual rendered asset (image, video, copy, card).
- Creative fatigue — declining engagement with repeated exposure.
- Cross-surface — touching multiple product surfaces (Main Feed, Visual Feed, etc.).
- CUPED — Controlled pre-Experiment Using Pre-data; variance-reduction technique.
D
- DAU — Daily Active Users.
- Deferred deep link — link that remembers its destination through the install flow.
- Delivery — the act of serving an ad into a slot.
- Delivery monitoring — real-time alerting on under- or over-delivery.
- Demand-Side Platform (DSP) — buy-side adtech that bids on behalf of advertisers.
- Differential privacy (DP) — formal privacy guarantee; add calibrated noise to aggregates.
- Dim — dimension table. Typically SCD-2.
- DMA / DSA — EU's Digital Markets Act and Digital Services Act.
- Dynamic Product Ads (DPA) — catalog-driven template ads; one ad, many product variants.
E
- Engagement — any user interaction (like, comment, share, save).
- Event — a logged user action or system action.
- Experiment — A/B or multi-variant test.
- Experiment exposure — first moment a user saw the treatment.
- Exactly-once — processing semantics where each event is processed exactly once end-to-end.
F
- Fact — fact table; event-grain table of measurable actions.
- False Discovery Rate (FDR) — multiple-comparison correction method.
- Fill rate — fraction of ad requests that got a winning ad.
- First-price auction — winner pays their bid.
- Flink — example of a stateful stream processor.
- Floor price (reserve price) — minimum acceptable bid for the auction.
- Frequency — impressions per unique user.
- Frequency cap — maximum impressions per user per window.
G
- GDPR — EU's General Data Protection Regulation.
- Generalized Second-Price (GSP) — auction format common in sponsored search.
- Gold — serving-layer / semantic layer of the medallion architecture.
- Growth accounting — NEW/RETAINED/RESURRECTED/DORMANT framework for DAU/MAU decomposition.
H
- Holdout — a fraction of users excluded from ads to measure incremental lift.
- HyperLogLog (HLL) — probabilistic cardinality sketch.
I
- Iceberg — open table format supporting ACID on object storage.
- Identity graph — graph of observed identifiers per user across devices/sessions.
- IDFA / GAID — iOS / Android advertising identifiers.
- IDFV — identifier for vendor (iOS, per-app).
- Impression — an ad was rendered and viewable.
- Incrementality — causal lift attributable to the ad; measured via holdout.
- Invalid traffic (IVT) — bot/fraud impressions filtered from billable.
- iOS ATT — App Tracking Transparency; requires user consent for IDFA access.
J
- Jaccard similarity — size of intersection / size of union. Used for audience overlap.
K
- Kafka — example of a distributed event streaming spine.
- Kappa architecture — streaming-only alternative to Lambda.
- KPI — Key Performance Indicator.
L
- L28 — "Last 28 days" active. Most predictive retention feature on consumer apps.
- Lambda architecture — hybrid streaming + batch pipeline with reconciliation.
- Late-arriving data — events that arrive past the window they should be aggregated in.
- Lead form — ad format that collects lead information inline.
- LGPD — Brazil's General Data Protection Law.
- Lineage — DAG of which data objects feed which.
- Lookalike audience — computed audience similar to a seed set.
M
- Main Feed — primary consumption surface (infinite scrolling organic + ads).
- MAU — Monthly Active Users.
- Materialized view — cached query result that refreshes on a cadence.
- Messaging platform — standalone messaging surface.
- Metric registry — versioned, centrally-defined catalog of metric definitions.
- MMP (Mobile Measurement Partner) — third-party app-install attribution provider.
- Multi-touch attribution (MTA) — credit-splitting across multiple ad touches pre-conversion.
N
- North-star metric — top-level success metric.
- NPS — Net Promoter Score.
O
- Objective — what the campaign optimizes for (reach, traffic, conversions, installs, etc.).
- Optimization goal — the specific event the bidder is trying to maximize.
- Over-delivery — platform served more than the budget allowed.
P
- Pacing — rate of spend through the day.
- Pacing gate — real-time check that prevents further delivery once a threshold is crossed.
- Partition — unit of data in a table (usually by date).
- Pinot / Druid / ClickHouse — examples of real-time OLAP stores.
- Pixel — web tracking tag installed by the advertiser to log conversions.
- Placement — the specific position within a surface (in-feed, in-stream, top-of-stories).
- PIPL — China's Personal Information Protection Law.
- Postback — server-to-server callback, used in SKAdNetwork and MMPs.
- Privacy regime — GDPR, CCPA, LGPD, PIPL, ATT, etc.
- Publisher Network — third-party publisher inventory monetized by the ads platform.
Q
- QPS — Queries Per Second.
R
- Reach — unique users exposed.
- Reach and frequency — standard brand-campaign pair of metrics.
- Real-time bidding (RTB) — auction conducted in ~100ms per ad request.
- Reconciliation — correcting streaming estimates with batch-truth.
- Retention — fraction of users from a cohort still active at day N.
- Retargeting — showing ads to users who previously interacted with the brand.
- ROAS — Return on Ad Spend = revenue / spend.
S
- SCD-2 — Slowly Changing Dimension type 2 (history of value changes).
- Second-price auction — winner pays the second-highest bid.
- Server conversion API — see CAPI.
- Session — bounded period of user activity, usually defined by inactivity gap.
- Shadow mode — new pipeline runs in parallel; output compared, not served.
- Short-form Video — vertical, full-screen, auto-play video surface.
- SKAdNetwork (SKAN) — Apple's privacy-preserving install attribution.
- Slot — a specific ad position in a specific surface.
- Spend-forecast — predicting what a campaign will spend over its lifetime.
- Stateful stream processor — stream processor that maintains per-key state (e.g., Flink).
- Static audience — explicit user list uploaded by advertiser.
- Stories — time-bounded vertical story card format, auto-advancing.
- Streaming spine — event bus (Kafka analog).
- Surface — a product surface (Main Feed, Visual Feed, Short-form Video, Stories, etc.).
T
- Targeting — audience rules for who can see the ad.
- Third-party cookie — browser cookie set by a domain other than the current site; deprecated.
- Thumbstop rate — 3-second views / impressions. Short-form video metric.
- TOS — Terms of Service.
- Trino / Presto — distributed SQL engines.
- Truncation — dropping events or rows to fit storage budgets.
U
- UMP (Unified Metrics Platform) — same as AMP.
- Under-delivery — platform served less than the budget allowed.
- Unit of randomization — the entity on which experiment assignment is done (user, device, session, ad request).
V
- Variance reduction — techniques (CUPED, stratified assignment) that reduce required sample size.
- VCG (Vickrey-Clarke-Groves) — truthful auction format.
- Viewability — fraction of impressions that met the viewability standard (e.g., MRC: 50% of pixels on screen for 1+ second).
- Visual Feed — image-and-video-first consumption surface.
W
- WAU — Weekly Active Users.
- Warehouse — data warehouse (BigQuery, Snowflake, Redshift, etc.).
- Watermark — in stream processing, the timestamp below which no more events are expected.
X
- XML feed — structured catalog feed (used in DPA).
Y
- Yield optimization — the supply-side act of maximizing revenue per ad slot.
Z
- Zero-party data — data the user deliberately shares (preferences, quiz answers).
- Z-score — standardized metric deviation from baseline.
Visualization & Measurement
Part 5: Visualization & Measurement of Campaign Objectives
"A campaign only succeeded if you can prove it. And you can only prove it if the measurement was designed before the campaign launched."
Section 1: The Campaign Scorecard Template
Every campaign, regardless of objective, should roll up to a one-page scorecard. The shape is identical. Only the numerator rows change.
- Z%
- see per-objective blocks below
- Z%
- see per-objective blocks below
- Z%
- see per-objective blocks below
Every scorecard has four blocks: delivery, objective-specific KPIs, benchmarks, health. Every block is non-negotiable. Dropping benchmarks turns a scorecard into a number dump.
Section 2: Reach & Frequency (Awareness Objective)
What matters
- Reach — unique users exposed at least once.
- Frequency — impressions per reached user.
- Frequency distribution — not just the average. 90% of users at freq 2 and 10% at freq 20 is very different from everyone at freq 4.
The key visualization: frequency distribution histogram
Overlay: vertical line at the frequency cap (e.g., 8). Count of users above the cap is your "over-cap" rate — should be near zero.
SQL
WITH user_freq AS (
SELECT
user_id,
COUNT(*) AS impressions_7d
FROM fact_impression
WHERE campaign_id = :c
AND dt BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
AND is_valid = TRUE
GROUP BY user_id
),
buckets AS (
SELECT
CASE
WHEN impressions_7d = 1 THEN '01'
WHEN impressions_7d <= 3 THEN '02_03'
WHEN impressions_7d <= 6 THEN '04_06'
WHEN impressions_7d <= 10 THEN '07_10'
ELSE '11_plus'
END AS freq_bucket,
COUNT(*) AS users,
SUM(impressions_7d) AS impressions
FROM user_freq
GROUP BY 1
)
SELECT
freq_bucket,
users,
impressions,
users * 1.0 / SUM(users) OVER () AS users_share,
impressions * 1.0 / SUM(impressions) OVER () AS impressions_share
FROM buckets
ORDER BY freq_bucket;
Health check
- Reach curve that flattens early. If reach stops growing after day 3, you're oversaturating a small audience. Expand targeting.
- Right tail that's too fat. >10% of users at freq 10+ means your frequency cap isn't working. Investigate pacing.
Section 3: Conversion Funnel (Conversions Objective)
The funnel
From impression to return purchase — live drop-off
Each stage's drop-off is diagnostic.
Visualization: funnel bar chart
From impression to return purchase — live drop-off
Annotate with industry benchmarks. If your click→purchase rate is 3.6% vs vertical median 4.5%, you know where to focus.
ROAS by attribution window
Same campaign. Same reality. Five different ROAS numbers.
A critical chart. The same campaign shows very different ROAS at 1d-click vs 7d-click vs 28d-click. Advertisers need to see all three.
Attribution Window Attributed Revenue ROAS
1-day click $12,000 2.4x
7-day click $23,000 4.6x
28-day click $27,000 5.4x
1-day view $18,000 3.6x
7-day click + 1v $29,500 5.9x
SQL
WITH windows AS (
SELECT 'click_1d' AS window_label, 1 AS click_d, 0 AS view_d UNION ALL
SELECT 'click_7d', 7, 0 UNION ALL
SELECT 'click_28d', 28, 0 UNION ALL
SELECT 'click_7d_view_1d', 7, 1
),
attributed AS (
SELECT
w.window_label,
SUM(c.conv_value) AS revenue
FROM windows w
CROSS JOIN fact_attributed_conversion c
WHERE c.campaign_id = :c
AND (
(c.source = 'click' AND DATEDIFF(c.conversion_ts, c.click_ts) <= w.click_d)
OR
(c.source = 'view' AND w.view_d > 0
AND DATEDIFF(c.conversion_ts, c.view_ts) <= w.view_d)
)
GROUP BY w.window_label
),
spend AS (
SELECT SUM(spend_cents) / 100.0 AS spend_usd
FROM fact_impression
WHERE campaign_id = :c
)
SELECT
a.window_label,
a.revenue,
s.spend_usd,
a.revenue / NULLIF(s.spend_usd, 0) AS roas
FROM attributed a CROSS JOIN spend s
ORDER BY a.window_label;
Section 4: App Install Funnel (App Installs Objective)
The funnel
App installs are their own beast. Attribution is through a Mobile Measurement Partner (MMP) in most cases, or SKAdNetwork on iOS.
Install-to-event funnel
Install to purchase to D7 retention
Installs 10,000
First-open 8,500 (85%)
Tutorial complete 5,100 (60% of first-open)
Registration 3,400 (40%)
Purchase 1,200 (14%)
D7 retention 2,800 (28% of installs)
The "event" schema
Advertiser-defined milestone events flow back via the MMP or server API:
CREATE TABLE fact_app_event (
dt DATE,
install_id BINARY(16), -- from install attribution
app_id BIGINT,
event_name STRING, -- 'first_open', 'tutorial_complete', 'purchase', ...
event_ts TIMESTAMP,
value_usd DECIMAL(18,4) -- for purchases
)
PARTITIONED BY (dt);
Retention curve
WITH installs AS (
SELECT install_id, install_ts
FROM fact_install
WHERE campaign_id = :c
AND install_ts BETWEEN :start AND :end
),
opens AS (
SELECT install_id, DATE(event_ts) AS open_dt
FROM fact_app_event
WHERE event_name = 'app_open'
)
SELECT
DATEDIFF(o.open_dt, DATE(i.install_ts)) AS days_since_install,
COUNT(DISTINCT o.install_id) * 1.0 /
(SELECT COUNT(DISTINCT install_id) FROM installs) AS retention_rate
FROM installs i
JOIN opens o USING (install_id)
WHERE o.open_dt BETWEEN DATE(i.install_ts) AND DATE(i.install_ts) + INTERVAL '30' DAY
GROUP BY days_since_install
ORDER BY days_since_install;
Standard curve is D1 ~50%, D7 ~25%, D30 ~15% for a solid app. Less than that, the install was probably a low-quality install from poor audience targeting.
SKAN caveat
SKAdNetwork postbacks are:
- Delayed (24h minimum, up to 60h).
- Conversion-value-constrained (6-bit on classic SKAN, 64-bit with SKAN 4).
- Subject to anonymization thresholds (no postback below cohort-size floor).
- Non-deterministic at granular grain.
Your SKAN pipeline has to handle all four. Don't mix SKAN-derived metrics with MMP-derived metrics in the same chart without clear labeling.
Section 5: Lead Quality Scoring (Lead Gen Objective)
The format
Advertiser runs a lead form ad. User fills it out without leaving the platform. Advertiser receives the lead record for follow-up.
The pipeline
The last two stages happen in the advertiser's CRM, not on the ad platform. So:
- Volume metrics (submissions, CPL) — measurable on-platform.
- Quality metrics (qualified rate, close rate, deal value) — require advertiser to feed back.
Feedback API
The advertiser's CRM reports back qualified-lead status (and eventually close/value) via server-to-server API. Schema:
CREATE TABLE lead_outcome_feedback (
lead_id BINARY(16),
campaign_id BIGINT,
advertiser_id BIGINT,
submitted_ts TIMESTAMP,
qualified BOOLEAN,
qualified_ts TIMESTAMP,
closed_won BOOLEAN,
closed_ts TIMESTAMP,
deal_value_usd DECIMAL(18,4)
);
Lead-quality ratio
SELECT
campaign_id,
COUNT(*) AS leads_submitted,
SUM(CASE WHEN qualified THEN 1 ELSE 0 END) AS qualified,
SUM(CASE WHEN qualified THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS qualified_rate,
SUM(CASE WHEN closed_won THEN deal_value_usd ELSE 0 END) AS revenue,
SUM(CASE WHEN closed_won THEN deal_value_usd ELSE 0 END) /
NULLIF(SUM(spend_usd), 0) AS ROAS_leadgen
FROM lead_outcome_feedback f
JOIN (SELECT campaign_id, SUM(spend_cents)/100.0 AS spend_usd
FROM fact_impression WHERE dt BETWEEN :start AND :end
GROUP BY campaign_id) s USING (campaign_id)
WHERE f.submitted_ts BETWEEN :start AND :end
GROUP BY campaign_id
ORDER BY ROAS_leadgen DESC;
Key chart: lead funnel + cycle time
Leads → qualified → closed · with cycle time and ROAS
Leads submitted: 1,200
Qualified (within 7d): 720 (60%)
Closed won (within 30d): 180 (15%)
Avg cycle time: 18 days
Revenue: $450,000
Spend: $60,000
ROAS: 7.5x
The "cycle time" dimension matters — you can't declare the campaign a failure after 2 weeks if the average lead closes on day 18.
Section 6: Cross-Objective Benchmark Table
Most senior dashboards include a benchmark row showing where this campaign sits relative to a reference set.
WITH this_campaign AS (
SELECT SUM(impressions) imps, SUM(clicks) clicks, SUM(spend_usd) spend, SUM(revenue) rev
FROM metric_campaign_daily
WHERE campaign_id = :c AND dt BETWEEN :start AND :end
),
advertiser_baseline AS (
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY clicks * 1.0 / imps) AS median_ctr,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY rev / spend) AS median_roas
FROM (
SELECT campaign_id,
SUM(impressions) AS imps,
SUM(clicks) AS clicks,
SUM(spend_usd) AS spend,
SUM(revenue) AS rev
FROM metric_campaign_daily
WHERE advertiser_id = (SELECT advertiser_id FROM dim.campaign WHERE campaign_id = :c)
AND dt BETWEEN CURRENT_DATE - 28 AND CURRENT_DATE - 1
GROUP BY campaign_id
) t
WHERE imps > 10000
),
vertical_baseline AS (
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY clicks * 1.0 / imps) AS median_ctr,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY rev / spend) AS median_roas
FROM (...vertical-level aggregation...) t
)
SELECT
tc.clicks * 1.0 / tc.imps AS campaign_ctr,
ab.median_ctr AS acct_median_ctr,
(tc.clicks * 1.0 / tc.imps) / NULLIF(ab.median_ctr, 0) - 1.0 AS vs_account_pct,
vb.median_ctr AS vert_median_ctr,
(tc.clicks * 1.0 / tc.imps) / NULLIF(vb.median_ctr, 0) - 1.0 AS vs_vertical_pct
FROM this_campaign tc CROSS JOIN advertiser_baseline ab CROSS JOIN vertical_baseline vb;
Benchmarks should always include at least two references — the advertiser's own trailing performance, and a vertical-wide median. One reference is too easy to game.
Section 7: Visualization Anti-Patterns
- Dual-axis charts. Always confusing. Split into two panels.
- Mixed-metric stacked bars. "Clicks + impressions" stacked makes no sense.
- Percentages without denominators. "CVR dropped 30%" — from 3% to 2.1% or from 0.1% to 0.07%? Always show the base.
- Moving-average without the raw. Hides short-term changes.
- Color-coded-only diff. 8% of users are color-blind. Always label with a sign or arrow too.
- Truncated Y-axis. Zoomed axes lie. Start at zero unless there's a reason to not.
Section 8: The One-Pager the Exec Actually Reads
At the most senior level, the scorecard shrinks to one screen with five numbers:
CAMPAIGN HEALTH — <campaign_name>
Spend: $54,230 (91% of budget)
Primary KPI: ROAS 4.2x (target 4.0x) ✓
Guardrail: IVT 0.8% (cap 2.0%) ✓
Trajectory: +12% vs. last week
Next action: Increase budget 20%, refresh creative
🟢 Running well — no intervention needed.
That's it. If the exec wants more, they'll click through. If they don't, you've already told them everything they need. Senior DE mastery is knowing when to stop adding numbers.
Section 9: Measurement Checklist (before a campaign launches)
- Objective is declared and mapped to exactly one primary KPI.
- At least two guardrail metrics are declared.
- Attribution window(s) are pre-committed; no post-hoc changes.
- Benchmark references (account, vertical) are identified.
- Dashboard is wired to the metric registry, not a hand-rolled SQL.
- Incrementality measurement plan exists (holdout, or lift study, or forgo).
- D+7 lock policy is written down — numbers become immutable on day 7.
- Advertiser approved the measurement plan in writing.
A campaign without measurement is a spend event. With measurement, it's a learning event. The DE's job is to make sure it's the second one.
Dashboard · Growth Accounting
Where does MAU actually come from — and where does it leak?
Activet = Retained + New + Resurrected · Activet-1 = Retained + Churned. Net = New + Resurrected − Churned.Dashboard · Where the Money Is Left on the Table
$142,300 gross spend → $89,720 effective spend. Where does the other $52,580 go?
Dashboard · Campaign Improvement Opportunities
Six moves you can make next. Bubble size = monthly $ recovery.
The Room · Real Decisions
If you've been preparing for an ads data-engineering loop by reading about attribution models, you've been preparing wrong. I've watched strong candidates who knew more about MTA than the interviewer lose the loop inside four minutes — not because their knowledge was weak, but because they answered the wrong question with the right words. This part is not a syllabus. It is a catalogue of the specific moments where candidates go silent, and what should have come out of their mouth instead.
Everything below is simulation, not explanation. Read it with a timer. Say the answers out loud. If any of them feel uncomfortable to say, that is the signal that you haven't yet rehearsed them — not that the answer is wrong.
1. The Scar Files
Each of the transcripts below is a specific moment I have seen end a loop. The candidate's answer is not wrong on the surface. The answer is wrong because the interviewer is scoring something the candidate did not know was being tested. I will tell you what was being scored in each case. You will not forget it.
Scar 1 — The "we use Kafka" death
Interviewer: "Walk me through how your current ads pipeline handles impression events."
Candidate: "Sure — we use Kafka for ingest, Flink for real-time, Spark for batch, Iceberg for storage, and Looker for dashboards."
Interviewer (silent for 3 seconds, then): "OK. Anything else?"
Candidate: "Uh… and we have data quality checks in Great Expectations."
Interviewer: "Got it. Let's move on."
Loop score at this point: weak hire, trending no hire. The candidate listed tools. The interviewer was asking them to tell a story. What should have come out of their mouth:
"We carry roughly 800K impressions per second at peak, 300K steady. Kafka is partitioned by advertiser_id to keep hot advertisers' downstream pacing co-located — we learned that the hard way when LinkedIn-scale campaigns knocked over our initial user-id partitioning in April last year. Flink consumes into a pacing KV with 5-second tolerance; batch reconciles it hourly. The painful thing in our stack right now is that Iceberg compaction is bumping against the batch window on quarter-close weeks — I'm six weeks into solving it."
Notice what the second answer does that the first doesn't: it gives numbers, a war story (the LinkedIn-scale campaign), a pain point you've lived with, and a thing you're actively working on. Interviewers score operational ownership, not tool knowledge. This is the single most common way strong candidates look mid-level.
Scar 2 — Attribution, confidently wrong
Interviewer: "How do you think about attribution for a campaign that runs across display and search?"
Candidate: "We use last-click attribution. It's standard."
Interviewer: "And your marketing team is OK with that?"
Candidate: "Yeah, they've always used it."
Interviewer (neutral tone): "I see. Next question."
The candidate doesn't know it, but the loop is over. The interviewer was testing whether the candidate has ever been in the room when this was fought over. The correct answer is not a better attribution model. The correct answer is:
"We default to last-click for operational attribution, but we maintain a parallel MTA lens for budget reallocation decisions. Last-click systematically under-credits upper-funnel — when our marketing director cut display spend 40% last year because last-click showed it was unprofitable, search ROAS fell 30% within six weeks, because search had been catching users that display had primed. We restored display. Now we show both lenses on the same dashboard, labeled by what decision each is fit for. I'd never recommend a team unify on one attribution number, because the number and the decision are coupled."
What's being scored: whether the candidate has been inside the consequence of last-click. The tell is the six-week lag — you cannot fabricate that specific detail unless you lived it.
Scar 3 — The "exactly once" lie
Interviewer: "How do you guarantee impression counts are exact?"
Candidate: "We use Kafka's exactly-once semantics and Flink's checkpointing, so we're exactly-once end to end."
Interviewer: "OK. What about the SDK?"
Candidate: "…it sends to Kafka."
Interviewer: "Right. And when the user's phone loses network, the SDK buffers and retries?"
Candidate: "…yes."
Interviewer: "So you're not exactly-once end to end."
The candidate never recovers. What they should have said:
"Exactly-once is a useful guarantee within the Kafka-to-sink path, but the SDK side is at-least-once by design — retries on network drop create duplicates we have to catch at ingest. We run a triple-key dedupe on (user_id, event_name, event_ts rounded to the second) with a 24-hour reconciliation window against a suspected-duplicates sidecar. We accept a 0.1% false-positive loss rate in exchange for ~0.8% duplicate catch. Naming that budget in the contract with the advertiser team was the hardest conversation of that project."
The signal: senior engineers know where "exactly-once" ends. They know it ends at the SDK. Candidates who claim end-to-end exactly-once reveal they have never been paged at 3 a.m. about a 2% revenue overcount.
Scar 4 — The system design that doesn't scope
Interviewer: "Design a pipeline for a new ads product that reports per-creative performance."
Candidate (starts drawing): "OK so we have Kafka here, events go into bronze, then we have a silver layer…"
Interviewer (four minutes in): "Who consumes this?"
Candidate: "…the advertiser."
Interviewer: "How often?"
Candidate: "Daily?"
Interviewer: "If it's daily, why did we just spend four minutes on a real-time ingestion layer?"
I have watched this exchange kill three candidates this year. The cure is uncomfortably simple: do not draw a single box until you have asked who reads it and how often. The first 60 seconds of a system-design answer should be clarifying questions, not architecture.
Scar 5 — The metric that didn't move
Interviewer: "Tell me about a project you shipped."
Candidate: "I built the new impression deduplication pipeline."
Interviewer: "Great — what did it change?"
Candidate: "It's faster and more accurate."
Interviewer: "By how much? And what downstream behavior changed because it was more accurate?"
Candidate: "…I'd have to check."
If you do not know what your project changed in dollars, hours, or headcount, you shipped a feature, not an outcome. Senior candidates walk in with three numbers for every project: the number that moved, the number that stayed flat (to show you knew what to watch), and the number you'd still want to improve. "I'd have to check" closes the loop immediately.
Scar 6 — The monitoring vacuum
Interviewer: "How do you know your pipeline is healthy?"
Candidate: "We have alerts on job failures."
Interviewer: "What about when the job succeeds but the data is wrong?"
Candidate: "We'd catch that in the downstream dashboards eventually."
Interviewer (written in the scorecard afterwards): "No data-quality discipline. Weak on production ownership."
"Eventually" is the word that killed this loop. The correct answer distinguishes pipeline liveness (job finished) from data correctness (output is what downstream expected), names the specific checks that run on every successful completion (row count floors, null rate ceilings, schema stability, freshness), and describes how violations route — blocking downstream until triage vs warning to a channel the owner watches.
Scar 7 — The "it depends" that wasn't followed up
Interviewer: "Should we denormalize this schema for the new reporting layer?"
Candidate: "It depends."
(Silence.)
Interviewer: "On?"
Candidate: "The tradeoffs."
This candidate had the knowledge. They had built denormalized reporting tables. They lost the loop because they did not name the axes the decision depends on. "It depends" without what-it-depends-on is indistinguishable from "I don't know." The cure, drilled to reflex: "It depends on X, Y, and Z. If X then denormalize because of tradeoff T. If not, keep normalized because of tradeoff U. I'd lean toward denormalize unless [specific condition], because in my experience the reversibility cost of denormalization is low and the query-performance cost of over-normalization is high."
Scar 8 — The candidate who was too junior by three words
Interviewer: "How would you handle a schema change in the event stream?"
Candidate: "I'd ask my manager."
The candidate is a competent senior engineer. But they just signaled to the interviewer that the decision does not feel like theirs to make. Staff+ roles are scored on whether you own the decision, not whether you escalate it. The answer the interviewer needed to hear: "I'd flag it to my manager as context, and I'd bring a proposed path — versioned schema with a 30-day producer notice, backward-compatible field addition, dual-write during transition, consumer migration gated by a coverage metric. If the manager has a better path, great. If not, I execute." Owning the proposal is the signal.
2. This Sounds Smart But Gets Rejected
Every bullet below is an answer I have heard in an interview loop that sounded competent enough that the candidate sat down thinking they did well. In every case the loop was scored down. Read each one with the assumption that your first instinct was probably the answer the candidate gave.
"We'd use Kafka for exactly-once delivery."
Gets rejected because: Kafka exactly-once is a guarantee within Kafka, not across the SDK-to-sink path. The interviewer is probing for the candidate's understanding of where the guarantee breaks. The right answer names the breakpoints (SDK retries, consumer side effects outside the transaction, external API calls from the processor) before it names the feature.
"We run data-quality checks in Great Expectations."
Gets rejected because: it is a tool, not a discipline. The interviewer wants to know which specific checks run on which specific tables, how thresholds are set (learned or hand-configured), who is paged on violation, and what the ratio of false-positive pages is. If you cannot answer those four questions, "we use Great Expectations" reads as "we installed Great Expectations."
"I'd partition by user_id for better parallelism."
Gets rejected because: it reveals the candidate has not thought about cardinality explosion. user_id as a partition key in a file-based store produces tens of millions of files over a retention window, and the metadata overhead kills every engine that reads it. The right instinct is partition by date plus bucketed user_id.
"We use medallion architecture: bronze, silver, gold."
Gets rejected because: it is vocabulary. Any candidate can say this. The interviewer wants the rule for what earns a silver table, the criterion for promotion to gold, and a specific example of a row that lives in silver but not in gold (with reason). Without those specifics, "medallion" is a word, not an answer.
"Let's put a cache in front of it."
Gets rejected because: caching is the lazy answer to read-heavy performance. The interviewer is scoring whether the candidate thought about invalidation. A cache in front of an ads aggregation without invalidation will serve stale bid-eligibility decisions and cost real money. Naming the invalidation policy (TTL, event-driven, versioned key) is the senior signal.
"We do idempotent writes using upserts."
Gets rejected because: idempotent writes are easy; idempotent pipelines are hard. The interviewer probes: what about the side effects — emails sent, counters incremented, webhooks fired? Upserts handle the table. Pipelines need end-to-end idempotency, which often requires an explicit processed-events ledger the pipeline checks before acting.
"Schema-on-read gives us flexibility."
Gets rejected because: flexibility is engineer-speak for "we punted the decision downstream." The interviewer is checking whether the candidate has felt the pain of five downstream consumers each implementing a slightly different schema interpretation of the same raw events. The senior answer says: schema-on-read at bronze; enforced typed contract at bronze-to-silver; if you do it the other way you pay for it for years.
"We'd scale horizontally."
Gets rejected because: it is the answer that means "I haven't thought about where the bottleneck is." The interviewer is looking for whether the candidate knows which specific component is constrained: the producer, the broker, the consumer, the sink, or (most often in ads pipelines) the shuffle in the downstream aggregation. Horizontal scaling the wrong component moves the bottleneck without solving it.
"We use Parquet because it's columnar."
Gets rejected because: columnar is table stakes. The interviewer wants dictionary encoding, predicate pushdown, min-max statistics, bloom-filter usage, and row-group sizing decisions. If the candidate says "Parquet is columnar" and stops, they are a consumer of formats, not an engineer who has tuned one.
"We'd denormalize for query speed."
Gets rejected because: denormalization is a tool, not a default. The interviewer is checking whether the candidate understands the asymmetry: denormalization is easy to add, extremely expensive to remove. Senior candidates default to keeping the grain in silver and denormalizing only in gold, where the denormalization is attached to a specific consumer-facing view.
"We use Airflow."
Gets rejected because: "we use Airflow" says nothing about how your team treats Airflow. Who owns DAG code review? How do you test a DAG before production? How do you handle cross-DAG dependencies? How do you prevent DAG sprawl at 500+ DAGs? If the candidate has answers to those, they are senior. If not, they are downstream of someone who set up Airflow for them.
"I would A/B test it."
Gets rejected because: A/B testing for DE changes (not product changes) is a specific discipline. The interviewer probes: how do you split traffic? Do you backfill historical data through both pipelines for comparison? What's your minimum detectable effect, and how did you calculate sample size? "I'd A/B test" without these answers reads as a phrase the candidate learned without running one.
3. The Ads Landmines — Where the Consequences Live
These are not topics. They are the specific moments when an ads pipeline hurts you financially, loudly, and publicly. Every landmine below has taken someone down in a loop because they could not name the consequence — they could only name the concept.
Landmine 1 — Last-click paid for itself until it didn't
The landmine: leadership sees that paid-search is credited with 68% of last-click conversions and concludes that display is not working. They cut display 40%. Paid-search ROAS drops 30% six weeks later. Revenue craters for one quarter before the team reverses. That quarter is on the ads engineering team's scorecard because the attribution report they shipped told leadership display was unprofitable.
What the interviewer is testing: whether you understand that attribution numbers are not neutral data. They are policy decisions wearing a number's clothes. The senior move is to ship last-click alongside an MTA view, and to label each view with the decision it is fit for — never a single number for a multi-channel decision.
Landmine 2 — Conversions that arrive tomorrow, attributed to yesterday
The landmine: your pipeline closes yesterday's books at 6 a.m. today. A conversion that arrives at 10 a.m. today, clicked yesterday, cannot be attributed to yesterday's report without rewriting yesterday's numbers. Rewriting yesterday's numbers means every dashboard that consumed yesterday silently drifts. Finance loses confidence within two quarters.
What the interviewer is testing: whether you know the pattern — append-only, with event_time (when it happened in the world) and attribution_time (when it was credited in our books) stored separately. Dashboards choose which clock to read. History never mutates. Candidates who propose rerunning yesterday reveal they have never been in a finance-close conversation.
Landmine 3 — The duplicate that grew revenue by 3% for two weeks
The landmine: a SDK update inadvertently retries on transient server errors without marking retries as duplicates. Your revenue numbers quietly grow 3% for two weeks before anyone notices. Legal calls. The advertiser asks for a refund of the overcount. Your team spends a month on the reconciliation, another on writing a post-mortem that names which SDK version introduced the bug and when the dedupe logic failed to catch it.
What the interviewer is testing: whether you know that duplicates are not a theoretical problem. They are an audit problem. The answer is a suspected-duplicate sidecar and a reconciliation budget negotiated with the advertiser team, so the 0.1% false-positive rate is an acceptable disclosed number rather than a surprise.
Landmine 4 — Identity stitching that dropped 12% of conversions overnight
The landmine: the identity team updates its graph with stricter confidence thresholds to address a privacy audit. Your pipeline now fails to stitch 12% of cross-device conversions that were previously counted. Advertiser dashboards show a 12% revenue drop that is not a revenue drop — it is a measurement drop. The distinction is lost on the advertiser, who calls the account manager, who calls your director, who calls you.
What the interviewer is testing: whether you know that identity is an upstream dependency your pipeline does not own, and whether you have contract tests with the identity team — when they change the graph, you know before production reads it. Without a contract, every identity update is a potential incident.
Landmine 5 — The pacing counter that ran 90 seconds behind during Black Friday
The landmine: Kafka lag on peak. The real-time pacing counter that the bidder reads is 90 seconds stale. The bidder thinks the advertiser has $400 left when they actually have $50 left. 9,000 more auctions get won. The advertiser over-spends by $400. Multiply by every advertiser affected, for the duration of the lag spike. Finance's post-incident memo lands in your inbox with a number attached.
What the interviewer is testing: whether you treat lag as a dollar-cost number. The senior move is a fallback pacing policy — at 30+ seconds of consumer lag, the bidder switches to conservative bidding (cap at 80% of target) automatically. Degradation is written into the design doc before the incident. Candidates who say "we'd scale Kafka" during an active incident reveal they have not lived through one.
Landmine 6 — Kappa architecture that couldn't sign off on billing
The landmine: your team migrates to streaming-only (Kappa) because it looks modern. Quarter close comes. Finance refuses to sign off on a revenue number sourced from a Flink checkpoint because it cannot be reproduced if the checkpoint is lost. You spend a weekend rebuilding a batch path. The next quarter's tech-planning meeting is heated.
What the interviewer is testing: whether you know that Lambda is not dead; it is alive wherever a number has to survive an audit. Kappa for operational metrics, Lambda for billing. Finance will never accept a streaming-only source of truth, and senior engineers know that before they propose the architecture, not after.
Landmine 7 — The iOS 14 revenue cliff nobody modeled
The landmine: iOS 14 ships with ATT prompts. Opt-in rates are 15–30% industry-wide. Your iOS attribution numbers fall 60%+ overnight. Advertisers see "their iOS campaigns stopped working" and pause budget. The revenue impact is real and takes two quarters to explain with a reworked measurement stack that honors SKAN postback limits.
What the interviewer is testing: whether you see regulatory changes as measurable engineering risks that should be modeled in advance, not as events that happen to you. Staff+ candidates maintain a privacy-impact register — a list of pending regulatory changes and the specific measurement systems each will affect. They do not get surprised.
4. Thirty-Second Drills — Forced-Choice Tradeoffs
For each drill below, start a timer. Answer out loud. If you pause more than ten seconds before speaking, you have not rehearsed it enough. If you do not say one of the named sentences below, you have said the wrong thing.
Drill 1 — The 4 TB query
Your query is correct but scans 4 TB. The interviewer is watching what you do next.
What the junior says: "I'd add a partition filter." Why it fails: you jumped to optimization before asking what the query is for. A daily analyst query at 4 TB is fine; a 5-minute dashboard query at 4 TB is a six-figure monthly bill. The interviewer is scoring whether you distinguish between the two.
What the senior says: "Two questions. Who runs this, and how often? If it is an analyst's one-off, 4 TB is acceptable. If it is a dashboard hitting it every 5 minutes, we materialize the aggregate in gold, not tune the scan. Which case are we in?"
Drill 2 — SLA or correctness
Your pipeline is late. You can ship stale-by-two-hours at 6 a.m., or you can slip to 8 a.m. and include everything.
What the junior says: picks one. Why it fails: there is no engineer's answer — this is a stakeholder call.
What the senior says: "If it is the CFO's revenue number, I am not making this call — I am paging the VP of Finance with both options and a recommended one in 10 minutes. If it is a team standup dashboard, I ship at 6 a.m. with a banner that says 'incomplete — reconciled at 8 a.m.' The axis is who is paying for the miss."
Drill 3 — Facts as events or aggregated
You are designing a click-stream schema. Thirty seconds: facts as events, or facts as aggregated impressions?
What the senior says: "Events at bronze, always — the grain is irreversible going up and cheap going down. At silver: events unless we have no impression-level consumer, in which case daily aggregates. I ask what the hardest downstream query is; if any of them needs per-impression, the answer is events. Aggregation is an optimization; un-aggregation is a backfill."
Drill 4 — Push back on a staff engineer's "we don't need Iceberg"
What the senior says: "You are right that a new format has real cost. I am not proposing Iceberg because it is new — I am proposing it because of a specific pain we have lived through: we lost a day last quarter rebuilding a silver table after a schema change. If we have two more of those this year at eight engineer-days each, Iceberg's migration cost pays back in a year. If you think we will not have those incidents, I should not propose it. Can we look at the last four quarters of schema-change incidents together?"
Drill 5 — Kafka lag on Black Friday
What the senior says: "None of the three options in isolation. I route pacing consumption to a pre-provisioned stand-by cluster immediately — a stale pacing signal is the only thing costing money right now. I accept dashboard lag up to 5 minutes silently, page the dashboard owner at 10. I start the broker-cluster scale-up in parallel with a 30-minute mitigation horizon. Dropping data is the last resort and requires an executive decision, because the lost rows may be billable."
Drill 6 — The conversion 7 days late
What the senior says: "For which attribution window? Same conversion, four answers. SKAN: out of window, not counted. Paid search 7-day click: counted. MMP 28-day click: counted. Finance billing for a closed month: counted in the current month's revenue, not retroactively. The engineer's job is not to pick — it is to preserve event_time so every consumer applies its own policy."
Drill 7 — The numbers that don't match
What the senior says: "Three possibilities, ranked: timezone, refund treatment, attribution-window drift. I reconcile by asking for the exact filters of both sides, not the queries. Both are usually correct for their purpose. The fix is almost always a labeled definition on each dashboard, not a query change."
Drill 8 — The vendor pitch
What the senior says: "Shadow mode 60 days. Daily divergence check against our pipeline. Kill criteria written before we sign: if divergence exceeds 2% in any week, or operational cost exceeds in-house, we walk. That reframe usually reveals that sales loves the vendor because of ownership anxiety, not the technology."
Drill 9 — The identity merge
What the senior says: "Two-track. Forward-only for attribution; sidecar ledger for historical reconciliation. Billing tables are immutable after monthly close. Identity changes never rewrite published revenue. Anyone proposing retroactive rewrites does not know what quarterly close looks like."
Drill 10 — The benchmark-resistant peer
What the senior says: "I don't know whether Postgres hits 500K/sec for our shape until we measure. Three engineer-days for a benchmark beats six months of migration on a hunch. Same schema, same partition key, same class of hardware. I'll have the numbers Friday."
Drill 11 — Privacy vs fraud
What the senior says: "Hashed IP with rotating salt, 7-day retention aligned to the fraud team's detection window. Privacy team gets minimization. Fraud team keeps 40% of their signal. The design decision to escalate: is there any case we absolutely need raw IP for, and if so, can we run that case in an enclave with tighter governance?"
Drill 12 — The 10x scale question
What the senior says: "At 10x, compaction breaks before Kafka or Spark — our file count per partition goes from 10K to 100K and the rewriter collides with the next day's load. Storage egress costs scale linearly and become the single largest line item. At 100x, the partition scheme itself has to change — hourly plus bucketed sub-partitioning. Kafka and Spark scale with configuration; the scheduling graph and the cloud bill do not."
5. The Interviewer-Pushback Playbook
When the interviewer pushes back, they are not saying you are wrong. They are saying "prove you can hold this under pressure." The weak move is to capitulate. The senior move is to restate with more specificity.
"That's not how we do it here."
"Fair. What I gave you is from my previous stack. What is different in yours, so I can tell you whether my answer changes?"
"What if we had 100x the volume?"
"At 10x, compaction breaks. At 100x, the partition scheme does. But the real decision is whether we plan the migration in advance or emergency-handle it later, because the migration is 3–6 months of work. What is your growth trajectory, so I can tell you whether I am designing for 10x or 100x?"
"Isn't that over-engineered?"
"You might be right. Let me name the specific pain I was solving — [pain]. If we don't have that pain, I simplify to [simpler design]. Which side are we on?"
"Why not just use a vendor?"
"Buy if three conditions hold: accuracy within 2% of ours, total cost under 18 months of engineering, no existential dependency that would make the vendor a single point of failure. If any fails, we build. I've seen teams default in either direction and regret it."
"Convince me."
"Three reasons, ranked. One — [strongest, addresses biggest risk]. Two — [second]. Three — [weakest, named as weakest]. If you kill it, my concern is [specific downside]. If you approve it, my concern is [specific downside]. I'm making a case for one path with eyes on both costs."
"What would you regret?"
"The decision I'd regret most is [specific choice]. It was reversible but had ongoing ops cost I may have under-weighted. Escape hatch: keep schema compatible with [alternative] so we can migrate in 6 months if it bites."
The silence after your answer
Junior candidates fill silence with more content, often undermining a good answer. Senior candidates wait five seconds and then say: "Did that answer what you asked, or did I miss the actual question?" This reframe preserves the good answer and surfaces what was actually being tested. Interviewers almost always tell you.
6. What Changes After You Read This
If you walked into this part thinking the ads interview was a knowledge test, you now know it is not. It is a pressure test where knowledge is the surface layer and judgment is what gets scored. The candidates who pass have rehearsed not concepts but responses — the pause, the question back, the named tradeoff, the admitted regret.
Three things to do before your next loop:
- Record yourself answering the twelve drills with a timer. Play it back. Count the filler words, the moments you said "it depends" without what-it-depends-on, the places you named a tool instead of a tradeoff. Those are the places the loop will die.
- Prepare three numbers for every project on your resume. What moved. What stayed flat. What you would still want to change. If you cannot produce those numbers, you shipped a feature, not an outcome — and that is what the interviewer sees.
- Practice the pushback responses out loud. Not the concepts, the exact words. The distance between reading these responses silently and saying them with conviction is the distance between a mid-level offer and a staff offer.
The uncomfortable truth: every section in this part was built from watching candidates I thought were strong lose loops on questions I thought they should pass. They did not lose on knowledge. They lost on form. You now know the form.
Good luck.