Interview Studio · Design Pillar

Design — twenty-three industry data models, end to end.

A separate workspace from Q&A practice. The Systems at the Whiteboard long-form series works each design end to end — the question and its back-of-envelope math, the data flow, the schema, the atomic-consistency SQL, the streaming Python, the aggregation layer, and the dashboard that proves it works. Each reads like the whiteboard session a senior loop actually runs. The full twenty-three-scenario dimensional-modeling catalog sits below as a quick reference.

▸ Systems at the Whiteboard · the long-form series
Nº 01 · Ride-share

The Dispatch Problem

Driver matching, the location firehose & surge — availability everywhere except the one compare-and-swap that owns truth.

Nº 02 · Delivery

The Stacked-Delivery Problem

Three-sided dispatch with batched legs; the bridge's payout shares must sum to the dispatch payout.

Nº 03 · Ads

The Auction Problem

Auction → impression → click → conversion threaded on one decision id; re-runnable attribution.

Nº 04 · Ads

The Attribution Problem

Cross-device identity graph, match-at-read, and the structural SKAN wall — credit resolved as-of event time.

Nº 05 · Ads

The Inventory Problem

CTV pacing & frequency capping with unfilled opportunities as first-class rows; make-good liability.

Nº 06 · E-commerce

The Fulfillment Problem

Orders, returns & multi-warehouse inventory as immutable, append-only movements; 1P/3P carve-out.

Nº 07 · Social

The Feed Problem

Impression-as-fact at 100B/day; ranker versioning for replay; the celebrity hot-partition, keyed away.

Nº 08 · Streaming

The Royalty Problem

Pool-model royalties — the per-stream rate doesn't exist until the period closes; SCD2 rights bridge.

Nº 09 · Payments

The Ledger Problem

Double-entry, append-only, SUM=0 per txn; idempotency on source_event_id; FX locked at event time.

Nº 10 · Marketplace

The Booking Problem

Calendar-as-fact with snapshot_date in the PK for price time-travel; refund frozen at cancel time.

Nº 11 · SaaS

The Metering Problem

Subscription snapshot + hourly meter at separate grains; the invoice is a SUM, never a recompute.

Nº 12 · ML / Recsys

The Recommendation Problem

Freeze the prediction before the outcome; the 5% exploration table is the non-negotiable counterfactual.

Nº 13 · CRM

The Hierarchy Problem

Account tree + materialized path + 3-layer tenant isolation; org_id rides every query or it fails.

Nº 14 · Streaming

The Bingewatch Problem

Late-arriving dimensions — the play fact never waits for its title; per-session vs per-day binge.

Nº 15 · M&A

The Integration Problem

Golden records + strangler dual-write; every merge reversible via an append-only crosswalk.

Nº 16 · Trust & Safety

The Fraud Problem

Typed-strength identity graph → connected components; every decision replayable to the byte.

Nº 17 · Analytics

The Growth Problem

DAU/WAU/MAU as a new/retained/resurrected/churned state machine; the quick ratio; cohort LTV.

Nº 18 · Spatial / XR

The Telemetry Problem

A sensor firehose with biometric-consent gates; device + session grains; the attention signal.

Nº 19 · Autonomous

The Autonomy Problem

Disengagements & incident replay; the safety ledger; liability frozen at decision time.

Nº 20 · Google Calendar

The Recurrence Problem

Google Calendar, end to end — recurrence-as-rule + occurrence-as-fact to a rolling horizon; UTC derived per occurrence; free/busy as a bitmap.

▸ Featured · 23 industry data models
▸ Quick jump · scenarios in this deep-dive
Uber / Lyft · ride-share

Trip lifecycle (with multi-driver convoys)

Journey → Trip → Event hierarchy + convoy bridge + 20K mass-event extension + real 148K dataset.

Uber · pricing

Surge pricing & supply/demand

State → Decision → Outcome causal chain. Frozen input_features_json + SCD2 model for replay.

DoorDash · 3-sided

Order & courier dispatch (stacked)

Customer + courier + restaurant grains, with brg_dispatch_orders bridge for batched deliveries.

Google · search ads

Auction → Impression → Click → Conversion

4-fact chain joined by auction_decision_id. Re-runnable attribution via attribution_run_id.

Meta · attribution

Cross-device attribution & identity

Identity graph + SCD2 device bridge + match-at-read. SKAN aggregate carve-out for iOS.

Netflix · CTV ads

Inventory, pacing & frequency capping

Unfilled opportunities as first-class rows. Pacing snapshot + make-good liability fact.

Amazon · e-commerce

Orders, returns & multi-warehouse

Append-only returns + 1P/3P recognized_as carve-out + immutable inventory movements.

Instagram · feed

Engagement at scale (with ranker A/B)

ranker_model_id on every impression. Append-only engagements with is_undone for unlikes.

Spotify · royalties

Listening history & pool-model royalties

Per-stream rate at period close. SCD2 bridge_track_rights for mid-quarter renegotiations.

Stripe · payments

Double-entry ledger

Append-only + SUM=0 per (txn × currency) invariant. UNIQUE(source_event_id) idempotency.

Airbnb · marketplace

Bookings, calendar & reviews

Calendar-as-fact with daily snapshot_date in PK. Refund locked at cancel time.

SaaS · billing

Subscription + hourly usage metering

Daily subscription snapshot + hourly meter + 2-event proration on plan changes.

Spotify · streaming royalties

Royalty pool distribution

SCD2 bridge_track_rightsholder + frozen monthly country pool + 30-sec royalty boundary as denormalized flag.

TikTok · ML feedback loop

For-You-Page recommendation + off-policy eval

5% exploration table for causal lift, predicted_score stored at impression-time, shadow-traffic model rollouts.

Stripe · payments

Double-entry ledger + idempotency

Append-only journal entries, SUM=0 per txn, refunds/chargebacks as new entries not updates, FX lock at event-time.

Salesforce · CRM multi-tenant

Account hierarchy + multi-tenant isolation

Hybrid parent_id + materialized path, 3-layer tenant isolation (partition + index + RLS), stage-changes as fact.

Netflix · streaming + LAD

Series + bingewatching + late-arriving dimensions

LEFT JOIN + ROW_NUMBER + COALESCE defensive query, per-session vs per-day binge derivation, profile + account dual key.

M&A · post-acquisition integration

Golden records + strangler + multi-org metrics

Ordered matching playbook (strict-then-fuzzy) + append-only crosswalk with un-merge support, dual-write CDC with 0.1% drift gate, share-and-bridge warehouse path, versioned dim_metric_definition. Adobe-Figma / Microsoft-Activision / Cisco-Splunk / Salesforce-Slack as live case studies (full deep-dive linked from the dedicated M&A Integration page below).

Fraud · trust & safety

Identity graph, impossible travel, account sharing, model replay

Typed-strength identity graph (strong / medium / weak edges) feeding connected-component clusters; SCD2 on dim_ruleset + dim_risk_model + dim_feature_snapshot for 6-month decision replay; multi-geo / impossible-travel detection via session-event LEAD; account-sharing vs ATO vs multi-account-fraud separation in the schema; feedback loop where analyst labels (hours) + chargebacks (30-180d) reconcile into fact_outcome_label as training-set source of truth. Worked example: a Lagos-Berlin transaction blocked by cluster lookup expands to 17 linked mule accounts.

Growth · engagement analytics

Growth accounting — DAU/WAU/MAU, the new/retained/resurrected/churned state machine, cohort LTV

The Social Capital / Reforge framework: a FULL OUTER self-join offset by one period decomposes every active-user base into new / retained / resurrected / churned (+ expansion / contraction for MRR). Two identities hold by construction — MAU(t) = retained + new + resurrected. One inc_amt column generalizes engagement and revenue. Frozen SCD0 cohorts feed the retention triangle and cumulative-LTV curve. Worked example: a "+100K MAU" headline that the quick ratio exposes as a 24%-churn treadmill. Netflix / Uber / Spotify lens.

Google Calendar · productivity

Recurring events, time zones & free/busy

Recurrence-as-rule in an SCD2 series dim + occurrence-as-fact to a rolling horizon; exceptions append-only by recurrence_id; DST-correct UTC derived per occurrence; a 15-min free/busy slot fact for find-a-time. Full long-form treatment: The Recurrence Problem.

▸ Also live · senior DE prep notebook
▸ Also live · expanded Design pillar
▶ LIVE

Snowflake — Architecture, Cost & CDC

The three-layer architecture, micro-partitions and pruning, the virtual-warehouse credit model, Time Travel and zero-copy cloning, Streams + Tasks + Dynamic Tables — plus an end-to-end project brief and a rapid-fire question bank.

▶ LIVE

System Design — Data Platforms

Lakehouse vs warehouse, batch vs streaming, dbt + Airflow patterns, schema registry, the Kappa architecture. Whiteboard-ready answers with diagrams.

▶ LIVE

Streaming Architecture

Flink + Kafka deep-dive: event-time, watermarks, exactly-once, backpressure, checkpointing, state backends, schema evolution, the 8-step play-event walkthrough.

▶ LIVE

Data Quality & Reliability

Schema, volume, distribution and referential checks. Lineage, freshness SLAs, the on-call playbook. From "tests" to a real DQ framework.

▶ LIVE

ML Engineering Interview Prep

Feature stores, training/serving skew, model versioning, online vs offline metrics, A/B and incrementality. Where DE meets ML.

▶ LIVE

M&A Integration — Data Platform Survival Guide

Entity resolution, schema-strangler dual-write, warehouse consolidation, multi-org metrics. Adobe-Figma, Microsoft-Activision, Cisco-Splunk, Salesforce-Slack as live case studies.

▶ LIVE

Skew & Distributions — The Senior DE Vocabulary

Whales / dolphins / minnows · decile / quartile / percentile · Pareto / long-tail / normal / skewed · mean vs median · LAD gap. The L5+ vocabulary interviewers test for, plus the one-sentence script that signals senior-IC bar in 90 seconds.

▶ LIVE

Hot Shards & Data Skew — Surviving Whales in Production

The operational deep-dive: Kafka partition skew, Flink keyed-state RocksDB hotspots, Spark salting / broadcast / AQE skew join, DB hot-row contention, DynamoDB / Cassandra hot partitions, BI dashboard query-time + visual lies. Plus the whale / dolphin / minnow handling playbook per layer.

▸ How Practice and Design fit together

Practice (Q&A) is for the rounds where you're given a schema and asked to write SQL or Python. Browse 1437 questions filtered by company, save your set, run answers in the embedded SQLite or Pyodide playground. Design is for the rounds where you're given an open-ended prompt — "design Uber's trip lifecycle data model" — and asked to architect a system end-to-end. Both pillars share the same workspace and use the same browser-only runtime; nothing is sent to a server.