Design — twelve industry data models, end to end.
A separate workspace from Q&A practice. Every scenario is a complete architecture deep-dive — eight sections of conceptual reasoning, a 3-column ER diagram (Dimensions / Fact Tables / Analytical Modules), stakeholder-grouped SQL with sample data and expected output, and a worked numerical example that ties the math to actual rows. Use it when the interview moves past "write this SQL" into "design this system."
Data Modeling — Twelve Industry Scenarios
Ride-share (Uber/Lyft with multi-driver convoys, surge, real Delhi NCR dataset), 3-sided marketplaces (DoorDash stacked dispatch), ads (Google auction chain, Meta cross-device attribution, Netflix CTV inventory + pacing), e-commerce (Amazon orders/returns/inventory with 1P/3P carve-out), social feed (Instagram with ranker A/B), streaming (Spotify pool-model royalties), payments (Stripe double-entry ledger), marketplaces (Airbnb calendar-as-fact), SaaS (subscription + hourly metering).
Trip lifecycle (with multi-driver convoys)
Journey → Trip → Event hierarchy + convoy bridge + 20K mass-event extension + real 148K dataset.
Surge pricing & supply/demand
State → Decision → Outcome causal chain. Frozen input_features_json + SCD2 model for replay.
Order & courier dispatch (stacked)
Customer + courier + restaurant grains, with brg_dispatch_orders bridge for batched deliveries.
Auction → Impression → Click → Conversion
4-fact chain joined by auction_decision_id. Re-runnable attribution via attribution_run_id.
Cross-device attribution & identity
Identity graph + SCD2 device bridge + match-at-read. SKAN aggregate carve-out for iOS.
Inventory, pacing & frequency capping
Unfilled opportunities as first-class rows. Pacing snapshot + make-good liability fact.
Orders, returns & multi-warehouse
Append-only returns + 1P/3P recognized_as carve-out + immutable inventory movements.
Engagement at scale (with ranker A/B)
ranker_model_id on every impression. Append-only engagements with is_undone for unlikes.
Listening history & pool-model royalties
Per-stream rate at period close. SCD2 bridge_track_rights for mid-quarter renegotiations.
Double-entry ledger
Append-only + SUM=0 per (txn × currency) invariant. UNIQUE(source_event_id) idempotency.
Bookings, calendar & reviews
Calendar-as-fact with daily snapshot_date in PK. Refund locked at cancel time.
Subscription + hourly usage metering
Daily subscription snapshot + hourly meter + 2-event proration on plan changes.
Senior DE Interview Prep — The Conversation
Eight deep sections worked through a Netflix case study but applicable to any senior loop: calibration (the four scoring axes), modeling (five-move sequence), SQL (seven patterns + dialect cheatsheet), Python (five reflexes + CodeSignal mechanics), streaming (Flink/Kafka vocabulary + 8-step play-event narrative), behavioral (STAR templates), system-design lite (4 architectures), day-of playbook + questions to ask back.
System Design — Data Platforms
Lakehouse vs warehouse, batch vs streaming, dbt + Airflow patterns, schema registry, the Kappa architecture. Whiteboard-ready answers with diagrams.
Streaming Architecture
Flink + Kafka deep-dive: event-time, watermarks, exactly-once, backpressure, checkpointing, state backends, schema evolution, the 8-step play-event walkthrough.
Data Quality & Reliability
Schema, volume, distribution and referential checks. Lineage, freshness SLAs, the on-call playbook. From "tests" to a real DQ framework.
ML Engineering Interview Prep
Feature stores, training/serving skew, model versioning, online vs offline metrics, A/B and incrementality. Where DE meets ML.
▸ How Practice and Design fit together
Practice (Q&A) is for the rounds where you're given a schema and asked to write SQL or Python. Browse 888 questions filtered by company, save your set, run answers in the embedded SQLite or Pyodide playground. Design is for the rounds where you're given an open-ended prompt — "design Uber's trip lifecycle data model" — and asked to architect a system end-to-end. Both pillars share the same workspace and use the same browser-only runtime; nothing is sent to a server.