Twenty-three modeling scenarios, the way an interviewer probes them — filter to the company you're prepping, then open a scenario into its tabbed study view.
Every modeling round tests the same five-move sequence — the domain changes what you say. Filter, search, or open a card for its study view.
01Clarify use cases
02Pin the grain
03Conceptual → logical → physical
04SCDs & time
05Stress test
No scenarios match these filters.
Compare scenarios
Two models, side by side
Pick two scenarios to compare…
Trip lifecycle (with multi-driver convoys)
Uber / Lyft · ride-sharing
Design a complete data model for the Trip Lifecycle — including the complex convoy / multi-driver scenario where one group request is split across two or more vehicles. Must support Operations (live ETA), Finance (driver payouts), and Risk (fraud).
Why this is hard — the design tension
The problem isn't a single trip — it's the group intent. Five friends going to the airport expect to be charged a single price, but Uber dispatches two vehicles. Operations sees two trips; Finance must charge the group once but pay two drivers fairly; Risk must verify the convoy actually moved together. The robust solution is a hierarchical Star Schema that decouples the physical movement of the car from the financial obligation of the rider.
Section 1 — Dimensional Model Overview
This model uses a Journey → Trip → Event hierarchy to handle everything from a single ride to complex shared convoys.
Fact Tables (the hierarchy)
fct_journeys — The "Parent" grain. One row per group request. Captures group intent (split type, total group cost, status).
fct_trips — The "Vehicle" grain. One row per driver / vehicle per journey. Source of truth for Finance (driver payouts).
fct_trip_events — The "Telemetry" grain. One row per state transition (REQUEST, MATCH, PICKUP, etc.) per trip. Powers Ops (Live ETA) and Risk (GPS replay).
fct_trip_feedback — The "Sentiment" grain. One row per (trip × rider) feedback event. Captures rating, category, free-text comment, sentiment. Powers Product (CSAT, NPS).
brg_journey_riders — The "Financial" bridge. Links riders to journeys and to a specific assigned trip/vehicle — handles cost-splitting (journey-level) and capacity routing (trip-level).
Dimension Tables
dim_users — Riders and drivers, typed via user_role. SCD2 on plan tier and rating.
dim_vehicles — Vehicle specs (model, type), insurance status, owner. SCD2 on insurance status (Risk needs as-of-trip status).
dim_geography — split by volatility: the H3 grid itself (dim_h3_cell, level 8 ≈ 0.7 km²) is SCD1 because the geometry is static, but the operational overlays riding on it — tax zones, airport/surge rings, road status — are SCD2 (dim_surge_zone and road-state rows), because those boundaries get re-drawn. See §11.3 for SCD2 geography in action.
dim_payment_methods — Card / wallet on file. SCD2 — Finance audit needs the as-of-trip card.
dim_fx_rate — SCD2 daily rates; locked at request_timestamp.
Section 10 — Working with a real Uber dataset (148K bookings · Delhi NCR · 2024)
To make the practical SQL drillable on real data, the model maps cleanly to a public Kaggle-derived dataset: github.com/hannahsutton1/uber.data — 148,770 booking rows from Delhi NCR for calendar year 2024. Repo includes raw + cleaned CSVs plus the cleaning Jupyter notebooks.
Glossary:VTAT = vehicle-to-pickup minutes (driver arrival time). CTAT = customer-to-completion minutes (in-trip duration). Booking_Value = fare in INR. Ride_Distance in km. Ratings on a 1–5 scale.
Source → Target mapping (ETL plan)
Source column(s)
Target table.column
Notes
booking_ID
fct_trips.trip_id
Natural key; idempotency anchor.
Date + Time
fct_trips.request_timestamp
Concat to TIMESTAMP_TZ (Asia/Kolkata).
Customer_ID
fct_trips.rider_id + dim_users
Single-rider per booking.
Vehicle_Type
dim_vehicles.type
Enum {Auto, Go Mini, Go Sedan, Premier Sedan, UberXL, Bike, eBike}.
Enrichment: inferring the journey hierarchy from flat bookings
The dataset is one row per single-rider booking — there is no journey_id, so the convoy hierarchy can't be exercised until you derive one. A window function groups bookings that share an origin, destination, and request-minute into an inferred journey:
-- Inferred journey derivation (the enrichment layer)
WITH inferred AS (
SELECT
booking_ID AS trip_id,
Customer_ID AS rider_id,
DENSE_RANK() OVER (
ORDER BY Pickup_Location, Drop_Location,
DATE_TRUNC('minute', CAST(Date || ' ' || Time AS TIMESTAMP))
) AS inferred_journey_id
FROM raw_kaggle_uber_data
)
SELECT
inferred_journey_id AS journey_id,
COUNT(*) AS convoy_vehicle_count,
COUNT(DISTINCT rider_id) AS distinct_riders
FROM inferred
GROUP BY inferred_journey_id
HAVING COUNT(*) > 1; -- multi-vehicle groups = candidate convoys
Caveat — this is a heuristic, not ground truth. Grouping on (origin, destination, minute) fuses independent strangers sharing a popular corridor at rush hour into a phantom "convoy." A real enrichment needs a genuine correlation signal — a shared booking/party token, a tighter spatial radius (H3 L10) with a few-second window, or the same payment instrument — before two bookings can be claimed as one journey. Use it to teach the window-function pattern, and name its false-positive rate out loud in the interview: spotting the limit is itself the senior signal.
Real sample rows (5 representative bookings from the dataset)
Q17 — Vehicle type mix & cancellation rate (real-data analysis)
SELECT
vehicle_type,
COUNT(*) AS bookings,
COUNT(*) FILTER (WHERE terminal_state = 'completed') AS completed,
COUNT(*) FILTER (WHERE terminal_state LIKE 'cancelled%') AS cancelled,
COUNT(*) FILTER (WHERE terminal_state = 'no_driver_found') AS no_driver,
ROUND(100.0 * COUNT(*) FILTER (WHERE terminal_state = 'completed')
/ COUNT(*), 1) AS completion_pct,
ROUND(AVG(booking_value_inr) FILTER (WHERE terminal_state = 'completed'), 0) AS avg_fare_inr,
ROUND(AVG(ride_distance_km) FILTER (WHERE terminal_state = 'completed'), 1) AS avg_distance_km
FROM fct_trips
WHERE DATE_TRUNC('year', request_ts) = '2024-01-01'
GROUP BY vehicle_type
ORDER BY bookings DESC;
Typical result on the 148K dataset:
vehicle_type
bookings
completed
completion_pct
avg_fare_inr
avg_distance_km
Auto
~37,400
~23,200
62%
~410
~22.5
Go Sedan
~31,800
~19,800
62%
~520
~24.1
Premier Sedan
~21,500
~13,300
62%
~610
~24.0
Bike
~18,900
~11,700
62%
~390
~22.7
eBike
~14,600
~9,000
62%
~395
~22.8
Go Mini
~12,300
~7,600
62%
~440
~23.0
UberXL
~12,200
~7,500
62%
~720
~24.5
Insight: completion rate is ~62% across all vehicle types — the bottleneck isn't the vehicle, it's cancellation behavior (next query).
Q18 — Top 10 routes by booking volume (Pickup → Drop)
SELECT
pickup_loc,
drop_loc,
COUNT(*) AS trips,
ROUND(AVG(booking_value_inr) FILTER (WHERE terminal_state = 'completed'), 0) AS avg_fare,
ROUND(AVG(ride_distance_km) FILTER (WHERE terminal_state = 'completed'), 1) AS avg_dist,
ROUND(100.0 * COUNT(*) FILTER (WHERE terminal_state = 'completed')
/ COUNT(*), 1) AS completion_pct
FROM fct_trips
GROUP BY pickup_loc, drop_loc
ORDER BY trips DESC
LIMIT 10;
Q19 — Cancellation reason breakdown (driver vs customer)
WITH driver_cancels AS (
SELECT 'driver' AS cancelled_by, driver_cancellation_reason AS reason, COUNT(*) AS n
FROM fct_trips
WHERE terminal_state = 'cancelled_driver' AND driver_cancellation_reason IS NOT NULL
GROUP BY driver_cancellation_reason
),
customer_cancels AS (
SELECT 'customer' AS cancelled_by, reason_for_cancelling_by_customer AS reason, COUNT(*) AS n
FROM fct_trips
WHERE terminal_state = 'cancelled_rider' AND reason_for_cancelling_by_customer IS NOT NULL
GROUP BY reason_for_cancelling_by_customer
)
SELECT cancelled_by, reason, n,
ROUND(100.0 * n / SUM(n) OVER (PARTITION BY cancelled_by), 1) AS pct_within_actor
FROM (SELECT * FROM driver_cancels UNION ALL SELECT * FROM customer_cancels) u
ORDER BY cancelled_by, n DESC;
Use case: if "Vehicle Breakdown" tops the driver list, fleet maintenance owns the fix; if "Driver took too long" tops the customer list, dispatch tuning owns it.
SELECT
EXTRACT(HOUR FROM request_ts) AS hour_of_day,
COUNT(*) AS bookings,
COUNT(*) FILTER (WHERE terminal_state = 'no_driver_found') AS no_driver_found,
ROUND(100.0 * COUNT(*) FILTER (WHERE terminal_state = 'no_driver_found')
/ COUNT(*), 2) AS no_driver_pct,
ROUND(100.0 * COUNT(*) FILTER (WHERE terminal_state = 'completed')
/ COUNT(*), 1) AS completion_pct,
ROUND(AVG(vtat_min) FILTER (WHERE terminal_state = 'completed'), 1) AS avg_vtat_min
FROM fct_trips
GROUP BY EXTRACT(HOUR FROM request_ts)
ORDER BY hour_of_day;
-- The hours where no_driver_pct spikes are the operational gaps where supply incentives matter most.
Q21 — Payment method mix by vehicle tier (premium vs economy)
SELECT
CASE WHEN vehicle_type IN ('Premier Sedan','UberXL') THEN 'premium'
WHEN vehicle_type IN ('Go Sedan','Go Mini') THEN 'mid'
ELSE 'economy'
END AS tier,
payment_method,
COUNT(*) AS trips,
ROUND(100.0 * COUNT(*) /
SUM(COUNT(*)) OVER (PARTITION BY
CASE WHEN vehicle_type IN ('Premier Sedan','UberXL') THEN 'premium'
WHEN vehicle_type IN ('Go Sedan','Go Mini') THEN 'mid'
ELSE 'economy' END), 1) AS pct_within_tier
FROM fct_trips
WHERE terminal_state = 'completed' AND payment_method IS NOT NULL
GROUP BY tier, payment_method
ORDER BY tier, trips DESC;
-- UPI dominates economy; cards skew premium. Drives partnership and incentive design.
What this dataset does not support (caveats — for honest interviews)
No driver_id, no vehicle_id. Only Vehicle_Type as a categorical. Driver-side analytics (utilization, payouts) need a synthetic driver_id assignment or a different dataset.
No surge multiplier, no fare breakdown. Only the final Booking_Value. Surge analysis (Scenario 2) needs synthetic data.
No lat/lon, no H3 cells. Only Delhi NCR neighborhood names. Geocode at ETL using h3-py or Mapbox.
No journey or multi-leg concept. One rider per booking — convoy / Pool / shared rides are not in scope. Use synthetic data for those examples.
No GPS trace.fct_trip_events can be reverse-engineered from VTAT/CTAT into approximate REQUESTED, MATCHED, PICKUP, DROPOFF timestamps but the geographic events (location_h3 per state) are unavailable.
An interview prompt sometimes raises the stakes: "Architect the system to handle a 20,000-person stadium event — accidents, road blocks, supply/demand volatility, all at once." The existing Journey → Trip → Event hierarchy holds, but three additional concerns must be modeled explicitly: infrastructure as a fact (SCD2 geography), growth accounting on both sides of the marketplace, and point-in-time financial integrity under chaos.
Section 12 — Entity ownership & boundaries (who owns what, what crosses the line)
The model so far names tables. A senior answer names archetypes and owners — because the boundary between two tables is a contract, not a foreign key. Four archetypes, four write disciplines, four systems of record.
Entity
Archetype & write discipline
System of record
What crosses the boundary
fct_trips
Stateful aggregate. One row mutated through a known state machine in OLTP; snapshotted append-only to the warehouse.
Trip service (OLTP) → warehouse mirror
The terminal state + immutable keys (trip_id, surge_multiplier, fx_rate_to_usd). Never the in-flight state.
fct_trip_events
Append-only log. Insert-only, never updated, ordered by event_timestamp within trip_id.
Dispatch / telemetry service
Every state transition, immutably. This log is the audit trail.
total_fare_usd · driver_payout_usd
Financial truth. Append-only ledger; corrections are new rows, never an in-place UPDATE.
Payments / ledger service
A committed amount keyed to trip_id — written only after a terminal state.
A sampled H3 cell on each event — not the raw 1 Hz GPS firehose.
The ownership rule that signals seniority:
fct_trips is a projection, not a source. The trip's life is the fct_trip_events log; fct_trips is that log folded into its current state. If the two ever disagree, the log wins and fct_trips is rebuilt.
That is event-sourcing-with-a-snapshot, and it earns the level. The interviewer's follow-up — "so can you rebuild fct_trips from fct_trip_events?" — has one correct answer: yes, and that is the whole point (Section 15).
Boundary violations that disqualify. A column on fct_trips that a second service writes directly → two writers, no source of truth. Raw GPS inside fct_trip_events → a 1 Hz firehose welded to an append-only audit log, now unqueryable. A fare written before the terminal state → finance reads an estimate as if it were committed. Each one collapses a boundary the model depends on.
Section 13 — Consistency contracts: one model, four consistency regimes
A model that does not state its consistency contracts looks architecturally correct but operationally fuzzy. The Uber model spans four subsystems, and each runs a different consistency regime. Name them explicitly, per subsystem.
Subsystem
Regime
Operational contract
Mechanism
Matching / dispatch
Eventual — seconds
✓ two riders may briefly see the same driver as "available" · ✗ a driver permanently double-booked
Optimistic lock on driver_id + offer expiry; first MATCHED wins
Payments / payout
Strong — ACID
✓ payout lands minutes after dropoff · ✗ a charge with no matching ledger entry; SUM(payouts) ≠ SUM(fares − take)
Double-entry ledger, trip_id as idempotency key, single writer
Location / GPS
Lossy — latest-write-wins
✓ dropped samples, a stale fix · ✗ blocking a trip write because a GPS packet is late
Fire-and-forget into a TTL'd store; never on the trip's critical path
Analytics / warehouse
Batch — bounded staleness
✓ warehouse fct_trips is 5–60 min behind OLTP · ✗ finance closing the books inside an open late-data window
Watermark + 24–48 h backfill window; an explicit "as-of" reporting timestamp
The framing that lifts the answer a level: consistency is per-field, not per-system. On a single fct_trips row —
surge_multiplier, fx_rate_to_usd — strong; locked at request, immutable forever.
driver_id — strong at assignment (the optimistic lock), then immutable.
the live ETA the rider sees (derived, not stored on the fact) — eventual; a stale value is a harmless UX hint.
is_flagged_risk — eventual and slowly-changing; appended, never blocks anything.
The trap: treating the whole trip as one consistency class. The ETA and the fare share a row but nothing operationally — one can be 30 seconds stale at zero cost, the other must be exact or finance breaks. A model that cannot tell those two fields apart will either over-engineer the ETA or under-protect the fare.
Section 14 — State-machine rigor: legal transitions, idempotency, the races
Section 1 lists the states. Seniority is in the edges — which transitions are legal, what guards the illegal ones, and which concurrency races the model must already survive. Interviewers love breaking a state machine; bring the guards before they ask.
The legal transition table (event states; terminal states mirror fct_trips.terminal_state):
From state
Legal next states
Absorbing?
REQUESTED
MATCHED · EXPIRED · CANCELLED
no
MATCHED
ARRIVED · CANCELLED
no
ARRIVED
STARTED · NO_SHOW · CANCELLED
no
STARTED
DROPOFF
no
DROPOFF · CANCELLED · EXPIRED · NO_SHOW
(none)
yes — terminal
Anything outside this table is an invalid transition and must be rejected, not silently written: STARTED → MATCHED (time travel), DROPOFF → STARTED (a terminal state is absorbing), a second MATCHED on a trip already matched. The guard is a state-rank check plus event_timestamp ordering within trip_id — an event that would lower the rank is routed to a dead-letter queue, not appended.
Idempotency.fct_trip_events.event_id is the producer's idempotency key. At-least-once delivery (Kafka, retried RPCs) will redeliver; the append must be a no-op on a seen event_id.
-- Event-log append. At-least-once delivery WILL redeliver; event_id makes it safe.
INSERT INTO fct_trip_events (event_id, trip_id, state, event_timestamp, location_h3, actor)
SELECT :event_id, :trip_id, :state, :event_ts, :h3, :actor
WHERE NOT EXISTS (SELECT 1 FROM fct_trip_events WHERE event_id = :event_id);
-- Equivalent: UNIQUE(event_id) + INSERT ... ON CONFLICT DO NOTHING.
The races a senior volunteers — before the interviewer reaches for them:
Race
What goes wrong
Resolution
Driver accepts twice (double-tap / RPC retry)
Two MATCHED events, same trip_id, same driver_id
Idempotent on event_id; the assignment UPDATE below is a no-op the second time
Duplicate offer — two drivers both accept
Two MATCHED, same trip_id, differentdriver_id
Optimistic lock: UPDATE … WHERE driver_id IS NULL — exactly one wins, the loser re-queues
Double cancel — rider and driver cancel together
Two terminal writes on one trip
Compare-and-set: terminal_state is set once, under WHERE terminal_state IS NULL
Late event reorders the log
ARRIVED lands after STARTED on network delay
Sort by event_timestamp within trip_id at read; reject rank-violating writes
-- OLTP write path. Driver assignment: the optimistic lock that
-- resolves the duplicate-offer race. Exactly one racing driver wins.
UPDATE trips
SET driver_id = :candidate_driver_id,
state = 'MATCHED'
WHERE trip_id = :trip_id
AND state = 'REQUESTED' -- guard: trip still offerable
AND driver_id IS NULL; -- guard: only the first writer passes
-- rows_updated = 0 → another driver already won; release the offer, re-queue.
-- rows_updated = 1 → this driver is assigned; emit the MATCHED event.
The trap: drawing the happy-path state diagram and stopping. The junior answer is a picture of states; the senior answer is a guard on every edge — the state-rank check, the event_id dedup, and the compare-and-set that makes "exactly one driver wins" true under concurrency. States are a diagram; legal transitions plus the locks that enforce them are a system.
Section 15 — Failure scenarios & the reconciliation engine
The clean star schema is the steady-state view. Production runs a reconciliation engine underneath it — and a senior candidate volunteers the ugliness rather than waiting to be cornered by it.
Failure
Symptom
How the model contains it
Duplicate driver assignment
Two drivers en route to one rider
The optimistic lock (§14) blocks it at write. If it still leaks, (driver_id, journey_id) uniqueness flags a DQ violation; the second trip resolves to cancelled_system — no payout.
Late GPS → wrong ETA
Rider sees a stale 12-min ETA; the driver is 2 min away
The live ETA is eventual-consistency UX (§13) — a wrong ETA costs nothing financial. location_h3 is sampled, never authoritative, and never feeds the fare.
Payment succeeds, trip then fails
Rider charged; trip ends cancelled_system after STARTED
The fare is an append-only ledger — the charge is real, so the reversal is a new refund row, not a delete. terminal_state drives the refund rule; net = SUM.
Event replay → double charge
Kafka replays the DROPOFF; the fare is computed twice
event_id idempotency (§14) dedups the replayed event before the fare calc sees it. The fare is keyed to trip_id once; a re-compute is a MERGE no-op.
Driver app crashes mid-trip
No DROPOFF event ever arrives
Stuck-state sweeper: a trip in STARTED past p99(trip_duration) × 3 is force-resolved to cancelled_system and reconciled against its last-known H3.
The reconciliation engine is three standing jobs, not a feature bolted on later:
Invariant checks — the two SUM reconciliations from Section 4, run continuously. A break pages the producing team.
Stuck-state sweeper — trips that miss a terminal state inside their SLA window are force-resolved on a known rule.
Ledger reconciliation — SUM(fct_trips.total_fare_usd) vs the payments ledger, per day. A drift is a real bug, never "rounding."
Senior framing. "The clean star schema is the steady-state view; production runs a reconciliation engine beneath it. Idempotency keys absorb replays, optimistic locks absorb races, append-only ledgers absorb rollbacks, and a stuck-state sweeper absorbs the events that simply never arrive. The model isn't only correct — it's correct and self-healing."
Section 16 — Partitioning & shard-key strategy
The shard key is the one decision you cannot migrate cheaply later — and the OLTP write path and the warehouse read path want different keys. Conflating them is a classic miss.
Layer · table
Shard / partition key
Rationale
OLTP · live trips
HASH(trip_id)
Lifecycle reads/writes are point lookups by trip_id; hashing spreads load evenly. Not by city — NYE Manhattan would melt a single shard.
OLTP · fct_trip_events
HASH(trip_id)
Co-located with its trip, so the lifecycle write stays single-shard.
OLTP · location store
H3 cell
Reads are spatial — "who is near this hex" — not by trip. L8 normally; sub-divide to L10 in dense cells.
Analysts scan a date range, then filter by geography.
Warehouse · fct_trip_events
Partition DATE(event_timestamp)
Append-only, time-series scan pattern.
Hot-partition risk — where production actually fails:
Hot spot
Cause
Counter-move
Dense city / NYE
A city-keyed shard puts all of Manhattan on one node
Never shard the write path by city. HASH(trip_id) for OLTP; HASH(h3_key) MOD N sub-partition in the warehouse.
Stadium let-out
One H3 L8 cell takes 100× its normal request rate
Adaptive zoom: split that cell to L10 — the hot partition becomes per-gate, not per-stadium (§11.5).
Mega-event in analytics
One origin_h3_key dominates a daily partition
Salted two-stage aggregation: GROUP BY h3 || salt, then re-roll.
Super-driver skew
A few airport drivers have 100× the trips
Salt driver_id in the payout rollup; the dimension is fine, the fact aggregation salts.
Why trip_id and not city or driver_id: a shard key must be immutable and high-cardinality with uniform load. trip_id is both. City is low-cardinality and skewed (and a trip's city can even change cross-border); driver_id is skewed by super-drivers. Rebalancing: consistent hashing on trip_id moves only ~1/N of keys when a shard is added — and because trips are short-lived (hours, not years), a rebalance only drains in-flight trips, it never migrates history.
Senior framing. "Pick the shard key for the path that can't tolerate a hotspot — the live write path — and pick it for the property you can't change later: immutability and uniform cardinality. trip_id for OLTP, DATE + H3 for the warehouse. The mistake that pages you at 2 a.m. is sharding the live path by city, because the business's best night — New Year's Eve in Manhattan — becomes a single melted node."
Senior framing. "The Journey → Trip → Event hierarchy is the unlock — Operations runs latency metrics off fct_trip_events, Finance reconciles fares against two invariants (rider-side and driver-side), and Risk replays GPS sequences for phantom-convoy detection. Three stakeholders, three grains, one source of truth — and two invariants on the financial bridge keep producer bugs from ever reaching the audit."
Vehicle Fact: fct_trips (one row per driver/vehicle per journey)
Column
Type
Description
trip_key
PK BIGINT
Surrogate key for the trip.
trip_id
UUID
Natural ID from the source system; idempotency anchor.
journey_id
FK
Parent journey. Multiple trips can share a journey (convoy).
trip_fare_share_usd
DECIMAL(10,2)
This vehicle's share of the journey fare. Drives driver payout, NOT what riders pay.
driver_id
FK
The matched driver. (driver,journey) is unique — same driver can't take two trips for one journey.
vehicle_id
FK
The vehicle used.
origin_h3_key
FK
H3 cell at request time.
destination_h3_key
FK
H3 cell at dropoff (NULL on cancellation).
quoted_route_distance_km
DECIMAL(8,3)
Planned route distance estimated at accept time — frozen, like surge/FX. The basis of the rider's fare quote, and the label the ETA/distance model is scored against.
trip_distance_km
DECIMAL(8,3)
Actual distance physically driven, from telemetry/odometer (not the event log, which carries only state-change waypoints). Driver payout is computed on this; the delta vs quoted_route_distance_km is an auditable fact.
route_recalc_reason
ENUM
NULL on a clean trip; else why the route changed mid-ride — {rider_cancel_in_trip, added_stop, traffic_reroute, driver_deviation}. The one column that tells ML an actual≠quoted divergence was exogenous.
Locked at request_timestamp — guarantees revenue can never be restated by FX drift.
total_fare_usd
DECIMAL(10,2)
Amount charged to rider(s) — a denormalized roll-up of the rider-side postings in fct_trip_financial_entry (at quoted_fx_rate), kept on the row for fast reads. The ledger, not this column, is the system of record.
driver_payout_usd
DECIMAL(10,2)
Final amount owed to driver after take rate (Finance focus).
is_flagged_risk
BOOLEAN
Latest fraud verdict (Risk focus) — a read-only projection of the newest fct_trip_risk_decision row. Corrections append to that ledger; this column is never UPDATEd in place.
The specific car this rider was assigned to. Critical: in a convoy, riders R_1 + R_2 may ride in T-101 while R_3 rides in T-102.
individual_charge_usd
DECIMAL(10,2)
What this rider actually pays. Computed from fct_journeys.split_type.
pickup_sequence
SMALLINT
Order picked up within their assigned trip (1, 2, 3…).
leg_distance_km
DECIMAL(8,3)
This rider's actual portion of the route (ridden, from telemetry); basis for distance-prorata splits.
quoted_leg_km
DECIMAL(8,3)
This rider's promised leg at request. Paired with leg_distance_km and bounded by effective_from→exit event, it isolates who consumed what when a co-rider drops mid-trip.
assignment_version
SMALLINT
Append-only: a re-assignment (rider moves cars mid-journey) writes a new version, never an UPDATE. The newest version is the live seat.
effective_from
TIMESTAMP_TZ
When this rider→trip assignment took effect; lets Risk and Finance reconstruct who was in which car at any instant.
Append-only systems of record. Two things on fct_trips — money and the fraud verdict — keep changing after the trip reaches its terminal state. Mutating them in place would break the one-row-per-trip grain, so each gets its own append-only ledger; the columns on fct_trips are read-only projections of the latest state.
Ledger Fact: fct_trip_financial_entry (one row per double-entry posting — append-only)
Column
Type
Description
entry_id
PK BIGINT
Surrogate; one immutable posting.
trip_id
FK
Link to fct_trips — a trip accrues many postings.
journey_id
FK
Parent journey; lets Finance reconcile rider-side and driver-side to total_group_fare_usd.
Where value leaves: rider_wallet, platform_take, driver_escrow, promo_pool…
destination_account
STRING
Where it lands. Every posting moves value between two accounts — money is never created or destroyed.
amount_local
DECIMAL(18,4)
Magnitude in currency_code; a reversal is a new posting with the accounts swapped, never a delete.
currency_code
CHAR(3)
ISO currency the posting was made in.
quoted_fx_rate
DECIMAL(18,8)
Rate locked at request_timestamp — fixes the rider's USD charge so revenue can't be restated by drift.
settlement_fx_rate
DECIMAL(18,8)
Rate at posted_at — what driver payouts and late tips actually settle at for corporate reconciliation.
posted_at
TIMESTAMP_TZ
When the posting was written.
accounting_period_id
STRING
The locked financial snapshot this posting falls in.
source_event
STRING
What triggered it (dropoff, dispute, payout_run…) — the audit anchor.
Conservation invariant. Because every row moves value between two accounts, platform integrity reduces to one identity per account: SUM(amount_local) WHERE destination_account = 'X' must equal SUM(amount_local) WHERE source_account = 'X'. A pipeline bug that creates or loses money breaks the equality on the next run — the ledger audits itself, exactly like the two reconciliation invariants on brg_journey_riders.
Risk Fact: fct_trip_risk_decision (one row per evaluation version — append-only)
Column
Type
Description
decision_id
PK BIGINT
Surrogate; one immutable verdict.
trip_id
FK
Link to fct_trips — a trip can be re-scored many times.
decision_version
SMALLINT
Monotonic per trip_id; the newest is what fct_trips.is_flagged_risk projects.
verdict
ENUM
{clear, review, flagged, confirmed_fraud}.
score
DECIMAL(5,4)
Model probability at this version.
model_id
FK
Which model/ruleset produced it — lets you replay a payout against the verdict that existed at the time.
decided_by
STRING
{model, analyst}; an analyst override appends a new version, never edits the model's.
decided_at
TIMESTAMP_TZ
When this version was written.
Mid-journey mutation — the "convoy jettison." Vehicle B breaks down ten minutes in; its two riders pile into Vehicle A or fork off into a fresh standalone trip. Nothing is mutated in place: the bridge appends new assignment_version rows re-seating those riders (the old rows stay, so "who was in which car at 12:09" is still answerable), and the billing divergence from the original group intent posts as a promo_adjustment / refund pair in fct_trip_financial_entry. The grain holds, the audit holds, and the reconciliation invariants still sum.
Promised vs driven — the mid-ride route recalculation. Rider A and Rider B share a pooled journey; mid-trip A cancels from inside the car and the driver re-routes to drop B. Two distances now diverge and the model keeps both: quoted_route_distance_km (frozen at accept — what the fares were quoted on) and trip_distance_km (actually driven, from telemetry). A's in-car cancel is a CANCELLED event after STARTED in fct_trip_events — the recalculation pivot — which stamps route_recalc_reason = rider_cancel_in_trip on the trip.
Who pays what. The driver is paid on trip_distance_km (they drove the detour) — a driver_payout posting. Rider A is charged only to the cancel point, bounded by effective_from→their cancel event, with the difference from quoted_leg_km posting as a refund / cancellation_fee pair. Rider B's pool discount assumed A's shared miles; once A leaves, B effectively rode partway solo, so a promo_adjustment closes the gap per policy. Every line is an append-only row in fct_trip_financial_entry, and the conservation invariant guarantees the re-route nets to zero across the rider, driver, and platform accounts.
Why the split matters for ML. Distance and ETA models train on features-at-request → realized-outcome: quoted_route_distance_km is the immutable prediction, trip_distance_km is the label. But here the two diverged for an exogenous reason — A cancelled — not because the model was wrong. Feed this trip to the base model unmarked and you teach it that a good prediction was an error. route_recalc_reason IS NOT NULL is the one-column filter that pulls these out of the training set (or routes them to a separate disruption model). Naming that confounder is the senior signal — the schema makes it a WHERE clause, not a forensic reconstruction.
Section 3 — How the model serves each stakeholder
Operations · live ETA & monitoring
By using fct_trip_events, Operations calculates lead-time metrics directly:
Pickup ETA accuracy = timestamp(STARTED) − timestamp(MATCHED) compared to the predicted ETA
Spatial hot-spot detection via location_h3: aggregate REQUESTED events per cell per minute against MATCHED events to identify cells where demand exceeds supply in real time.
Finance · driver payouts & revenue
fct_trip_financial_entry is the general ledger — every charge, fee, payout, refund, and adjustment is an append-only posting keyed to trip_id. fct_trips carries only the rolled-up total_fare_usd / driver_payout_usd projections for fast reads. Driver payouts settle in fct_payouts (one row per driver per payout run) and must reconcile back to the ledger.
surge_multiplier is immutable once the request is made — ensures the rider's quoted price matches the driver's earning potential. Stored on the row, never derived.
terminal_state dictates whether a cancellation fee triggers a partial driver payout (e.g. cancelled_rider after MATCHED triggers a fee; cancelled_system does not).
FX is locked at request_timestamp — guarantees revenue restatement is impossible from rate drift.
Risk · fraud detection
Location anomaly: compare origin_h3 on fct_trips with the first STARTED event's H3 in fct_trip_events — large divergence = potential ride-tampering.
Account farming: join fct_trips to dim_devices (one row per physical device — hardware fingerprint, first-seen, OS) through fct_session (one row per app session — device_id × rider_id × login window). A single device cycling through multiple rider_ids in a short window flags suspicious activity.
Each re-scoring appends a new fct_trip_risk_decision version instead of overwriting the last, so Finance can replay any payout against the verdict that was current at the time — a later fraud re-evaluation can never retroactively rewrite a settled decision.
Phantom convoy (collusion): two drivers and a fraudulent rider book a split-fare convoy on a stolen card; one car drives the route while the second sits at home spoofing its GPS to collect a "moving together" payout. Proximity isn't the tell — real convoys stay close too — jitter is: genuine telemetry carries GPS noise, a scripted loop doesn't. A streaming job scores each asset's positional entropy; near-zero jitter over a multi-minute window appends a confirmed_fraud verdict to fct_trip_risk_decision and freezes the driver payout before it settles.
Section 4 — Why the hierarchy: bridges + parent fact
Without the journey/trip split + bridge, you face three bad choices:
Option A: One row per trip, rider IDs in an array column. Finance can't aggregate revenue per rider without exploding the array. Disqualifying.
Option B: One row per rider in fct_trips. Pool with 3 riders becomes 3 rows — fare is triple-counted on every aggregation. Driver and dispatch counts also triple-count. Disqualifying.
Option C: Single fact for everything (rider-grain + driver-grain merged). A 3-rider, 2-driver convoy = 6 rows. Now you can't tell drivers from riders without filtering by role, and convoy-cost reconciliation needs a self-join. Disqualifying.
The hierarchy with bridge is the only design that satisfies all three stakeholders:
One row per group in fct_journeys — group-level KPIs (avg group size, split-type mix) are correct without filters.
One row per driver/vehicle per journey in fct_trips — Operations and Finance per-driver counts are correct on simple aggregation. (driver_id, journey_id) uniqueness honored.
Multiple rows per journey in brg_journey_riders — Finance computes per-rider charge with the journey's split logic; Risk can route a rider to a specific trip via the FK.
Two enforceable invariants:
SUM(brg_journey_riders.individual_charge_usd) per journey_id = fct_journeys.total_group_fare_usd
SUM(fct_trips.trip_fare_share_usd) per journey_id = fct_journeys.total_group_fare_usd
Two continuous data-quality checks. Either failing flags a producer bug.
REQUESTER_PAYS: individual_charge = total_group_fare for requester, 0 for others
Section 5 — Conceptual data flow (the convoy in plain English)
For Journey J-500 with 5 riders heading to the airport, dispatched as a 2-car convoy:
fct_journeys — the group's perspective. One row, $60 total, EQUAL_SPLIT, 2 vehicles dispatched.
fct_trips — the drivers' perspective. Two rows: Driver Alpha ($30 share), Driver Bravo ($30 share).
brg_journey_riders — the riders' perspective. Three rows: each rider charged $20, two assigned to T-101, one to T-102.
Enhanced graphical data model — Trip Lifecycle & Analytical Metrics
Solid arrows = referential FK relationships. Dashed arrows = analytical lineage (which fact tables feed which metric module).
The driverless future — what changes when the human driver disappears?
As autonomous vehicles (AVs) become a real share of the fleet, the data model has to absorb three structural shifts:
The driver dimension shrinks.dim_users becomes mostly riders. Drivers don't go away entirely — they become safety operators / remote-fleet teleoperators — but most rows in fct_trips won't have a human driver_id.
The vehicle dimension explodes.dim_vehicles evolves into dim_av_fleet with software version, sensor calibration, ODD (Operational Design Domain) restrictions, last safety-test pass, etc. — all SCD2 because they drift.
A whole new fact table appears for safety telemetry. Every disengagement, every remote-intervention, every edge-case recognition gets logged. Regulators require it; insurance audits depend on it.
Q15 — AV safety metric: miles per disengagement (the core regulator KPI)
WITH av_miles AS (
SELECT
a.software_version,
SUM(t.trip_distance_km) * 0.621371 AS total_miles
FROM fct_trips t
JOIN dim_av_fleet a ON t.av_key = a.av_key
WHERE t.terminal_state = 'completed'
AND t.request_timestamp >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY a.software_version
),
disengagements AS (
SELECT
a.software_version,
COUNT(*) FILTER (WHERE s.event_type = 'DISENGAGEMENT') AS n_disengagements,
COUNT(*) FILTER (WHERE s.event_type = 'REMOTE_INTERVENTION') AS n_interventions
FROM fct_av_safety_event s
JOIN dim_av_fleet a ON s.av_key = a.av_key
WHERE s.event_timestamp >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY a.software_version
)
SELECT
m.software_version,
ROUND(m.total_miles, 0) AS total_miles,
d.n_disengagements,
d.n_interventions,
ROUND(m.total_miles / NULLIF(d.n_disengagements, 0)) AS miles_per_disengagement,
ROUND(m.total_miles / NULLIF(d.n_interventions, 0)) AS miles_per_intervention
FROM av_miles m JOIN disengagements d USING (software_version)
ORDER BY miles_per_disengagement DESC;
-- Higher MPD = safer software. Required for NHTSA 2027 quarterly safety filings.
Q16 — Software version A/B: did v3.2 reduce disengagement rate vs v3.1?
WITH per_version AS (
SELECT a.software_version,
SUM(t.trip_distance_km * 0.621371) AS miles,
COUNT(*) FILTER (WHERE s.event_type = 'DISENGAGEMENT') AS disengagements
FROM fct_trips t
JOIN dim_av_fleet a ON t.av_key = a.av_key
LEFT JOIN fct_av_safety_event s ON s.trip_id = t.trip_id
WHERE a.software_version IN ('v3.1', 'v3.2')
GROUP BY a.software_version
)
SELECT
software_version,
ROUND(disengagements::DECIMAL / miles * 1000000, 2) AS disengagements_per_mil_miles
FROM per_version;
-- Compare across versions; statistical-significance test runs in the model card.
Senior framing for the AV future. "The data model survives the driverless transition because the grains are right. fct_trips doesn't care if the driver is a human or a piece of software — it cares about request → completion and the fare share. What changes is the vehicle dimension (now SCD2 on software version) and the safety fact (regulator-mandated). The Journey → Trip → Event hierarchy is platform-shape-agnostic — it works for single-sided fleets, two-sided marketplaces, three-sided delivery, and driverless robotaxis without re-architecting."
11.1 — Re-stated grain hierarchy under stress
Table
Grain
Purpose
fct_journeys
One row per group request
Captures intent (e.g. a group of 6 leaving the stadium together).
fct_trips
One row per (driver/vehicle × journey)
Vehicle-grain truth for driver payouts & insurance.
fct_trip_events
One row per state transition / telemetry sample
Tracks Matchup & Pickup latencies + GPS replays.
brg_journey_riders
One row per (rider × journey)
Financial bridge — many-to-many cost splits.
dim_geography (SCD2)
One row per H3 cell × state-change
NEW emphasis: infrastructure as a versioned dim. Tracks "as-of-trip" road status (one-way, accident, blocked).
11.2 — Growth accounting on both sides of the marketplace
To know whether a 20K-person event is being fueled by new users or resurrected ones (and which drivers are churning), every user is segmented daily.
Segment
Definition
What it tells the business
New
First-time activity in this window.
Acquisition is working.
Retained
Active in previous period AND active now.
Core ecosystem is healthy.
Resurrected
Active now but inactive in the previous period.
Re-engagement campaign / event pulled them back.
Churned
Active in previous period, did not show up.
Supply gap risk — especially among drivers.
Q22 — L1 / L7 / L28 active status per user (daily refresh)
SELECT
user_id,
user_role, -- 'rider' | 'driver'
MAX(CASE WHEN activity_date >= CURRENT_DATE - 1 THEN 1 ELSE 0 END) AS is_L1,
MAX(CASE WHEN activity_date >= CURRENT_DATE - 7 THEN 1 ELSE 0 END) AS is_L7,
MAX(CASE WHEN activity_date >= CURRENT_DATE - 28 THEN 1 ELSE 0 END) AS is_L28
FROM dw_user_daily_activity
GROUP BY user_id, user_role;
WITH today AS (
SELECT DISTINCT user_id, user_role FROM dw_user_daily_activity
WHERE activity_date = CURRENT_DATE - 1 -- yesterday's activity
),
prior AS (
SELECT DISTINCT user_id, user_role FROM dw_user_daily_activity
WHERE activity_date BETWEEN CURRENT_DATE - 8 AND CURRENT_DATE - 2 -- prior 7-day window
),
ever AS (
SELECT DISTINCT user_id, user_role FROM dw_user_daily_activity
WHERE activity_date < CURRENT_DATE - 8 -- before prior window
)
SELECT user_role,
COUNT(*) FILTER (WHERE t.user_id IS NOT NULL AND p.user_id IS NULL AND e.user_id IS NULL) AS new_users,
COUNT(*) FILTER (WHERE t.user_id IS NOT NULL AND p.user_id IS NOT NULL) AS retained,
COUNT(*) FILTER (WHERE t.user_id IS NOT NULL AND p.user_id IS NULL AND e.user_id IS NOT NULL) AS resurrected,
COUNT(*) FILTER (WHERE t.user_id IS NULL AND p.user_id IS NOT NULL) AS churned
FROM today t
FULL OUTER JOIN prior p USING (user_id, user_role)
LEFT JOIN ever e USING (user_id, user_role)
GROUP BY user_role;
-- Driver "churned" spike before a stadium event = supply crisis incoming. Marketing fires re-engagement push.
11.3 — Mass departure with road blocks: SCD2 geography in action
When an accident or police closure occurs during the stadium emptying out, dim_geography records the change as a new SCD2 row. Old route remains queryable; new route status is "as-of-trip".
geo_key
h3_index
route_status
valid_from
valid_to
7001
Stadium_Hex
Normal
2026-05-07 08:00
2026-05-07 22:00
7002
Stadium_Hex
Accident / Blocked
2026-05-07 22:00
2026-05-08 02:00
7003
Stadium_Hex
Normal
2026-05-08 02:00
9999-12-31
Q24 — Correlate pickup latency to road status (was the spike supply or infrastructure?)
SELECT
g.h3_index,
g.route_status,
COUNT(t.trip_id) AS total_requests,
ROUND(AVG(EXTRACT(EPOCH FROM (e_pic.event_timestamp - e_req.event_timestamp))), 1) AS avg_pickup_latency_sec,
ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (e_pic.event_timestamp - e_req.event_timestamp))), 1) AS p95_pickup_latency_sec
FROM fct_trips t
JOIN fct_trip_events e_req ON t.trip_id = e_req.trip_id AND e_req.state = 'REQUESTED'
JOIN fct_trip_events e_pic ON t.trip_id = e_pic.trip_id AND e_pic.state = 'PICKUP'
JOIN dim_geography g
ON t.pickup_h3_cell = g.h3_index
AND e_req.event_timestamp >= g.valid_from
AND e_req.event_timestamp < g.valid_to -- as-of-request status
GROUP BY g.h3_index, g.route_status
ORDER BY p95_pickup_latency_sec DESC;
-- "Accident / Blocked" rows with high p95 = blame infrastructure, not driver supply.
-- "Normal" rows with high p95 = blame supply, surge it up.
11.4 — Financial & risk integrity under chaos
Even when a 20K event triggers cascading surges + accidents + cancellations, three SCD2 rules guarantee restate-ability:
FX rate locked at request_timestamp.dim_fx_rate is SCD2; the rate that was current at the moment the journey was requested is the rate the rider is charged in USD, regardless of how the rate moves while the trip is in progress. Prevents arbitrage and revenue restatement.
Insurance status as-of accident timestamp.dim_vehicles SCD2 on insurance. If a crash happens at 22:14, Risk queries dim_vehicles with valid_from ≤ 22:14 < valid_to — exact policy in effect to the millisecond.
Surge multiplier locked in fct_trips at journey request. Even if surge spikes from 1.5× to 3.0× while the rider is being matched, the rider pays the multiplier that was active at request. Stored on the row, not derived.
Q25 — Insurance audit: was the vehicle insured at the moment of the accident?
SELECT
t.trip_id,
t.vehicle_id,
e.event_timestamp AS accident_ts,
v.insurance_policy_id,
v.insurance_status,
v.valid_from AS policy_active_from,
v.valid_to AS policy_active_to,
CASE WHEN v.insurance_status = 'active' THEN 'COVERED' ELSE 'GAP' END AS audit_verdict
FROM fct_trip_events e
JOIN fct_trips t ON e.trip_id = t.trip_id
JOIN dim_vehicles v ON t.vehicle_id = v.vehicle_id
AND e.event_timestamp >= v.valid_from
AND e.event_timestamp < v.valid_to
WHERE e.state = 'ACCIDENT_REPORTED';
-- Required for insurance claims + regulator audits. Single-row defensible answer.
11.5 — Scalability rules under load (the engineering details)
Rule
Why
Infinity date9999-12-31 instead of NULL for valid_to
Range joins like e_req.ts BETWEEN g.valid_from AND g.valid_to are far more performant than ... AND (g.valid_to IS NULL OR g.valid_to > e_req.ts). Optimizers handle BETWEEN cleanly; OR + IS NULL kills predicate pushdown.
H3 sharding by zoom level — L8 (~0.7 km²) for cities, L10 (~15K m²) for stadium event zones
Stadium hexes get 100× the requests/min of suburb hexes. Going to L10 inside venues distributes the load across many smaller hexes — the "hot partition" stops being the entire stadium and becomes per-gate.
Surge locking in fct_trips.surge_multiplier_at_request
Rider's quoted price = driver's earning potential = revenue booked. All three numbers must agree forever. Locked at journey-request, never recomputed.
Per-partition watermarks on fct_trip_events
One stadium-area producer with bad clock skew shouldn't stall the entire event-time pipeline. Per-partition watermark contains the damage to its own H3 cell.
Pre-event materialization of dim_geography for the venue area
Police-coordinated road closures are known hours in advance. Pre-write the SCD2 transitions so they're queryable from minute zero of the event.
Senior framing for mass-event scale. "The Journey → Trip → Event hierarchy doesn't change at 20K riders — what changes is the load on three specific SCD2 dimensions: dim_geography for road state, dim_vehicles for insurance, dim_fx_rate for currency. Get those three SCD2s right, use infinity-dates instead of NULLs, and shard H3 to L10 inside the venue, and the rest of the model behaves identically — surge locked, payouts reconciled, growth segmented. The chaos is contained at the dimension layer, not the fact layer."
From the map to the terrain. Sections 1–11 draw the model — clean grains, a reconciling star schema, the convoy hierarchy. Sections 12–16 add the layer a senior interviewer actually probes: what the model does when it meets production. A model that only describes the happy path is a well-drawn map of a city with nothing marked for where the roads flood or which bridges buckle under load. Entity ownership, consistency contracts, state-machine guards, failure containment, and the shard key are that missing layer — the line between a "system-design explanation" and a design that survives production.
Section 7 — SQL analysis for business units
Sample data (Journey J-500: airport convoy with 3 riders split across 2 vehicles)
Q1 — System lag: per-state transition latency using gaps-and-islands
SELECT
trip_id,
state,
event_timestamp,
LAG(event_timestamp) OVER (PARTITION BY trip_id ORDER BY event_timestamp) AS prev_event_time,
EXTRACT(EPOCH FROM (
event_timestamp -
LAG(event_timestamp) OVER (PARTITION BY trip_id ORDER BY event_timestamp)
)) AS transition_latency_sec
FROM fct_trip_events
WHERE trip_id = 'T-101'
ORDER BY event_timestamp;
Result for T-101:
state
event_timestamp
prev_event_time
transition_latency_sec
REQUESTED
08:30:00
NULL
NULL
MATCHED
08:31:00
08:30:00
60
PICKUP
08:35:00
08:31:00
240
DROPOFF
09:00:00
08:35:00
1500
Use case: high transition_latency_sec on REQUESTED→MATCHED indicates "hot partitions" or processing delays — Ops can alert when the cell-level p95 exceeds threshold.
Q2 — Dispatch latency (p50 / p95) per H3 cell per hour
WITH latency AS (
SELECT
e.trip_id,
DATE_TRUNC('hour',
MIN(CASE WHEN state='REQUESTED' THEN event_timestamp END)) AS hour_bucket,
MAX(CASE WHEN state='REQUESTED' THEN location_h3 END) AS h3_cell,
EXTRACT(EPOCH FROM (
MAX(CASE WHEN state='MATCHED' THEN event_timestamp END)
- MAX(CASE WHEN state='REQUESTED' THEN event_timestamp END)
)) AS latency_sec
FROM fct_trip_events e
GROUP BY trip_id
HAVING MAX(CASE WHEN state='MATCHED' THEN 1 END) = 1
)
SELECT hour_bucket, h3_cell, COUNT(*) AS matched_trips,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY latency_sec) AS p50_sec,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_sec) AS p95_sec
FROM latency GROUP BY 1, 2 ORDER BY p95_sec DESC;
Q3 — Cancellation breakdown by actor
SELECT terminal_state, COUNT(*) AS cnt,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct_of_total
FROM fct_trips
GROUP BY terminal_state ORDER BY cnt DESC;
Q4 — Convoy efficiency (vehicles per journey, riders per vehicle)
SELECT
j.journey_id,
j.num_riders,
j.num_vehicles_dispatched,
ROUND(j.num_riders::DECIMAL / j.num_vehicles_dispatched, 2) AS riders_per_vehicle,
COUNT(DISTINCT t.trip_id) AS actual_trips,
COUNT(DISTINCT b.rider_id) AS riders_seen
FROM fct_journeys j
JOIN fct_trips t USING (journey_id)
JOIN brg_journey_riders b USING (journey_id)
GROUP BY j.journey_id, j.num_riders, j.num_vehicles_dispatched;
Result for J-500:
journey_id
num_riders
num_vehicles
riders_per_vehicle
actual_trips
riders_seen
J-500
3
2
1.50
2
3
B. Finance · revenue reconciliation & driver payout
Q5 — Driver payout reconciliation (per-driver share vs riders' charges)
SELECT
t.driver_id,
t.trip_id,
t.trip_fare_share_usd AS revenue_basis,
SUM(b.individual_charge_usd) AS total_collected_from_riders,
COUNT(b.rider_id) AS riders_in_this_car
FROM fct_trips t
JOIN brg_journey_riders b ON t.trip_id = b.trip_id
GROUP BY t.driver_id, t.trip_id, t.trip_fare_share_usd
ORDER BY t.trip_id;
Result for J-500:
driver_id
trip_id
revenue_basis
total_collected_from_riders
riders_in_car
D_Alpha (4)
T-101
$30.00
$40.00
2
D_Bravo (5)
T-102
$30.00
$20.00
1
Critical insight: Driver Alpha's car collected $40 from riders but Alpha is paid only on $30 revenue_basis (the journey was split equally between drivers). The $10 gap goes to the platform as the "convoy coordination" fee. Without this two-grain design, you couldn't represent it.
Q6 — Pool fare allocation per rider (single-vehicle Pool, distance-prorata)
-- Same query, useful when split_type = DISTANCE_PRORATA
SELECT
b.journey_id,
b.rider_id,
b.trip_id,
b.pickup_sequence,
b.leg_distance_km,
ROUND(100.0 * b.leg_distance_km
/ SUM(b.leg_distance_km) OVER (PARTITION BY b.journey_id), 1) AS distance_share_pct,
b.individual_charge_usd,
j.total_group_fare_usd
FROM brg_journey_riders b
JOIN fct_journeys j USING (journey_id)
WHERE b.journey_id = 'J-500'
ORDER BY b.trip_id, b.pickup_sequence;
-- Invariant 1: SUM(brg.individual_charge) per journey = fct_journeys.total_group_fare
-- Invariant 2: SUM(fct_trips.trip_fare_share) per journey = fct_journeys.total_group_fare
SELECT
j.journey_id,
j.total_group_fare_usd AS reported_total,
ROUND(SUM(DISTINCT t.trip_fare_share_usd), 2) AS sum_trip_shares,
(SELECT ROUND(SUM(b.individual_charge_usd), 2)
FROM brg_journey_riders b WHERE b.journey_id = j.journey_id)
AS sum_rider_charges,
CASE
WHEN j.total_group_fare_usd != SUM(DISTINCT t.trip_fare_share_usd) THEN 'FAIL_TRIP_INVARIANT'
WHEN j.total_group_fare_usd != (SELECT SUM(individual_charge_usd)
FROM brg_journey_riders WHERE journey_id = j.journey_id)
THEN 'FAIL_RIDER_INVARIANT'
ELSE 'PASS'
END AS dq_status
FROM fct_journeys j
JOIN fct_trips t USING (journey_id)
GROUP BY j.journey_id, j.total_group_fare_usd;
-- Any FAIL row = producer regression. Required for SOX / audit.
-- Risk pattern: two drivers booked into the same journey but their PICKUP H3 cells
-- are far apart at the same moment. Either a producer bug or a fraud ring.
SELECT
e1.trip_id AS car_1,
e2.trip_id AS car_2,
e1.location_h3 AS loc_1,
e2.location_h3 AS loc_2,
EXTRACT(EPOCH FROM ABS(e1.event_timestamp - e2.event_timestamp)) AS pickup_time_gap_sec
FROM fct_trip_events e1
JOIN fct_trips t1 ON e1.trip_id = t1.trip_id
JOIN fct_trips t2 ON t1.journey_id = t2.journey_id AND t1.trip_id < t2.trip_id
JOIN fct_trip_events e2 ON e2.trip_id = t2.trip_id
WHERE e1.state = 'PICKUP' AND e2.state = 'PICKUP'
AND e1.location_h3 != e2.location_h3 -- not in same H3 cell
AND ABS(EXTRACT(EPOCH FROM (e1.event_timestamp - e2.event_timestamp))) < 600; -- within 10 min
-- Any returned row = an investigation item.
For J-500 sample data, both cars pickup at h3_a within 0 seconds of each other → no rows returned → legit convoy.
SELECT
t.trip_id,
v.model,
v.capacity AS vehicle_capacity,
COUNT(b.rider_id) AS passenger_count
FROM fct_trips t
JOIN brg_journey_riders b ON t.trip_id = b.trip_id
JOIN dim_vehicles v ON t.vehicle_id = v.vehicle_key
GROUP BY t.trip_id, v.model, v.capacity
HAVING COUNT(b.rider_id) > v.capacity;
-- Off-app riders being charged = either producer bug or driver fraud.
Q10 — Location anomaly: request vs first-pickup divergence
WITH first_pickup AS (
SELECT trip_id, location_h3 AS pickup_h3
FROM fct_trip_events
WHERE state = 'STARTED'
)
SELECT
t.trip_key, t.trip_id, t.origin_h3_key, fp.pickup_h3,
CASE WHEN t.origin_h3_key <> fp.pickup_h3 THEN 'INVESTIGATE' ELSE 'OK' END AS risk_flag
FROM fct_trips t
LEFT JOIN first_pickup fp USING (trip_id)
WHERE t.terminal_state = 'completed';
D. Product · CSAT, sentiment & growth accounting
Q11 — Trip rating distribution (1–5 stars) per driver, last 28 days
SELECT
t.driver_id,
COUNT(f.feedback_key) AS total_ratings,
ROUND(AVG(f.rating_score), 2) AS avg_rating,
COUNT(*) FILTER (WHERE f.rating_score <= 2) AS low_ratings,
ROUND(100.0 * COUNT(*) FILTER (WHERE f.rating_score <= 2)
/ NULLIF(COUNT(*), 0), 1) AS pct_low_ratings
FROM fct_trips t
JOIN fct_trip_feedback f ON f.trip_key = t.trip_key
WHERE f.feedback_timestamp >= CURRENT_DATE - INTERVAL '28 days'
GROUP BY t.driver_id
ORDER BY avg_rating ASC;
-- Drivers with pct_low_ratings > 5% are flagged for retraining or offboarding.
Q12 — Rolling 7d & 28d CSAT (% of ratings ≥ 4)
WITH daily_ratings AS (
SELECT DATE(feedback_timestamp) AS d,
COUNT(*) AS n_ratings,
COUNT(*) FILTER (WHERE rating_score >= 4) AS n_satisfied
FROM fct_trip_feedback
GROUP BY 1
)
SELECT d,
ROUND(100.0 * SUM(n_satisfied) OVER (ORDER BY d ROWS 6 PRECEDING)
/ NULLIF(SUM(n_ratings) OVER (ORDER BY d ROWS 6 PRECEDING), 0), 1) AS csat_7d,
ROUND(100.0 * SUM(n_satisfied) OVER (ORDER BY d ROWS 27 PRECEDING)
/ NULLIF(SUM(n_ratings) OVER (ORDER BY d ROWS 27 PRECEDING), 0), 1) AS csat_28d
FROM daily_ratings
ORDER BY d;
Q13 — Sentiment by feedback category (NLP signal × free-text)
SELECT
feedback_category,
COUNT(*) AS n_comments,
ROUND(AVG(sentiment_score), 3) AS avg_sentiment,
ROUND(AVG(rating_score), 2) AS avg_stars,
COUNT(*) FILTER (WHERE sentiment_score < -0.3) AS n_strongly_negative
FROM fct_trip_feedback
WHERE feedback_timestamp >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY feedback_category
ORDER BY avg_sentiment ASC;
-- Surfaces what's driving the dissatisfaction — vehicle, route, app, payment, driver.
WITH user_activity AS (
SELECT b.rider_id,
MAX(j.request_timestamp) AS last_active_ts
FROM brg_journey_riders b
JOIN fct_journeys j USING (journey_id)
GROUP BY b.rider_id
)
SELECT
COUNT(*) FILTER (WHERE last_active_ts >= CURRENT_TIMESTAMP - INTERVAL '1 day') AS l1_active,
COUNT(*) FILTER (WHERE last_active_ts >= CURRENT_TIMESTAMP - INTERVAL '7 days') AS l7_active,
COUNT(*) FILTER (WHERE last_active_ts >= CURRENT_TIMESTAMP - INTERVAL '28 days') AS l28_active,
COUNT(*) AS total_riders
FROM user_activity;
-- Standard L1/L7/L28 active KPIs from Reforge growth-accounting framework.
Worked numerical example — Convoy J-500 (EQUAL_SPLIT, $60 group fare)
5 friends call a group ride. 3 are billable riders (R_1, R_2, R_3); the others are children covered under R_1's account. Group fare quoted: $60.00.
System dispatches 2 vehicles because the group exceeds single-car capacity → fct_journeys.num_vehicles_dispatched = 2.
Driver Alpha's car (T-101) takes 2 riders (R_1, R_2). Driver Bravo's car (T-102) takes 1 rider (R_3).
Per-rider charge (EQUAL_SPLIT): $60 / 3 = $20.00 per rider. Stored in brg_journey_riders.individual_charge_usd.
Per-driver share: $60 / 2 = $30.00 per driver. Stored in fct_trips.trip_fare_share_usd.
Note the asymmetry: Driver Alpha's car collected $40 in charges from his 2 riders, but Alpha's payout basis is $30 — the platform retains $10 as a convoy-coordination fee. Driver Bravo's car collected $20 from his 1 rider but is paid $30 — the platform covers the $10 gap. This is fair: the journey was equally split between drivers regardless of which car each rider sat in. Without the two-grain design (trip share AND rider charge), you couldn't represent it.
Section 8 — Why this works for the convoy scenario
Unique driver accounts. By placing driver_id on fct_trips (not fct_journeys), the model honors the rule that a (driver, journey) combination is unique — same driver can't take two trips for one journey, but two drivers can share one journey.
Cost flexibility. The bridge supports multi-destination and DISTANCE_PRORATA logic. If Rider 1 exits at Mile 5 and Rider 2 at Mile 10, individual_charge_usd is computed pro-rata using leg_distance_km captured from fct_trip_events — same model, different split_type.
Auditability. Finance traces every cent from a rider's wallet through brg_journey_riders directly to the specific driver's fct_trips payout. The two continuous invariants (Q7) catch producer bugs before they hit a quarterly SOX report.
Scalability. "Pool for 4", "Bus", "Wedding party of 12" — no schema migration. Just more rows in brg_journey_riders; the journey grain is unchanged.
Eliminates granularity mismatch. The 3-tier hierarchy (journey, trip, event) lets each stakeholder query at their natural grain without filters or self-joins.
Section 9 — Business model evolution: single-sided → two-sided → three-sided → driverless
The Uber data model isn't static — it's shaped by the platform's market structure, and that structure is shifting. As Uber harnesses platform leverage, AI, and driverless technologies, the rider may eventually hail a ride that involves no human driver at all. A senior architect should be able to describe how the data model morphs at each platform shape, and what the driverless future demands.
Design a complete data model for the Trip Lifecycle — including the complex convoy / multi-driver scenario where one group request is split across two or more vehicles. Must support Operations (live ETA), Finance (driver payouts), and Risk (fraud).
Senior framing. "The clean star schema is the steady-state view; production runs a reconciliation engine beneath it. Idempotency keys absorb replays, optimistic locks absorb races, append-only ledgers absorb rollbacks, and a stuck-state sweeper absorbs the events that simply never arrive. The model isn't only correct — it's correct and self-healing."
Surge pricing & supply/demand state
Uber · supply/demand · dynamic pricing
Design a model for Uber's surge engine that captures the supply/demand state per geography per minute, the surge decisions made, and the actual outcome of those decisions — supporting the live engine, post-hoc effectiveness analysis, capacity planning, and pricing-team experimentation.
Why this is hard — the design tension
Surge isn't one fact — it's a causal chain: state observed → model invoked → multiplier set → riders react → drivers reposition → state changes. Each link needs its own grain or you lose the ability to evaluate the engine. The model also has to replay history with the surge model that was active then, not today's — pricing teams A/B test new models monthly. And the geographic skew is brutal: midtown Manhattan on New Year's Eve has 100× the rows of suburban Queens.
Section 1 — Dimensional Model Overview
Star Schema with a State → Decision → Outcome causal chain.
Fact Tables
fct_geo_state_minute — The "Observation" grain. One row per (H3 cell × minute). Captures open requests, available drivers, ratio, ETA, weather, event signals.
fct_surge_decision — The "Decision" grain. One row per surge-multiplier change per cell. Captures which model version made the call and on what input features.
fct_surge_outcome — The "Effectiveness" grain. One row per surge decision joined to its post-decision N-minute outcome (did supply rebalance? did the price stick?).
brg_zone_cells — Hierarchy bridge. Many H3 cells roll up to a surge zone (operational region used by the pricing team).
Dimension Tables
dim_h3_cell — H3 L8 hexagons. SCD1 (geometry is static).
dim_surge_zone — Operational regions (e.g. "SF Downtown", "JFK Airport"). SCD2 — boundaries get re-drawn.
dim_weather — Weather state per cell per hour (rain, snow, temperature). SCD1 snapshot.
dim_event — Concerts, sports games, conferences. SCD1 (each event is a row, not a state).
Section 10 — Geospatial indexing: GeoHash vs H3 vs Quadtree
The model leans on H3 everywhere — dim_h3_cell, fct_geo_state_minute per cell. A senior interviewer will ask why H3 and not the alternatives. Have the trade-off ready.
Index
Strength
Weakness
Best fit
GeoHash (rectangles, string-prefix)
A prefix means containment — trivially range-scannable in any KV store or B-tree index, zero special library
Non-uniform cell area (distorts toward the poles); 8 neighbors with diagonal ambiguity; adjacent cells can share no prefix (boundary jitter)
Cheap proximity inside a system you already run (Postgres, DynamoDB)
H3 (hexagons)
Six equidistant neighbors, no diagonal ambiguity; near-equal cell area; clean resolution hierarchy
Hexagons don't perfectly nest (parent/child is approximate); needs the H3 library
Ride-hailing — supply/demand per cell, surge, ETA. Uber's own choice.
Quadtree (adaptive rectangles)
Depth adapts to density — dense cities get deep nodes, empty ocean stays shallow
The tree mutates as density shifts; harder to shard and rebalance
Highly skewed, slowly-changing density (map tiles, static POI)
Why H3 wins here: surge is a supply/demand ratio per cell, and the math only behaves if cells are equal-area and neighbors are equidistant — you average a cell against its ring of neighbors. Hexagons give that; GeoHash rectangles do not (a diagonal neighbor sits 1.4× farther than an edge neighbor, biasing the ratio). The choice of dim_h3_cell is not cosmetic — it is what makes driver_to_request_ratio a trustworthy number.
Storage — Redis GEO vs a custom H3 grid:
Option
Mechanism
Trade-off
Redis GEO
Sorted set, GeoHash-encoded score; GEOSEARCH by radius
Zero infra to build, radius search built in — but it is GeoHash underneath (same non-uniformity) and holds no per-cell aggregate state
Custom H3 grid
h3_key → {open_requests, available_drivers, …} in Redis hashes / Pinot
Per-cell state is a first-class key; the surge engine reads one key per cell. More to build — you own the indexing
The surge engine reading "a Redis snapshot of the latest minute" keyed by cell is the custom H3 grid — chosen over raw Redis GEO because surge needs per-cell aggregate state, not point-radius search. And resolution is a dial: L10 localizes demand more sharply than L8 but multiplies the cell count (and the per-minute write cost) — so the model picks resolution per zone (L8 cities, L10 venues), never globally.
Section 11 — Update amplification & location staleness
Two production realities the clean per-minute grain hides — and both are favourite senior probes.
Update amplification. Each driver app emits a location ping roughly every 4 s. With D drivers online that is D/4 writes per second. A naïve design fans every single ping out to the matching index, the surge counter, the rider's live map, and the ETA service — one ping becomes four downstream writes, and matching re-evaluates candidate sets on every ping. The counter-moves:
Problem
Counter-move
One ping → four synchronous downstream writes
Write the ping once to the location stream; consumers pull or subscribe — they are never pushed to synchronously. The location service is the single fan-in point.
Matching re-runs on every ping
Matching reads the cell snapshot on a fixed cadence (per minute, or on a rider request) — decouple ping rate from match rate.
fct_geo_state_minute touched per ping
Pre-aggregate at the edge: count drivers per cell in a tumbling 1-minute window — one row per cell per minute, not one per ping. (This is what the grain already implies; make it explicit.)
Hot-cell ping storm (stadium)
Sub-divide that cell to L10 so the per-cell write rate stays bounded (§8 skew note).
Location staleness. A driver's last fix is 25 s old — include them in matching, or not? The rule is degrade, don't exclude — and degrade by confidence, not a hard cutoff:
Staleness
Treatment
< 5 s
Fresh — full weight in matching and ETA.
5–30 s
Stale-but-usable — keep in the candidate set, widen the ETA confidence interval, de-rank vs a fresh driver at equal distance.
30–120 s
Suspect — eligible only if the cell is supply-starved (a stale driver beats no driver); flag the ETA low-confidence.
> 120 s
Excluded — treat as offline; the driver has likely lost connectivity.
A hard 30-second cutoff is the amateur move: dropping every stale driver in a supply-starved cell shrinks the matchable pool, pushes the multiplier higher, and the surge model starts fighting itself. Staleness is a ranking signal weighted against supply pressure, not a binary filter — so carry location_age_sec alongside every location read and let the matcher weight it.
Senior framing. "A location is a value plus an age. The amateur design treats every GPS fix as equally true and fans every ping out synchronously; the senior design writes the ping once, lets consumers pull on their own cadence, and carries location_age_sec so matching can weight a stale driver instead of dropping one — because in a supply-starved cell a 25-second-old driver still beats no driver."
Senior framing. "The surge model isn't a column — it's a fact table joined to a versioned dimension. fct_surge_decision + frozen input_features_json + SCD2 dim_surge_model is the contract that lets the pricing team A/B-test new models confidently and lets regulators get a defensible answer to 'why was this rider charged 3.2× on this date?' — every time, forever."
Section 2 — The Logical Data Model
Observation Fact: fct_geo_state_minute
Column
Type
Description
h3_key
FK
H3 cell observed.
minute_bucket_ts
TIMESTAMP_TZ
Truncated to the minute.
open_requests
INT
Riders waiting in this cell.
available_drivers
INT
On-app, not currently on a trip.
driver_to_request_ratio
DECIMAL(8,3)
Computed for convenience.
predicted_eta_sec
INT
From the ETA service at this moment.
weather_key
FK
Weather conditions in this cell.
event_keys
ARRAY<FK>
Active events influencing this cell.
Decision Fact: fct_surge_decision
Column
Type
Description
decision_key
PK
Surrogate.
h3_key
FK
Cell the decision applies to.
decision_ts
TIMESTAMP_TZ
When the new multiplier became active.
previous_multiplier
DECIMAL(4,2)
Just before this decision.
new_multiplier
DECIMAL(4,2)
The applied surge factor (1.0 = no surge).
model_key
FK
Which dim_surge_model version made the call.
input_features_json
JSON
Frozen features the model saw — required for replay.
Reads fct_geo_state_minute for the latest minute, joins dim_surge_model to get the active model, computes a multiplier, writes fct_surge_decision.
Round-trip target: under 500 ms per cell. Reads come from a Redis snapshot of the latest minute; writes append to Kafka, then to the warehouse.
Pricing Team · effectiveness analysis
Did the surge work? A surge "worked" if (delta_drivers > 0 within 10 min) AND (abandoned_requests stayed flat). Both signals come from fct_surge_outcome.
Pricing elasticity: for each multiplier band, what % of riders abandoned? Plot abandonment vs multiplier — the inflection point is the pain threshold.
Model A/B: two model versions live in dim_surge_model; route a % of cells to each. Compare realized_revenue_usd + abandoned_requests per model.
Capacity Planning · forecasting
Train demand forecast on historical fct_geo_state_minute. MAPE measured against later observed minutes.
Identify chronic supply gaps — cells where driver_to_request_ratio is below 0.5 for >30% of peak hours. These get long-term incentive programs.
Section 4 — Why versioning the surge model matters (and why input_features_json is mandatory)
Without a versioned dim_surge_model + frozen input features, you cannot replay history. Three failure modes show up the moment the pricing team A/B tests:
Failure 1: model drift hides a regression. If today's model retroactively scores yesterday's input, a buggy current-model version "explains" yesterday's outcome cleanly — even when the actual yesterday's model produced a wildly different result.
Failure 2: feature drift. A weather data source upgrades from "rain_mm" to "precip_intensity" mid-week. Without frozen input_features_json, you can't tell whether yesterday's surge was driven by rain or by something else.
Failure 3: dispute resolution. A regulator asks "why was this rider charged 3.2× on this date?" — only frozen features + the SCD2 model row gives you a defensible answer.
The contract: every fct_surge_decision row carries a deterministic, replay-safe snapshot of the inputs. Same inputs + same model_key → identical multiplier, every time, forever.
Section 5 — Conceptual data flow (the State → Decision → Outcome chain)
Enhanced graphical data model — Surge Pricing & Effectiveness
Section 9 — The fare lifecycle: estimate, commit, and the audit replay
Surge sets a multiplier — but a senior answer models the whole fare, and a fare is not one number. It has a lifecycle with two hard time boundaries that an interviewer expects you to separate.
Stage (when)
Fare object · layer
Mutability
Authoritative for
At request
Upfront estimate · UX / quote service
Re-quoted every few seconds while the rider drags the pin
What the rider is shown
At acceptance
Quoted fare · pricing, frozen
Immutable — surge, base rates, FX all locked here
The price contract
At dropoff
Committed fare · ledger
Append-only; corrections are new rows, never an UPDATE
What the rider is charged
The two boundaries are time-of-decision (request/accept — surge multiplier, FX rate, pricing-model version all freeze here) and time-of-charge (dropoff — the committed fare may differ from the quote: route changed, wait time, tolls, a promo). The delta between quote and committed is itself an auditable fact. They must be separate fields because they live in different consistency regimes: the estimate is fast and eventually-consistent (Redis, re-computed every few seconds); the committed fare is strong-consistency and append-only (the ledger). Fuse them into one column and either the ledger inherits the jitter of the UX quote, or the quote inherits the latency of the ledger.
The audit replay — "recompute exactly what the rider was charged." Finance and regulators demand reproducibility. Add one FK to fct_trips — pricing_model_version — pointing at a new SCD2 dimension dim_pricing_model (the base-fare / per-km / per-min rate card), exactly parallel to the existing dim_surge_model. With surge_multiplier and fx_rate_to_usd already frozen on the row, the fare becomes a pure function of frozen inputs:
-- Reproduce the exact fare a rider was charged: join the fare to the pricing
-- model active AT the trip's request time — never today's rate card.
SELECT t.trip_id,
t.total_fare_usd AS charged,
(pm.base_fare + pm.per_km * t.distance_km
+ pm.per_min * t.duration_min) * t.surge_multiplier AS recomputed
FROM fct_trips t
JOIN dim_pricing_model pm
ON pm.pricing_model_version = t.pricing_model_version
AND t.request_timestamp BETWEEN pm.effective_from
AND COALESCE(pm.effective_to, '9999-12-31');
-- charged = recomputed → the trip is auditable. A mismatch is a real bug.
Versioned pricing logic follows the same discipline as the surge model (Section 4): the rate card is an SCD2 row, not code baked into the ETL. Historical recalculation never runs today's formula over yesterday's trips — it joins as-of-trip-time. A/B pricing experiments run two pricing_model_versions at once; fct_trips.pricing_model_version records which one priced each trip, so revenue and conversion compare cleanly per version.
The trap: a single fare column UPDATEd in place from estimate → final. Now you cannot tell what the rider was quoted from what they were charged, cannot explain a dispute, and cannot A/B a pricing change. Estimate and committed fare are different facts at different consistency levels — model them as such.
Section 7 — SQL analysis for business units
Sample data (a single H3 cell, 5 minutes of supply/demand state and 1 surge decision)
Q1 — Cells currently below ratio 0.5 (live surge candidates)
WITH latest_state AS (
SELECT h3_key, MAX(minute_bucket_ts) AS latest_ts
FROM fct_geo_state_minute
WHERE minute_bucket_ts >= NOW() - INTERVAL '2 minutes'
GROUP BY h3_key
)
SELECT s.h3_key,
s.driver_to_request_ratio,
s.open_requests, s.available_drivers,
s.predicted_eta_sec,
w.precip_intensity, e.event_name
FROM fct_geo_state_minute s
JOIN latest_state l USING (h3_key)
LEFT JOIN dim_weather w ON s.weather_key = w.weather_key
LEFT JOIN dim_event e ON e.event_key = ANY(s.event_keys)
WHERE s.minute_bucket_ts = l.latest_ts
AND s.driver_to_request_ratio < 0.5
ORDER BY s.driver_to_request_ratio ASC;
B. Pricing team · effectiveness analysis
Q2 — Surge effectiveness: % of decisions that pulled supply within 10 min
SELECT
d.model_key,
COUNT(*) AS n_decisions,
COUNT(*) FILTER (WHERE o.delta_drivers > 0) AS pulled_supply,
ROUND(100.0 * COUNT(*) FILTER (WHERE o.delta_drivers > 0)
/ COUNT(*), 1) AS effectiveness_pct,
ROUND(AVG(o.delta_drivers), 1) AS avg_delta_drivers,
ROUND(AVG(o.abandoned_requests), 1) AS avg_abandoned
FROM fct_surge_decision d
JOIN fct_surge_outcome o USING (decision_key)
WHERE o.n_minutes_after = 10
AND d.decision_ts >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY d.model_key
ORDER BY effectiveness_pct DESC;
Q3 — Pricing elasticity: abandonment by multiplier band
SELECT
CASE
WHEN d.new_multiplier < 1.5 THEN '1.0-1.5x'
WHEN d.new_multiplier < 2.0 THEN '1.5-2.0x'
WHEN d.new_multiplier < 3.0 THEN '2.0-3.0x'
ELSE '3.0x+'
END AS multiplier_band,
COUNT(*) AS n_decisions,
ROUND(AVG(o.abandoned_requests), 1) AS avg_abandoned,
ROUND(AVG(o.completed_trips), 1) AS avg_completed,
ROUND(100.0 * AVG(o.abandoned_requests)
/ NULLIF(AVG(o.abandoned_requests + o.completed_trips), 0), 1) AS abandonment_pct,
ROUND(AVG(o.realized_revenue_usd), 2) AS avg_revenue_usd
FROM fct_surge_decision d
JOIN fct_surge_outcome o USING (decision_key)
WHERE o.n_minutes_after = 10
GROUP BY multiplier_band
ORDER BY multiplier_band;
-- The inflection point in abandonment_pct is the rider pain threshold.
Q4 — Model A/B comparison: revenue per cell-minute under v3.1 vs v3.2
SELECT
d.model_key,
COUNT(*) AS n_decisions,
ROUND(SUM(o.realized_revenue_usd), 2) AS total_revenue,
ROUND(SUM(o.realized_revenue_usd) / COUNT(*), 2) AS rev_per_decision,
ROUND(AVG(o.completed_trips), 2) AS avg_completed,
ROUND(AVG(o.abandoned_requests), 2) AS avg_abandoned
FROM fct_surge_decision d
JOIN fct_surge_outcome o USING (decision_key)
WHERE d.decision_ts >= CURRENT_DATE - INTERVAL '14 days'
AND d.model_key IN ('model_v3.1', 'model_v3.2')
AND o.n_minutes_after = 10
GROUP BY d.model_key;
-- Statistical significance test runs in the model card; this is the headline.
C. Capacity planning · forecast accuracy & chronic gaps
Q5 — Demand forecast MAPE per cell (predicted vs actual)
SELECT
h3_key,
COUNT(*) AS n_minutes,
ROUND(AVG(ABS(predicted_demand - open_requests)
/ NULLIF(open_requests, 0)) * 100, 2) AS mape_pct,
ROUND(AVG(predicted_demand - open_requests), 2) AS avg_bias,
COUNT(*) FILTER (WHERE predicted_demand < open_requests) AS underforecast_min
FROM fct_geo_state_minute
WHERE minute_bucket_ts >= CURRENT_DATE - INTERVAL '7 days'
AND open_requests > 0
GROUP BY h3_key
HAVING AVG(ABS(predicted_demand - open_requests) / NULLIF(open_requests, 0)) > 0.20
ORDER BY mape_pct DESC;
-- Cells with MAPE > 20% need feature engineering attention.
SELECT
h3_key,
COUNT(*) AS peak_minutes,
COUNT(*) FILTER (WHERE driver_to_request_ratio < 0.5) AS under_supplied_min,
ROUND(100.0 * COUNT(*) FILTER (WHERE driver_to_request_ratio < 0.5)
/ COUNT(*), 1) AS pct_under_supplied
FROM fct_geo_state_minute
WHERE minute_bucket_ts >= CURRENT_DATE - INTERVAL '30 days'
AND EXTRACT(HOUR FROM minute_bucket_ts) BETWEEN 17 AND 21
GROUP BY h3_key
HAVING COUNT(*) FILTER (WHERE driver_to_request_ratio < 0.5) * 1.0 / COUNT(*) > 0.30
ORDER BY pct_under_supplied DESC;
-- These cells get long-term driver incentive programs (per-trip bonus, guarantees).
Section 8 — Why this works
Causal chain preserved. State → Decision → Outcome are three separate facts. Each can be queried in isolation, but they join cleanly via decision_key. Effectiveness analysis is one JOIN, not a self-join puzzle.
Replay-safe by construction. Frozen input_features_json + SCD2 dim_surge_model means any historical surge can be recomputed exactly. Audits, disputes, and A/B baselines all work.
Skew managed. H3 L8 cells partition the geography evenly; downtown skew is contained per-cell. Sub-partition by HASH(h3_key) MOD 8 for the hottest few cells if needed.
Cold-start covered. When a new city launches, brg_zone_cells can map untrained cells to their parent zone's averages until enough data accrues.
Worked example — surge math at 19:02 in h3_sf_market
At 19:01, ratio drops from 1.50 → 0.50 (concert lets out). 7 drivers available, 14 open requests. ETA jumps from 240s → 360s.
At 19:02, ratio crashes to 0.23 (5 drivers, 22 requests). The model fires a surge decision: new_multiplier = 2.5.
Frozen input_features_json: {ratio:0.23, eta_sec:540, event:"concert_chase_center", prev_minute_ratio:0.50}. Same inputs to model_v3.2 will always yield 2.5 — replay is deterministic.
10 min later (19:12): drivers from neighboring cells reposition (delta_drivers = +9), some riders abandon (5 abandoned out of 19), but 14 trips complete. Realized revenue: $1,240.
Effectiveness verdict: the surge worked — supply pulled in (delta_drivers > 0), revenue captured, but at the cost of 5 abandoned trips. The pricing team weighs $1,240 captured vs ~$200 estimated lost-to-abandonment.
Drill · answer this in 90 seconds:
Design a model for Uber's surge engine that captures the supply/demand state per geography per minute, the surge decisions made, and the actual outcome of those decisions — supporting the live engine, post-hoc effectiveness analysis, capacity planning, and pricing-team experimentation.
Senior framing. "A location is a value plus an age. The amateur design treats every GPS fix as equally true and fans every ping out synchronously; the senior design writes the ping once, lets consumers pull on their own cadence, and carries location_age_sec so matching can weight a stale driver instead of dropping one — because in a supply-starved cell a 25-second-old driver still beats no driver."
Order & courier dispatch (with stacked deliveries)
DoorDash / Uber Eats · 3-sided marketplace
Model DoorDash's order lifecycle from customer placement → restaurant prep → courier pickup → delivery, supporting Customers (on-time SLA), Couriers (earnings), Restaurants (operations), with native handling of stacked orders where one courier delivers multiple orders on a single dispatch.
Why this is hard — the design tension
This is a 3-sided marketplace (customer + restaurant + courier) where one fact table can't satisfy all three. Stacked orders make it worse: one courier dispatch covers multiple orders, but each order has its own customer SLA and its own restaurant relationship. Mutable tips (editable up to 24h post-delivery) demand append-only economics. The model must read clean for each stakeholder without forcing the others to filter.
Section 1 — Dimensional Model Overview
Fact Tables (the order/dispatch split + bridge)
fct_orders — The "Customer" grain. One row per customer order. Drives customer SLA + restaurant operations.
fct_dispatches — The "Courier" grain. One row per courier dispatch. May cover 1 (single) or N (stacked) orders. Drives courier earnings.
brg_dispatch_orders — The "Stacking" bridge. Links a dispatch to its 1-N orders, with leg sequence and per-order distance.
fct_order_state_events — The "Telemetry" grain. One row per order state transition (PLACED, CONFIRMED, READY, PICKED_UP, DELIVERED, CANCELLED).
Dimension Tables
dim_customers · dim_addresses (SCD2 on address — for legal/refund history)
dim_restaurants (SCD2 — hours, menu version, status)
Q1 — Stage-by-stage latency for the order, with whose-fault attribution
SELECT
order_id,
EXTRACT(EPOCH FROM (confirmed_ts - placed_ts)) / 60 AS confirm_min,
EXTRACT(EPOCH FROM (ready_ts - confirmed_ts)) / 60 AS prep_min,
EXTRACT(EPOCH FROM (picked_up_ts - ready_ts)) / 60 AS pickup_wait_min,
EXTRACT(EPOCH FROM (delivered_ts - picked_up_ts)) / 60 AS drive_min,
EXTRACT(EPOCH FROM (delivered_ts - placed_ts)) / 60 AS total_min,
CASE
WHEN (ready_ts - confirmed_ts) > INTERVAL '20 min' THEN 'restaurant_slow'
WHEN (picked_up_ts - ready_ts) > INTERVAL '10 min' THEN 'courier_late_pickup'
WHEN (delivered_ts - picked_up_ts) > INTERVAL '20 min' THEN 'courier_slow_drive'
ELSE 'on_track' END AS bottleneck
FROM fct_orders WHERE DATE(placed_ts) = '2025-05-01';
B. Courier · earnings & stack uplift
Q2 — Earnings per hour by stacked vs single dispatch
SELECT
is_batched,
COUNT(*) AS dispatches,
ROUND(SUM(payout_usd) /
SUM(EXTRACT(EPOCH FROM (dispatch_end_ts - dispatch_start_ts))/3600), 2) AS earnings_per_hr,
ROUND(SUM(payout_usd) / SUM(total_orders), 2) AS payout_per_order,
ROUND(AVG(total_distance_km / total_orders), 2) AS km_per_order
FROM fct_dispatches
WHERE DATE(dispatch_start_ts) BETWEEN '2025-05-01' AND '2025-05-31'
GROUP BY is_batched;
-- Quantifies the courier value of stacking; if stacked is < 1.3× single payout/hr, batching isn't worth it.
C. Restaurant · prep time & cancellation
Q3 — Prep time p50/p95 per restaurant per day
SELECT
restaurant_id,
DATE(placed_ts) AS d,
COUNT(*) AS orders,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM (ready_ts - confirmed_ts))/60) AS prep_p50_min,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM (ready_ts - confirmed_ts))/60) AS prep_p95_min,
ROUND(100.0 * COUNT(*) FILTER (WHERE terminal_state = 'cancelled_restaurant') / COUNT(*), 1) AS cancel_pct
FROM fct_orders
WHERE confirmed_ts IS NOT NULL AND ready_ts IS NOT NULL
GROUP BY restaurant_id, DATE(placed_ts)
ORDER BY prep_p95_min DESC;
-- Restaurants with prep_p95 > 25min get warned; > 35min get throttled (lower placement priority).
Section 8 — Why this works
Three stakeholders, three grains, one model. Customer queries hit fct_orders; courier queries hit fct_dispatches; the bridge ties them only when stacking math matters.
Stack uplift is measurable. Compare is_batched=TRUE vs FALSE earnings/hr — if stacking doesn't beat single by ~30%, dispatch should stop offering it.
Mutable tips handled cleanly. Append correction rows; tip_locked_ts tracks the immutability boundary. Finance restates from tip_locked_ts.
Stage-attribution. The bottleneck CASE WHEN in Q1 turns "the order was 18 min late" into "the restaurant prep took 22 min" — actionable per stakeholder.
Senior framing. "The 3-sided design lets us answer three different stakeholders in one model: was the customer's order on time? (joins on fct_orders), did the courier earn enough per hour? (aggregates fct_dispatches), did the restaurant lose orders to long prep times? (joins on dim_restaurants). The bridge is what makes stacking a first-class concept rather than a hack."
Worked example — Stacked dispatch DSP-99
Courier X accepts dispatch at 18:18 with 2 stacked orders from RST_PIZZA. Pickup is shared (single restaurant); 2 different drop locations.
Order O-201 delivered first at 18:34 (16 min after pickup, 2.0 km).
Order O-202 delivered second at 18:42 (24 min after pickup, additional 2.2 km from drop1).
Courier earnings: base_pay $6.50 + tips $11.00 = $17.50 for 24 min of dispatch time → $43.75/hr. A single-order dispatch in the same area pays ~$28/hr.
Customer SLAs: O-201 on-time (16 min total); O-202 was 8 min late (waited for O-201 drop). Stacking trade-off: courier earnings up, second customer's SLA at risk.
Drill · answer this in 90 seconds:
Model DoorDash's order lifecycle from customer placement → restaurant prep → courier pickup → delivery, supporting Customers (on-time SLA), Couriers (earnings), Restaurants (operations), with native handling of stacked orders where one courier delivers multiple orders on a single dispatch.
Senior framing. "The 3-sided design lets us answer three different stakeholders in one model: was the customer's order on time? (joins on fct_orders), did the courier earn enough per hour? (aggregates fct_dispatches), did the restaurant lose orders to long prep times? (joins on dim_restaurants). The bridge is what makes stacking a first-class concept rather than a hack."
Design a model for Google search ads that captures the entire chain from auction clearing → impression → click → conversion, supporting advertiser billing, quality-score recomputation, ranker training, incrementality measurement, and re-runnable attribution windows.
Why this is hard — the design tension
This is a causal chain across 4 facts with extreme asymmetry: 2 billion ad requests/day → 200 billion targeting lookups → ~100 candidate ads per request → 1 winner. Loser logs are 20× winner volume. Auctions clear in <100ms but conversions land days later. The model must keep the join key (auction_decision_id) deterministic across all stages, support re-attribution when models change, and protect privacy under ATT/Sandbox.
Section 1 — Dimensional Model Overview
Fact Tables (the auction chain)
fct_auctions — The "Decision" grain. One row per auction clearing. Carries auction_decision_id.
fct_impressions — The "Served" grain. One row per impression (auction winner served on the page).
fct_clicks — The "Click" grain. One row per click on an impression. Mutable: invalid-click invalidation comes later.
fct_conversions — The "Conversion" grain. One row per advertiser-reported conversion event.
fct_attributions — The "Attribution" grain. One row per (conversion × eligible-touchpoint × model × run). Append-only per attribution_run_id.
fct_auction_losers — The "Counterfactual" grain. Sampled losing candidates (1%) for ranker training.
Re-attribution: when the data-driven model upgrades from v2→v3, run a new attribution_run_id over the same conversions and impressions. Old run is preserved — no data lost.
Incrementality: compare conversion-rate of exposed (impression served) vs counterfactual (won auction but didn't serve due to PSA / blank slot) — fct_auction_losers partly enables this.
One key, four facts.auction_decision_id is generated once at auction-time and travels with the impression, click, conversion. No fuzzy joins, no timestamp matching.
Re-runnable attribution. Attribution is computed, not collected. New models = new attribution_run_id; the latest_run view picks the most recent. Advertisers can compare last-click vs data-driven side by side.
Invalid clicks as a slowly-changing fact. Click validation is asynchronous (bot detection takes hours to days). Append is_invalid=TRUE with invalidation_ts rather than UPDATE — preserves historical billing audit.
Loser sampling at 1%. 20× winners by volume. Reservoir sample to 1% and re-weight in training; full retention is unaffordable.
Enhanced graphical data model — Search Ads Auction Chain
Section 7 — SQL analysis for business units
Sample data — Nike "running shoe" auction
-- Auction at 14:32:01 — 5 candidates compete; Nike wins on ad_rank 0.0751
INSERT INTO fct_auctions VALUES
('AUC_NIKE_001','q_runningshoe','kw_runningshoe','U_TOK_42','dev_iphone',
'2025-05-01 14:32:01.123','AD_NIKE_RUN','1.92','1.0','1.92','standard');
-- Impression served
INSERT INTO fct_impressions VALUES
('IMP_001','AUC_NIKE_001','AD_NIKE_RUN',0.034,0.92,'2025-05-01 14:32:01.180', 0.0751);
-- User clicks 4 seconds later
INSERT INTO fct_clicks VALUES ('CLK_001','IMP_001','2025-05-01 14:32:05',FALSE,NULL);
-- Conversion 17 hours later (next-day purchase)
INSERT INTO fct_conversions VALUES
('CONV_001','ADV_NIKE','2025-05-02 07:42:00','purchase',129.99);
-- Two attribution rows: last-click model + DDA model on the same conversion
INSERT INTO fct_attributions VALUES
('ATTR_001','CONV_001','IMP_001','CLK_001','MODEL_LAST_CLICK','RUN_2025_05_02', 1.000, 'click_24h', 129.99),
('ATTR_002','CONV_001','IMP_001','CLK_001','MODEL_DDA_v3', 'RUN_2025_05_02', 0.620, 'click_24h', 80.59);
-- Sampled losers (1%)
INSERT INTO fct_auction_losers VALUES
('AUC_NIKE_001','AD_ASICS_GEL', 0.0572, 100), -- weight = 1/sample_rate
('AUC_NIKE_001','AD_STRAVA', 0.0408, 100);
WITH latest_run AS (
SELECT MAX(attribution_run_id) AS run_id FROM fct_attributions
),
campaign_metrics AS (
SELECT
c.campaign_id,
COUNT(DISTINCT i.impression_id) AS impressions,
COUNT(DISTINCT cl.click_id) FILTER (WHERE cl.is_invalid = FALSE) AS clicks,
SUM(a.clearing_price_usd) AS spend_usd,
COUNT(DISTINCT atr.conversion_id) AS conversions,
SUM(atr.attributed_value_usd) AS attributed_revenue
FROM fct_auctions a
JOIN fct_impressions i USING (auction_decision_id)
JOIN dim_creatives cr ON i.creative_id = cr.creative_id
JOIN dim_campaigns c ON cr.campaign_id = c.campaign_id
LEFT JOIN fct_clicks cl USING (impression_id)
LEFT JOIN fct_attributions atr ON atr.click_id = cl.click_id
AND atr.attribution_run_id = (SELECT run_id FROM latest_run)
AND atr.model_id = 'MODEL_LAST_CLICK'
WHERE a.auction_ts >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY c.campaign_id
)
SELECT campaign_id, impressions, clicks, spend_usd, conversions, attributed_revenue,
ROUND(100.0 * clicks / NULLIF(impressions, 0), 3) AS ctr_pct,
ROUND(spend_usd / NULLIF(clicks, 0), 2) AS cpc_usd,
ROUND(spend_usd / NULLIF(conversions, 0), 2) AS cpa_usd,
ROUND(attributed_revenue / NULLIF(spend_usd, 0), 2) AS roas
FROM campaign_metrics ORDER BY roas DESC;
B. Attribution & measurement
Q2 — Last-click vs DDA: which model gives higher attributed revenue?
SELECT
atr.model_id,
COUNT(DISTINCT atr.conversion_id) AS conversions,
ROUND(SUM(atr.attributed_value_usd), 2) AS total_attributed_value,
ROUND(AVG(atr.fractional_credit), 3) AS avg_credit
FROM fct_attributions atr
WHERE atr.attribution_run_id = (SELECT MAX(attribution_run_id) FROM fct_attributions)
GROUP BY atr.model_id;
-- Last-click gives full credit (credit=1.0). DDA distributes — useful for diagnosing over-attribution.
C. Ranker · quality score & pCTR calibration
Q3 — Predicted vs realized pCTR per creative (calibration check)
SELECT
i.creative_id,
COUNT(*) AS impressions,
ROUND(AVG(i.predicted_pCTR), 4) AS predicted_pCTR,
ROUND(COUNT(*) FILTER (WHERE c.click_id IS NOT NULL)::DECIMAL / COUNT(*), 4) AS realized_pCTR,
ROUND(100.0 * (COUNT(*) FILTER (WHERE c.click_id IS NOT NULL)::DECIMAL / COUNT(*) - AVG(i.predicted_pCTR))
/ AVG(i.predicted_pCTR), 1) AS calibration_error_pct
FROM fct_impressions i
LEFT JOIN fct_clicks c USING (impression_id)
WHERE i.served_ts >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY i.creative_id
HAVING COUNT(*) > 1000
ORDER BY ABS(calibration_error_pct) DESC;
-- Creatives with calibration_error > 20% need ranker retraining attention.
Section 8 — Why this works
One join key, four facts.auction_decision_id threads through impression, click, conversion. ROAS query is a 4-way join on a single column — efficient and correct.
Re-attribution is cheap. New model = new attribution_run_id. Old runs preserved; advertisers compare side-by-side.
Invalid clicks handled losslessly. Append-only is_invalid + invalidation_ts preserves billing history while crediting advertisers.
Loser sampling tractable. 1% sampled with sample_weight column; ranker training works without 20× data volume.
Senior framing. "The auction_decision_id is the join key that turns four independent facts into a coherent advertiser narrative — and the attribution_run_id + dim_attribution_models SCD2 makes re-running attribution a query, not a re-ingest. Last-click vs DDA isn't an ETL decision — it's a column in the SELECT."
Worked example — Nike "running shoe" auction clearing & attribution
Second-price clearing: Nike pays just enough to beat Asics's ad_rank — at Nike's quality 0.92, that's $1.92, not $2.40. Savings of $0.48 = trust signal.
Impression served at 14:32:01. User clicks at 14:32:05.
Conversion 17 hours later: $129.99 purchase.
Last-click attribution (MODEL_LAST_CLICK): credit 1.0 → $129.99 attributed to this click.
Data-driven attribution (MODEL_DDA_v3): credit 0.62 → $80.59 attributed to this click. The remaining $49.40 attributed to upstream touchpoints.
Nike's ROAS under last-click = $129.99 / $1.92 = 67.7×. Under DDA = $80.59 / $1.92 = 42.0×. Both are stored side-by-side via attribution_run_id.
Drill · answer this in 90 seconds:
Design a model for Google search ads that captures the entire chain from auction clearing → impression → click → conversion, supporting advertiser billing, quality-score recomputation, ranker training, incrementality measurement, and re-runnable attribution windows.
Senior framing. "The auction_decision_id is the join key that turns four independent facts into a coherent advertiser narrative — and the attribution_run_id + dim_attribution_models SCD2 makes re-running attribution a query, not a re-ingest. Last-click vs DDA isn't an ETL decision — it's a column in the SELECT."
Cross-device attribution & identity graph
Meta / Facebook · cross-device attribution
Design a model for Facebook ads where a user sees an ad on phone, opens the advertiser's site on laptop later, and converts on tablet a week after. Support 1/7/28-day click + view-through windows, ATT/SKAdNetwork iOS constraints, and incrementality lift measurement.
Why this is hard — the design tension
One human, many devices. Apple's ATT cuts user-level signals on iOS. SKAdNetwork only returns aggregated postbacks. The model must support cross-device joining via a probabilistic+deterministic identity graph, multiple attribution windows in parallel (1d / 7d / 28d × click vs view), and two privacy paths (user-level for opted-in, aggregate-only for SKAN).
Section 1 — Dimensional Model Overview
Fact Tables
fct_ad_events — The "Touch" grain. One row per impression OR click (event-typed).
fct_conversions — The "Conversion" grain. One row per advertiser-reported conversion.
fct_skan_postbacks — The iOS-aggregate grain. One row per SKAdNetwork postback (no user-level data).
fct_attributions — The "Match" grain. One row per (conversion × eligible touchpoint × window × model × run). Append-only via attribution_run_id.
Dimension & Bridge Tables
dim_identities — One row per resolved person. Probabilistic + deterministic.
bridge_identity_devices — SCD2. Many devices per identity; link_confidence drives weighting.
Run multiple window_type values in parallel: click_1d, click_7d, view_28d. The advertiser picks; we report all.
Cross-device join: phone impression + laptop click + tablet conversion all join via identity_id from bridge_identity_devices.
Privacy / iOS · ATT & SKAN
privacy_path = 'att_optout' events have NULL identity_id; only aggregate roll-ups by campaign × placement × day are valid.
fct_skan_postbacks is a separate fact — never joins to user-level. Aggregated reporting only.
Measurement · incrementality lift
Hold out N% of users from receiving any ad (in dim_user_holdouts). Lift = (treatment conversion rate) − (holdout conversion rate).
The holdout group does not appear in fct_attributions at all — they're conversions without ad events.
Section 4 — Why "match at read, not at write"
The decisive design choice is to never compute attribution at ingest time. Every conversion is paired with every eligible touchpoint, and the attribution model is a function evaluated at read time. Three benefits:
Re-attribution is a query, not a re-ingest. Switch from last-click to data-driven without rebuilding the fact table.
Multiple windows in parallel. Same conversion appears in click_1d, click_7d, and view_28d rows simultaneously.
Audit-ready. When an advertiser disputes attribution, you produce the exact attribution_run_id that ran on date X with model Y. Defensible.
Enhanced graphical data model — Cross-device Attribution
Section 7 — SQL analysis for business units
Sample data — 1 user (3 devices) → 3 ad events → 1 conversion
INSERT INTO dim_identities VALUES ('IDT_001','PERSON_TOK_42','login_email');
INSERT INTO bridge_identity_devices VALUES
('IDT_001','DEV_PHONE', 1.000,'login', '2024-01-01',NULL),
('IDT_001','DEV_LAPTOP', 0.920,'household_ip', '2024-03-15',NULL),
('IDT_001','DEV_TABLET', 0.880,'behavioral', '2024-06-22',NULL);
-- 3 ad events across 3 devices, same identity:
INSERT INTO fct_ad_events VALUES
('AE_001','impression','IDT_001','DEV_PHONE','CMP_NIKE','feed','2025-04-25 09:15:00','opted_in'),
('AE_002','click', 'IDT_001','DEV_LAPTOP','CMP_NIKE','feed','2025-04-28 14:32:00','opted_in'),
('AE_003','impression','IDT_001','DEV_TABLET','CMP_NIKE','reel','2025-04-30 21:00:00','opted_in');
-- Conversion 1 week after first impression
INSERT INTO fct_conversions VALUES
('CONV_777','ADV_NIKE','2025-05-02 11:42:00','purchase',129.99,'IDT_001');
-- Attribution rows for window=7d_click + last_click vs MTA
INSERT INTO fct_attributions VALUES
('ATR_01','CONV_777','AE_002','MODEL_LAST_CLICK','RUN_2025_05_03','click_7d',TRUE,1.000,129.99),
('ATR_02','CONV_777','AE_001','MODEL_MTA', 'RUN_2025_05_03','view_28d', TRUE,0.300, 38.99),
('ATR_03','CONV_777','AE_002','MODEL_MTA', 'RUN_2025_05_03','click_7d', TRUE,0.500, 65.00),
('ATR_04','CONV_777','AE_003','MODEL_MTA', 'RUN_2025_05_03','view_28d', TRUE,0.200, 26.00);
A. Advertiser · cross-device ROAS by window
Q1 — Conversions attributed by window_type (1d, 7d, 28d) under last-click
SELECT
atr.window_type,
COUNT(DISTINCT atr.conversion_id) AS conversions_attributed,
ROUND(SUM(atr.attributed_value_usd), 2) AS attributed_value
FROM fct_attributions atr
WHERE atr.attribution_run_id = (SELECT MAX(attribution_run_id) FROM fct_attributions)
AND atr.model_id = 'MODEL_LAST_CLICK'
AND atr.is_attributed = TRUE
GROUP BY atr.window_type ORDER BY attributed_value DESC;
-- Lets advertiser see how many conversions fall inside each window. Picks the right window for their funnel.
Q2 — Cross-device match rate (% conversions linked to ≥1 ad event via identity)
WITH conv AS (
SELECT conversion_id, identity_id FROM fct_conversions
WHERE conversion_ts >= CURRENT_DATE - INTERVAL '30 days'
),
matched AS (
SELECT DISTINCT atr.conversion_id FROM fct_attributions atr
JOIN conv USING (conversion_id)
WHERE atr.is_attributed = TRUE
)
SELECT
COUNT(DISTINCT c.conversion_id) AS total_conversions,
COUNT(DISTINCT m.conversion_id) AS matched_conversions,
ROUND(100.0 * COUNT(DISTINCT m.conversion_id) / COUNT(DISTINCT c.conversion_id), 1) AS match_rate_pct
FROM conv c LEFT JOIN matched m USING (conversion_id);
B. Measurement · incrementality lift
Q3 — Conversion rate: treatment vs holdout
WITH treatment AS (
SELECT i.identity_id,
EXISTS (SELECT 1 FROM fct_conversions c WHERE c.identity_id = i.identity_id
AND c.advertiser_id = 'ADV_NIKE'
AND c.conversion_ts BETWEEN '2025-04-01' AND '2025-05-01') AS converted
FROM dim_identities i
WHERE i.identity_id NOT IN (SELECT identity_id FROM dim_user_holdouts
WHERE advertiser_id = 'ADV_NIKE'
AND period = '2025-04')
),
holdout AS (
SELECT i.identity_id,
EXISTS (SELECT 1 FROM fct_conversions c WHERE c.identity_id = i.identity_id
AND c.advertiser_id = 'ADV_NIKE'
AND c.conversion_ts BETWEEN '2025-04-01' AND '2025-05-01') AS converted
FROM dim_identities i
JOIN dim_user_holdouts h ON i.identity_id = h.identity_id
WHERE h.advertiser_id = 'ADV_NIKE' AND h.period = '2025-04'
)
SELECT 'treatment' AS arm, COUNT(*) AS n, ROUND(100.0 * AVG(converted::INT), 3) AS conv_rate_pct FROM treatment
UNION ALL
SELECT 'holdout', COUNT(*), ROUND(100.0 * AVG(converted::INT), 3) FROM holdout;
-- Lift = treatment_rate - holdout_rate. Statistical significance = Welch's t-test on the two proportions.
Section 8 — Why this works
Identity graph as a first-class fact.bridge_identity_devices SCD2 with confidence is how cross-device works without lying about certainty.
Match at read. Multiple windows + multiple models computed in parallel; advertiser picks at query time.
iOS / SKAN carve-out. Privacy path is explicit; aggregate-only queries fall back to fct_skan_postbacks.
Incrementality real. Holdout assignment is a separate dim, not derived — survives campaign re-orgs.
Senior framing. "The decisive design choice is to never compute attribution at write time — every conversion is paired with every eligible ad event, and the attribution model is a function evaluated at read time. That's how we re-attribute when an advertiser switches from last-click to data-driven without re-ingesting raw events. The identity graph as a versioned bridge is what makes cross-device honest about its uncertainty."
Worked example — User IDT_001 attribution math
User IDT_001 sees Nike impression on phone (Apr 25), clicks Nike ad on laptop (Apr 28), sees Nike Reels on tablet (Apr 30).
Buys Nike shoes for $129.99 on May 2.
Last-click (click_7d window): credit 1.0 → all $129.99 attributed to AE_002 (the laptop click).
Identity bridge enabled the cross-device join; without it the 3 events wouldn't connect. link_confidence on each device drives MTA weighting in advanced models.
Both attribution rows live in fct_attributions with attribution_run_id = RUN_2025_05_03. Switching models = SELECT, not re-ingest.
Drill · answer this in 90 seconds:
Design a model for Facebook ads where a user sees an ad on phone, opens the advertiser's site on laptop later, and converts on tablet a week after. Support 1/7/28-day click + view-through windows, ATT/SKAdNetwork iOS constraints, and incrementality lift measurement.
Senior framing. "The decisive design choice is to never compute attribution at write time — every conversion is paired with every eligible ad event, and the attribution model is a function evaluated at read time. That's how we re-attribute when an advertiser switches from last-click to data-driven without re-ingesting raw events. The identity graph as a versioned bridge is what makes cross-device honest about its uncertainty."
CTV ad inventory, pacing & frequency capping
Netflix · CTV ads · inventory & pacing
Design a model for Netflix's ad-supported tier — capturing every ad slot opportunity (filled or unfilled), every ad served, frequency caps per user, and contractual pacing guarantees per advertiser ("guaranteed 10M impressions by month-end with make-good if under-delivered").
Why this is hard — the design tension
CTV inventory has a contractual layer that programmatic doesn't: guaranteed buys with monetary penalties for under-delivery. Three pressures pull simultaneously: (1) maximize fill rate, (2) honor frequency caps so users don't see the same ad 8 times, (3) hit committed impressions per advertiser per month. Modeling unfilled opportunities (not just impressions) is the senior move — it's the only way to answer "why did we leave money on the table?"
Section 1 — Dimensional Model Overview
Fact Tables
fct_ad_opportunities — The "Slot" grain. One row per ad-slot opportunity in a viewing session. Filled or unfilled — both kept.
fct_ad_impressions — The "Served" grain. Subset of opportunities where an ad actually played.
fct_pacing_daily — The "Contract" grain. One row per (campaign × day) aggregating delivered vs committed.
fct_frequency_state — The "Cap" grain. One row per (profile × campaign × day) tracking impressions toward cap.
fct_makegood_obligations — The "Owed" grain. One row per (campaign × month) when committed not met. Drives next-month bonus inventory.
Ad server · live decisions — for each opportunity, query fct_frequency_state + fct_pacing_daily + active campaign list. Pick a creative or mark unfilled.
Account managers · advertiser pacing — track pacing_index per campaign daily. If <0.95 with <5 days left, increase delivery priority.
Finance · make-good bookkeeping — fct_makegood_obligations tracks dollars owed when committed impressions not met. Liability on the balance sheet until fulfilled.
Exec · "money on the table" analysis — count fct_ad_opportunities with served_creative_id IS NULL grouped by unfilled_reason. Tells you if the bottleneck is demand (sales team) or eligibility rules (ops team).
Section 4 — Why model unfilled opportunities
Without unfilled rows: can't compute fill rate. Can't tell if you under-delivered because of low demand vs frequency caps vs content unsuitability.
With unfilled rows: fill rate = COUNT(filled) / COUNT(total opportunities). Drill into unfilled_reason to find the right team to fix.
Cost: doubles the row count. Tradeoff is worth it; partition by date keeps queries fast.
Enhanced graphical data model — Netflix CTV Inventory
Section 7 — SQL analysis
Sample data — 1 campaign, 5 opportunities (3 filled / 2 unfilled), 1 day pacing
INSERT INTO dim_campaigns VALUES ('CMP_FORD','ADV_FORD','2025-05-01','2025-05-31',5000000,1.0);
INSERT INTO fct_ad_opportunities VALUES
('OPP_001','PROF_A','TITLE_X','SESS_1','pre_roll', 0, 15, 25.50, 'CRT_FORD_15S', 'guaranteed_buy', NULL, '2025-05-01 19:32:00'),
('OPP_002','PROF_A','TITLE_X','SESS_1','mid_roll',1200, 15, 25.50, NULL, NULL, 'frequency_capped', '2025-05-01 19:52:00'),
('OPP_003','PROF_B','TITLE_Y','SESS_2','pre_roll', 0, 15, 25.50, 'CRT_FORD_15S', 'guaranteed_buy', NULL, '2025-05-01 20:05:00'),
('OPP_004','PROF_C','TITLE_Z','SESS_3','pre_roll', 0, 30, 25.50, NULL, NULL, 'no_eligible_demand', '2025-05-01 20:22:00'),
('OPP_005','PROF_D','TITLE_X','SESS_4','mid_roll',1500, 15, 25.50, 'CRT_FORD_15S', 'guaranteed_buy', NULL, '2025-05-01 21:00:00');
-- Pacing for the day:
INSERT INTO fct_pacing_daily VALUES ('CMP_FORD','2025-05-01', 3, 161290, 3, 5000000, 0.000, 30);
-- Way under: should have delivered 161290 today, only got 3.
A. Operations · fill rate & unfilled diagnostics
Q1 — Fill rate & unfilled-reason breakdown (the "money on the table" report)
SELECT
slot_type,
COUNT(*) AS opportunities,
COUNT(*) FILTER (WHERE served_creative_id IS NOT NULL) AS filled,
ROUND(100.0 * COUNT(*) FILTER (WHERE served_creative_id IS NOT NULL)
/ COUNT(*), 1) AS fill_rate_pct,
COUNT(*) FILTER (WHERE unfilled_reason = 'frequency_capped') AS cap_blocked,
COUNT(*) FILTER (WHERE unfilled_reason = 'no_eligible_demand') AS no_demand,
COUNT(*) FILTER (WHERE unfilled_reason = 'content_unsuitable') AS content_block
FROM fct_ad_opportunities
WHERE DATE(opportunity_ts) = '2025-05-01'
GROUP BY slot_type ORDER BY opportunities DESC;
-- "no_eligible_demand" rows = sales team to fix. "frequency_capped" = ops/cap-policy team.
B. Account managers · pacing alerts
Q2 — Campaigns at risk of under-delivery (pacing < 0.95 with <5 days remaining)
SELECT p.campaign_id, c.advertiser_id,
p.cumulative_delivered, p.total_committed,
ROUND(100.0 * p.cumulative_delivered / NULLIF(p.total_committed, 0), 1) AS pct_delivered,
p.pacing_index, p.days_remaining,
ROUND((p.total_committed - p.cumulative_delivered) / NULLIF(p.days_remaining, 0)) AS daily_needed
FROM fct_pacing_daily p JOIN dim_campaigns c USING (campaign_id)
WHERE p.day = CURRENT_DATE
AND p.pacing_index < 0.95
AND p.days_remaining < 5
ORDER BY p.pacing_index ASC;
-- Triggers ad-server priority bump + account-manager Slack alert.
C. Finance · make-good liability
Q3 — Pending make-good liability ($) by advertiser
SELECT
c.advertiser_id,
COUNT(DISTINCT mg.campaign_id) AS campaigns_owed,
SUM(mg.owed_impressions) AS total_impressions_owed,
ROUND(SUM(mg.owed_value_usd), 2) AS total_dollars_owed,
COUNT(*) FILTER (WHERE mg.status = 'pending') AS pending_count
FROM fct_makegood_obligations mg
JOIN dim_campaigns c USING (campaign_id)
WHERE mg.status IN ('pending','scheduled')
GROUP BY c.advertiser_id
ORDER BY total_dollars_owed DESC;
-- Books a liability on Netflix's balance sheet until fulfilled.
Section 8 — Why this works
Unfilled rows are first-class. Fill rate becomes a simple COUNT FILTER, not a derived metric.
Pacing as a daily snapshot. Not derived on the fly — pre-aggregated for fast dashboard reads.
Make-goods as a liability fact. Finance owns it; it's a real balance-sheet item, not a metric.
Frequency caps as state. Per (profile × campaign × day) — exactly what the ad server queries pre-decision.
Senior framing. "Modeling unfilled opportunities (not just impressions) is the senior move — it's the only way to answer 'why did we leave money on the table this quarter?' which is the exec-level question this data product exists for. Pacing as snapshot + make-good as liability turns the contractual layer into queryable finance, not a sales spreadsheet."
Worked example — CMP_FORD on May 1
Total committed: 5,000,000 impressions over May. Daily target: 161,290.
Day 1 delivered: 3 impressions (out of 5 opportunities). 2 unfilled — 1 frequency_capped, 1 no_eligible_demand.
Design a model for Netflix's ad-supported tier — capturing every ad slot opportunity (filled or unfilled), every ad served, frequency caps per user, and contractual pacing guarantees per advertiser ("guaranteed 10M impressions by month-end with make-good if under-delivered").
Senior framing. "Modeling unfilled opportunities (not just impressions) is the senior move — it's the only way to answer 'why did we leave money on the table this quarter?' which is the exec-level question this data product exists for. Pacing as snapshot + make-good as liability turns the contractual layer into queryable finance, not a sales spreadsheet."
Orders, returns & multi-warehouse inventory
Amazon · e-commerce · returns & inventory
Design a model for Amazon orders that handles partial returns, replacements, marketplace seller orders (1P vs FBA vs 3P), and inventory drawn from multiple warehouses with FIFO/LIFO consumption rules.
Why this is hard — the design tension
Returns reshape revenue retroactively. Multi-warehouse fulfillment splits a single SKU across two warehouses. Seller-of-record (1P retail / 2P FBA / 3P marketplace) determines whether Amazon books GMV or commission. And inventory has to reconcile to ledger every night. The model must be append-only on the financial side and movement-based on the physical side — same product, two grains, one source of truth.
Section 1 — Dimensional Model Overview
Fact Tables
fct_order_lines — The "Sale" grain. One row per (order × product × source_warehouse). 3-item order = 3 rows.
fct_returns — The "Refund" grain. One row per return event, line-level. Append-only.
fct_inventory_movements — The "Stock-flow" grain. One row per stock change (in / out / transfer / damage / adjustment).
snap_inventory_daily — The "State" grain. Daily snapshot per (product × warehouse).
fct_replacements — The "Linked-order" grain. Maps replacement orders back to their original.
Customer experience — order delivery SLA from fct_order_lines (ordered → delivered). Return rate by category from fct_returns tracks product quality.
Finance — net revenue = SUM(order_lines.gross_revenue_usd) − SUM(returns.refund_amount_usd). For 3P sellers, dim_sellers.recognized_as = 'commission_only' means only the commission is booked.
Operations — multi-warehouse fill rate, transfer efficiency, days of inventory on hand. Reconciliation runs nightly: snap_inventory_daily ending_qty must match SUM(fct_inventory_movements) since opening.
Returns reshape historical revenue. Never UPDATE fct_order_lines. Append fct_returns. Finance can restate any historical period correctly.
Inventory snapshots derive from movements. Snapshots are pre-aggregated for fast read; movements are the immutable source. Lose a snapshot, rebuild from movements.
Replacements are linked, not collapsed. Original order stays + replacement is a new row + return matches both. Triple-counted at gross level, nets to zero correctly.
3P seller revenue carve-out.dim_sellers.recognized_as drives whether SUM is GMV or commission — single column makes the rule explicit and queryable.
Enhanced graphical data model — Amazon Orders + Returns + Inventory
Section 7 — SQL analysis
Sample data — 1 order with 2 line items (one split across 2 warehouses), 1 partial return
Q1 — Prime vs non-Prime delivery time (last 30 days)
SELECT
is_prime,
COUNT(*) AS lines,
ROUND(AVG(EXTRACT(EPOCH FROM (delivered_ts - ordered_ts))/3600), 1) AS avg_hours,
ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM (delivered_ts - ordered_ts))/3600), 1) AS p95_hours
FROM fct_order_lines
WHERE delivered_ts IS NOT NULL AND ordered_ts >= CURRENT_DATE - 30
GROUP BY is_prime;
B. Finance · net revenue with 1P/3P carve-out
Q2 — Net revenue with seller-type carve-out (3P books only commission)
WITH gross AS (
SELECT
s.seller_type,
s.recognized_as,
SUM(CASE WHEN s.recognized_as = 'gmv' THEN ol.gross_revenue_usd
ELSE ol.gross_revenue_usd * s.commission_pct END) AS gross_recognized_usd
FROM fct_order_lines ol JOIN dim_sellers s USING (seller_id)
WHERE DATE(ol.ordered_ts) = '2025-05-01'
GROUP BY s.seller_type, s.recognized_as
),
returns AS (
SELECT
s.seller_type,
SUM(CASE WHEN s.recognized_as = 'gmv' THEN r.refund_amount_usd
ELSE r.refund_amount_usd * s.commission_pct END) AS refund_recognized_usd
FROM fct_returns r
JOIN fct_order_lines ol ON r.order_line_id = ol.order_line_id
JOIN dim_sellers s USING (seller_id)
WHERE DATE(r.return_ts) = '2025-05-01'
GROUP BY s.seller_type
)
SELECT g.seller_type,
ROUND(g.gross_recognized_usd, 2) AS gross_revenue,
ROUND(COALESCE(r.refund_recognized_usd, 0), 2) AS refunds,
ROUND(g.gross_recognized_usd - COALESCE(r.refund_recognized_usd, 0), 2) AS net_revenue
FROM gross g LEFT JOIN returns r USING (seller_type);
SELECT
product_id, warehouse_id, snapshot_date,
opening_qty, receipts, returns_in, sales, damage, transfers_net, ending_qty,
(opening_qty + receipts + returns_in - sales - damage + transfers_net) AS computed_ending,
ending_qty - (opening_qty + receipts + returns_in - sales - damage + transfers_net) AS drift,
CASE WHEN ending_qty = opening_qty + receipts + returns_in - sales - damage + transfers_net
THEN 'PASS' ELSE 'FAIL' END AS reconciliation
FROM snap_inventory_daily
WHERE snapshot_date = CURRENT_DATE - 1
AND ending_qty != opening_qty + receipts + returns_in - sales - damage + transfers_net;
-- Empty result = clean. Non-empty = inventory adjustment needed; investigate.
Section 8 — Why this works
Append-only returns. Net revenue = SUM(orders) − SUM(returns). No UPDATE; full historical audit.
Movements are immutable. Snapshots derive from movements; rebuild at any time.
Seller-type carve-out is a column. 1P / 2P / 3P revenue rules become a CASE in the query, not a separate ETL path.
Multi-warehouse split is a row. Same order_id + line_number, different warehouse → fulfilment is queryable without losing the customer's "1 line item" mental model.
Senior framing. "Append-only on returns is the contract that lets finance restate any historical period without rebuilding orders. The 1P/3P carve-out as dim_sellers.recognized_as turns a multi-billion-dollar revenue policy into a queryable column, not a brittle ETL branch — that's the difference between a model that survives quarter-end and one that doesn't."
Worked example — Order ORD_99 with split fulfillment + partial return
Order: 1 book ($21.64) + 6 pens. Pens split across 2 warehouses: 4 from WH-A ($15.16) + 2 from WH-B ($7.58) = 6 pens for $22.74. Total order gross = $44.38.
Seller-type effects on revenue: Book is 1P retail → Amazon books $21.64 GMV. Pens are 3P seller (15% commission) → Amazon books $22.74 × 0.15 = $3.41. Total Amazon recognized = $21.64 + $3.41 = $25.05.
Partial return: 3 of 6 pens returned a week later. Refund = $11.37 (proportional). Amazon's recognized refund = $11.37 × 0.15 = $1.71.
Net Amazon recognized: $25.05 − $1.71 = $23.34.
Inventory: WH-A movements record sale_out (+4 pens out, then return_in +1 pen back). WH-B records sale_out (-2 pens). Daily snapshot reconciles.
Drill · answer this in 90 seconds:
Design a model for Amazon orders that handles partial returns, replacements, marketplace seller orders (1P vs FBA vs 3P), and inventory drawn from multiple warehouses with FIFO/LIFO consumption rules.
Senior framing. "Append-only on returns is the contract that lets finance restate any historical period without rebuilding orders. The 1P/3P carve-out as dim_sellers.recognized_as turns a multi-billion-dollar revenue policy into a queryable column, not a brittle ETL branch — that's the difference between a model that survives quarter-end and one that doesn't."
Feed engagement at scale (with ranker A/B)
Meta / Instagram · social feed · ranker
Model engagement on the Instagram feed — impressions, dwell time, likes, saves, comments, shares, plus negative signals (hide, report) — supporting feed ranker training/A-B, creator analytics, and brand safety. Volume: 100B+ impressions/day.
Why this is hard — the design tension
Volume alone forces design choices: 100B impressions/day means exact COUNT DISTINCT is unaffordable; HLL is mandatory. Mutable engagements (like → unlike) demand append-only with is_undone flags. Heartbeat-driven dwell can't live at raw grain — must be aggregated at ETL. And the ranker A/B requirement means every impression must carry the model_id that ranked it, or you can't compare baseline lift.
Section 1 — Dimensional Model Overview
Fact Tables
fct_impressions — The "Served" grain. One row per (viewer × post × impression). Carries ranker_model_id + predicted_score.
fct_engagements — The "Reaction" grain. One row per (viewer × post × event_type × event_ts). Likes, saves, comments, shares, hides, reports — all here.
fct_dwell_per_impression — The "Time-on-post" grain. One row per impression with total dwell_ms (aggregated from heartbeats at ETL).
fct_ranker_experiments — The "Treatment" grain. One row per (viewer × experiment × variant). Drives A/B comparison.
Ranker team · model lift — Compare engagement-rate per impression for ranker_model_id = v3.1 vs v3.2. Positive lift on like-rate but watch hide-rate; both should move favorably.
Creator analytics — Reach (impressions), engagement rate (engagements / impressions), dwell distribution per post.
Brand safety — Hide+Report rate per post per hour. Posts crossing thresholds get demoted; spike → human review queue.
Product growth — Negative-engagement rate is the canary for "this content drives session abandonment".
Section 4 — Why ranker_model_id on every impression + append-only engagements
Ranker A/B requires the model on the row. Without it, a 1% holdout for the new model can't be compared to the 99% baseline. ranker_model_id is a first-class dimension, not a footnote.
Likes are mutable, but the fact is immutable. A like at T1 + unlike at T2 = 2 rows: an insert and an "undone" insert with is_undone=TRUE, undone_ts=T2. Never DELETE; preserves time-series of toggle behavior.
Dwell aggregated at ETL. 250ms heartbeats × 100B impressions × 30 sec average = ~12 trillion raw rows/day. Aggregate before warehouse — only 100B per-impression rows survive.
Approximate aggregations.APPROX_COUNT_DISTINCT + HLL for everything user-level. Exact only when finance demands it (rare).
Enhanced graphical data model — Instagram Feed Engagement
Section 7 — SQL analysis
Sample data — 1 viewer sees 3 posts under v3.2 ranker; engages with two
Q1 — Like / save / share rate per post (with active engagements only)
WITH active_engagements AS (
SELECT post_id, event_type FROM fct_engagements WHERE is_undone = FALSE
)
SELECT
i.post_id,
COUNT(*) AS impressions,
APPROX_COUNT_DISTINCT(i.viewer_id) AS unique_viewers,
COUNT(*) FILTER (WHERE e.event_type = 'like') AS likes,
COUNT(*) FILTER (WHERE e.event_type = 'save') AS saves,
COUNT(*) FILTER (WHERE e.event_type = 'share') AS shares,
ROUND(100.0 * COUNT(*) FILTER (WHERE e.event_type='like') / COUNT(*), 2) AS like_rate_pct,
ROUND(AVG(d.dwell_ms), 0) AS avg_dwell_ms
FROM fct_impressions i
LEFT JOIN active_engagements e ON e.post_id = i.post_id
LEFT JOIN fct_dwell_per_impression d ON d.impression_id = i.impression_id
GROUP BY i.post_id ORDER BY like_rate_pct DESC;
B. Brand safety · negative-signal monitoring
Q2 — Posts with hide+report rate above brand-safety threshold
SELECT
i.post_id,
COUNT(*) AS impressions,
COUNT(*) FILTER (WHERE e.event_type IN ('hide','report')) AS negative_signals,
ROUND(100.0 * COUNT(*) FILTER (WHERE e.event_type IN ('hide','report'))
/ COUNT(*), 3) AS negative_rate_pct
FROM fct_impressions i
LEFT JOIN fct_engagements e ON e.impression_id = i.impression_id AND e.is_undone = FALSE
WHERE DATE(i.impression_ts) = CURRENT_DATE - 1
GROUP BY i.post_id
HAVING COUNT(*) > 1000
AND COUNT(*) FILTER (WHERE e.event_type IN ('hide','report')) * 1.0 / COUNT(*) > 0.005
ORDER BY negative_rate_pct DESC;
-- Posts above 0.5% hide+report rate go to human moderation queue.
C. Ranker · A/B model lift
Q3 — Engagement-rate lift v3.2 over v3.1
WITH ranker_metrics AS (
SELECT
i.ranker_model_id,
COUNT(*) AS impressions,
COUNT(*) FILTER (WHERE e.event_type='like' AND e.is_undone=FALSE) AS likes,
COUNT(*) FILTER (WHERE e.event_type='save' AND e.is_undone=FALSE) AS saves,
COUNT(*) FILTER (WHERE e.event_type IN ('hide','report')) AS negatives,
AVG(d.dwell_ms) AS avg_dwell_ms
FROM fct_impressions i
LEFT JOIN fct_engagements e ON e.impression_id = i.impression_id
LEFT JOIN fct_dwell_per_impression d ON d.impression_id = i.impression_id
WHERE i.impression_ts >= CURRENT_DATE - 7
AND i.ranker_model_id IN ('model_v3.1','model_v3.2')
GROUP BY i.ranker_model_id
)
SELECT ranker_model_id,
ROUND(100.0 * likes / NULLIF(impressions, 0), 3) AS like_rate_pct,
ROUND(100.0 * saves / NULLIF(impressions, 0), 3) AS save_rate_pct,
ROUND(100.0 * negatives / NULLIF(impressions, 0), 3) AS negative_rate_pct,
ROUND(avg_dwell_ms, 0) AS avg_dwell_ms
FROM ranker_metrics;
-- v3.2 wins if like_rate higher AND negative_rate not worse.
Section 8 — Why this works
Ranker A/B is queryable.ranker_model_id on every impression → comparing v3.1 vs v3.2 is a GROUP BY, not a separate ETL pipeline.
Mutable engagements without losing history. Append-only with is_undone preserves the full toggle behavior — useful for "regret rate" analysis.
Dwell aggregated at ETL. 100B per-impression rows >> 12T per-heartbeat rows. The aggregation is non-negotiable.
HLL by default. Distinct viewer counts at petabyte scale require APPROX_COUNT_DISTINCT — exact only when finance demands it.
Senior framing. "Storing ranker_model_id on the impression is what lets the feed team A/B test new models without losing the ability to compute baseline metrics — model-version is a first-class dimension, not a footnote. Append-only engagements with is_undone means we can answer 'what % of likes get unliked within 5 min?' which is the canary for low-quality ranker decisions."
Worked example — viewer U_42's feed session
3 posts impressed under model_v3.2. Predicted scores: 0.72, 0.55, 0.41. Ranker order = score-desc, correct.
POST_001 (score 0.72): 4.2s dwell, like at +4s, full view → strong positive.
POST_002 (score 0.55): 0.8s dwell, no engagement, partial view (40%) → user scrolled past. Ranker over-predicted.
Calibration error on this session: 1/3 (POST_002) too high, 1/3 (POST_003) too low. Logged in fct_impressions.predicted_score for ranker training.
Mutable like: User unlikes POST_001 5 minutes later. fct_engagements stores both events (ENG_1 like, ENG_4 the unlike with is_undone=TRUE). Final state: not liked. Active-only queries filter is_undone=FALSE.
Drill · answer this in 90 seconds:
Model engagement on the Instagram feed — impressions, dwell time, likes, saves, comments, shares, plus negative signals (hide, report) — supporting feed ranker training/A-B, creator analytics, and brand safety. Volume: 100B+ impressions/day.
Senior framing. "Storing ranker_model_id on the impression is what lets the feed team A/B test new models without losing the ability to compute baseline metrics — model-version is a first-class dimension, not a footnote. Append-only engagements with is_undone means we can answer 'what % of likes get unliked within 5 min?' which is the canary for low-quality ranker decisions."
Listening history & pool-model royalty payouts
Spotify · streaming media · per-stream royalties
Design a model for Spotify's listening history that supports recommendations, the "Year in Review" feature, and per-stream royalty payouts to multiple rights holders (artist + label + publisher + songwriter) — under the pool model where revenue is divided by total qualified streams in a country-period.
Why this is hard — the design tension
Two consumers want the same fact for opposite reasons. Product needs every play (skips included) for recommendations. Finance only pays on qualified streams (≥30s). Multi-rights splits mean one stream produces 4+ payout rows (artist + label + publisher + songwriter) and the splits change mid-quarter when deals renegotiate. Pool model math requires computing per-stream rate as revenue ÷ total streams in pool — a chicken-and-egg dependency that resolves only at period close.
Section 1 — Dimensional Model Overview
Fact Tables
fct_play_events — The "Listen" grain. One row per play event with qualified_play flag.
fct_royalty_periods — The "Pool" grain. One row per (country × period). Holds total pool revenue, total qualified streams, computed per-stream rate.
fct_royalty_attributions — The "Payout" grain. One row per (play × rights_holder × period). Derived nightly at period close.
Dimension & Bridge Tables
dim_users (SCD2 — plan tier drifts) · dim_subscriptions (Family/Duo group of users)
bridge_track_rights (SCD2) — track_id, rights_holder_id, rights_type ∈ {artist, label, publisher, songwriter}, share_pct, effective_from, effective_to. Sum of share_pct per (track, rights_type) per period = 100%.
Product · recommendations + Year in Review — every play counts (qualified or not). Skips are signal. context_type tells the recommender which surface drove the play.
Finance · royalty payouts — quarterly close: compute pool revenue per country, count qualified plays, derive rate, generate fct_royalty_attributions. Splits use as-of-play bridge rows, never current.
Rights holders · transparency — labels/publishers want their cut breakdown. Each fct_royalty_attributions row is a defensible per-play line item.
Section 4 — Why the bridge is SCD2 (not just a table)
Label deals renegotiate mid-quarter — a label might move 5% from publisher to artist effective Apr 15. Three failure modes if the bridge isn't SCD2:
Apr 1-14 plays paid under the wrong split. Restatement needed every quarter — auditor nightmare.
Lawsuit risk. Songwriters can demand "pay me as of the deal that was active when my song was streamed". SCD2 is the only defensible answer.
Forecast invalidation. Finance forecasts use the current bridge; without SCD2, mid-quarter changes silently invalidate projections.
The contract: every fct_royalty_attributions row uses bridge_track_rights WHERE play.played_ts BETWEEN bridge.effective_from AND bridge.effective_to.
Enhanced graphical data model — Spotify Listening + Royalty Pool
Section 7 — SQL analysis
Sample data — Country pool with 1M streams in Q2-2025
INSERT INTO fct_royalty_periods VALUES
('PER_2025Q2_US','US','2025-04-01','2025-06-30', 5000000.00, 1000000, 0.005000);
-- Per-stream rate = $5M / 1M plays = $0.005 / qualified play
INSERT INTO bridge_track_rights VALUES
('TRK_001','LBL_BIG', 'label', 0.50,'2024-01-01','2025-04-15'),
('TRK_001','ART_NOVA', 'artist', 0.30,'2024-01-01','2025-04-15'),
('TRK_001','PUB_HARMONY', 'publisher', 0.15,'2024-01-01','2025-04-15'),
('TRK_001','SW_LEE', 'songwriter',0.05,'2024-01-01','2025-04-15'),
-- Mid-quarter renegotiation: label gives up 5%, artist gains 5%
('TRK_001','LBL_BIG', 'label', 0.45,'2025-04-15','9999-12-31'),
('TRK_001','ART_NOVA', 'artist', 0.35,'2025-04-15','9999-12-31'),
('TRK_001','PUB_HARMONY', 'publisher', 0.15,'2025-04-15','9999-12-31'),
('TRK_001','SW_LEE', 'songwriter',0.05,'2025-04-15','9999-12-31');
-- 2 plays of TRK_001 in US: one before, one after the renegotiation
INSERT INTO fct_play_events VALUES
('PLAY_A','U_42','TRK_001','2025-04-10 14:00:00', 180, TRUE, 'algorithmic', NULL,'iphone','US'),
('PLAY_B','U_42','TRK_001','2025-05-20 09:30:00', 240, TRUE, 'playlist','PL_ROCK','iphone','US');
A. Finance · per-stream rate & pool reconciliation
Q1 — Per-stream rate per country per period
SELECT
rp.country_id, rp.period_start, rp.period_end,
rp.total_qualified_plays,
ROUND(rp.total_pool_revenue_usd, 2) AS pool_usd,
ROUND(rp.per_stream_rate_usd, 6) AS rate_per_stream
FROM fct_royalty_periods rp
WHERE rp.period_start = '2025-04-01'
ORDER BY rp.country_id;
Q2 — Royalty attribution per qualified play (using as-of-played bridge splits)
SELECT
p.play_id,
p.played_ts,
b.rights_type,
b.rights_holder_id,
b.share_pct,
rp.per_stream_rate_usd,
ROUND(rp.per_stream_rate_usd * b.share_pct, 6) AS attributed_payout_usd
FROM fct_play_events p
JOIN dim_countries c ON p.country_id = c.country_id
JOIN fct_royalty_periods rp
ON rp.country_id = p.country_id
AND p.played_ts BETWEEN rp.period_start AND rp.period_end
JOIN bridge_track_rights b
ON b.track_id = p.track_id
AND p.played_ts >= b.effective_from
AND p.played_ts < b.effective_to -- as-of-play splits
WHERE p.qualified_play = TRUE AND p.track_id = 'TRK_001'
ORDER BY p.played_ts, b.rights_type;
B. Product · listening behavior
Q3 — Skip rate by context_type (algorithmic vs playlist vs radio)
SELECT context_type,
COUNT(*) AS plays,
COUNT(*) FILTER (WHERE qualified_play = FALSE) AS skips,
ROUND(100.0 * COUNT(*) FILTER (WHERE qualified_play = FALSE)
/ COUNT(*), 2) AS skip_rate_pct
FROM fct_play_events
WHERE played_ts >= CURRENT_DATE - 30
GROUP BY context_type ORDER BY skip_rate_pct DESC;
-- High skip rate on algorithmic = recommender quality issue. On radio = expected (lean-back).
Section 8 — Why this works
One play, two consumers.qualified_play as a flag (not a separate table) keeps Product and Finance on the same source of truth.
Pool model derived at period close. Per-stream rate isn't real-time — it's only known when the period closes. Modeled as a fact table, not a daily rolling number.
Attribution table grows linearly. ~4 rows per qualified play. Quarterly bulk-write at period close, then read-only.
Senior framing. "The 30-second threshold is the integration point between product analytics (every play matters for recs) and finance (only qualified plays earn royalty) — modeling it as a flag on fct_play_events rather than two separate tables is what lets both teams stay on the same source of truth. SCD2 on bridge_track_rights turns deal renegotiations from an audit nightmare into a predicate."
Worked example — TRK_001 royalty math (before vs after renegotiation)
Pool: $5,000,000 in US for Q2-2025. 1,000,000 qualified streams. Per-stream rate = $0.005.
Audit query produces 8 rows total (2 plays × 4 rights holders). Each is defensible: shows the exact bridge row that was active when the play occurred.
Drill · answer this in 90 seconds:
Design a model for Spotify's listening history that supports recommendations, the "Year in Review" feature, and per-stream royalty payouts to multiple rights holders (artist + label + publisher + songwriter) — under the pool model where revenue is divided by total qualified streams in a country-period.
Senior framing. "The 30-second threshold is the integration point between product analytics (every play matters for recs) and finance (only qualified plays earn royalty) — modeling it as a flag on fct_play_events rather than two separate tables is what lets both teams stay on the same source of truth. SCD2 on bridge_track_rights turns deal renegotiations from an audit nightmare into a predicate."
Double-entry ledger for payments (charges, refunds, chargebacks, FX, reserves)
Stripe · payments · double-entry ledger
Design a model for Stripe's payment system supporting charges, refunds, chargebacks, payouts, multi-currency FX, reserves, and rigorous double-entry accounting where every transaction's entries sum to zero per currency. Append-only — never UPDATE — for regulatory audit compliance.
Why this is hard — the design tension
Payments is the domain with zero tolerance for retro edits. A regulator can ask "what was merchant X's balance on day Y as the books showed it then?" — the answer has to be defensible regardless of corrections made since. That forces append-only on every fact. Multi-currency adds an FX gain/loss leg. Chargebacks arrive 30-180 days after the charge. Reserves are an internal transfer, not external money. The model must enforce SUM=0 per balance transaction per currency as a continuous invariant.
Section 1 — Dimensional Model Overview
Fact Tables
fct_ledger_entries — The "Atom" grain. One row per debit OR credit on (account, currency). The source of truth. Append-only.
fct_balance_transactions — The "Business event" grain. One row per charge/refund/payout/chargeback. Groups its entries via balance_txn_id.
snap_account_balance_daily — The "State" grain. Derived — running SUM per (account × currency × day). Pre-aggregated for fast balance reads.
Merchant · balance & payouts — Available balance = SUM(fct_ledger_entries.amount) WHERE account = 'merchant_balance' AND merchant_id = X. From snap_account_balance_daily for speed.
Finance · reconciliation — Reconcile every balance_txn: SUM(amount) GROUP BY currency = 0. Any failure = producer bug, page on-call.
Risk · chargeback exposure — Aggregate entry_type = 'chargeback' per merchant per day. High rate = risk profile change → increase reserve_pct.
Regulators · audit trail — "Show merchant X balance as of date Y as the books showed it then" — query fct_ledger_entries WHERE posted_ts <= Y AND filter out reversals for entries posted after Y.
Append-only is regulatory. SOX / PCI / banking audits require an immutable history. Every reversal is a new pair of entries with opposite signs, never a retro edit.
SUM=0 invariant per (balance_txn, currency). Continuous data-quality check. If it fails, payments stop until fixed.
Idempotency at the row.UNIQUE(source_event_id) guarantees that producer retries don't create duplicate ledger entries — even under network partitions.
FX as separate currency entries. EUR charge → USD payout requires a 4-entry transaction: −EUR (merchant_balance debit) / +EUR (fx_holding credit) / −USD (fx_holding debit at locked rate) / +USD (merchant_balance credit). FX gain/loss recognized in reporting currency.
Enhanced graphical data model — Stripe Double-Entry Ledger
Section 7 — SQL analysis
Sample data — €100 charge converted to USD payout (multi-currency, 4 entries)
Q1 — Continuous SUM=0 invariant per (balance_txn × currency)
SELECT
balance_txn_id,
currency_code,
ROUND(SUM(amount), 4) AS sum_amount,
COUNT(*) AS entry_count
FROM fct_ledger_entries
WHERE effective_ts >= CURRENT_DATE - 1
GROUP BY balance_txn_id, currency_code
HAVING ABS(SUM(amount)) > 0.0001;
-- Empty = healthy. Any row = producer bug. Halts the payments pipeline.
B. Merchant · balance & payouts
Q2 — Merchant balance per currency (as-of-now)
SELECT
merchant_id, currency_code,
ROUND(SUM(amount), 2) AS balance
FROM fct_ledger_entries
WHERE account_id = 'merchant_balance' AND merchant_id = 'MRC_FR'
GROUP BY merchant_id, currency_code;
-- Or read from snap_account_balance_daily for sub-second response.
Q3 — Audit trail: balance as it appeared on a specific historical date
SELECT
merchant_id, currency_code,
ROUND(SUM(amount), 2) AS balance_as_of_2025_05_15
FROM fct_ledger_entries
WHERE account_id = 'merchant_balance'
AND merchant_id = 'MRC_FR'
AND posted_ts <= '2025-05-15 23:59:59'
GROUP BY merchant_id, currency_code;
-- Defensible regardless of corrections made AFTER 2025-05-15. The append-only contract.
C. Risk · chargeback exposure
Q4 — Chargeback rate per merchant + dispute lag
WITH charges AS (
SELECT merchant_id, balance_txn_id, effective_ts AS charge_ts, ABS(amount) AS gross_usd
FROM fct_ledger_entries
WHERE entry_type = 'charge' AND account_id = 'merchant_balance' AND amount > 0
),
chargebacks AS (
SELECT merchant_id, reversal_of_id, effective_ts AS chargeback_ts, ABS(amount) AS chargeback_usd
FROM fct_ledger_entries WHERE entry_type = 'chargeback'
)
SELECT
c.merchant_id,
COUNT(c.balance_txn_id) AS total_charges,
COUNT(cb.reversal_of_id) AS total_chargebacks,
ROUND(100.0 * COUNT(cb.reversal_of_id) / NULLIF(COUNT(c.balance_txn_id), 0), 2) AS chargeback_rate_pct,
ROUND(AVG(EXTRACT(EPOCH FROM (cb.chargeback_ts - c.charge_ts))/86400), 1) AS avg_dispute_lag_days
FROM charges c
LEFT JOIN chargebacks cb ON cb.reversal_of_id = c.balance_txn_id
WHERE c.charge_ts >= CURRENT_DATE - 90
GROUP BY c.merchant_id
HAVING COUNT(c.balance_txn_id) > 100
ORDER BY chargeback_rate_pct DESC;
-- Merchants >1% chargeback rate trigger reserve_pct increase.
Section 8 — Why this works
Append-only is the contract. Historical balance reads ("as of date Y") are defensible regardless of corrections made later. Required by SOX, PCI, banking regulators.
Invariant catches bugs at write time. Continuous DQ on SUM=0 — any producer regression halts the pipeline before bad data reaches finance.
Idempotency at the row.UNIQUE(source_event_id) handles producer retries without duplicating money.
FX as currency entries. No "convert at read time" — every conversion is a recorded ledger event with its locked rate.
Senior framing. "Append-only is not a style choice in payments — it's a regulatory requirement. The model has to support 'show me the full audit trail for merchant X on date Y as it appeared then' regardless of corrections made since. That's why every reversal is a new pair of entries, never a retro edit. SUM=0 per (balance_txn × currency) is the invariant that keeps the books honest at write time."
Worked example — €100 charge → USD payout
Charge BTX_001 (€100): 4 ledger entries in EUR.
+€100 to merchant_balance · −€2.90 fee from merchant_balance · net €97.10 to merchant.
−€100 from customer_balance · +€2.90 to fees_revenue.
EUR sum = 100 − 2.90 − 100 + 2.90 = 0 ✓
Payout BTX_002 (3 days later, FX rate locked at 1 EUR = 1.08 USD).
−€97.10 from merchant_balance · +€97.10 to fx_holding (EUR side closes)
−$104.87 from fx_holding (USD side, 97.10 × 1.08) · +$104.87 to merchant_bank
EUR sum = 0 ✓ · USD sum = 0 ✓
Two months later: chargeback on BTX_001. Append 4 new entries that reverse the charge (positive amounts where original was negative). reversal_of_id links each new entry to its original. Original rows untouched. Audit trail preserved. Net effect: merchant balance net change for the chargeback is recorded as a fresh balance_txn.
Drill · answer this in 90 seconds:
Design a model for Stripe's payment system supporting charges, refunds, chargebacks, payouts, multi-currency FX, reserves, and rigorous double-entry accounting where every transaction's entries sum to zero per currency. Append-only — never UPDATE — for regulatory audit compliance.
Senior framing. "Append-only is not a style choice in payments — it's a regulatory requirement. The model has to support 'show me the full audit trail for merchant X on date Y as it appeared then' regardless of corrections made since. That's why every reversal is a new pair of entries, never a retro edit. SUM=0 per (balance_txn × currency) is the invariant that keeps the books honest at write time."
Bookings, calendar, cancellations & reviews
Airbnb · 2-sided marketplace · calendar-as-fact
Design a model for Airbnb supporting search/availability lookup, booking lifecycle (request → accepted → stayed → reviewed), cancellation-policy refund logic, host payouts after stay completion, and dynamic pricing — with full point-in-time queryability ("what was the price of listing X on July 4 last year?").
Why this is hard — the design tension
The calendar is the hot read path: a search query for "Paris, July 4-7, 2 guests" hits potentially millions of (listing × night) rows in milliseconds. Calendar-as-fact with daily snapshots is the canonical answer. Booking modifications require append-only — same booking_id can't UPDATE in place. Cancellation refunds depend on (policy × days-from-check-in) and must be computed at cancel time, locked on the booking row. Multi-currency pays guest in EUR, host in USD; FX locked at booking.
Section 1 — Dimensional Model Overview
Fact Tables
fct_calendar_nights — The "Availability" grain. One row per (listing × night × snapshot_date). Daily snapshots enable point-in-time price lookup.
fct_bookings — The "Reservation" grain. One row per booking lifecycle. Append-only — modifications create new rows.
fct_reviews — The "Bilateral feedback" grain. One row per review (typed: guest_to_host OR host_to_guest).
fct_host_payouts — The "Payout" grain. One row per payout event after stay completion.
Guest · search & book — Search joins fct_calendar_nights WHERE is_available=TRUE AND night_date BETWEEN x AND y. Hot path; pushed to Elasticsearch + indexed in warehouse.
Host · earnings & calendar mgmt — Daily payouts from fct_host_payouts. Pricing team analyzes fct_calendar_nights over time to recommend smart-price strategy.
Pricing team · "what was the price on date X?" — Time-travel via snapshot_date in the PK. Critical for dynamic-pricing experiments and dispute resolution.
Trust & safety · review-rate analysis — Bilateral reviews; if a host has high "guest cancelled" rate, T&S investigates.
Section 4 — Why calendar-as-fact + daily snapshot
Without snapshots: can only answer "what's the price now?" Time-travel queries require reconstructing from dim_listings SCD2 + booking events — slow, error-prone, often impossible after data drift.
With daily snapshots: "what was the price of listing X on July 4 last year (as Airbnb showed it on July 1)?" is one query.
Cost: 7M listings × 365 future nights × daily snapshot = ~9B rows/day. Partition by night_date; aggressively expire old snapshots (keep 90-day raw, then weekly).
Append-only on bookings. Modifications generate a new booking row with parent_booking_id; original sets terminal_state = modified. Audit trail preserved.
Enhanced graphical data model — Airbnb Bookings + Calendar
Section 7 — SQL analysis
Sample data — 1 listing, 5 nights of calendar, 1 booking, 1 guest cancellation
Q1 — Available listings in Paris for July 4-7, 2 guests
SELECT l.listing_id, l.title, MIN(c.effective_price_local) AS price_low, MAX(c.effective_price_local) AS price_high
FROM dim_listings l
JOIN fct_calendar_nights c ON l.listing_id = c.listing_id
JOIN dim_geography g ON l.geography_id = g.geography_id
WHERE c.snapshot_date = CURRENT_DATE
AND c.night_date BETWEEN '2025-07-04' AND '2025-07-06'
AND c.is_available = TRUE
AND l.capacity >= 2
AND g.market_id = 'PARIS'
GROUP BY l.listing_id, l.title
HAVING COUNT(*) = 3; -- all 3 nights available
B. Pricing team · time-travel queries
Q2 — Price of LST_PARIS_42 on July 4, 2025 — as Airbnb showed it on June 1
SELECT listing_id, night_date, base_price_local, smart_price_local, effective_price_local
FROM fct_calendar_nights
WHERE listing_id = 'LST_PARIS_42'
AND night_date = '2025-07-04'
AND snapshot_date = '2025-06-01';
-- The data product the dynamic-pricing team can't ship without.
C. Trust & safety · cancellation analysis
Q3 — Host cancellation rate (a strong T&S signal)
SELECT
l.host_id,
COUNT(*) AS total_bookings,
COUNT(*) FILTER (WHERE b.cancellation_actor = 'host') AS host_cancels,
ROUND(100.0 * COUNT(*) FILTER (WHERE b.cancellation_actor = 'host')
/ COUNT(*), 2) AS host_cancel_rate_pct
FROM fct_bookings b
JOIN dim_listings l USING (listing_id)
WHERE b.request_ts >= CURRENT_DATE - 90
GROUP BY l.host_id
HAVING COUNT(*) > 10
ORDER BY host_cancel_rate_pct DESC;
-- Hosts >3% rate get warning; >5% lose Superhost status.
Section 8 — Why this works
Calendar-as-fact + snapshots = time-travel pricing queries are first-class.
Append-only bookings with parent_booking_id chain = audit trail for modifications.
Refund locked at cancel time = future policy changes don't restate historical refunds.
Bilateral reviews = both guest and host accountability surfaced.
Senior framing. "The calendar-as-fact is the unlock — pricing teams can answer 'what was the price of this listing on Jul 4 last year?' from fct_calendar_nights without reconstructing it from booking events. That's the data product the dynamic-pricing team can't ship without. Snapshot_date in the PK is the cheap engineering choice that pays back every quarter."
Worked example — BKG_777 cancellation under "moderate" policy
Booking total: $993.60 for 3 nights ($720 subtotal + $80 cleaning + $56 service + $64 tax, FX 1.08).
Cancelled 5 days before check-in.
Moderate policy refund_curve: ≥5 days before = 100% refund of nightly rate (subtotal only); cleaning fee always refunded; service fee non-refundable.
Refund = $720 (subtotal at 100%) × 0.50 (we're inside the partial-refund window for "moderate") + $80 cleaning + $0 service = $496.80.
fct_bookings.refund_amount_usd = $496.80, locked at cancellation_ts. Future re-querying gives same result regardless of policy changes after.
Calendar nights for Jul 4-6 flip back to is_available=TRUE in the next snapshot. Listing reopens for searches.
Drill · answer this in 90 seconds:
Design a model for Airbnb supporting search/availability lookup, booking lifecycle (request → accepted → stayed → reviewed), cancellation-policy refund logic, host payouts after stay completion, and dynamic pricing — with full point-in-time queryability ("what was the price of listing X on July 4 last year?").
Senior framing. "The calendar-as-fact is the unlock — pricing teams can answer 'what was the price of this listing on Jul 4 last year?' from fct_calendar_nights without reconstructing it from booking events. That's the data product the dynamic-pricing team can't ship without. Snapshot_date in the PK is the cheap engineering choice that pays back every quarter."
Design a model for a SaaS with mixed pricing — flat-rate seats + usage-based metering (per-query, per-GB, per-API-call) — handling mid-cycle plan changes, prorations, overages, free trials, and multi-currency invoicing.
Why this is hard — the design tension
Two revenue streams with opposite shapes. Subscription is predictable, monthly, modeled at low row volume. Usage is volatile, hourly, billions of rows/day. Mid-cycle plan changes require pro-rated math. Trials look like paid plans but bill $0. Free-tier allowances reset each period. Multi-currency demands FX locked at usage time. The model must keep usage and subscription separate at the fact layer but join cleanly at invoice time.
Section 1 — Dimensional Model Overview
Fact Tables
fct_subscription_state — The "Daily snapshot" grain. One row per (subscription × day). Captures plan, status, MRR.
fct_usage_meter — The "Hourly usage" grain. One row per (subscription × meter × hour). High volume.
fct_billing_events — The "Charge" grain. One row per subscription_charge / usage_charge / proration / refund / credit / dunning.
fct_invoices — The "Statement" grain. One row per invoice — derived rollup of billing events.
Hourly meter resolution is the integration point: fine enough for usage analytics, coarse enough that the table doesn't explode. Per-query/per-call rolls up to the hour at ingest.
Daily subscription state avoids modeling subscription as a slowly-changing dim — too much daily flux. Daily snapshot is bounded and queryable.
Append-only billing events with UNIQUE(source_event_id) = idempotent + audit-defensible. Mid-cycle plan change emits 2 events (proration credit on old, prorated charge on new); invoice is just SUM of events in the period.
FX locked per hour. Same precision as the meter; never restated.
Enhanced graphical data model — SaaS Subscription + Usage
Section 7 — SQL analysis
Sample data — Mid-cycle plan upgrade with 15 days remaining
-- Account on Pro plan ($100/mo), upgrades to Enterprise ($300/mo) on day 15 of a 30-day cycle
INSERT INTO fct_billing_events VALUES
('EVT_S1','ACCT_X','SUB_42','subscription_charge','PLAN_PRO', '2025-05-01 00:00','2025-05-01','2025-05-31',100.00,'USD',100.00,'paid','SRC_001'),
-- Day 15: upgrade. Proration credit on PRO (15 days unused = $50)
('EVT_P1','ACCT_X','SUB_42','proration', 'PLAN_PRO', '2025-05-15 14:00','2025-05-15','2025-05-31',-50.00,'USD',-50.00,'applied','SRC_002'),
-- Day 15: prorated charge for ENT (15 days at new rate = $150)
('EVT_P2','ACCT_X','SUB_42','proration', 'PLAN_ENTERPRISE','2025-05-15 14:00','2025-05-15','2025-05-31',150.00,'USD',150.00,'paid','SRC_003');
-- Hourly usage: account exceeds 100 free credits/hour 3 times during the day
INSERT INTO fct_usage_meter VALUES
('ACCT_X','SUB_42','METER_COMPUTE','2025-05-15 09:00:00', 120, 20, 0.10,'USD', 2.00,1.0, 2.00,'2025-05-15 09:05'),
('ACCT_X','SUB_42','METER_COMPUTE','2025-05-15 14:00:00', 180, 80, 0.10,'USD', 8.00,1.0, 8.00,'2025-05-15 14:05'),
('ACCT_X','SUB_42','METER_COMPUTE','2025-05-15 18:00:00', 250, 150, 0.10,'USD',15.00,1.0,15.00,'2025-05-15 18:05');
SELECT
COUNT(*) AS active_subs,
ROUND(SUM(mrr_usd), 2) AS mrr_usd,
ROUND(SUM(mrr_usd) * 12, 2) AS arr_usd
FROM fct_subscription_state
WHERE day = CURRENT_DATE
AND status = 'active';
Q2 — Invoice for SUB_42 in May 2025 (sums all event types)
SELECT
subscription_id,
SUM(amount_usd) FILTER (WHERE event_type = 'subscription_charge') AS subscription_charges,
SUM(amount_usd) FILTER (WHERE event_type = 'usage_charge') AS usage_charges,
SUM(amount_usd) FILTER (WHERE event_type = 'proration') AS proration_net,
SUM(amount_usd) FILTER (WHERE event_type = 'credit') AS credits,
SUM(amount_usd) AS total_due_usd
FROM fct_billing_events
WHERE subscription_id = 'SUB_42'
AND period_start = '2025-05-01'
GROUP BY subscription_id;
B. Customer Success · upsell signal
Q3 — Accounts approaching their plan's overage threshold (upsell candidates)
WITH usage_30d AS (
SELECT subscription_id, meter_id, SUM(billable_quantity) AS used
FROM fct_usage_meter
WHERE hour_bucket_ts >= CURRENT_DATE - 30
GROUP BY subscription_id, meter_id
),
plan_caps AS (
SELECT s.subscription_id, m.meter_id, p.included_quantity
FROM fct_subscription_state s
JOIN dim_plans p USING (plan_id)
CROSS JOIN dim_meters m
WHERE s.day = CURRENT_DATE AND s.status = 'active'
)
SELECT u.subscription_id, u.meter_id, u.used, c.included_quantity,
ROUND(100.0 * u.used / NULLIF(c.included_quantity, 0), 1) AS pct_of_cap
FROM usage_30d u
JOIN plan_caps c USING (subscription_id, meter_id)
WHERE u.used > c.included_quantity * 0.85
ORDER BY pct_of_cap DESC;
-- >85% of cap = upsell trigger; >100% = overage charges hit next invoice.
C. Sales · NRR by cohort
Q4 — Net Revenue Retention by signup cohort
WITH cohort_baseline AS (
SELECT subscription_id, DATE_TRUNC('quarter', MIN(day)) AS cohort_q,
FIRST_VALUE(mrr_usd) OVER (PARTITION BY subscription_id ORDER BY day) AS starting_mrr
FROM fct_subscription_state
),
current_state AS (
SELECT subscription_id, mrr_usd AS current_mrr
FROM fct_subscription_state WHERE day = CURRENT_DATE
)
SELECT cb.cohort_q,
COUNT(DISTINCT cb.subscription_id) AS cohort_size,
ROUND(SUM(cb.starting_mrr), 0) AS starting_mrr_total,
ROUND(SUM(COALESCE(cs.current_mrr, 0)), 0) AS current_mrr_total,
ROUND(100.0 * SUM(COALESCE(cs.current_mrr, 0)) / NULLIF(SUM(cb.starting_mrr), 0), 1) AS nrr_pct
FROM cohort_baseline cb
LEFT JOIN current_state cs USING (subscription_id)
GROUP BY cb.cohort_q
ORDER BY cb.cohort_q;
-- NRR > 100% = expansion outpaces churn (the SaaS holy grail).
Section 8 — Why this works
Hourly usage + daily subscription = right resolution at each grain. Usage is high-volume; subscription is bounded.
Proration is two events, not one. Both credit and charge land in fct_billing_events; invoice rollup picks them up cleanly.
Idempotent at source_event_id. Producer retries don't double-charge.
FX locked per hour. Same precision as the meter; no FX restatement.
Senior framing. "The hourly grain on fct_usage_meter is the integration point — finance rolls it up to invoices, product teams use it for usage analytics, and customer success uses it for adoption signals. Three consumers, one source of truth. Mid-cycle plan changes emit two billing events instead of one updated row — that's what makes the invoice a SUM, not a recompute."
Worked example — SUB_42 mid-cycle upgrade + usage overage
Day 1 (May 1): charged $100 for Pro plan (full month).
Day 15 (May 15): upgrade to Enterprise. Two proration events:
Credit −$50 on Pro (15 unused days × $100/30)
Charge +$150 for Enterprise (15 days × $300/30)
Net delta: +$100
Day 15 (later): usage spikes. Plan includes 100 compute_credits/hr; account uses 250 in one hour → 150 billable at $0.10 = $15 overage.
MRR effect on day 15 onward: jumps from $100 to $300/mo. Visible in fct_subscription_state.mrr_usd daily snapshot.
Drill · answer this in 90 seconds:
Design a model for a SaaS with mixed pricing — flat-rate seats + usage-based metering (per-query, per-GB, per-API-call) — handling mid-cycle plan changes, prorations, overages, free trials, and multi-currency invoicing.
Senior framing. "The hourly grain on fct_usage_meter is the integration point — finance rolls it up to invoices, product teams use it for usage analytics, and customer success uses it for adoption signals. Three consumers, one source of truth. Mid-cycle plan changes emit two billing events instead of one updated row — that's what makes the invoice a SUM, not a recompute."
Streaming royalty distribution (Spotify pool-based payout)
Spotify · streaming royalty pool
Why this is hard — the design tension
Spotify's royalty model is the cleanest example of a multi-party financial fact: each stream is a small event, but the dollars get split across artist + label + publisher + Spotify, with the rate calibrated per-country and the pool refreshed monthly. Mistakes are not just dollars — they're rights-holder lawsuits.
Two competing structural forces:
Pool-based pro-rata (Spotify's choice): every premium subscriber's $$ enters one country pool; each artist gets their streams ÷ total country streams × pool. Fast, simple, predictable.
User-centric (UCPS) (Apple/Deezer testing): each subscriber's $$ goes ONLY to the artists THEY listened to. Fairer to niche artists; computationally heavy at billions-of-users × millions-of-artists scale.
The query in co_sql_305-0375 implements pro-rata; the modeling section here covers the schema that makes both possible without a rewrite.
fct_subscription_revenue — monthly per (user, country, tier)
The denominator pool source
Fact (snapshot)
fct_country_pool — monthly per country, computed from subscription revenue × ~70%
Refreshed at month-close; never recomputed
Fact (output)
fct_payout — one row per (track, party_role, country, month)
Frozen at close; audit trail
Enhanced graphical data model — Spotify Royalty Pool · Pro-Rata Distribution
Section 2 — The crucial SCD2 — bridge_track_rightsholder
The same track can have its label, publisher, or featured-artist split CHANGE between months. A pre-Sept rights deal pays Label A; a Sept-onward deal pays Label B. Rolling up against the current rights table gives wrong attribution for August streams.
SELECT t.track_sk, b.party_sk, b.party_role, b.share_pct
FROM fct_stream s
JOIN bridge_track_rightsholder b
ON b.track_sk = s.track_sk
AND s.played_at BETWEEN b.effective_from AND COALESCE(b.effective_to, '9999-12-31')
Every query joins by event-time, never by current. The bridge supports BOTH pro-rata (sum streams × share) AND UCPS (sum user_share × stream_share) without schema rewrite.
Section 3 — Why the pool table is its own fact
The country pool isn't computed at query time — it's frozen at month-close. Storing it in fct_country_pool:
Auditors can replay any month's payout from the same inputs.
Late-arriving streams (offline mode upload Day 35) attribute to the month-of-event, not month-of-ingest.
If the calculation logic changes (e.g. UCPS rollout), past months' payouts are unaffected.
Section 4 — The 30-second royalty boundary
Industry standard: streams under 30 seconds DO NOT trigger royalty. Embed at the fact-table level:
CREATE TABLE fct_stream (
...
played_seconds INTEGER NOT NULL,
is_royalty_eligible INTEGER GENERATED ALWAYS AS (played_seconds >= 30) STORED
);
Every downstream query can filter WHERE is_royalty_eligible with no risk of forgetting. The flag is denormalized for performance; computed once at write.
Section 5 — Fraud-stream filtering
Bot-driven streams (5-second-loop attacks to inflate small artists) are real. Anti-fraud writes a side flag rather than deleting events:
fct_stream.is_fraudulent INTEGER (default 0)
-- Updated by a separate Spark job, not the player
Why update-not-delete: legal teams want the audit trail of "what was claimed vs paid". Deleted rows lose the "we caught X% of fraud" reporting capability.
Senior framing. "The bridge_track_rightsholder is SCD2 because rights deals change between accounting periods, and the country_pool is its own fact table because the month-close freezes the denominator. Together they let me support pro-rata today and UCPS later without a rewrite — the schema doesn't bake in the distribution algorithm."
TikTok For-You-Page recommendation feedback loop
TikTok · recommendation feedback loop
Why this is hard — the design tension
The For-You-Page is the most-watched ML system on the planet — billions of impressions per day, sub-200ms latency, and the recommendations train on themselves. Modeling this without producing biased feedback loops is the entire game.
The trap: echo-chamber bias. The ranker boosts videos that are already getting watched; users see them more; they get watched more; they're boosted more. New videos and new creators starve. The schema must support both online learning (continuous updates) and counterfactual evaluation (was the recommendation actually causal?).
fct_exploration_event — random or holdout videos (5% traffic)
Critical for causal estimation
Fact (model)
fct_ranker_score — predicted engagement at impression-time
Stored alongside impression for off-policy evaluation
Enhanced graphical data model — TikTok For-You-Page · Recommendation Feedback Loop
Section 2 — The exploration table is non-negotiable
fct_exploration_event stores the 5% of impressions that were RANDOM (or from a holdout pool). Without this, you can't measure causality:
"Watched videos liked by similar users" gets clicked 80% — is that because the algorithm is good, or because the videos were going to win anyway?
Random exploration data answers: "if we showed this video to 1000 random users, how many would have engaged?" — the counterfactual baseline.
Model lift = (algorithmic CTR − exploration CTR) ÷ exploration CTR. Without exploration, you can't compute lift, just attribution.
Every recommendation system at FAANG scale has this. Drop the term "off-policy evaluation" in interviews.
Section 3 — Storing the predicted score
The ranker's predicted engagement score (e.g., predicted_watch_seconds = 18.3) is stored AT IMPRESSION TIME, not query time. Why:
Models are versioned. Today's prediction was made by ranker v117; replaying with v118 changes everything.
"Was the model right?" requires comparing prediction-then to outcome-now. Storing the score freezes the test.
Production debugging: a video shown 10M times with high score but low actual engagement is the diagnostic signal of model drift.
Section 4 — The cold-start lever
New videos have no engagement history; the ranker can't predict. Two design moves:
Lever
Schema implication
Forced exploration in the first 1000 impressions
fct_video_lifecycle.bootstrap_pool_remaining — counted down per impression
Inherit creator's prior + sound's prior
Joining to dim_creator.recent_engagement_rate at impression-time as a fallback feature
Section 5 — The shadow-traffic pattern for model rollouts
New ranker versions get shadow traffic: same impressions served, but new model also produces predictions logged to fct_ranker_score_shadow. If shadow looks good, promote to a 1% A/B; if A/B wins, ramp to 100%.
Schema requirement: fct_ranker_score has a model_version column. Production runs 5-10 ranker versions concurrently — all writing to the same table, joined by version for evaluation.
Section 6 — Privacy + retention
Engagement data is high-PII. The schema separates:
fct_engagement — keyed by user_id_token (not user_id directly)
vault.user_id_map — encrypted token mapping, separate access pool
30-day retention on raw events; aggregated rollups (per-user-per-day-per-video-cluster) kept indefinitely
Senior framing. "The 5% exploration table is what makes the system improvable — without it, every metric is confounded by the ranker's own selection. Storing predicted_score at impression-time enables off-policy evaluation: I can compare ranker v117 vs v118 on the SAME impressions retroactively, which is how we ship safely without 100% A/B rollouts."
Stripe payment ledger (double-entry, replay-safe)
Stripe · double-entry ledger & idempotency
Why this is hard — the design tension
Stripe processes ~$1T/year. Every cent must reconcile across charges, refunds, fees, payouts, FX, taxes, chargebacks — and survive replay, retries, and partial failures. The whole system is an append-only double-entry ledger: nothing is updated, only debited or credited. This is the same model banks have used for centuries; the schema is just the SQL realization.
bridge_charge_journal — links a charge to its journal-entry rows
One charge → many entries (revenue + fee + tax)
Enhanced graphical data model — Stripe Double-Entry Ledger · Append-Only · SUM=0
Section 2 — The double-entry contract
Every transaction emits a set of journal entries. Their signed amounts MUST sum to zero per transaction_id. This is the core integrity constraint:
-- A $100 charge with a $3 Stripe fee:
INSERT INTO fct_journal_entry (txn_id, account, currency, amount) VALUES
('TXN_001', 'customer_xyz', 'USD', -100.00), -- customer debited $100
('TXN_001', 'merchant_abc', 'USD', 97.00), -- merchant credited $97
('TXN_001', 'stripe_revenue','USD', 3.00); -- Stripe credited $3
-- SUM = 0 ✓
Run a daily integrity check:
SELECT txn_id, SUM(amount) AS net
FROM fct_journal_entry
GROUP BY txn_id
HAVING ABS(SUM(amount)) > 0.001;
-- ANY row returned = data corruption, page on-call
Section 3 — Idempotency via natural key
Stripe's API is idempotent: same idempotency key + same body = same result. The schema enforces:
UNIQUE (idempotency_key, account_id, txn_type)
ON fct_journal_entry
A retry tries to insert a duplicate row → unique violation → application catches and returns the original txn. Replay-safe by construction.
Section 4 — Refunds are NEW entries, not deletes
Junior engineers mark a charge as "refunded" with an UPDATE. Wrong. The double-entry ledger appends:
Both transactions exist forever. fct_account_balance is just SUM(amount) GROUP BY account — always derivable, never wrong.
Section 5 — Multi-currency lock
Charge in EUR, Stripe books revenue in USD, merchant settles in GBP. Three currencies, one transaction. Three journal entries with different currency columns + an fx_rate_locked_at_event column on each. Querying "what did we earn in March in USD?" sums the USD-denominated entries — never re-does FX.
Section 6 — Chargebacks are 60-90 days late
Card networks send chargebacks 30-90 days after the charge. The original transaction has long since posted; the chargeback emits a NEW journal entry with references_txn = 'TXN_001'. The merchant balance silently goes down 60 days later. Schema accommodates: monthly close-of-books locks the past, but new chargeback rows can still arrive — they go to a "post-close adjustment" sub-period.
Senior framing. "The append-only double-entry ledger is the only model where SUM-equals-zero is an enforceable integrity constraint at the database level. Refunds, chargebacks, FX adjustments are all NEW journal entries, not updates — that's what makes the entire system replay-safe and auditable. The unique constraint on idempotency_key turns API retries into no-ops."
Salesforce account hierarchy + multi-tenant CRM
Salesforce · account hierarchy & multi-tenant
Why this is hard — the design tension
Enterprise CRM has a problem most B2C platforms never face: nested organizations. JP Morgan has 100 subsidiaries, each with their own opportunity flow, but the renewal happens at the parent level. A revenue rollup that doesn't aggregate up the tree is wrong; one that aggregates incorrectly double-counts subsidiaries.
Plus multi-tenancy: every Salesforce customer's data is in the same physical tables, partitioned by org_id. One bad query reads someone else's data. The schema design and the row-level security stack must enforce isolation absolutely.
Section 1 — Dimensional Model
Type
Tables
Notes
Dim
dim_account — recursive: parent_account_id FK to self
Salesforce uses a hybrid: parent_id for writes + materialized path for read perf, with a daily job to rebuild paths. Mention all three in the interview.
Section 3 — Rollup query — the recursive CTE
WITH RECURSIVE descendants AS (
SELECT account_id, account_id AS root, 0 AS depth FROM dim_account WHERE parent_account_id IS NULL
UNION ALL
SELECT a.account_id, d.root, d.depth + 1
FROM dim_account a JOIN descendants d ON a.parent_account_id = d.account_id
WHERE d.depth < 10 -- circuit breaker
)
SELECT d.root AS top_level_account,
SUM(s.arr) AS total_subtree_arr,
COUNT(DISTINCT a.account_id) AS subsidiaries
FROM descendants d
JOIN fct_account_arr_snapshot s ON s.account_id = d.account_id
GROUP BY d.root;
Section 4 — Multi-tenant isolation
Every fact and dim table has a leading org_id column. The platform enforces:
Partition key: org_id is part of the primary key
Index leading column: org_id-first composite indexes
Row-level security: CREATE POLICY filters every query — engineers can't access a tenant's data without going through the API layer
Per-tenant query caps: a runaway query in one org doesn't take down the cluster
Section 5 — Record sharing (the bridge)
Salesforce's "share with my manager" feature: opportunity Y is owned by rep A but visible to manager B + VP C. Modeled as bridge_account_user_access(account_sk, user_sk, permission_level, granted_via). Sharing rules cascade up the role hierarchy automatically — bridge rows generated by a daily job from the role tree.
Section 6 — Pipeline velocity from stage changes
Don't store "current_stage" as a column on the opportunity — that's lossy. Store every stage change as an event in fct_opportunity_stage_change(opp_id, stage, entered_at). Current stage is the latest event:
SELECT opp_id, stage AS current_stage
FROM (
SELECT opp_id, stage, ROW_NUMBER() OVER (PARTITION BY opp_id ORDER BY entered_at DESC) AS rn
FROM fct_opportunity_stage_change
) WHERE rn = 1;
Velocity, conversion-by-stage, time-in-stage all derivable from the same table. Single source of truth for "everything pipeline".
Senior framing. "I'd model account hierarchy with both parent_id (for clean writes) AND a materialized path column (for fast subtree reads), refreshed nightly. Multi-tenant isolation is enforced at THREE layers — partition key + composite-index leading column + row-level security policy — because losing tenant data isolation is an existential bug. Pipeline state lives in a fact table of stage-changes, never as a current-stage column on the opportunity, because that loses time-in-stage and velocity."
Netflix streaming, series + bingewatching, late-arriving dimensions
Netflix · streaming + series + LAD
Why this is hard — the design tension
Netflix has the data problem most consumer platforms can only dream of: 250M subscribers, 15B+ play events per day, content metadata that propagates regionally over minutes-to-hours. Three structural challenges intersect:
Late-arriving dimensions (LAD) — a Netflix Original drops at midnight Pacific; play events fire before the title metadata propagates to every regional shard. INNER JOIN drops the views silently and Content Strategy stares at "0 plays in the first 3 minutes" of the most-anticipated launch of the quarter.
Series-vs-episode-vs-season grain — "Stranger Things S5 E1" is one episode, of one season, of one series, of one franchise. Aggregations must roll up cleanly without double-counting (a viewer watched 8 episodes — that's 1 series-bingewatcher, not 8 series-views).
Bingewatching as a derived behavior — bingewatching isn't recorded; it's computed from the play stream. The schema must support both per-session binge (3+ episodes within 24h) and per-day binge (any 3+ episodes of same series in a calendar day) without a re-write.
For temporal-correct attribution of late-arriving plays
Enhanced graphical data model — Netflix · Streaming + Series + Bingewatching + LAD
Section 2 — The LAD-defensive query (the make-or-break pattern)
Every play-event-to-content-dim join must be defensive. The shape:
WITH current_episode AS (
SELECT episode_id, title, series_id, season_number, episode_number,
ROW_NUMBER() OVER (PARTITION BY episode_id ORDER BY effective_date DESC) AS rn
FROM dim_episode
)
SELECT p.play_id, p.account_id, p.profile_id,
COALESCE(e.title, 'Episode ID: ' || p.episode_id || ' (Pending Sync)') AS title,
COALESCE(e.series_id, -1) AS series_id_safe,
COALESCE(e.season_number, -1) AS season_number_safe,
p.watch_minutes
FROM fct_play_event p
LEFT JOIN current_episode e ON e.episode_id = p.episode_id AND e.rn = 1;
Three things working together: LEFT JOIN keeps every play; ROW_NUMBER + rn=1 picks the most-recent SCD2 dim version; COALESCE with a searchable placeholder ("Pending Sync") so the BI tool's filter dropdown surfaces orphans. The orphan-rate SLO is < 0.5% — above 1% pages the data-platform on-call.
Section 3 — Series + episode roll-up — the bridge approach
"Top binged series this week" requires going from episode → series. Naive approach: store series_id directly on every play. Problem: when a series is re-titled (Marvel acquires a show, renames it), all historical plays now point to the wrong series name.
The bridge approach: store only episode_id on the fact (the immutable identity); join through SCD2 dim hierarchy at query time, with temporal-correctness via the bridge:
SELECT s.series_title, COUNT(DISTINCT p.account_id || '|' || s.series_id) AS unique_viewers
FROM fct_play_event p
JOIN dim_episode e USING (episode_id)
JOIN dim_series s
ON s.series_id = e.series_id
AND p.played_at BETWEEN s.effective_from AND COALESCE(s.effective_to, '9999-12-31')
GROUP BY s.series_title
ORDER BY unique_viewers DESC;
Plays that happened during the OLD series_title attribute correctly to the OLD title; new plays attribute to the new. No backfill required when re-titling.
Section 4 — Bingewatching as a derived fact
Bingewatching is not an event — it's a pattern. Two co-existing definitions need separate derived facts:
Metric
Definition
Use case
Per-session binge
3+ episodes of same series within 24h, with < 30-min gap between episodes
Algorithm tuning — "auto-play next" decisioning
Per-day binge
Any 3+ episodes of same series in a calendar day
Marketing analytics — "% of subscribers who binged this month"
-- fct_binge_session derivation, refreshed nightly
WITH same_series_runs AS (
SELECT account_id, profile_id, series_id, episode_id, started_at,
LAG(started_at) OVER (PARTITION BY account_id, profile_id, series_id ORDER BY started_at) AS prev_start
FROM fct_play_event
),
sessioned AS (
SELECT *,
SUM(CASE WHEN prev_start IS NULL OR started_at - prev_start > INTERVAL '30 min' THEN 1 ELSE 0 END)
OVER (PARTITION BY account_id, profile_id, series_id ORDER BY started_at) AS session_id
FROM same_series_runs
)
SELECT account_id, profile_id, series_id, session_id,
MIN(started_at) AS session_start, COUNT(*) AS episodes_in_session
FROM sessioned
GROUP BY account_id, profile_id, series_id, session_id
HAVING COUNT(*) >= 3;
Section 5 — Series-streak detection — gaps-and-islands per (user, series)
"Watched at least one episode every day for 5 days" is the engagement loyalty signal. Different shape from binge — same user across CONSECUTIVE days, not multiple episodes in one day. The pattern is gaps-and-islands:
WITH active_days AS (
SELECT DISTINCT account_id, series_id, DATE(started_at) AS active_date
FROM fct_play_event
),
islands AS (
SELECT account_id, series_id, active_date,
julianday(active_date) - ROW_NUMBER() OVER (PARTITION BY account_id, series_id ORDER BY active_date) AS island_id
FROM active_days
)
SELECT account_id, series_id,
MIN(active_date) AS streak_start, MAX(active_date) AS streak_end, COUNT(*) AS streak_days
FROM islands
GROUP BY account_id, series_id, island_id
HAVING COUNT(*) >= 5;
Per-series streaks reveal which shows hook viewers — the "people kept coming back to Stranger Things every day for a week" insight. Used by content acquisition for renewal decisions.
Section 7 — The QoE side-table
Quality-of-experience (rebuffering, bitrate switches, startup latency) is high-volume — heartbeats every 60s. Storing on the play fact bloats it. Production splits:
fct_play_event — one row per session, low volume, hot
fct_play_heartbeat — many rows per session, high volume, downsampled to 5-min in warehouse
fct_qoe_summary — derived nightly, rebuf-time / total-time per session
Analysts query fct_qoe_summary 99% of the time; raw heartbeats are reserved for SRE incident drills.
Section 8 — Late-arriving FACTS — the offline-mobile problem
A user watches an episode offline on the subway. The mobile uploads the play event 4 hours later. By then the dim_episode might have moved on (re-genre'd). The defensive temporal join handles it:
JOIN dim_episode e
ON e.episode_id = p.episode_id
AND p.played_at BETWEEN e.effective_from AND COALESCE(e.effective_to, '9999-12-31')
The play attributes against the dim version that was current AT THE TIME OF THE EVENT, not now. This is the LAD-FACT cousin of the LAD-DIM problem — same SCD2 mechanism, opposite direction of lateness.
Senior framing. "The defensive LAD pattern (LEFT JOIN + ROW_NUMBER + COALESCE) is non-negotiable on every play-to-episode join because launch-day metadata propagation is measured in minutes and we can't lose the launch-day analytics. Bingewatching is two distinct derived facts — per-session (algorithm input) and per-day (marketing rollup) — both materialized nightly so analyst queries don't re-derive. The episode-only fact + temporal series_title bridge means re-titling a show doesn't require backfilling 5 years of plays. And profile_id alongside account_id on every event is what makes the password-sharing crackdown queries possible without join gymnastics."
Section 6 — Profile-vs-account distinction
Netflix's profile model is unusual: one billing account with up to 5 profiles. Mom watches drama; Kid watches cartoons. Modeling consequences:
Account-grain facts — billing, plan tier, payment events. These cross profile boundaries.
Profile-grain facts — every play, search, browse, rating. These are profile-keyed.
Both keys on every event — fct_play_event has BOTH account_id AND profile_id, never just one. Otherwise rolling up "minutes per account" requires joining back through dim.
The 2023 password-sharing crackdown depended on this distinction — accounts with profiles being used in geographically-incompatible patterns flagged for sharing.
M&A integration, golden records, schema strangler, multi-org metrics
M&A integration · golden records
Why this is hard — four data problems pretending to be one deal
Every acquisition press release says "stronger together." Every acquirer's data team knows that means two incompatible CRMs, two warehouses, two ID systems, two fiscal calendars, and an executive expecting unified KPIs on day 90. The interview question: "your company just bought a $20B SaaS — walk me through the data integration". The senior signal is naming all four problems first, then sequencing them:
Entity resolution — is their Acme Corp the same as our Acme Corp? Golden record + crosswalk + an ordered rule playbook (strict-then-fuzzy).
Schema unification — both apps keep shipping; can't freeze either DB. Strangler pattern with dual-write + CDC + reconciliation.
Warehouse consolidation — Snowflake vs Databricks vs BigQuery. Three strategies: big-bang, federated query, or share-and-bridge.
Multi-org metrics — their DAU isn't your DAU. Fiscal calendars, FX policy, consolidation rules, overlapping ARR. Versioned dim_metric_definition.
Full deep-dive: the M&A Integration page walks all four patterns end-to-end with Adobe-Figma, Microsoft-Activision, Cisco-Splunk, and Salesforce-Slack as live case studies.
Section 1 — Dimensional Model
Type
Tables
Notes
Master data — golden
golden_customer, golden_account, golden_user
Unified entity; SCD2 on canonical fields; lineage via sources[]
Crosswalk — xref
customer_xref, account_xref, user_xref
Source ID ↔ golden ID; append-only with superseded_by for un-merge
Source — per system
acquirer_crm.customers, target_crm.tenants
Kept native; both apps continue writing during strangler
You never decide "these two rows are the same customer" using one rule. You run an ordered list of rules from strict to permissive. The first rule that fires wins — same idea as airport security: passport check first (definite ID), then bag scan, then pat-down. Strict rules auto-merge; fuzzy rules send candidates to a human analyst. (Industry jargon: this is sometimes called the "match-key cascade" — same thing, fancier name.)
Rule #
Type
What it checks
Confidence
What happens
1
Strict
Same email domain AND same DUNS number
0.99+
Auto-merge into one golden record
2
Strict
Same domain AND normalized company name
0.95+
Auto-merge, sample for QA
3
Fuzzy
Similar name (typo-tolerant) + same country + industry
0.80+
Human reviews via match_candidates
4
Fuzzy
3+ employee emails shared (@acme.com on both)
0.70+
Human reviews; escalate
5
No match
Nothing fires
—
Distinct golden record
Section 3 — The un-merge — the part everyone forgets
Two months in, an MDM analyst realizes "Acme Corp Holding" and "Acme Corporation" were merged but are actually parent/subsidiary. You need to un-merge without losing referential integrity in the downstream pipelines that already consumed the wrong golden ID.
The pattern: superseded_by column on customer_xref + append-only crosswalk + SCD2 on golden_customer. Never DELETE; retire the golden record, point affected source IDs at a new golden_id, log who/when, emit CDC event so fact tables can re-attribute. Without this, the first analyst un-merge request is a 2-week incident.
Section 4 — The strangler dual-write pattern
Both apps keep shipping during a 6-18-month convergence. The pattern: stand up unified schema alongside source; CDC every source change to a translation layer; backfill historicals; cutover one entity at a time; reconcile daily.
-- Reconciliation query (runs hourly during cutover)
WITH source_counts AS (
SELECT 'acquirer_crm' AS src, COUNT(*) AS n, MAX(updated_at) AS last_change
FROM acquirer_crm.customers
UNION ALL
SELECT 'target_crm', COUNT(*), MAX(updated_at) FROM target_crm.tenants
),
unified_counts AS (
SELECT source_system AS src, COUNT(*) AS n, MAX(ingested_at) AS last_ingest,
COUNT(*) FILTER (WHERE ingest_status = 'rejected') AS rejected
FROM unified.customer_ingest_log GROUP BY source_system
)
SELECT s.src, s.n AS source_rows, u.n AS unified_rows,
100.0 * ABS(s.n - u.n) / NULLIF(s.n, 0) AS drift_pct,
u.rejected,
EXTRACT(EPOCH FROM (s.last_change - u.last_ingest)) AS lag_seconds
FROM source_counts s LEFT JOIN unified_counts u USING (src);
Drift > 0.1% pages on-call. Lag > 5 min pages on-call. Cutover gate: 72 consecutive hours green.
Section 5 — Warehouse consolidation — three strategies
Strategy
How
When
Big-bang migration
Copy data + rewrite dbt models from B → A
Strategic incompatibility; < 2-yr post-merger
Federated query
Snowflake External Tables / DBX Unity / Trino across both
Every real M&A uses share-and-bridge for the first 6 months, then commits. Tag every query/cluster with cost_owner = 'pre-merger-target' | 'pre-merger-acquirer' | 'post-merger-combined' so the CFO's "how much of last month's bill came from the acquired team" question has an answer.
Section 7 — Case studies (see the deep-dive)
Deal
Signature challenge
Adobe → Figma ($20B, blocked)
Creative Cloud ID ↔ Figma email: ~40% pro-user overlap; NDR math would shift on day 1
Microsoft → Activision ($69B)
Federated B2C identity across Xbox Live / Battle.net / PSN / Steam; 8 regulator-specific metric versions
Cisco → Splunk ($28B)
Two data-platform vendors; Iceberg as lingua franca; "ingested GB normalized for retention class" as unified billing-equivalent
Salesforce → Slack ($27.7B)
Account ≠ Workspace (one Account often spans many Workspaces); cross_product_active as cross-sell signal
Senior framing. "M&A integration is four orthogonal data problems — entity resolution, schema unification, warehouse consolidation, multi-org metrics — and the senior-IC mistake is shipping a single 'integration plan' that conflates them. Entity resolution gives you the golden record via an ordered rule playbook (strict-then-fuzzy) + append-only crosswalk with superseded_by for un-merge. The strangler pattern lets both apps keep shipping while CDC dual-writes through a translation layer; daily reconciliation gates the cutover. Warehouse consolidation can wait behind cross-account shares for 6 months. Metric definitions become versioned dim rows the board deck can audit. Adobe-Figma, Microsoft-Activision, Cisco-Splunk, and Salesforce-Slack all hit the same four; the only difference is which one was hardest in their context."
The board deck has "combined DAU" on slide 3. It's wrong, because acquirer counts "any session in 24h" and target counts "any session in 7d, deduped to one row per user-day". Bake definitions into a versioned dim instead of into SQL:
CREATE TABLE dim_metric_definition (
metric_id TEXT NOT NULL, -- 'dau'
definition_version INT NOT NULL,
org_scope TEXT NOT NULL, -- 'acquirer' | 'target' | 'combined'
window_unit TEXT NOT NULL, -- 'hour' | 'day' | 'week'
window_size INT NOT NULL,
dedup_grain TEXT NOT NULL, -- 'user' | 'user-day' | 'session'
fiscal_calendar TEXT NOT NULL, -- 'calendar' | '445'
currency_policy TEXT NOT NULL, -- 'lock-at-txn' | 'lock-at-month-end'
effective_from DATE NOT NULL,
effective_to DATE,
PRIMARY KEY (metric_id, definition_version)
);
Every dashboard tile cites (metric_id, definition_version). Combined-DAU has three competing versions; finance picks the one that matches the board narrative; the data layer auto-generates the SQL from the dim row.
Why this is hard — five problems pretending to be one model
Every interview asks the same question: "design the data model for a fraud system at a fintech / marketplace / streaming product." Junior answers stop at "a transaction table with a risk_score column." The senior signal is naming the five orthogonal sub-problems first:
Identity resolution — the SAME user appears across 3 emails, 2 phones, 4 devices, 5 IPs. Build a graph, not a row. Strong edges (same payment instrument) vs weak edges (same IP within 1h).
Impossible travel / multi-geo — login from Lagos, transaction from Berlin 18 min later. Velocity of physical position vs known device/SIM.
Account sharing detection — one user, many simultaneous-session devices/geos (Netflix family-plan abuse, Spotify subscription sharing). Inverse of multi-account fraud: one identity using many accounts looks like sharing; many identities sharing one account looks like ATO.
Decision replay — at audit time you must reproduce why you blocked the September 14 transaction. Means: freeze the model version, ruleset version, and feature snapshot at decision time. SCD2 everywhere.
Feedback loop — chargebacks come back 30-180 days later; analyst verdicts hours later. Both must flow back as labels to retrain the next model — without the loop, the model degrades silently.
Common interview pitfall: conflating multi-account fraud (one bad actor opens 50 accounts to abuse signup bonuses) with account sharing (one legit user shares Netflix with their parents) with account takeover (one bad actor uses one legit user's compromised account). All three look like "identity ↔ account doesn't match 1:1" in the data — the schema must distinguish them, or every detection rule produces the wrong false-positive class.
Append-only edges with edge_strength (1=strong: shared payment instrument; 0.3=weak: shared IP > 24h apart). identity_cluster.cluster_id rolls up edges into a connected component
Decision-context dimensions (frozen)
dim_ruleset, dim_risk_model, dim_feature_snapshot
SCD2 ESSENTIAL — the only way to replay a 6-month-old decision. ruleset_id + model_id live on every fact row. Feature snapshot is a frozen subset of computed values used by the model at decision time
One row per attempt. Includes risk_score_at_decision, decision (allow/step_up/block), ruleset_id, model_id, feature_snapshot_id
Signal facts (append-only)
fact_rule_firing, fact_model_score
One row per rule firing or model evaluation. Lets you trace which signals contributed to a decision. Volume is high — partition by day, keep 90-day hot tier
Review & labels
fact_case, fact_analyst_label, fact_chargeback
Cases opened on threshold cross. Analyst verdicts (fraud / clean / inconclusive) and chargebacks (30-180 day delayed) feed back as labels for the next training cycle
Outcome facts (delayed)
fact_outcome_label
Materialized union of analyst labels + chargebacks; the training-set source of truth. Reconciles disagreements (analyst said clean, chargeback came back 90d later → final label = fraud)
Section 2 — Enhanced graphical data model — Fraud · Identity Graph + Decision Replay + Feedback Loop
Section 3 — The Identity Graph (the part everyone gets wrong)
The graph turns "is this account fraudulent?" into "what's the connected component of strong edges this account participates in?" Edges have a strength:
Strong (1.0): same payment instrument used by two accounts; same KYC document hash; same SIM
Medium (0.5): same device fingerprint observed within 24h on both accounts
Weak (0.2): same IP observed within 1h; same residential ASN within a week
Compute the identity_cluster via connected components on edges with edge_strength >= 0.5 (run nightly; incremental for new edges). Strong-only clusters are the action graph; weak-edge graphs feed the analyst as "these may be related but require human judgement."
Section 5 — Worked numerical example · the Lagos-Berlin transaction
22:47 UTC. A user account in good standing for 18 months attempts a €4,200 transaction.
Step
Signal
Score delta
Running
1
Base score (model v7.2 prior probability)
+12
12
2
Login from Lagos 22:29 (last known geo: Berlin 18 min earlier)
+38 (impossible travel rule)
50
3
IP is residential proxy (asn_risk_class = 'high')
+15
65
4
Device fingerprint NEW for this user — first seen 4 min before login
+12
77
5
Payment instrument: NEW card, BIN issued in a different country than user KYC
+8
85
6
Cluster lookup: this device fingerprint linked to 3 prior chargeback users (strong edge)
+25
110
—
Threshold for block: 100
BLOCK
Decision is written to fact_transaction with decision='block', risk_score=110, plus the frozen context: ruleset_id=42, model_id='v7.2', feature_snapshot_id=$decision_id. The 6 contributing rules each get a row in fact_rule_firing.
A case opens, analyst reviews in 4 hours, confirms fraud → label written to fact_analyst_label. The graph-expansion query above runs on the cluster the device fingerprint belongs to; 17 additional accounts are surfaced, of which 12 have transactions in the last week. Fraud ops freezes them all pending review. None had crossed the per-account threshold individually; only the cluster lookup tied them together.
180 days later: a chargeback arrives on one of the 17 (txn from before the analyst freeze). fact_chargeback records it idempotently (network deduplicates retries). fact_outcome_label updates: where the analyst said 'fraud' AND the chargeback agreed → final_label='fraud', source='agreed'. The retrain pipeline picks this up Friday and the new model_id v7.3 ships Monday with this case in its training set.
Section 4 — Stakeholder SQL · grouped by who asks
Fraud analyst — case queue ordered by SLA risk
SELECT c.case_id, c.opened_ts, c.sla_hours,
EXTRACT(EPOCH FROM (NOW() - c.opened_ts))/3600 AS hours_open,
t.amount, t.merchant, u.email, u.risk_tier
FROM fact_case c
JOIN fact_transaction t ON c.txn_id = t.txn_id
JOIN dim_user u ON t.user_id = u.user_id AND u.is_current
WHERE c.closed_ts IS NULL
ORDER BY (c.sla_hours - EXTRACT(EPOCH FROM (NOW() - c.opened_ts))/3600);
Risk PM — false-positive rate by ruleset version (last 30d)
SELECT t.ruleset_id, r.rules_json->'name' AS ruleset_name,
COUNT(*) FILTER (WHERE t.decision = 'block') AS blocked,
COUNT(*) FILTER (WHERE t.decision = 'block' AND ol.final_label = 'clean') AS fp,
ROUND(100.0 * COUNT(*) FILTER (WHERE t.decision = 'block' AND ol.final_label = 'clean')
/ NULLIF(COUNT(*) FILTER (WHERE t.decision = 'block'), 0), 2) AS fp_pct
FROM fact_transaction t
JOIN dim_ruleset r ON t.ruleset_id = r.ruleset_id
LEFT JOIN fact_outcome_label ol ON t.txn_id = ol.txn_id
WHERE t.ts >= NOW() - INTERVAL '30 days'
GROUP BY t.ruleset_id, r.rules_json->'name'
ORDER BY fp_pct DESC;
Detection engineer — impossible-travel signal (same user, two geos < possible flight time apart)
WITH paired AS (
SELECT s.user_id, s.ts AS ts_a, s.geo AS geo_a,
LEAD(s.ts) OVER (PARTITION BY s.user_id ORDER BY s.ts) AS ts_b,
LEAD(s.geo) OVER (PARTITION BY s.user_id ORDER BY s.ts) AS geo_b
FROM fact_session_event s
)
SELECT user_id, ts_a, geo_a, ts_b, geo_b,
EXTRACT(EPOCH FROM (ts_b - ts_a))/60 AS gap_minutes,
great_circle_km(geo_a, geo_b) AS distance_km
FROM paired
WHERE ts_b IS NOT NULL
AND geo_a <> geo_b
AND great_circle_km(geo_a, geo_b) / NULLIF(EXTRACT(EPOCH FROM (ts_b - ts_a))/3600, 0) > 900 -- > 900 km/h ⇒ faster than any plane
ORDER BY gap_minutes;
SELECT user_id, COUNT(DISTINCT geo) AS distinct_geos_24h,
COUNT(DISTINCT device_id) AS distinct_devices_24h,
MAX(great_circle_km_between_pairs) AS max_pair_km
FROM (
SELECT s.user_id, s.geo, s.device_id, s.ts,
great_circle_km(s.geo, LAG(s.geo) OVER (PARTITION BY s.user_id ORDER BY s.ts)) AS great_circle_km_between_pairs
FROM fact_session_event s
WHERE s.ts >= NOW() - INTERVAL '24 hours'
) p
GROUP BY user_id
HAVING COUNT(DISTINCT geo) >= 3 AND COUNT(DISTINCT device_id) >= 3;
-- 3+ geos AND 3+ devices in 24h = candidate for shared-account review,
-- separate from impossible-travel (which is about velocity, not parallelism).
Data scientist — feature-audit for replay (what did model v7.2 see for txn X?)
SELECT t.txn_id, t.risk_score, t.decision,
t.model_id, m.sha, m.training_cutoff,
fs.feature_vector_json
FROM fact_transaction t
JOIN dim_risk_model m ON t.model_id = m.model_id
JOIN dim_feature_snapshot fs ON t.feature_snapshot_id = fs.feature_snapshot_id
WHERE t.txn_id = $1;
-- Reproduces the EXACT inputs the model used 6 months ago. Without this, every
-- "why did you block this customer?" audit becomes "we don't know."
Fraud ops — cluster expansion (the moment you find one mule, find the other 47)
WITH RECURSIVE walk AS (
SELECT user_id, 0 AS hop
FROM fact_analyst_label
WHERE case_id = $1 AND label = 'fraud'
UNION
SELECT CASE WHEN e.user_a = w.user_id THEN e.user_b ELSE e.user_a END,
w.hop + 1
FROM walk w
JOIN brg_user_device e ON w.user_id IN (e.user_a, e.user_b)
WHERE e.edge_strength >= 0.5 AND w.hop < 4
)
SELECT DISTINCT user_id, MIN(hop) AS hops_from_seed
FROM walk
GROUP BY user_id
ORDER BY hops_from_seed;
-- Expands outward 4 hops on strong edges. Surfaces accounts the analyst
-- didn't know were related — typical haul: 30-200x the seed account count
-- for mule rings, 2-5x for friendly fraud.
Section 6 — Senior framing
Senior framing. "Fraud is five orthogonal problems pretending to be a model. Identity resolution is a graph with typed edges of varying strength — strong edges form the action cluster, weak edges feed the analyst. Decision replay forces SCD2 on ruleset + model + feature snapshot, with all three IDs stamped on every fact row — otherwise you cannot defend a block to a regulator six months later. Account sharing and account takeover and multi-account fraud look identical in raw transactions (identity ↔ account not 1:1) and the schema must disambiguate them, because each has a different false-positive class. The feedback loop matters more than the model — analyst verdicts come back in hours, chargebacks in 30-180 days, and the fact_outcome_label union is the training set of record. The interview red flag is "we add a risk_score column to transactions" without graph, frozen decision context, or feedback fact. Real systems at Stripe, Wise, Coinbase, PayPal all land on this shape; the differences are which graph backend (Neo4j vs JanusGraph vs a Postgres recursive CTE), whether the cluster job is real-time or batch, and how aggressively the feature store separates online from offline."
Growth accounting · DAU/WAU/MAU, the new/retained/resurrected/churned state machine, cohort LTV
Growth accounting · DAU/WAU/MAU & LTV
Why this is hard — a single MAU number hides a leaky bucket
"How many monthly active users do we have?" is the question a junior answers with COUNT(DISTINCT user_id). The senior answer: that number is nearly useless on its own. A flat 10M MAU can mean a healthy stable product OR a bucket leaking 2M users a month and refilling with 2M new ones — a business about to fall off a cliff. The framework that fixes this is growth accounting (the Social Capital / Reforge model), and it forces four design problems:
Define the activity event. Netflix counts a viewing session ≥ 2 min; Uber counts a completed trip; Spotify counts a track play ≥ 30s. The grain of fact_activity IS the product's definition of "active" — get it wrong and every downstream metric is wrong.
The state machine. Every period, each user is exactly one of: new (first-ever active period), retained (active last period too), resurrected (was dormant, came back), or churned (active last period, gone this period). Plus, for revenue: expansion and contraction.
The frozen cohort. A user's cohort = the period of their first activity, and it never changes. Cohort retention and LTV curves are only meaningful if the cohort key is immutable — which means it's a derived dimension, computed once, SCD0.
The generalized value column. The whole framework runs on one column, inc_amt — incremental value created by a user in a period. Set inc_amt = 1 and you get pure engagement growth accounting; set inc_amt = daily_revenue and the identical query produces MRR growth accounting with expansion/contraction. One model, two metrics.
The two identities that make it work:MAU(t) = retained(t) + new(t) + resurrected(t) and MAU(t−1) = retained(t) + churned(t). Subtract them: ΔMAU = new + resurrected − churned. That decomposition is the entire point — it turns "MAU is flat" into "we added 2M new + 0.3M resurrected and lost 2.3M churned," which is an actionable, alarming, completely different statement.
Section 1 — Dimensional Model
Type
Tables
Notes
Atomic activity fact
fact_activity
The grain: one row per (user, day, activity). Holds inc_amt. Netflix=viewing-session, Uber=completed-trip, Spotify=track-play. Everything else derives from here
Period rollups
agg_dau, agg_wau, agg_mau
One row per (period, user) with summed inc_amt. date_trunc rollups — materialized so the growth-accounting joins don't re-scan raw activity
User dimension
dim_user
SCD2 on plan / country / acquisition_channel. The slowly-changing attributes you'll want to slice growth accounting by
Cohort dimension (frozen)
dim_user_cohort
SCD0 — first_dt, first_week, first_month, acquisition_channel_at_signup. Computed ONCE; never updated. The immutable cohort key
Growth-accounting facts
fact_dau_growth, fact_mau_growth, fact_mrr_growth
One row per period: active, new, retained, resurrected, churned (+ expansion, contraction for MRR). Produced by the FULL OUTER self-join offset by one period
Cohort-retention fact
fact_cohort_retention
(first_period, active_period, periods_since_first, active_users, inc_amt). The long-form retention triangle a BI tool pivots into a heatmap
Cohort-LTV fact
fact_cohort_ltv
Cumulative: (first_period, periods_since_first, retained_pctg, cum_amt, cum_amt_per_user). The LTV curve per cohort
Metric-definition dimension
dim_metric_definition
SCD2 — versioned definition of "active" (session length threshold, dedup grain, period boundary). When marketing changes the bar, OLD numbers must not silently move
Section 2 — Enhanced graphical data model — Growth Accounting · Activity → State Machine → Cohort Curves
Section 3 — One model, two metrics: the inc_amt generalization
The cleverness of growth accounting is that engagement growth and revenue growth run the identical query — only the value column changes:
inc_amt = 1 → user-count growth accounting. new/retained/resurrected/churned are headcounts. This is the DAU/MAU leaky-bucket view.
inc_amt = daily_revenue → MRR growth accounting. Now the same join also produces expansion (a retained user whose inc_amt went UP) and contraction (retained, but DOWN). Revenue identities: MRR(t) = retained + new + resurrected + expansion and MRR(t−1) = retained + churned + contraction.
Company lens.Netflix — activity = a viewing session; resurrection is the win-back of a lapsed subscriber; the metric that matters is monthly, because the subscription is monthly. Uber — activity = a completed trip; resurrection is enormous and seasonal (riders lapse for months, return for an airport run); weekly cohorts expose this. Spotify — activity = a 30-second play; DAU-dominant; the critical state transition isn't churn, it's free → premium, which growth accounting models as a contraction-to-expansion flip on inc_amt.
Section 5 — Worked numerical example · the flat-MAU trap
A streaming product reports MAU of 10.0M in March and 10.1M in April — leadership calls it "stable, slight growth." Growth accounting on the April row:
Bucket
April users
Reading
retained
7.6M
active in March AND April
new
1.9M
first-ever active in April
resurrected
0.6M
lapsed, came back
MAU(April)
10.1M
= 7.6 + 1.9 + 0.6 ✓ (identity 1)
churned
−2.4M
active in March, gone in April
MAU(March)
10.0M
= 7.6 retained + 2.4 churned ✓ (identity 2)
Quick ratio = (1.9 + 0.6) / 2.4 = 1.04. The product is on a treadmill: it spent an entire month of acquisition spend to net +0.1M. The "+100K MAU" headline hides a 24% monthly churn rate. The growth-accounting decomposition turns one comfortable number into one alarming one — and points the team at retention, not acquisition. A cohort-retention heatmap confirms it: if month-1 retention is 60% and falling cohort-over-cohort, the leak is widening. The fix lives in the product, not the ad budget.
Section 4 — Stakeholder SQL
Growth analyst — the quick ratio (is the bucket leaking?)
SELECT period,
active, new, resurrected, churned,
ROUND((new + resurrected) / NULLIF(ABS(churned), 0), 2) AS quick_ratio
FROM fact_mau_growth
ORDER BY period;
-- quick_ratio > 1 ⇒ growing; ≈ 1 ⇒ treadmill; < 1 ⇒ shrinking even if MAU looks flat.
Finance — net revenue retention by cohort
SELECT first_month,
MAX(CASE WHEN months_since_first = 0 THEN cum_amt_per_user END) AS m0,
MAX(CASE WHEN months_since_first = 12 THEN cum_amt_per_user END) AS m12,
ROUND(MAX(CASE WHEN months_since_first = 12 THEN cum_amt_per_user END)
/ NULLIF(MAX(CASE WHEN months_since_first = 0 THEN cum_amt_per_user END), 0), 2) AS ltv_multiple
FROM fact_cohort_ltv
GROUP BY first_month ORDER BY first_month;
Growth marketing — does retention floor out? (heatmap source)
SELECT first_month, months_since_first,
ROUND(100.0 * active_users
/ FIRST_VALUE(active_users) OVER (PARTITION BY first_month ORDER BY months_since_first), 1) AS retention_pct
FROM fact_cohort_retention
ORDER BY first_month, months_since_first;
-- A curve that flattens (e.g. settles at 41%) means you found your retention floor —
-- those users are the real product-market fit.
Section 6 — Senior framing
Senior framing. "Growth accounting exists because COUNT(DISTINCT user_id) is a vanity metric — a flat MAU can be a stable product or a bucket leaking and refilling, and the single number cannot tell you which. The framework decomposes every period's active base into new / retained / resurrected / churned via a FULL OUTER self-join offset by one period — FULL OUTER specifically, because churned users live only on the previous-period side and any inner/left join silently reports zero churn. Two identities hold by construction and give you ΔMAU = new + resurrected − churned. The whole thing generalizes through one column, inc_amt: set it to 1 for engagement, to revenue for MRR growth accounting with expansion and contraction — same query, two metrics. Cohorts must be frozen (SCD0 on first-activity period) or retention curves become meaningless. The interview red flag is answering 'how do we measure growth?' with a MAU number and no decomposition; the senior answer names the leaky bucket, computes the quick ratio, and reaches for a cohort-retention heatmap to find the retention floor. Netflix, Uber, and Spotify all run this exact framework — they differ only in what counts as an 'activity event' and whether daily, weekly, or monthly cohorts best expose their resurrection behaviour."
Spatial computing & XR — immersive sessions, the sensor firehose, and biometric consent
Spatial/XR · Meta Quest · Apple Vision Pro
Design the data platform for a spatial-computing platform (Meta Quest, Apple Vision Pro). A fleet of headsets streams eye-, hand-, and room-sensors during immersive sessions. The model must serve Product (engagement & comfort), Trust & Privacy (biometric + spatial-map consent), and Monetization (store + attention) — without drowning in raw telemetry or mishandling the most sensitive data a consumer device has ever collected.
Why this is hard — the device records your body and your home
A phone knows where you tapped. A headset knows where your eyes went, the geometry of your living room, and how your hands move — at 60–120 Hz. Two facts make this unlike every other scenario in this catalog:
The most sensitive data classes, at firehose rates. Eye-gaze is biometric; the room mesh is a 3-D map of a private space. Both stream continuously. You cannot weld a 90 Hz pose firehose to a queryable analytics fact — the same boundary lesson as raw GPS in the ride-share trip log (#1).
Capability is not uniform. Vision Pro has eye-tracking; a Quest controller session does not; phone-AR has neither. What data even exists is a function of the device — so device capability is a dimension that gates the schema, not a footnote.
The robust answer is a device-centric star: dim_device conformed, the immersive session as the analytic grain, raw sensor streams in an object store keyed back to the session, and a temporal dim_consent that gates what may ever be read.
Section 1 — A device-centric star
The grain all three stakeholders share is the immersive session (headset on → off, within one app). Finer signals (pose, gaze) aggregate up to it; coarser context (user, device, consent) hangs off it.
dim_device is a conformed dimension. The same dim_device defined here keys ads attribution (#5), fraud velocity (#19), the engagement surface (#8 / #14 / #17), and the autonomous fleet (#22). One definition — device_type {phone, web, CTV, console, headset, vehicle, POS}, make/model, OS + runtime, app version, the advertising identifier (IDFA / GAID) and its tracking-consent state, hardware fingerprint, first/last seen, SCD2 on consent + OS — conformed across the catalog. The headset is simply its most sensor-rich member.
Conformed dim: dim_device
Column
Type
Description
device_key
PK BIGINT
Surrogate. SCD2 on consent + OS/runtime.
device_type
ENUM
{phone, web, CTV, console, headset, vehicle, POS} — the conformed type used catalog-wide.
make_model
STRING
e.g. Quest 3, Vision Pro. Drives the capability join.
sensor_suite
ARRAY
{eye, hand, face, room_mesh, passthrough…} — what data can exist.
os_runtime / app_version
STRING
Runtime + app build; SCD2.
advertising_id / ad_consent
STRING / ENUM
IDFA / GAID + limit-tracking state — the conformed marketing key.
Privacy dim: dim_consent (SCD2 — one row per user × data-category × region × version)
Dwell time; gaze_consented resolved as-of the session against dim_consent.
anchor_id / room_hash
STRING
dim_spatial_anchor: persistent AR placement keyed to a room fingerprint — never the raw mesh.
Senior framing. "The headset is the most extreme member of dim_device — so the catalog's existing disciplines all apply, just sharper. Pose is a firehose, so it lives in object storage like raw GPS. Consent is SCD2, so eye-tracking is replayable only under the permission that existed at capture — the same as-of-time locking we use for FX and risk verdicts. And capability lives on the device, so the schema a Vision Pro populates is a superset of what a controller-only Quest session can. Get those three right and spatial data stops being scary and starts being a star schema."
Section 2 — presence, safety & ad attention
Three things the session grain alone can't express — physical safety, multi-user co-presence, and consented attention — each reusing a pattern from elsewhere in the catalog.
Safety: fct_guardian_event. The "guardian" play boundary is a safety system — when a hand or head crosses it (a wall, furniture, another person), the headset warns or fades to passthrough. One row per breach: session_key, event_ts, breach_type {hand_exit, head_exit, proximity}, severity. It's the XR analog of the AV's fact_disengagement_events — a safety signal that correlates with room size and content intensity. The boundary encodes room geometry, so store the breach event, never the mesh.
Co-presence: brg_session_anchor — the convoy bridge, in 3-D. A shared-space session has many users in one room sharing one spatial anchor: a classic many-to-many. Putting N users on one session row triple-counts engagement exactly like the pooled-ride trap in #1; the fix is the same bridge — one row per (shared_space_id, anchor_id, user_session_key). Per-user metrics stay correct, and "who was co-located when" is a join, not a self-join.
Attention: fct_ad_attention — gaze-confirmed, consent-gated. In XR an ad impression isn't "rendered on screen" — eye-tracking can prove the user actually looked. One row per ad exposure: session_key, ad_id, gaze_dwell_ms, gaze_confirmed (true only above a dwell threshold), consent_ok (resolved as-of the session against dim_consent). Gaze-confirmed attention is a far stronger currency than a served impression — but it's biometric, so it's billable only where eye-tracking consent was granted at capture, and it needs a spoof check (a perfectly centered, zero-jitter gaze is as fake as the phantom-convoy GPS in #22).
Attention, gated by consent that was in force at capture time
The one query that separates a senior answer from a lawsuit: aggregate gaze dwell per surface, but include only the gaze captured while the user's eye-tracking consent was granted in their region — resolved as-of the session, not as-of now.
SELECT a.surface_id,
COUNT(*) AS gaze_hits,
ROUND(AVG(a.dwell_ms)) AS avg_dwell_ms
FROM fct_attention a
JOIN fct_xr_session s ON s.session_key = a.session_key
JOIN dim_consent c ON c.user_key = s.user_key
AND c.data_category = 'eye_tracking'
AND c.region = s.region
AND s.start_ts >= c.valid_from
AND s.start_ts < c.valid_to -- consent AS-OF capture
WHERE c.state = 'granted'
GROUP BY a.surface_id
ORDER BY gaze_hits DESC;
A user who later withdraws consent closes the granted row's valid_to and opens a withdrawn row. This query automatically stops counting their future gaze — and, run for a deletion request, the same join identifies exactly which rows must be purged. No consent state is ever overwritten, so "was this gaze allowed?" is always answerable.
Gaze-confirmed ad viewability — the consented attention rate
Of ads served, what fraction were actually looked at — counting only users who consented to eye-tracking at the time:
SELECT a.ad_id,
COUNT(*) AS impressions,
COUNT(*) FILTER (WHERE a.gaze_confirmed) AS gaze_views,
ROUND(100.0 * COUNT(*) FILTER (WHERE a.gaze_confirmed)
/ NULLIF(COUNT(*), 0), 1) AS gaze_viewability_pct
FROM fct_ad_attention a
JOIN fct_xr_session s ON s.session_key = a.session_key
JOIN dim_consent c ON c.user_key = s.user_key
AND c.data_category = 'eye_tracking'
AND s.start_ts >= c.valid_from
AND s.start_ts < c.valid_to
WHERE c.state = 'granted'
GROUP BY a.ad_id
ORDER BY gaze_viewability_pct DESC;
Served-but-never-looked-at impressions drop out of the billable metric, and non-consented users never enter it — viewability that's honest and lawful at once.
The decisions that earn the level
Per-second aggregate vs raw frames. 90 Hz × 6-DoF × head + 2 hands + 2 eyes is millions of rows per session. The fact is the per-second rollup; the raw stream is a blob URI. Inlining frames makes the table unqueryable and the storage bill absurd — same call as pre-aggregating GPS to a cell-per-minute in #2.
Consent as SCD2, not a boolean. A users.eye_tracking_ok flag answers "can I use it now?" but destroys "could I use that capture?" Temporal consent is the only design that survives a withdrawal + an audit.
Capability on the device, not the fact. Don't null-pad every session with eye columns. Join dim_device.sensor_suite to know which facts a session can even produce; absence of eye rows for a Quest is correct, not missing data.
Room mesh as a fingerprint, not geometry. Persist a hash for relocalization; keep the actual 3-D map of someone's home in encrypted, residency-pinned, short-TTL storage — out of the warehouse entirely.
What disqualifies an answer
Raw pose welded to the event log. A 1–2 kHz multi-sensor firehose in an analytics fact — unqueryable, and you've co-mingled biometric raw data with everything else.
Eye-tracking read under "current" consent. Using today's permission to justify yesterday's capture is a retroactive consent rewrite — exactly the failure the SCD2 window prevents.
Room mesh treated as telemetry. A 3-D scan of a private home is among the most sensitive data possible; storing it like an event payload is a breach waiting to happen.
Cross-app identity leakage. Attention/gaze keyed to a global user without app-scoped boundaries lets one experience profile a user from another's data.
N users on one shared-session row. Co-located multiplayer is many-to-many; collapsing it onto one session row triple-counts engagement and breaks per-user attribution — the pooled-ride trap (#1) in 3-D. Use the bridge.
Billing gaze that wasn't consented — or wasn't real. Eye-tracking ad attention is biometric: billable only under as-of-capture consent, and only after a jitter/spoof check — a perfectly still gaze is fabricated, like the phantom-convoy GPS in #22.
Worked example — a grant, then a withdrawal
User U-42 (region EU) plays an app on a Vision Pro. Eye-tracking consent: granted 10:00, withdrawn 10:40. Two sessions:
session
start_ts
gaze rows
consent as-of start
counted?
S-1
10:05
1,820
granted (10:00→10:40)
✅ yes
S-2
11:00
1,540
withdrawn (10:40→∞)
⛔ no
The attention query returns S-1's gaze only. S-2's rows still exist (the device captured them before the runtime consent flag propagated) but the dim_consent as-of join excludes them from analytics, and a deletion job targets them via the same predicate. Had consent been a single mutable boolean, withdrawing it would either have falsified S-1's history or failed to exclude S-2 — both wrong.
Drill · answer this in 90 seconds:
Design the data model for a VR/AR platform that streams eye-, hand-, and room-sensors from a mixed fleet of headsets, and must serve product engagement, ad attention, and a regulator asking "prove you only used eye-tracking the user consented to."
Senior framing. "Device-centric star: dim_device (conformed, carries the sensor suite) → fct_xr_session as the grain → pose as a per-second rollup with raw frames in object storage. The regulator's question is answered by dim_consent as SCD2 — every analytic read joins consent as-of the session, so eye-tracking is usable only under the permission that existed at capture, withdrawals are non-destructive, and deletion is one predicate. The room mesh never enters the warehouse — only a relocalization hash does. It's the same firehose, as-of-time, and conformed-dimension discipline as the rest of the catalog, applied to the most sensitive sensors yet."
Autonomous fleet — disengagements, incident replay, and the versioned "driver"
Autonomous · Waymo · Tesla · Uber AV
Design the data platform for a driverless ride-hail fleet (Waymo, Tesla robotaxi, Uber AV). The autonomy stack replaces the human driver. The model must serve Operations (dispatch & operational domain), Safety & Regulatory (disengagements & incident replay), and Finance (a mixed human + AV marketplace) — at terabytes of sensor data per vehicle per day.
Why this is hard — there is no driver to blame
Every other ride-share model (#1, #2) attributes a trip to a driver. Here the driver is software. That single fact reshapes the schema:
The "driver" is a stack of versioned dimensions. Responsibility for any action traces to a software release × HD-map version × sensor-calibration × operational design domain — each of which changes over time and must be reconstructable to the millisecond.
The regulator owns a fact table. A disengagement (autonomy hands control to a safety driver or remote operator) is the safety currency — counted, reported (e.g. CA DMV), and litigated. It is append-only and must replay against the exact versions that were driving.
The sensor log is a firehose at extreme scale. Camera + lidar + radar is terabytes/vehicle/day. None of it belongs in the warehouse; the analytics layer holds events and pointers, the raw drive-logs sit in object storage.
This scenario is the capstone: it reuses the catalog's SCD2 versioning, decision-replay, temporal geofence, firehose boundary, and conformed-dimension patterns — at safety-critical grade — and folds back into the marketplace fct_trips from #1.
Section 1 — The "driver" becomes four versioned dimensions
Two asset layers: dim_vehicles (the car) and dim_devices (its sensors). "Device" is a conformed idea across the catalog — headsets in #21, phones in #5/#19, these vehicle sensors here — even though each scenario names its own grain. This extends the dim_av_fleet hint from #1's AV-future section: software version, sensor calibration, ODD restrictions — all SCD2 because they drift.
The versioned "driver" — four SCD2 dimensions
Dimension
Grain / key columns
Why SCD2
dim_vehicles
one row per vehicle × config; VIN, platform, sensor_suite, calibration_version
Sensors re-calibrate; the as-of-trip config must be recoverable.
This is "who was driving." Every incident attributes to a release.
dim_hd_map
map_version, tile set, survey date
The map the car believed in; a stale tile can cause a fault.
dim_odd
geofence (H3 set) × weather envelope × time envelope
Operational Design Domain — where/when the AV may drive. A temporal geofence, exactly like dim_surge_zone in #2.
Component inventory: dim_devices (one row per physical sensor on a vehicle) — dim_vehicles is the car master; this is its swappable hardware. A LiDAR or camera is replaced individually, so the device grain here is the component, not the car.
Column
Type
Description
sensor_id
PK
Serial of the specific unit; SCD2 (firmware + status drift).
vehicle_key
FK
The car it's installed on — a swap closes one row and opens another.
sensor_type / manufacturer
STRING
{LiDAR, long-range camera, radar, compute} × vendor — the key to failure-profile analysis.
firmware_version / status
STRING / ENUM
Low-level build; {active, degraded, maintenance}.
Trip fact: fact_av_trips — one row per autonomous trip; FKs to the four versioned dims captured at dispatch, plus marketplace_trip_key folding it into the fct_trips hierarchy from #1 (human and AV trips reconcile to one Finance grain). For exception handling it also carries trip_status (a state machine: en_route → in_transit → completed | aborted_failsafe | rescue_dispatched), a self-referencing parent_trip_id, and fare_adjustment — see the rescue chain below.
Safety fact: fact_disengagement_events (append-only — the regulatory currency)
Column
Type
Description
disengagement_id
PK
One immutable safety event.
trip_key / vehicle_key
FK
The trip and vehicle.
release_key / map_key / odd_key
FK
Snapshotted at event time — the exact versions that were driving. Never re-pointed.
The implicated dim_devices when a component triggered it — joins a disengagement to a specific unit + manufacturer for failure-profile analysis.
reg_reportable
BOOLEAN
Whether it counts toward regulatory disengagement reporting.
Terminal failure & the rescue chain. A sensor cracks mid-trip; the car can no longer drive autonomously, so it executes a fail-safe stop and writes a fact_disengagement_events row (stamped with failed_sensor_id). The passenger's trip flips to trip_status = aborted_failsafe and ends. Dispatch detects a stranded passenger and auto-assigns a rescue vehicle: a newfact_av_trips row carrying the same request_id (same rider intent) and a parent_trip_id pointing back at the broken trip. Billing walks that parent chain, sums both legs' distance, applies a fare_adjustment (often comping the ride), and posts one clean invoice. The self-referencing chain keeps the incident, the rescue, and the billing on a single auditable thread — no patchy side-tables.
Telemetry firehose: fact_vehicle_telemetry — per-second kinematics rollup with a raw_log_uri pointer to the full multi-sensor drive-log in object storage (keyed by drive_id); raw lidar/camera frames never enter the warehouse.
Ingestion — hot path & cold path. At ~10k vehicles × 100 ms that's ≈100k events/s, sent as Protobuf (not JSON) to survive the cellular uplink. A hot path (Pub/Sub → stream processor → Redis / Bigtable) serves <2 s fleet-ops alerts and the live map; a cold path lands the same events in the warehouse for overnight safety reporting and ML. The facts above are the cold path's grain; the hot path keeps only the latest state per vehicle.
Senior framing. "Replace the driver with a stack and the whole model becomes about versioned attribution. Each fact_disengagement_events row freezes FKs to the release, map, calibration, and ODD that were live at that instant — so when the stack ships v9 tomorrow, last week's incident still attributes to v8. That's the decision-replay pattern from fraud (#19), made safety-critical. ODD is the surge-zone temporal geofence (#2). The drive-log is the GPS firehose (#1). And the trip folds into the same marketplace fact (#1) so Finance never forks human and robot. Nothing here is new — it's the catalog, at the grade where a wrong join is a subpoena."
Section 2 — bridging the app layer (so "conversion dropped" has an answer)
Everything above is the physical-robotics half. The other half is the consumer app (iOS / Google Play) — and leaving it out makes a dropped booking-conversion number unanswerable: a buggy app release, or cars failing pickups? Two dims and a clickstream fact close the gap.
App-release dim: dim_mobile_app_versions — one row per store build: platform {iOS, Android}, store_origin {App Store, Google Play, TestFlight}, semantic_version, build_number, is_critical_patch. A bad drop becomes an isolable dimension.
Clickstream fact: fact_mobile_clickstream_events — the booking funnel, partitioned by date and clustered by app_version_key:
Consumer-device context (the rider's phone — the catalog's "device" idea on the app side) + 5G / Wi-Fi / LTE.
attribution_campaign
STRING
Marketing origin — ties to the ads attribution in #5.
The booking request (fact_ride_requests — the AV analog of #1's journey grain) carries session_id + app_version_key, so lineage runs fact_mobile_clickstream_events → fact_ride_requests → fact_av_trips → fact_disengagement_events in one chain. "Conversion fell on iOS v6.14.2" and "those sessions hit 3× the fail-safes" become the same query (SQL tab) — the app-vs-fleet blind spot is closed. Billing rolls up in a finance semantic view keyed on the request, where the fail-safe zero-charge and COPQ (cost of poor quality = comped fare + recovery credit) are computed.
Section 3 — fleet operations: assistance, energy, and rollout
Three operational realities round out the fleet model — each reuses a pattern already in the catalog.
Remote assistance ≠ disengagement. Most "human-in-the-loop" moments aren't autonomy dropping out — they're the car asking a remote operator to confirm an ambiguous scene (a construction cone, a hand-waved go-ahead) while staying in control. That's a distinct, far more frequent fact: fact_remote_assists — trip_key, vehicle_key, release_key (frozen), operator_id, request_reason, resolution {confirmed, rerouted, escalated_to_disengage}, latency_ms. Folding it into fact_disengagement_events inflates the disengagement rate and hides the real ops load; the ratio of assists to disengagements is itself a maturity signal.
Energy is a dispatch constraint, not a footnote. An EV fleet can't be dispatched like gas cars. fact_charging_sessions (vehicle_key, station_key, start/end, kwh, soc_before/soc_after) plus dim_charging_station make charging downtime a first-class utilization metric — and dispatch must gate on remaining range vs the trip distance and the ODD: a low-state-of-charge car is outside its operational envelope for a long trip exactly like a weather or geofence limit.
Multi-city rollout is just SCD2 on the ODD. When a new neighborhood opens, dim_odd gets a new versioned row — trips before the expansion evaluate against the old envelope, after against the new. No new machinery: the temporal-geofence pattern (#2) doubles as the rollout ledger, and "did the expansion change our disengagement rate?" is a before/after on one key.
The capstone move — counterfactual replay. Because every fact_disengagement_events froze its version FKs and a raw_log_uri to the drive-log, you can re-simulate the logged sensor stream through a new stack and ask "would v9 have disengaged here?" Logging a fact_sim_runs (disengagement_id, candidate_release, outcome) turns the safety case from "we shipped it and watched" into "we proved the regression was gone before we shipped" — the same frozen-state discipline, run forward instead of back.
Section 4 — liability: when the data is the legal record
In a human-driven world the driver is presumptively at fault, and insurance is priced on the driver. Remove the driver and fault shifts to product liability — the operator and its software stack. That promotes the data platform from "analytics" to the system of record for legal fault, which adds three requirements the rest of the model was already quietly satisfying.
1 · The evidence must be tamper-evident. An incident's frozen version FKs, the drive-log, and the sensor bundle are exhibits — and someone will challenge whether they were altered after the fact. Seal each incident's evidence as a content-addressed bundle and store its evidence_hash on a write-once row. Append-only stops being a tidiness preference and becomes admissibility.
2 · Fault attribution is incident-replay with money attached. You already froze the exact release × map × calibration × ODD and the failed_sensor_id on every disengagement — liability is that replay, now adjudicated:
fact_fault_findings (append-only — one row per adjudication version) — incident_id, finding_version, at_fault {av, third_party, shared, undetermined}, contributing_factor {sensor, perception, planning, third_party, road}, adjudicator {internal, insurer, court}, decided_at. An appeal appends a new version; the finding that was current when a payout posted is never overwritten — the risk-decision replay pattern (#19), in a courtroom.
3 · Underwriting is a feedback loop, not a side system. Premiums (self-insured reserves or a carrier's) are priced from the fleet's MMBD, disengagement, and incident history per release and ODD — telematics underwriting. The same safety facts that prove a release is safer also lower its insurance cost; and when a vendor's component is the contributing_factor, failed_sensor_id becomes a subrogation path to recover from the manufacturer. Payouts post to the append-only financial ledger from #1 — money moves by double entry, never an UPDATE.
Section 5 — the runnable schema (DDL)
The grids above as copy-pasteable DDL (BigQuery / Snowflake dialect) — dimensions first, then the high-volume facts with their physical PARTITION BY / CLUSTER BY layout. SCD2 dims carry valid_from/valid_to; "device" shows up three ways — the car (dim_vehicles), its sensors (dim_devices), and the rider's phone on fact_mobile_clickstream_events.
Dimensions
-- Vehicle master asset (the car); SCD2
CREATE TABLE dim_vehicles (
vehicle_key BIGINT NOT NULL,
vin VARCHAR(32) NOT NULL,
platform VARCHAR(40) NOT NULL, -- 'Waymo Driver Gen 5', 'Tesla Robotaxi'
deployment_city VARCHAR(40) NOT NULL,
calibration_version VARCHAR(40) NOT NULL,
valid_from TIMESTAMP NOT NULL,
valid_to TIMESTAMP NOT NULL DEFAULT TIMESTAMP '9999-12-31 00:00:00',
PRIMARY KEY (vehicle_key)
);
-- Per-sensor hardware inventory, snowflaked off the vehicle; SCD2 (your dim_devices)
CREATE TABLE dim_devices (
sensor_id VARCHAR(50) NOT NULL,
vehicle_key BIGINT NOT NULL,
sensor_type VARCHAR(30) NOT NULL, -- 'LiDAR','Long-Range Camera','Radar','Compute'
sensor_location VARCHAR(30), -- 'Roof_Front','Bumper_Left'
manufacturer VARCHAR(50) NOT NULL,
firmware_version VARCHAR(50) NOT NULL,
status VARCHAR(20) NOT NULL, -- 'active','degraded','failed'
valid_from TIMESTAMP NOT NULL,
valid_to TIMESTAMP NOT NULL DEFAULT TIMESTAMP '9999-12-31 00:00:00',
PRIMARY KEY (sensor_id, valid_from),
FOREIGN KEY (vehicle_key) REFERENCES dim_vehicles(vehicle_key)
);
-- The "driver": the autonomy stack; SCD2
CREATE TABLE dim_software_release (
release_key BIGINT NOT NULL,
stack_version VARCHAR(40) NOT NULL, -- 'v9.0.3'
perception_model_hash VARCHAR(64) NOT NULL,
planner_version VARCHAR(40) NOT NULL,
valid_from TIMESTAMP NOT NULL,
valid_to TIMESTAMP NOT NULL DEFAULT TIMESTAMP '9999-12-31 00:00:00',
PRIMARY KEY (release_key)
);
CREATE TABLE dim_hd_map (
map_key BIGINT NOT NULL,
map_version VARCHAR(40) NOT NULL,
tile_set VARCHAR(60) NOT NULL,
survey_date DATE NOT NULL,
valid_from TIMESTAMP NOT NULL,
valid_to TIMESTAMP NOT NULL DEFAULT TIMESTAMP '9999-12-31 00:00:00',
PRIMARY KEY (map_key)
);
-- Operational Design Domain: temporal geofence; SCD2
CREATE TABLE dim_odd (
odd_key BIGINT NOT NULL,
geofence_h3 ARRAY, -- set of allowed H3 cells
weather_env VARCHAR(60) NOT NULL, -- 'clear|light_rain'
time_env VARCHAR(60) NOT NULL, -- 'always' | 'day_only'
valid_from TIMESTAMP NOT NULL,
valid_to TIMESTAMP NOT NULL DEFAULT TIMESTAMP '9999-12-31 00:00:00',
PRIMARY KEY (odd_key)
);
-- Consumer mobile-app store builds (your dim_mobile_app_versions)
CREATE TABLE dim_mobile_app_versions (
app_version_key VARCHAR(50) NOT NULL,
platform VARCHAR(10) NOT NULL, -- 'iOS','Android'
store_origin VARCHAR(30) NOT NULL, -- 'App Store','Google Play','TestFlight'
semantic_version VARCHAR(20) NOT NULL, -- 'v6.14.2'
build_number INT NOT NULL,
is_critical_patch BOOLEAN NOT NULL,
PRIMARY KEY (app_version_key)
);
CREATE TABLE dim_users (
user_id VARCHAR(50) NOT NULL,
signup_ts TIMESTAMP NOT NULL,
payment_type VARCHAR(20) NOT NULL,
is_tester BOOLEAN NOT NULL,
customer_tier VARCHAR(20) NOT NULL DEFAULT 'standard',
PRIMARY KEY (user_id)
);
Facts (high-volume tables carry their physical layout)
-- Booking-funnel clickstream (your fact_mobile_clickstream_events)
CREATE TABLE fact_mobile_clickstream_events (
event_id VARCHAR(50) NOT NULL,
session_id VARCHAR(50) NOT NULL,
user_id VARCHAR(50) NOT NULL,
app_version_key VARCHAR(50) NOT NULL,
event_ts TIMESTAMP NOT NULL,
event_name VARCHAR(40) NOT NULL, -- app_launch|destination_searched|quote_generated|booking_confirmed|app_crash
device_model VARCHAR(50), -- 'iPhone 15 Pro' (the rider's phone)
os_version VARCHAR(30),
network_type VARCHAR(10), -- '5G','Wi-Fi','LTE'
attribution_campaign VARCHAR(100),
PRIMARY KEY (event_id),
FOREIGN KEY (user_id) REFERENCES dim_users(user_id),
FOREIGN KEY (app_version_key) REFERENCES dim_mobile_app_versions(app_version_key)
)
PARTITION BY DATE(event_ts)
CLUSTER BY app_version_key, event_name;
-- Booking request (AV analog of #1's journey grain) (your fact_ride_requests)
CREATE TABLE fact_ride_requests (
request_id VARCHAR(50) NOT NULL,
session_id VARCHAR(50) NOT NULL, -- lineage back to fact_mobile_clickstream_events
user_id VARCHAR(50) NOT NULL,
app_version_key VARCHAR(50) NOT NULL, -- isolates a buggy build
request_ts TIMESTAMP NOT NULL,
pickup_h3 VARCHAR(15) NOT NULL,
dropoff_h3 VARCHAR(15) NOT NULL,
estimated_fare DECIMAL(8,2) NOT NULL,
request_status VARCHAR(20) NOT NULL, -- fulfilled|no_car|user_cancelled
PRIMARY KEY (request_id),
FOREIGN KEY (user_id) REFERENCES dim_users(user_id),
FOREIGN KEY (app_version_key) REFERENCES dim_mobile_app_versions(app_version_key)
)
PARTITION BY DATE(request_ts)
CLUSTER BY app_version_key, request_status;
-- Autonomous trip; version FKs frozen at dispatch; rescue chain (your fact_trips)
CREATE TABLE fact_av_trips (
trip_key BIGINT NOT NULL,
request_id VARCHAR(50) NOT NULL, -- shared across rescue legs
marketplace_trip_key BIGINT, -- folds into fct_trips (#1)
vehicle_key BIGINT NOT NULL,
release_key BIGINT NOT NULL, -- frozen
map_key BIGINT NOT NULL, -- frozen
odd_key BIGINT NOT NULL, -- frozen
user_id VARCHAR(50) NOT NULL,
start_ts TIMESTAMP,
end_ts TIMESTAMP,
autonomous_miles DECIMAL(8,2) DEFAULT 0,
actual_fare DECIMAL(8,2) DEFAULT 0,
fare_adjustment DECIMAL(8,2) DEFAULT 0,
trip_status VARCHAR(30) NOT NULL, -- en_route|in_transit|completed|aborted_failsafe|rescue_dispatched
parent_trip_id BIGINT, -- self-FK: rescue -> broken trip
PRIMARY KEY (trip_key),
FOREIGN KEY (request_id) REFERENCES fact_ride_requests(request_id),
FOREIGN KEY (vehicle_key) REFERENCES dim_vehicles(vehicle_key),
FOREIGN KEY (release_key) REFERENCES dim_software_release(release_key),
FOREIGN KEY (parent_trip_id) REFERENCES fact_av_trips(trip_key)
)
PARTITION BY DATE(start_ts)
CLUSTER BY vehicle_key, trip_status;
-- Telemetry firehose: per-second rollup; raw frames in object store (your fact_vehicle_telemetry)
CREATE TABLE fact_vehicle_telemetry (
vehicle_key BIGINT NOT NULL,
event_ts TIMESTAMP NOT NULL, -- vehicle clock (event-time)
sec_bucket TIMESTAMP NOT NULL,
h3_index VARCHAR(15) NOT NULL,
speed_mph INT,
soc_pct INT, -- state of charge
autonomy_mode VARCHAR(20) NOT NULL, -- fully_autonomous|remote_guidance|manual
active_trip_key BIGINT, -- NULL when cruising empty
raw_log_uri VARCHAR(255), -- pointer to full multi-sensor drive-log
FOREIGN KEY (vehicle_key) REFERENCES dim_vehicles(vehicle_key)
)
PARTITION BY DATE(event_ts)
CLUSTER BY vehicle_key, autonomy_mode;
-- Disengagement: append-only, frozen version FKs (your fact_disengagement_events)
CREATE TABLE fact_disengagement_events (
disengagement_id VARCHAR(50) NOT NULL,
trip_key BIGINT, -- NULL if no passenger
vehicle_key BIGINT NOT NULL,
release_key BIGINT NOT NULL, -- frozen
map_key BIGINT NOT NULL, -- frozen
odd_key BIGINT NOT NULL, -- frozen
failed_sensor_id VARCHAR(50), -- -> dim_devices (root-cause / subrogation)
event_ts TIMESTAMP NOT NULL,
location_h3 VARCHAR(15) NOT NULL,
type VARCHAR(30) NOT NULL, -- safety_driver_takeover|remote_operator|system_fault|hard_brake
reg_reportable BOOLEAN NOT NULL,
raw_log_uri VARCHAR(255),
PRIMARY KEY (disengagement_id),
FOREIGN KEY (vehicle_key) REFERENCES dim_vehicles(vehicle_key),
FOREIGN KEY (release_key) REFERENCES dim_software_release(release_key),
FOREIGN KEY (failed_sensor_id) REFERENCES dim_devices(sensor_id)
)
PARTITION BY DATE(event_ts)
CLUSTER BY vehicle_key, type;
-- Liability: collision/claim + append-only fault adjudications
CREATE TABLE fact_incidents (
incident_id VARCHAR(50) NOT NULL,
disengagement_id VARCHAR(50),
trip_key BIGINT,
vehicle_key BIGINT NOT NULL,
failed_sensor_id VARCHAR(50),
counterparty_type VARCHAR(20), -- vehicle|pedestrian|cyclist|property
severity VARCHAR(20),
claim_amount_usd DECIMAL(12,2) DEFAULT 0,
evidence_hash VARCHAR(64) NOT NULL, -- content hash of sealed evidence (write-once)
current_finding_version INT NOT NULL DEFAULT 1,
event_ts TIMESTAMP NOT NULL,
PRIMARY KEY (incident_id),
FOREIGN KEY (disengagement_id) REFERENCES fact_disengagement_events(disengagement_id)
);
CREATE TABLE fact_fault_findings (
incident_id VARCHAR(50) NOT NULL,
finding_version INT NOT NULL, -- append-only; appeals add a version
at_fault VARCHAR(20) NOT NULL, -- av|third_party|shared|undetermined
contributing_factor VARCHAR(20), -- sensor|perception|planning|third_party|road
adjudicator VARCHAR(20) NOT NULL, -- internal|insurer|court
decided_at TIMESTAMP NOT NULL,
PRIMARY KEY (incident_id, finding_version),
FOREIGN KEY (incident_id) REFERENCES fact_incidents(incident_id)
);
Incident replay — attribute every disengagement to the exact state that was driving
The regulator's question is "what was in control when this happened?" Because each fact_disengagement_events row froze the version FKs at event time, the answer is a plain join — and it stays correct forever, even after the stack, map, and calibration all advance.
SELECT d.disengagement_id, d.ts, d.type, d.location_h3,
r.stack_version,
m.map_version,
v.calibration_version,
o.odd_id
FROM fact_disengagement_events d
JOIN dim_vehicles v ON v.vehicle_key = d.vehicle_key
JOIN dim_software_release r ON r.release_key = d.release_key
JOIN dim_hd_map m ON m.map_key = d.map_key
JOIN dim_odd o ON o.odd_key = d.odd_key
WHERE d.reg_reportable = TRUE
AND d.ts >= DATE '2026-01-01'
ORDER BY d.ts;
-- MMBD — Mean Miles Between Disengagements — by release (the safety regression check)
SELECT r.stack_version,
COUNT(*) AS disengagements,
SUM(t.autonomous_miles) AS miles,
ROUND(SUM(t.autonomous_miles) / NULLIF(COUNT(*),0), 1) AS mmbd
FROM fact_disengagement_events d
JOIN dim_software_release r ON r.release_key = d.release_key
JOIN fact_av_trips t ON t.trip_key = d.trip_key
WHERE d.reg_reportable = TRUE
GROUP BY r.stack_version
ORDER BY mmbd DESC;
Shipping a new release never rewrites the first query (FKs are frozen), and the second turns "is v9 safer than v8?" into a one-line regression on MMBD — Mean Miles Between Disengagements, the north-star metric every AV program (and the CA DMV report) lives or dies by.
Rescue chain — how long was a stranded rider waiting?
Because the rescue trip carries parent_trip_id, "who hit a vehicle failure today, and how long until rescue?" is a clean self-join rather than a forensic reconstruction:
WITH failed AS (
SELECT trip_key, user_id, vehicle_key AS broken_vehicle, end_ts AS failure_time
FROM fact_av_trips
WHERE trip_status = 'aborted_failsafe' AND DATE(start_ts) = CURRENT_DATE
),
rescue AS (
SELECT parent_trip_id, vehicle_key AS rescue_vehicle, start_ts AS rescue_pickup
FROM fact_av_trips
WHERE parent_trip_id IS NOT NULL AND DATE(start_ts) = CURRENT_DATE
)
SELECT f.user_id, f.broken_vehicle, r.rescue_vehicle,
TIMESTAMP_DIFF(r.rescue_pickup, f.failure_time, MINUTE) AS minutes_stranded
FROM failed f
JOIN rescue r ON r.parent_trip_id = f.trip_key
ORDER BY minutes_stranded DESC;
Cross-ecosystem root-cause — app bug or fleet failure?
The query the app-vs-fleet bridge exists for: booking-conversion by app release, cross-referenced with the physical fail-safes those same sessions witnessed.
SELECT v.platform, v.semantic_version,
COUNT(DISTINCT c.session_id) AS app_sessions,
COUNT(DISTINCT r.request_id) AS ride_requests,
SAFE_DIVIDE(COUNT(DISTINCT r.request_id),
COUNT(DISTINCT c.session_id)) * 100 AS booking_conversion_pct,
COUNT(DISTINCT CASE WHEN t.trip_status = 'aborted_failsafe'
THEN t.trip_key END) AS failsafes_witnessed
FROM fact_mobile_clickstream_events c
JOIN dim_mobile_app_versions v ON v.app_version_key = c.app_version_key
LEFT JOIN fact_ride_requests r ON r.session_id = c.session_id
LEFT JOIN fact_av_trips t ON t.request_id = r.request_id
WHERE c.event_ts >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY v.platform, v.semantic_version
ORDER BY booking_conversion_pct ASC;
Low conversion with near-zero failsafes_witnessed points at the app build; low conversion with elevated fail-safes points at the fleet. One query, two ecosystems, root cause isolated.
Assist-to-disengagement ratio — a fleet-maturity signal
SELECT r.stack_version,
COUNT(DISTINCT a.assist_id) AS remote_assists,
COUNT(DISTINCT d.disengagement_id) AS disengagements,
ROUND(COUNT(DISTINCT a.assist_id)
/ NULLIF(COUNT(DISTINCT d.disengagement_id), 0), 1) AS assists_per_disengage
FROM dim_software_release r
LEFT JOIN fact_remote_assists a ON a.release_key = r.release_key
LEFT JOIN fact_disengagement_events d ON d.release_key = r.release_key
GROUP BY r.stack_version
ORDER BY assists_per_disengage DESC;
A rising ratio means the car increasingly asks for help instead of ceding control — maturity. Conflate the two facts into one and this signal vanishes entirely.
Subrogation — which sensor vendor's failures are costing us?
When a component caused the incident, the operator recovers from the manufacturer. failed_sensor_id → dim_devices turns liability into a vendor-accountability ledger:
SELECT s.manufacturer, s.sensor_type,
COUNT(DISTINCT i.incident_id) AS incidents,
SUM(i.claim_amount_usd) AS recoverable_exposure_usd
FROM fact_incidents i
JOIN fact_fault_findings f ON f.incident_id = i.incident_id
AND f.finding_version = i.current_finding_version
JOIN dim_devices s ON s.sensor_id = i.failed_sensor_id
WHERE f.contributing_factor = 'sensor'
AND f.at_fault IN ('av', 'shared')
GROUP BY s.manufacturer, s.sensor_type
ORDER BY recoverable_exposure_usd DESC;
The same failed_sensor_id that explained a disengagement now sizes a vendor recovery — one column, three uses: root-cause, failure-profile, and subrogation.
The decisions that earn the level
Frozen version FKs on the event, not a live lookup. Snapshot release/map/calibration/ODD onto fact_disengagement_events at write time. Resolving "current" versions at query time would re-attribute old incidents to today's stack — destroying the regulatory record.
ODD as a temporal geofence. "Where may the AV drive?" changes with construction, weather, and policy. Model it as SCD2 (the dim_surge_zone pattern), so a trip is evaluated against the ODD that was in force, not today's.
Disengagement is append-only. Re-classifying a takeover after review appends a new adjudication row; the original verdict is never UPDATEd — same audit discipline as the risk ledger in #1/#19.
One marketplace fact for mixed fleets. Fold fact_av_trips into the fct_trips hierarchy so a city dispatching humans and robots reports revenue once. The grain was built platform-agnostic in #1 for exactly this.
What disqualifies an answer
A mutable "current stack version" column. If the vehicle row is overwritten on every OTA update, you can never answer "which release was driving last Tuesday?" — the entire safety case collapses.
Raw drive-logs in the warehouse. Terabytes/vehicle/day of lidar and video — the warehouse holds the per-second rollup and a pointer; the frames live in object storage keyed by drive_id.
ODD as a static table. A non-temporal geofence means you evaluate yesterday's trip against today's operational boundary — the same restate-ability bug as locking FX at the wrong time.
Forking human and AV revenue. Two parallel trip facts force Finance to reconcile by hand and double-count a mixed-fleet market. Fold both into one grain.
The tunnel problem (late, out-of-order data). A car buffers 15 minutes in a cellular dead zone, then dumps it on reconnect. Process on event-time, not processing-time, with a bounded allowed-lateness window — otherwise MMBD and real-time alerts are computed against arrival order and are simply wrong. Same late-arriving discipline as #5 / #15 / #17.
Schema evolution as hardware changes. A new thermal sensor or swapped camera alters the payload. Land experimental signals in a semi-structured / VARIANT column behind a schema registry, so a sensor swap never breaks the pipeline or forces a fact migration.
Geographic data skew. A busy SF intersection emits 10× the events and disengagements of a Phoenix freeway. Partition by event_date and cluster by geo / vehicle_id, or the hot cells throttle every query.
Worked example — an incident survives a stack upgrade
Vehicle AV-7 has a remote_operator disengagement at 2026-05-02 14:03 while running stack v8.2, map 2026.4, calib C-19. The next day the fleet upgrades to v9.0 (a new SCD2 release row).
query run on…
disengagement_id
stack_version returned
May 2 (v8.2 live)
D-551
v8.2
May 10 (v9.0 live)
D-551
v8.2 — unchanged
Because fact_disengagement_events.release_key was frozen at event time, the incident remains attributed to v8.2 forever. A live lookup of "AV-7's current release" would now wrongly blame v9.0 — falsifying the regulatory record and the v8-vs-v9 safety comparison. Frozen FKs make replay deterministic.
Drill · answer this in 90 seconds:
Design the data model for a driverless robotaxi fleet that must let a regulator replay any safety incident against the exact software, map, and sensor state that was driving — while ingesting terabytes of sensor data per vehicle per day and sharing a marketplace with human drivers.
Senior framing. "The driver becomes four SCD2 dimensions — dim_software_release (who was driving), dim_hd_map, dim_vehicles (sensor calibration), and dim_odd (a temporal geofence). fact_disengagement_events is append-only and freezes FKs to those four at event time, so incident replay is a deterministic join that survives every future upgrade — decision-replay (#19) at safety grade. Raw drive-logs stay in object storage with a per-second rollup and pointer in the warehouse (the GPS-firehose boundary from #1). And fact_av_trips folds into the marketplace fct_trips hierarchy so human and robot revenue reconcile on one grain. The car is dim_vehicles and its sensors are dim_devices. It's the whole catalog, at the grade where a wrong join is a subpoena."
Recurring events, time zones & free/busy
Google Calendar · productivity · recurrence-as-rule + occurrence-as-fact
Design the analytical model behind Google Calendar: recurring events with per-instance exceptions, IANA time zones with daylight-saving correctness, invitations with accept / decline / tentative, and "find a time" free/busy across many attendees — with point-in-time queryability ("what did my Tuesday look like as Calendar showed it last Monday?").
Why this is hard — the design tension
A recurring event is one logical row that denotes an unbounded set of occurrences — "every weekday, forever" is a single rule and an infinite series. Materialize occurrences as rows and you sign up for a horizon plus a forever-running backfill; expand the rule on every read and you recompute a deterministic answer, with a FREQ=MINUTELY rule waiting to blow up a query. The senior answer mirrors Airbnb's calendar-as-fact: keep the rule in an SCD2 series dimension, materialize an occurrence fact to a rolling horizon, and overlay exceptions as an append-only ledger keyed by recurrence_id. Time is the second landmine: a 09:00 stand-up must stay 09:00 across a DST change, so the rule lives in local wall-clock + IANA zone and the UTC instant is derived per occurrence — never frozen.
Section 1 — Dimensional Model Overview
Fact Tables
fct_event_occurrences — The "Occurrence" grain. One row per (event × occurrence_start), materialized from the recurrence rule out to a rolling horizon. The hot read path for every calendar view.
fct_event_exceptions — The "Override" grain. One append-only row per (series × recurrence_id) cancel / move / edit. Drives recurrence health.
fct_invitations — The "RSVP" grain. One row per (occurrence × attendee), with response state and response latency. Append-only response changes.
fct_freebusy_slots — The "Availability" grain. One row per (calendar × 15-min slot). The bitmap-as-fact behind "find a time".
Dimension Tables
dim_events (SCD2 — the recurrence master: title, RRULE, organizer drift)
fct_event_exceptions — append-only: series_sk, recurrence_id (original wall-clock start), exception_type ∈ {cancel, move, edit}, new_start_utc, overridden_cols, created_ts. A "move" carries its own start; a "cancel" tombstones one occurrence; an "edit" records which columns the user pinned.
Section 3 — How the model serves each stakeholder
User · calendar view — Reads fct_event_occurrences for the visible window WHERE NOT is_cancelled; the override row already carries the moved time, so no rule re-expansion at read time.
Organizer · response rollup — fct_invitations aggregated to (event, response) gives the accept / decline / tentative tally, with response latency for nudges.
Scheduler & rooms · find-a-time — fct_freebusy_slots intersected across attendees returns common open slots. Rooms get a synchronous double-book check against real occurrences at booking time.
Workspace admin · meeting load — Busy minutes per user per day from fct_freebusy_slots surface focus-time erosion and meeting-heavy orgs.
Section 4 — Why occurrence-as-fact + a local-time rule
Without materialization: every calendar open re-expands every RRULE — deterministic work, redone forever, and exposed to expansion bombs.
With a rolling horizon (12 months): a calendar view is a range scan on fct_event_occurrences; a change-stream job re-materializes only the touched series. Queries beyond the horizon fall back to on-demand expansion (rare).
DST correctness: store occurrence_start_local + tz_id; derive occurrence_start_utc via dim_timezone. 09:00 Pacific is 16:00Z in summer and 17:00Z in winter — automatically, because UTC is never frozen.
Exceptions append-only: a cancel or move never UPDATEs the series in place; it writes an fct_event_exceptions row and flips the occurrence's flags. Full audit trail; recurrence health is a GROUP BY series.
Enhanced graphical data model — Google Calendar
Section 7 — SQL analysis
Sample data — one weekly stand-up series expanded to 5 occurrences (wk3 cancelled, wk4 moved Tue→Wed), 3 attendees, one day of free/busy
-- the recurrence master (SCD2 series dimension)
INSERT INTO dim_events VALUES
(501,'EVT_STANDUP','Daily Stand-up','FREQ=WEEKLY;BYDAY=TU','America/Los_Angeles',9001,TRUE);
-- occurrences materialized from the rule; 09:00 PDT -> 16:00Z (summer)
INSERT INTO fct_event_occurrences VALUES
-- event,cal, occ_start_local, occ_start_utc, occ_end_utc, tz, recurrence_id, is_override,is_cancelled,transparency
(501,77,'2026-06-09 09:00','2026-06-09 16:00+00','2026-06-09 16:15+00','PT','2026-06-09 09:00',FALSE,FALSE,'opaque'),
(501,77,'2026-06-16 09:00','2026-06-16 16:00+00','2026-06-16 16:15+00','PT','2026-06-16 09:00',FALSE,FALSE,'opaque'),
(501,77,'2026-06-23 09:00', NULL, NULL, 'PT','2026-06-23 09:00',TRUE, TRUE, 'opaque'), -- cancelled
(501,77,'2026-07-01 14:00','2026-07-01 21:00+00','2026-07-01 21:15+00','PT','2026-06-30 09:00',TRUE, FALSE,'opaque'), -- moved from Tue 6/30
(501,77,'2026-07-07 09:00','2026-07-07 16:00+00','2026-07-07 16:15+00','PT','2026-07-07 09:00',FALSE,FALSE,'opaque');
-- the append-only exception ledger that produced the two odd rows above
INSERT INTO fct_event_exceptions VALUES
(501,'2026-06-23 09:00','cancel', NULL, NULL, '2026-06-20 11:00'),
(501,'2026-06-30 09:00','move', '2026-07-01 21:00+00','{start}', '2026-06-28 09:30');
-- per-occurrence RSVPs for the MOVED instance (someone declines just that one)
INSERT INTO fct_invitations VALUES
(504,9001,'accepted', '2026-06-28 09:31', 1, TRUE), -- organizer
(504,9002,'accepted', '2026-06-28 12:00', 149,FALSE),
(504,9003,'declined', '2026-06-29 08:00', NULL,FALSE); -- Chen can't do Wed
-- one UTC day of 15-min free/busy for 3 attendee calendars (08:00-14:00 PT shown)
INSERT INTO fct_freebusy_slots VALUES -- (calendar_sk, slot_start_utc, is_busy)
(77,'2026-07-01 16:00+00',1),(77,'2026-07-01 16:15+00',1), -- Alice 09:00-10:30
-- ... Bren cal 78, Chen cal 79 populated similarly ...
(78,'2026-07-01 15:30+00',1),(79,'2026-07-01 17:00+00',1);
A. User · calendar view (exceptions already applied)
Q1 — My stand-up occurrences over the next month, cancellations dropped, moves honored
SELECT o.recurrence_id AS original_slot,
o.occurrence_start_local AS shows_at_local,
o.occurrence_start_utc AS instant_utc,
CASE WHEN o.is_override THEN 'moved' ELSE 'series' END AS kind
FROM fct_event_occurrences o
WHERE o.event_sk = 501
AND o.is_cancelled = FALSE
AND o.occurrence_start_local >= '2026-06-09'
AND o.occurrence_start_local < '2026-07-09'
ORDER BY o.occurrence_start_local;
-- 4 rows: 6/9, 6/16, (6/23 dropped), 7/1 14:00 'moved', 7/7. No RRULE re-expansion.
B. Scheduler · free/busy find-a-time
Q2 — First 60-min slot on Jul 1 where Alice (77), Bren (78) and Chen (79) are all free
WITH combined AS (
SELECT slot_start_utc, SUM(is_busy) AS busy_heads
FROM fct_freebusy_slots
WHERE calendar_sk IN (77,78,79)
AND slot_start_utc >= '2026-07-01 15:00+00' -- 08:00 PT
AND slot_start_utc < '2026-07-01 21:00+00' -- 14:00 PT
GROUP BY slot_start_utc
),
runs AS ( -- a free slot, and how many free slots have run up to it
SELECT slot_start_utc,
COUNT(*) FILTER (WHERE busy_heads > 0)
OVER (ORDER BY slot_start_utc) AS busy_group
FROM combined WHERE busy_heads = 0
)
SELECT MIN(slot_start_utc) AS open_from,
MIN(slot_start_utc) + INTERVAL '60 min' AS open_to
FROM runs
GROUP BY busy_group
HAVING COUNT(*) >= 4 -- 4 x 15-min = 60 min, all heads free
ORDER BY open_from
LIMIT 1; -- -> 18:00Z = 11:00 PT
C. Organizer · RSVP funnel + recurrence health
Q3 — Response mix for the moved instance, and exception load per series
-- RSVP funnel for one occurrence
SELECT response, COUNT(*) AS heads,
ROUND(AVG(latency_min)) AS avg_latency_min
FROM fct_invitations
WHERE occurrence_sk = 504
GROUP BY response; -- accepted 2, declined 1
-- recurrence health: which series churn their occurrences?
SELECT series_sk,
COUNT(*) AS exceptions,
COUNT(*) FILTER (WHERE exception_type = 'cancel') AS cancels,
COUNT(*) FILTER (WHERE exception_type = 'move') AS moves
FROM fct_event_exceptions
GROUP BY series_sk
HAVING COUNT(*) >= 2
ORDER BY exceptions DESC; -- a series with many cancels is a 'zombie' to nudge
Section 8 — Why this works
Occurrence-as-fact + rolling horizon = calendar views are range scans, not RRULE re-expansions, and they survive open-ended series.
Local time + dim_timezone = DST is correct by construction; UTC is derived, never frozen, so a clock change can't silently move a million stand-ups.
Append-only exceptions keyed by recurrence_id = cancels and moves carry an audit trail, and per-instance RSVPs fall out for free.
Free/busy as a 15-min slot fact = "find a time" is a bitwise OR across attendees, advisory and microsecond-cheap; rooms get a synchronous check where double-booking is unacceptable.
Senior framing. "Recurrence is the unlock and the trap: one row means infinite occurrences, so I keep the RRULE in an SCD2 series dim and materialize occurrence-as-fact to a rolling horizon — the calendar analogue of Airbnb's calendar-as-fact. The non-obvious correctness move is storing local wall-clock plus the IANA zone and deriving UTC per occurrence; freeze UTC and the next DST change moves every recurring meeting an hour. Exceptions are an append-only ledger keyed by recurrence_id, and free/busy is a 15-minute slot fact you OR across attendees — that's what makes 'find a time' cheap enough for AI agents to poll."
Worked example — the stand-up that skipped a week, then moved
Series EVT_STANDUP = FREQ=WEEKLY;BYDAY=TU in America/Los_Angeles, 09:00–09:15. Expanded across June–July it yields Tuesdays 6/9, 6/16, 6/23, 6/30, 7/7.
Each occurrence's UTC is derived: it's summer (PDT, UTC−7), so 09:00 local = 16:00Z. In winter the same series would land at 17:00Z — no row rewrite needed.
Week 3 (6/23) is cancelled: one fct_event_exceptions row (type=cancel) and the occurrence flips is_cancelled=TRUE with NULL UTC. Q1 drops it.
Week 4 (6/30) is moved to Wed 7/1 14:00: an exception (type=move, recurrence_id=2026-06-30 09:00) and an override occurrence at 21:00Z (14:00 PDT). The override carries its own attendees, so Chen declines just that instance while staying on the series.
Rescheduling the moved instance, the planner asks free/busy for 3 attendees on 7/1: Alice busy 09:00–10:30 & 12:00–13:00, Bren 08:30–09:30 & 13:00–14:00, Chen 10:00–11:00. OR the slot masks → the first all-free 60-min run is 11:00–12:00 PT (18:00Z). Q2 returns exactly that.
Recurrence health (Q3) shows series 501 with 2 exceptions (1 cancel, 1 move) — healthy. A series with cancels every week is a "zombie" the workspace nudges to delete.
Drill · answer this in 90 seconds:
Design the model behind Google Calendar: recurring events with per-instance exceptions, IANA time zones with DST correctness, accept/decline/tentative invitations, and free/busy "find a time" across many attendees — with point-in-time queryability.
Senior framing. "One row means infinite occurrences, so I keep the RRULE in an SCD2 series dim and materialize occurrence-as-fact to a rolling horizon — Airbnb's calendar-as-fact, applied to scheduling. I store local wall-clock + IANA zone and derive UTC per occurrence so DST can't move a million stand-ups. Exceptions are an append-only ledger keyed by recurrence_id (per-instance RSVPs fall out for free), and free/busy is a 15-minute slot fact I OR across attendees so 'find a time' is microsecond-cheap."