Every ride-share schema attributes a trip to a driver. Here the driver is software — and that single fact rewrites the model into a question about versioned responsibility, terabytes of sensor data per vehicle per day, and a regulator who owns one of your fact tables. A complete working through: the four versioned dimensions that replace the driver, the append-only disengagement with frozen version keys, the deterministic incident replay, the temporal geofence, the rescue chain, and the liability ledger where a wrong join is a subpoena.
This is the capstone of the marketplace family, and it is hard for one reason that reshapes everything: the entity every other ride-share model leans on — the driver — has been deleted and replaced by a release pipeline.
"Design the data platform for a driverless ride-hail fleet — Waymo, a Tesla robotaxi, Uber AV. The autonomy stack replaces the human driver. Serve Operations, Safety & Regulatory, and Finance — at terabytes of sensor data per vehicle per day. How would you scope this out?"
Every human ride-share model attributes a trip to a driver, prices insurance on that driver, and presumes them at fault when something goes wrong. Remove the driver and all three assumptions collapse at once. Responsibility for any action now traces to a software release crossed with an HD-map version, a sensor calibration, and an operational design domain — each of which changes over time and must be reconstructable to the millisecond, because a regulator will ask, a court will ask, and "the current version" is the wrong answer to both. The driver did not vanish; it became a stack of versioned dimensions, and the entire schema is organized around making that stack replayable.
A weak answer models the AV like a human driver with a robot in the seat. A strong answer notices the three structural forces first and names them before any boxes:
Scope first, because the safety boundary is the design. State what you are building: a fleet data platform whose spine is versioned attribution — every safety-relevant event freezes the versions that produced it — folding into a shared marketplace trip fact so Finance never forks human and robot. State what you are deliberately treating as a callable service: the perception and planning stacks themselves (the platform records which stack ran, not how it thinks), the maps survey pipeline, and real-time motion control. State what you punt: payments rails (treated as a downstream double-entry ledger), the consumer rendering of the app, and rider identity beyond the trip.
Then the envelope, volunteered. A fleet at city scale:
| Quantity | Estimate | Consequence |
|---|---|---|
| Vehicles in fleet | ~10,000 | Sets the telemetry fan-in |
| Telemetry cadence | 100 ms | The event-time tempo |
| Telemetry events | ≈ 100 K / s | Hot path vs cold path — the number that splits the pipeline |
| Raw sensor data | terabytes / vehicle / day | Never in the warehouse — drive-log + pointer only |
| Disengagements / 1k miles | ~0.1–1 | The MMBD denominator; the regulatory north star |
| Version dims (the "driver") | 4 × SCD2 | Frozen onto every safety event at write time |
| Cellular dead-zone buffer | up to 15 min | Late, out-of-order data — event-time or be wrong |
Notice the asymmetry: a hundred thousand telemetry events a second is a modest stream, but the raw sensor data behind each vehicle is terabytes a day — three orders of magnitude apart. That gap is the whole architecture. The warehouse holds the small, structured stream and a pointer; the firehose stays in object storage. And the single most consequential design choice — frozen version keys on the safety event — is invisible in the numbers and decisive in court.
One architecture, two paths and one spine. A hot path serves sub-two-second fleet-ops alerts; a cold path lands the same events for overnight safety reporting; and the spine is the versioned-dimension join that every safety event freezes at write time, so replay is deterministic forever.
Three properties of this picture do most of the interview's work. First, the firehose never touches the warehouse — raw lidar and camera flow vehicle → object store, and only a per-second rollup plus a raw_log_uri pointer reaches the analytic layer. Second, the version dimensions are frozen onto the safety event at write time, not looked up at query time: the dashed arrow into the disengagement fact is a one-time snapshot, so when the stack ships v9 tomorrow, last week's incident still attributes to v8. Third, ingestion is event-time with bounded allowed lateness, because a vehicle buffers fifteen minutes in a cellular dead zone and dumps it on reconnect — process on arrival order and your safety metrics are computed against the network, not the world.
Freeze the versions on the event; never resolve "current" at query time. A safety event is a photograph of a moment — which release was driving, which map it believed in, which sensors it trusted, where it was allowed to be. Snapshot all four onto the event the instant it is written and the photograph is permanent: the regulatory record cannot be falsified by tomorrow's deployment, the v8-vs-v9 comparison stays honest, and incident replay is a plain join that survives every future upgrade. Resolve "current" instead and you re-attribute history to whatever is running now — which is how a safety case becomes a liability.
The schema falls out of the versioned-driver insight. Four SCD2 dimensions replace the human; an append-only disengagement fact freezes foreign keys to all four; a telemetry rollup keeps the firehose at arm's length; and the trip fact folds into the shared marketplace grain so Finance never forks human and robot.
"Who was driving" is no longer a person — it is the join of four things that drift: the autonomy release, the HD-map it believed in, the vehicle's sensor calibration, and the operational design domain that says where and when it may drive. Each is SCD2 because each changes, and the as-of-trip configuration must be recoverable. The ODD is a temporal geofence — exactly the surge-zone pattern — so a multi-city rollout is just a new versioned row.
The trip fact freezes the four version keys at dispatch and folds into the marketplace grain through marketplace_trip_key, so a city dispatching humans and robots reports revenue once. It also carries the exception machinery — a status state machine, a self-referencing parent_trip_id, and a fare_adjustment — for the rescue chain. The telemetry fact is a per-second rollup with a pointer; the raw frames never arrive.
This is the table the regulator owns, and it has two non-negotiable properties: it is append-only, and it freezes the version keys at event time. Re-classifying a takeover after review appends a new adjudication row elsewhere; the original event is never updated. The failed_sensor_id column does triple duty — root cause, failure-profile analysis, and, in §08, subrogation against the manufacturer.
Two more facts complete the picture without inflating the headline. Remote assistance is not a disengagement: most human-in-the-loop moments are the car asking a remote operator to confirm an ambiguous scene while staying in control, so they get their own far-more-frequent fact_remote_assists — folding them into disengagements would inflate the safety rate and hide the real ops load. And the app layer gets a clickstream fact keyed on app version, so a dropped booking-conversion number has an answer: a buggy build, or cars failing pickups.
The correctness of this entire platform lives in one discipline: the safety event freezes its version foreign keys at write time, so incident replay is a deterministic join that gives the same answer forever — even after the stack, the map, and the calibration have all advanced.
A trip is a state machine, and the exception edges are where the schema earns its grade. A sensor cracks mid-trip; the car can no longer drive autonomously, executes a fail-safe stop, and writes a disengagement row stamped with the failed sensor. The passenger's trip flips to a terminal failure state; dispatch auto-assigns a rescue:
The rescue is a new trip row carrying the same request_id (same rider intent) and a parent_trip_id pointing back at the broken leg. Billing walks that parent chain, sums both legs' distance, applies a fare_adjustment — usually comping the ride — and posts one clean invoice. The self-referencing chain keeps the incident, the rescue, and the billing on a single auditable thread, with no patchy side-tables.
Here is the atomic heart of the platform: the regulator's question — "what was in control when this happened?" — answered as a plain join, correct forever because the keys were frozen. Beside it, the north-star regression: Mean Miles Between Disengagements by release.
The worked case makes the invariant concrete. Vehicle AV-7 has a remote_operator disengagement on May 2 while running stack v8.2, map 2026.4, calibration C-19. The next day the fleet upgrades to v9.0 — a new SCD2 release row. Run the replay query on May 2 and it returns v8.2; run it on May 10, with v9.0 live, and it still returns v8.2, because the release key was frozen at event time. A live lookup of "AV-7's current release" would now wrongly blame v9.0 — falsifying both the regulatory record and the v8-vs-v9 comparison. Frozen keys make replay deterministic; live lookups make it a fiction.
Three programs carry the platform: the telemetry consumer that handles late, out-of-order data on event-time, the disengagement writer that freezes the version keys, and the rescue dispatcher that threads a broken trip to its rescue. Each is small; the judgment is in what they refuse to do.
A vehicle buffers fifteen minutes in a cellular dead zone and dumps it all on reconnect. The consumer's defining refusal is that it never trusts arrival order: it windows on the vehicle's own clock with a bounded allowed-lateness, so a fifteen-minute-old event lands in the bucket it belongs to. Process on processing-time and a tunnel makes MMBD and the live map simply wrong — computed against the network, not the road.
The writer is the guardian of the invariant. Its one job is to resolve the four version keys as-of the event instant and stamp them onto the row, after which the row is never touched again. The refusal is structural: there is no update path. A re-classification of the takeover type, after human review, appends a new adjudication row in the liability layer — it does not mutate the original safety event, because the original is evidence.
When a trip aborts to fail-safe, the dispatcher creates the rescue leg as a new trip that inherits the rider intent — same request_id — and points back at the broken trip via parent_trip_id. The refusal here is against side-tables: the rescue relationship lives on the trip fact itself, so billing and "how long was the rider stranded" are both plain joins rather than forensic reconstructions.
One carve-out, always stated: the rescue leg's version keys are the rescue vehicle's own current stack, not a copy of the broken trip's. The two legs were driven by two different cars under two different releases, and conflating them would corrupt the per-release safety statistics. The chain links the legs; it does not merge their attribution.
The derived layer is where the frozen-state discipline pays its largest dividend. Because every disengagement froze its version keys and a pointer to the drive-log, you can re-simulate the logged sensor stream through a new stack and ask "would v9 have disengaged here?" — turning the safety case from "we shipped it and watched" into "we proved the regression was gone before we shipped."
Counterfactual replay is the capstone move. The same content-addressed inputs that explain a past incident become the inputs to a forward simulation: take the logged sensor stream, run it through a candidate release, and log the outcome. A regression that no longer disengages is proof, not hope. The run is recorded as a fact so the safety case is itself auditable.
Liability is the same replay with money attached. Remove the driver and fault shifts from a person to product liability — the operator and its software stack — which promotes the data platform from analytics to the system of record for legal fault. The schema was already quietly satisfying the requirement: the frozen version keys and the failed sensor on every disengagement are the fault attribution, now adjudicated. An appeal appends a new finding version; the finding that was current when a payout posted is never overwritten — the risk-decision replay pattern, in a courtroom.
The facts are where the platform explains itself. Three queries an interviewer loves, because each carries a classic pattern on its back — the self-join rescue chain, the cross-ecosystem root-cause funnel, and the subrogation aggregation that turns a safety column into a recovery ledger.
Because the rescue trip carries parent_trip_id, "who hit a vehicle failure today, and how long until rescue?" is a clean self-join rather than a forensic reconstruction. The broken legs and the rescue legs are the same table, joined to itself on the parent link.
The query the app-vs-fleet bridge exists for: booking-conversion by app release, cross-referenced with the physical fail-safes those same sessions witnessed. Low conversion with near-zero fail-safes points at the app build; low conversion with elevated fail-safes points at the fleet. One query, two ecosystems, root cause isolated.
When a component caused the incident, the operator recovers from the manufacturer. The same failed_sensor_id that explained a disengagement now sizes a vendor recovery — one column, three uses: root-cause, failure-profile, and subrogation. The adjudication join takes only the current finding version, so an appealed claim doesn't double-count.
A senior design ends with observability, because every safety discipline above is invisible without it. The board watches three different definitions of "healthy" — the safety case, the fleet operations, and the liability ledger — and treats a release-over-release MMBD regression as the page-worthy incident.
Read the green tiles together and the board argues that v9.0 is a genuine safety improvement, not a hopeful one: MMBD is up, the counterfactual harness proved the regression gone before the release shipped, and the rising assist ratio says the stack is learning to defer rather than fail. Read the amber and rust and it tells the operator exactly where to spend: fourteen cars drifting out of their ODD envelope on low charge, a downtown intersection generating most of the disengagements, and one vendor's sensors carrying nearly half a million dollars of recoverable exposure. Every number on the board traces back to a frozen key — which is what makes it admissible as well as actionable.
Strip the robots away and the question was testing five judgments, each of which generalizes far beyond autonomous vehicles: