PaddySpeaks · Systems at the Whiteboard · Nº 19

The Autonomy Problem

Every ride-share schema attributes a trip to a driver. Here the driver is software — and that single fact rewrites the model into a question about versioned responsibility, terabytes of sensor data per vehicle per day, and a regulator who owns one of your fact tables. A complete working through: the four versioned dimensions that replace the driver, the append-only disengagement with frozen version keys, the deterministic incident replay, the temporal geofence, the rescue chain, and the liability ledger where a wrong join is a subpoena.

§ 01 — THE QUESTIONThere is no driver to blame

This is the capstone of the marketplace family, and it is hard for one reason that reshapes everything: the entity every other ride-share model leans on — the driver — has been deleted and replaced by a release pipeline.

Interview Prompt

"Design the data platform for a driverless ride-hail fleet — Waymo, a Tesla robotaxi, Uber AV. The autonomy stack replaces the human driver. Serve Operations, Safety & Regulatory, and Finance — at terabytes of sensor data per vehicle per day. How would you scope this out?"

LEVEL · STAFF / PRINCIPALDURATION · 45 MINFORMAT · WHITEBOARD

Every human ride-share model attributes a trip to a driver, prices insurance on that driver, and presumes them at fault when something goes wrong. Remove the driver and all three assumptions collapse at once. Responsibility for any action now traces to a software release crossed with an HD-map version, a sensor calibration, and an operational design domain — each of which changes over time and must be reconstructable to the millisecond, because a regulator will ask, a court will ask, and "the current version" is the wrong answer to both. The driver did not vanish; it became a stack of versioned dimensions, and the entire schema is organized around making that stack replayable.

A weak answer models the AV like a human driver with a robot in the seat. A strong answer notices the three structural forces first and names them before any boxes:

THE VERSIONED DRIVER
Responsibility is a join of four SCD2 dimensions. Release × HD-map × calibration × ODD — each drifts, each must be recoverable as-of any instant. Ship v9 tomorrow and last week's incident must still attribute to v8.
THE REGULATOR'S FACT TABLE
A disengagement is the safety currency. When autonomy hands control to a safety driver or remote operator, that event is counted, reported, and litigated. It is append-only and must replay against the exact versions that were driving.
THE FIREHOSE AT EXTREME SCALE
The sensor log is terabytes per vehicle per day. Camera, lidar, radar — none of it belongs in the warehouse. The analytics layer holds events and pointers; the raw drive-logs sit in object storage.
A mutable "current stack version" column is the cardinal sin. Overwrite the vehicle row on every over-the-air update and you can never answer "which release was driving last Tuesday" — and the entire safety case collapses.

Scoping out loud

Scope first, because the safety boundary is the design. State what you are building: a fleet data platform whose spine is versioned attribution — every safety-relevant event freezes the versions that produced it — folding into a shared marketplace trip fact so Finance never forks human and robot. State what you are deliberately treating as a callable service: the perception and planning stacks themselves (the platform records which stack ran, not how it thinks), the maps survey pipeline, and real-time motion control. State what you punt: payments rails (treated as a downstream double-entry ledger), the consumer rendering of the app, and rider identity beyond the trip.

Then the envelope, volunteered. A fleet at city scale:

QuantityEstimateConsequence
Vehicles in fleet~10,000Sets the telemetry fan-in
Telemetry cadence100 msThe event-time tempo
Telemetry events≈ 100 K / sHot path vs cold path — the number that splits the pipeline
Raw sensor dataterabytes / vehicle / dayNever in the warehouse — drive-log + pointer only
Disengagements / 1k miles~0.1–1The MMBD denominator; the regulatory north star
Version dims (the "driver")4 × SCD2Frozen onto every safety event at write time
Cellular dead-zone bufferup to 15 minLate, out-of-order data — event-time or be wrong

Notice the asymmetry: a hundred thousand telemetry events a second is a modest stream, but the raw sensor data behind each vehicle is terabytes a day — three orders of magnitude apart. That gap is the whole architecture. The warehouse holds the small, structured stream and a pointer; the firehose stays in object storage. And the single most consequential design choice — frozen version keys on the safety event — is invisible in the numbers and decisive in court.


§ 02 — DATA FLOWFollowing a drive through the building

One architecture, two paths and one spine. A hot path serves sub-two-second fleet-ops alerts; a cold path lands the same events for overnight safety reporting; and the spine is the versioned-dimension join that every safety event freezes at write time, so replay is deterministic forever.

FIREHOSE · TB / VEHICLE / DAY WAREHOUSE · ~100K EVENTS/S + POINTERS VEHICLE lidar · camera · radar Protobuf · cellular INGEST · Pub/Sub event-time + watermark 15-min allowed lateness never processing-time OBJECT STORE · DRIVE-LOGS raw lidar/camera · keyed by drive_id never in warehouse · replay source raw_log_uri (pointer only) HOT PATH <2s alerts · live map fact_av_trips frozen version FKs → fct_trips (#1) rescue chain fact_disengagement append-only FROZEN: release·map·odd failed_sensor_id fact_telemetry per-second rollup + raw_log_uri + fact_incidents THE "DRIVER" · 4 × SCD2 dim_software_release dim_hd_map · dim_vehicles dim_odd (temporal geofence) frozen onto events at write time cold path · same events snapshot at event time · never re-pointed SOLID — sustained flow · DASHED — the freeze & the cold path · Geography proposes; the frozen FK adjudicates forever.
FIG. 1 — Hot path, cold path, frozen spine. The firehose stays in object storage; the warehouse holds structured events and pointers; every safety event freezes the four version keys at write time.

Three properties of this picture do most of the interview's work. First, the firehose never touches the warehouse — raw lidar and camera flow vehicle → object store, and only a per-second rollup plus a raw_log_uri pointer reaches the analytic layer. Second, the version dimensions are frozen onto the safety event at write time, not looked up at query time: the dashed arrow into the disengagement fact is a one-time snapshot, so when the stack ships v9 tomorrow, last week's incident still attributes to v8. Third, ingestion is event-time with bounded allowed lateness, because a vehicle buffers fifteen minutes in a cellular dead zone and dumps it on reconnect — process on arrival order and your safety metrics are computed against the network, not the world.

The Attribution Philosophy, In One Rule

Freeze the versions on the event; never resolve "current" at query time. A safety event is a photograph of a moment — which release was driving, which map it believed in, which sensors it trusted, where it was allowed to be. Snapshot all four onto the event the instant it is written and the photograph is permanent: the regulatory record cannot be falsified by tomorrow's deployment, the v8-vs-v9 comparison stays honest, and incident replay is a plain join that survives every future upgrade. Resolve "current" instead and you re-attribute history to whatever is running now — which is how a safety case becomes a liability.


§ 03 — DATA MODELThe driver becomes four versioned dimensions

The schema falls out of the versioned-driver insight. Four SCD2 dimensions replace the human; an append-only disengagement fact freezes foreign keys to all four; a telemetry rollup keeps the firehose at arm's length; and the trip fact folds into the shared marketplace grain so Finance never forks human and robot.

The four versioned dimensions

"Who was driving" is no longer a person — it is the join of four things that drift: the autonomy release, the HD-map it believed in, the vehicle's sensor calibration, and the operational design domain that says where and when it may drive. Each is SCD2 because each changes, and the as-of-trip configuration must be recoverable. The ODD is a temporal geofence — exactly the surge-zone pattern — so a multi-city rollout is just a new versioned row.

DDL · THE VERSIONED "DRIVER" (FOUR SCD2 DIMENSIONS)
-- Vehicle master asset (the car); SCD2 — sensors re-calibrate over its life. CREATE TABLE dim_vehicles ( vehicle_key BIGINT NOT NULL, vin VARCHAR(32) NOT NULL, platform VARCHAR(40) NOT NULL, -- 'Waymo Driver Gen 5' deployment_city VARCHAR(40) NOT NULL, calibration_version VARCHAR(40) NOT NULL, valid_from TIMESTAMP NOT NULL, valid_to TIMESTAMP NOT NULL DEFAULT '9999-12-31', PRIMARY KEY (vehicle_key) ); -- The "driver": the autonomy stack. This IS who was driving; SCD2. CREATE TABLE dim_software_release ( release_key BIGINT NOT NULL, stack_version VARCHAR(40) NOT NULL, -- 'v9.0.3' perception_model_hash VARCHAR(64) NOT NULL, planner_version VARCHAR(40) NOT NULL, valid_from TIMESTAMP NOT NULL, valid_to TIMESTAMP NOT NULL DEFAULT '9999-12-31', PRIMARY KEY (release_key) ); -- The map the car believed in; a stale tile can cause a fault. SCD2. CREATE TABLE dim_hd_map ( map_key BIGINT NOT NULL, map_version VARCHAR(40) NOT NULL, tile_set VARCHAR(60) NOT NULL, survey_date DATE NOT NULL, valid_from TIMESTAMP NOT NULL, valid_to TIMESTAMP NOT NULL DEFAULT '9999-12-31', PRIMARY KEY (map_key) ); -- Operational Design Domain: a TEMPORAL GEOFENCE — where/when the AV may -- drive. SCD2, so a city expansion is one new versioned row, no new machinery. CREATE TABLE dim_odd ( odd_key BIGINT NOT NULL, geofence_h3 VARCHAR[], -- set of allowed H3 cells weather_env VARCHAR(60) NOT NULL, -- 'clear|light_rain' time_env VARCHAR(60) NOT NULL, -- 'always' | 'day_only' valid_from TIMESTAMP NOT NULL, valid_to TIMESTAMP NOT NULL DEFAULT '9999-12-31', PRIMARY KEY (odd_key) ); -- Per-sensor hardware inventory, snowflaked off the vehicle; SCD2. A LiDAR -- is swapped individually, so the grain is the COMPONENT, not the car. CREATE TABLE dim_devices ( sensor_id VARCHAR(50) NOT NULL, vehicle_key BIGINT NOT NULL, sensor_type VARCHAR(30) NOT NULL, -- 'LiDAR','Camera','Radar','Compute' manufacturer VARCHAR(50) NOT NULL, -- the key to failure-profile + subrogation firmware_version VARCHAR(50) NOT NULL, status VARCHAR(20) NOT NULL, -- 'active','degraded','failed' valid_from TIMESTAMP NOT NULL, valid_to TIMESTAMP NOT NULL DEFAULT '9999-12-31', PRIMARY KEY (sensor_id, valid_from), FOREIGN KEY (vehicle_key) REFERENCES dim_vehicles(vehicle_key) );

The trip fact and the firehose rollup

The trip fact freezes the four version keys at dispatch and folds into the marketplace grain through marketplace_trip_key, so a city dispatching humans and robots reports revenue once. It also carries the exception machinery — a status state machine, a self-referencing parent_trip_id, and a fare_adjustment — for the rescue chain. The telemetry fact is a per-second rollup with a pointer; the raw frames never arrive.

DDL · TRIP FACT + TELEMETRY ROLLUP
-- Autonomous trip; version FKs frozen at dispatch; self-FK rescue chain. CREATE TABLE fact_av_trips ( trip_key BIGINT NOT NULL, request_id VARCHAR(50) NOT NULL, -- shared across rescue legs marketplace_trip_key BIGINT, -- folds into fct_trips (#1) vehicle_key BIGINT NOT NULL, release_key BIGINT NOT NULL, -- frozen at dispatch map_key BIGINT NOT NULL, -- frozen odd_key BIGINT NOT NULL, -- frozen user_id VARCHAR(50) NOT NULL, start_ts TIMESTAMP, end_ts TIMESTAMP, autonomous_miles DECIMAL(8,2) DEFAULT 0, actual_fare DECIMAL(8,2) DEFAULT 0, fare_adjustment DECIMAL(8,2) DEFAULT 0, -- the rescue comp trip_status VARCHAR(30) NOT NULL, -- state machine, below parent_trip_id BIGINT, -- self-FK: rescue -> broken trip PRIMARY KEY (trip_key), FOREIGN KEY (release_key) REFERENCES dim_software_release(release_key), FOREIGN KEY (parent_trip_id) REFERENCES fact_av_trips(trip_key) ); -- Telemetry firehose: per-second rollup; raw frames in object store. CREATE TABLE fact_vehicle_telemetry ( vehicle_key BIGINT NOT NULL, event_ts TIMESTAMP NOT NULL, -- vehicle clock (event-time) sec_bucket TIMESTAMP NOT NULL, h3_index VARCHAR(15) NOT NULL, speed_mph INT, soc_pct INT, -- state of charge (a dispatch gate) autonomy_mode VARCHAR(20) NOT NULL, -- fully_autonomous|remote_guidance|manual active_trip_key BIGINT, -- NULL when cruising empty raw_log_uri VARCHAR(255) -- pointer to the full drive-log );

The append-only disengagement — the regulator's table

This is the table the regulator owns, and it has two non-negotiable properties: it is append-only, and it freezes the version keys at event time. Re-classifying a takeover after review appends a new adjudication row elsewhere; the original event is never updated. The failed_sensor_id column does triple duty — root cause, failure-profile analysis, and, in §08, subrogation against the manufacturer.

DDL · DISENGAGEMENT (APPEND-ONLY, FROZEN FKs)
CREATE TABLE fact_disengagement_events ( disengagement_id VARCHAR(50) NOT NULL, trip_key BIGINT, -- NULL if no passenger vehicle_key BIGINT NOT NULL, release_key BIGINT NOT NULL, -- frozen — the release that was driving map_key BIGINT NOT NULL, -- frozen odd_key BIGINT NOT NULL, -- frozen failed_sensor_id VARCHAR(50), -- -> dim_devices (root-cause / subrogation) event_ts TIMESTAMP NOT NULL, location_h3 VARCHAR(15) NOT NULL, type VARCHAR(30) NOT NULL, -- safety_driver_takeover|remote_operator| -- system_fault|hard_brake reg_reportable BOOLEAN NOT NULL, -- counts toward the CA DMV report raw_log_uri VARCHAR(255), -- the replay source for §06 PRIMARY KEY (disengagement_id), FOREIGN KEY (release_key) REFERENCES dim_software_release(release_key), FOREIGN KEY (failed_sensor_id) REFERENCES dim_devices(sensor_id) );

Two more facts complete the picture without inflating the headline. Remote assistance is not a disengagement: most human-in-the-loop moments are the car asking a remote operator to confirm an ambiguous scene while staying in control, so they get their own far-more-frequent fact_remote_assists — folding them into disengagements would inflate the safety rate and hide the real ops load. And the app layer gets a clickstream fact keyed on app version, so a dropped booking-conversion number has an answer: a buggy build, or cars failing pickups.


§ 04 — THE CORE INVARIANTFrozen keys, and the replay that survives every upgrade

The correctness of this entire platform lives in one discipline: the safety event freezes its version foreign keys at write time, so incident replay is a deterministic join that gives the same answer forever — even after the stack, the map, and the calibration have all advanced.

A trip is a state machine, and the exception edges are where the schema earns its grade. A sensor cracks mid-trip; the car can no longer drive autonomously, executes a fail-safe stop, and writes a disengagement row stamped with the failed sensor. The passenger's trip flips to a terminal failure state; dispatch auto-assigns a rescue:

EN_ROUTE IN_TRANSIT COMPLETED ABORTED_FAILSAFE RESCUE_DISPATCHED

The rescue is a new trip row carrying the same request_id (same rider intent) and a parent_trip_id pointing back at the broken leg. Billing walks that parent chain, sums both legs' distance, applies a fare_adjustment — usually comping the ride — and posts one clean invoice. The self-referencing chain keeps the incident, the rescue, and the billing on a single auditable thread, with no patchy side-tables.

Snapshot the release, map, calibration, and ODD onto the event at write time. Resolve "current" at query time and you re-attribute last week's incident to today's stack — destroying the regulatory record.AUTONOMY RULE Nº 1

Here is the atomic heart of the platform: the regulator's question — "what was in control when this happened?" — answered as a plain join, correct forever because the keys were frozen. Beside it, the north-star regression: Mean Miles Between Disengagements by release.

SQL · INCIDENT REPLAY + MMBD REGRESSION (THE INVARIANT)
-- Attribute every disengagement to the EXACT state that was driving. Because -- the FKs were frozen at event time, this stays correct after any upgrade. SELECT d.disengagement_id, d.event_ts, d.type, d.location_h3, r.stack_version, m.map_version, v.calibration_version FROM fact_disengagement_events d JOIN dim_vehicles v ON v.vehicle_key = d.vehicle_key JOIN dim_software_release r ON r.release_key = d.release_key -- frozen key JOIN dim_hd_map m ON m.map_key = d.map_key -- frozen key WHERE d.reg_reportable = TRUE AND d.event_ts >= DATE '2026-01-01' ORDER BY d.event_ts; -- MMBD by release: "is v9 safer than v8?" in one line. The metric every AV -- program — and the CA DMV report — lives or dies by. SELECT r.stack_version, count(*) AS disengagements, sum(t.autonomous_miles) AS miles, round(sum(t.autonomous_miles) / nullif(count(*), 0), 1) AS mmbd FROM fact_disengagement_events d JOIN dim_software_release r ON r.release_key = d.release_key JOIN fact_av_trips t ON t.trip_key = d.trip_key WHERE d.reg_reportable = TRUE GROUP BY r.stack_version ORDER BY mmbd DESC;

The worked case makes the invariant concrete. Vehicle AV-7 has a remote_operator disengagement on May 2 while running stack v8.2, map 2026.4, calibration C-19. The next day the fleet upgrades to v9.0 — a new SCD2 release row. Run the replay query on May 2 and it returns v8.2; run it on May 10, with v9.0 live, and it still returns v8.2, because the release key was frozen at event time. A live lookup of "AV-7's current release" would now wrongly blame v9.0 — falsifying both the regulatory record and the v8-vs-v9 comparison. Frozen keys make replay deterministic; live lookups make it a fiction.


§ 05 — INGESTION & STREAMSPython on the firehose and the freeze

Three programs carry the platform: the telemetry consumer that handles late, out-of-order data on event-time, the disengagement writer that freezes the version keys, and the rescue dispatcher that threads a broken trip to its rescue. Each is small; the judgment is in what they refuse to do.

1 · The telemetry consumer — event-time, never arrival order

A vehicle buffers fifteen minutes in a cellular dead zone and dumps it all on reconnect. The consumer's defining refusal is that it never trusts arrival order: it windows on the vehicle's own clock with a bounded allowed-lateness, so a fifteen-minute-old event lands in the bucket it belongs to. Process on processing-time and a tunnel makes MMBD and the live map simply wrong — computed against the network, not the road.

PYTHON · TELEMETRY CONSUMER (EVENT-TIME WINDOWING)
ALLOWED_LATENESS_S = 900 # 15-min dead-zone buffer tolerance def consume_telemetry(events, watermark, sink): """Land per-second telemetry rollups keyed by the VEHICLE clock, not arrival time. A burst replayed after a tunnel is bucketed by event_ts, so a 15-minute-old reading updates the minute it actually belongs to. The refusal: arrival order is never the grain. Processing-time windows let a cellular dead zone masquerade as a fleet that went dark.""" for e in events: event_ts = e["vehicle_clock"] # the source of truth for time if watermark - event_ts > ALLOWED_LATENESS_S: sink.dead_letter(e) # beyond the window: quarantine, continue # never silently fold into "now" sink.upsert_rollup( vehicle_key = e["vehicle_key"], sec_bucket = floor_to_second(event_ts), # event-time bucket speed_mph = e["speed_mph"], soc_pct = e["soc_pct"], autonomy_mode = e["autonomy_mode"], raw_log_uri = e["raw_log_uri"], # pointer only — frames stay in object store ) # Idempotent upsert keyed by (vehicle_key, sec_bucket): a replayed burst # re-states a bucket, never double-counts it.

2 · The disengagement writer — freeze the versions, then it's immutable

The writer is the guardian of the invariant. Its one job is to resolve the four version keys as-of the event instant and stamp them onto the row, after which the row is never touched again. The refusal is structural: there is no update path. A re-classification of the takeover type, after human review, appends a new adjudication row in the liability layer — it does not mutate the original safety event, because the original is evidence.

PYTHON · DISENGAGEMENT WRITER (FREEZE-AT-WRITE)
def write_disengagement(conn, ev): """Resolve the four version keys AS-OF the event instant and freeze them onto an append-only row. There is deliberately no UPDATE path: a disengagement, once written, is immutable evidence. Re-classification appends elsewhere.""" ts = ev["event_ts"] # resolve each SCD2 dimension to the version valid AT THE EVENT INSTANT release = scd2_asof(conn, "dim_software_release", ev["vehicle_key"], ts) hd_map = scd2_asof(conn, "dim_hd_map", ev["vehicle_key"], ts) odd = scd2_asof(conn, "dim_odd", ev["vehicle_key"], ts) conn.execute(""" INSERT INTO fact_disengagement_events (disengagement_id, trip_key, vehicle_key, release_key, map_key, odd_key, failed_sensor_id, event_ts, location_h3, type, reg_reportable, raw_log_uri) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s) -- no ON CONFLICT DO UPDATE: append-only by construction """, (ev["id"], ev.get("trip_key"), ev["vehicle_key"], release["release_key"], hd_map["map_key"], odd["odd_key"], ev.get("failed_sensor_id"), ts, ev["location_h3"], ev["type"], ev["reg_reportable"], ev["raw_log_uri"])) # From here the row is frozen. v9 shipping tomorrow cannot re-point it at v9.

3 · The rescue dispatcher — thread the broken trip to its rescue

When a trip aborts to fail-safe, the dispatcher creates the rescue leg as a new trip that inherits the rider intent — same request_id — and points back at the broken trip via parent_trip_id. The refusal here is against side-tables: the rescue relationship lives on the trip fact itself, so billing and "how long was the rider stranded" are both plain joins rather than forensic reconstructions.

PYTHON · RESCUE DISPATCHER (SELF-REFERENCING CHAIN)
def dispatch_rescue(conn, broken_trip, rescue_vehicle_key): """A trip aborted to fail-safe; strand a rider. Create the rescue as a NEW trip carrying the SAME request_id (same rider intent) and a parent_trip_id back to the broken leg. The chain lives on the fact — not a side-table — so billing walks one auditable thread.""" conn.execute(""" UPDATE fact_av_trips SET trip_status = 'rescue_dispatched' WHERE trip_key = %s """, (broken_trip["trip_key"],)) conn.execute(""" INSERT INTO fact_av_trips (trip_key, request_id, vehicle_key, release_key, map_key, odd_key, user_id, trip_status, parent_trip_id, fare_adjustment) VALUES (%s, %s, %s, %s, %s, %s, %s, 'en_route', %s, %s) """, (next_trip_key(), broken_trip["request_id"], # SAME intent, new leg rescue_vehicle_key, current_release(conn, rescue_vehicle_key), # the rescue car's own stack current_map(conn, rescue_vehicle_key), current_odd(conn, rescue_vehicle_key), broken_trip["user_id"], broken_trip["trip_key"], # parent_trip_id -> the broken trip comp_fare(broken_trip))) # usually comps the ride

One carve-out, always stated: the rescue leg's version keys are the rescue vehicle's own current stack, not a copy of the broken trip's. The two legs were driven by two different cars under two different releases, and conflating them would corrupt the per-release safety statistics. The chain links the legs; it does not merge their attribution.


§ 06 — AGGREGATIONCounterfactual replay, and liability with money attached

The derived layer is where the frozen-state discipline pays its largest dividend. Because every disengagement froze its version keys and a pointer to the drive-log, you can re-simulate the logged sensor stream through a new stack and ask "would v9 have disengaged here?" — turning the safety case from "we shipped it and watched" into "we proved the regression was gone before we shipped."

Counterfactual replay is the capstone move. The same content-addressed inputs that explain a past incident become the inputs to a forward simulation: take the logged sensor stream, run it through a candidate release, and log the outcome. A regression that no longer disengages is proof, not hope. The run is recorded as a fact so the safety case is itself auditable.

PYTHON · COUNTERFACTUAL REPLAY HARNESS
def replay_against_candidate(conn, disengagement_id, candidate_release, sim): """Re-simulate a past disengagement's logged sensor stream through a NEW stack. Frozen state + the drive-log pointer make this deterministic: the question 'would v9 have disengaged here?' becomes answerable BEFORE v9 ships.""" d = conn.fetch_one(""" SELECT raw_log_uri, release_key, location_h3, event_ts FROM fact_disengagement_events WHERE disengagement_id = %s """, (disengagement_id,)) # fetch the exact logged frames that the OLD stack disengaged on frames = sim.load_drive_log(d["raw_log_uri"]) outcome = sim.run(candidate_release, frames) # 'disengaged' | 'nominal' conn.execute(""" INSERT INTO fact_sim_runs (disengagement_id, candidate_release, outcome, ran_at) VALUES (%s, %s, %s, now()) """, (disengagement_id, candidate_release, outcome)) return outcome # A wall of 'nominal' outcomes across last quarter's disengagements is the # regression-gone proof — the frozen-state discipline, run forward.

Liability is the same replay with money attached. Remove the driver and fault shifts from a person to product liability — the operator and its software stack — which promotes the data platform from analytics to the system of record for legal fault. The schema was already quietly satisfying the requirement: the frozen version keys and the failed sensor on every disengagement are the fault attribution, now adjudicated. An appeal appends a new finding version; the finding that was current when a payout posted is never overwritten — the risk-decision replay pattern, in a courtroom.

DDL · LIABILITY — INCIDENT + APPEND-ONLY FAULT FINDINGS
-- One row per reportable collision/claim. The evidence is content-addressed -- and write-once: append-only stops being tidiness and becomes admissibility. CREATE TABLE fact_incidents ( incident_id VARCHAR(50) NOT NULL, disengagement_id VARCHAR(50), vehicle_key BIGINT NOT NULL, failed_sensor_id VARCHAR(50), -- the subrogation path counterparty_type VARCHAR(20), -- vehicle|pedestrian|cyclist|property severity VARCHAR(20), claim_amount_usd DECIMAL(12,2) DEFAULT 0, evidence_hash VARCHAR(64) NOT NULL, -- content hash of sealed evidence current_finding_version INT NOT NULL DEFAULT 1, event_ts TIMESTAMP NOT NULL, PRIMARY KEY (incident_id), FOREIGN KEY (disengagement_id) REFERENCES fact_disengagement_events(disengagement_id) ); -- One row per adjudication version. An appeal APPENDS; nothing is overwritten. CREATE TABLE fact_fault_findings ( incident_id VARCHAR(50) NOT NULL, finding_version INT NOT NULL, -- append-only; appeals add a version at_fault VARCHAR(20) NOT NULL, -- av|third_party|shared|undetermined contributing_factor VARCHAR(20), -- sensor|perception|planning|road adjudicator VARCHAR(20) NOT NULL, -- internal|insurer|court decided_at TIMESTAMP NOT NULL, PRIMARY KEY (incident_id, finding_version), FOREIGN KEY (incident_id) REFERENCES fact_incidents(incident_id) );
Append-only stops being a tidiness preference and becomes admissibility. The original verdict, the original versions, the sealed evidence — none of them may move, because someone will challenge whether they did.AGGREGATION RULE Nº 1

§ 07 — ANALYTICS SQLInterrogating the fleet

The facts are where the platform explains itself. Three queries an interviewer loves, because each carries a classic pattern on its back — the self-join rescue chain, the cross-ecosystem root-cause funnel, and the subrogation aggregation that turns a safety column into a recovery ledger.

How long was the rider stranded? — the self-join

Because the rescue trip carries parent_trip_id, "who hit a vehicle failure today, and how long until rescue?" is a clean self-join rather than a forensic reconstruction. The broken legs and the rescue legs are the same table, joined to itself on the parent link.

SQL · RESCUE LATENCY (SELF-JOIN ON PARENT_TRIP_ID)
WITH failed AS ( SELECT trip_key, user_id, vehicle_key AS broken_vehicle, end_ts AS failure_time FROM fact_av_trips WHERE trip_status = 'aborted_failsafe' AND date(start_ts) = CURRENT_DATE ), rescue AS ( SELECT parent_trip_id, vehicle_key AS rescue_vehicle, start_ts AS rescue_pickup FROM fact_av_trips WHERE parent_trip_id IS NOT NULL AND date(start_ts) = CURRENT_DATE ) SELECT f.user_id, f.broken_vehicle, r.rescue_vehicle, extract(epoch FROM r.rescue_pickup - f.failure_time) / 60 AS minutes_stranded FROM failed f JOIN rescue r ON r.parent_trip_id = f.trip_key ORDER BY minutes_stranded DESC;

App bug or fleet failure? — the cross-ecosystem funnel

The query the app-vs-fleet bridge exists for: booking-conversion by app release, cross-referenced with the physical fail-safes those same sessions witnessed. Low conversion with near-zero fail-safes points at the app build; low conversion with elevated fail-safes points at the fleet. One query, two ecosystems, root cause isolated.

SQL · CONVERSION × FAIL-SAFES BY APP RELEASE
SELECT v.platform, v.semantic_version, count(DISTINCT c.session_id) AS app_sessions, count(DISTINCT r.request_id) AS ride_requests, round(100.0 * count(DISTINCT r.request_id) / nullif(count(DISTINCT c.session_id), 0), 1) AS booking_conversion_pct, count(DISTINCT CASE WHEN t.trip_status = 'aborted_failsafe' THEN t.trip_key END) AS failsafes_witnessed FROM fact_mobile_clickstream_events c JOIN dim_mobile_app_versions v ON v.app_version_key = c.app_version_key LEFT JOIN fact_ride_requests r ON r.session_id = c.session_id LEFT JOIN fact_av_trips t ON t.request_id = r.request_id WHERE c.event_ts >= now() - INTERVAL '7 days' GROUP BY v.platform, v.semantic_version ORDER BY booking_conversion_pct ASC; -- Low conversion + ~0 fail-safes -> blame the build. Low conversion + -- elevated fail-safes -> blame the fleet. The blind spot is closed.

Which vendor's sensors are costing us? — subrogation

When a component caused the incident, the operator recovers from the manufacturer. The same failed_sensor_id that explained a disengagement now sizes a vendor recovery — one column, three uses: root-cause, failure-profile, and subrogation. The adjudication join takes only the current finding version, so an appealed claim doesn't double-count.

SQL · SUBROGATION EXPOSURE BY MANUFACTURER
SELECT s.manufacturer, s.sensor_type, count(DISTINCT i.incident_id) AS incidents, sum(i.claim_amount_usd) AS recoverable_exposure_usd FROM fact_incidents i JOIN fact_fault_findings f ON f.incident_id = i.incident_id AND f.finding_version = i.current_finding_version -- latest only JOIN dim_devices s ON s.sensor_id = i.failed_sensor_id WHERE f.contributing_factor = 'sensor' AND f.at_fault IN ('av', 'shared') GROUP BY s.manufacturer, s.sensor_type ORDER BY recoverable_exposure_usd DESC;

§ 08 — THE DASHBOARDProving the fleet is safe

A senior design ends with observability, because every safety discipline above is invisible without it. The board watches three different definitions of "healthy" — the safety case, the fleet operations, and the liability ledger — and treats a release-over-release MMBD regression as the page-worthy incident.

THE SAFETY CASE
MMBD by release (is v9 safer than v8?), reg-reportable count vs the filing deadline, assist-to-disengage ratio — a rising ratio means the car asks for help instead of ceding control, which is maturity.
FLEET OPS
vehicles in ODD vs out (a low-state-of-charge car is outside its envelope for a long trip), telemetry watermark lag, rescue latency p95, charging downtime.
THE LEDGER
open incidents, subrogation exposure by vendor, COPQ (comped fare + recovery credit) — the cost of poor quality, watched like a budget.
THE GATE
counterfactual pass rate — the share of last quarter's disengagements the candidate release no longer trips. Below the bar, the release does not ship.
Fleet Safety & Ops — Phoenix FRI 19:05 MST · STACK v9.0 ROLLING · 2s REFRESH
MMBD v9.0
17.4k mi
MMBD v8.2
11.9k mi
Reg-Reportable (mo)
7
Assists : Disengage
9.2×
Disengagements by H3 cell — downtown grid (darker = more)
HIGH LOW
Vehicles Out-of-ODD
14
Rescue Latency p95
8.1min
Counterfactual Pass
96%
Watermark Lag
6s
Subrogation Exp.
412k$
COPQ (today)
3.8k$
FIG. 2 — The story a maturing fleet tells: MMBD on v9.0 climbing well clear of v8.2 with a 96% counterfactual pass rate, the assist-to-disengage ratio rising to 9.2× (the car asking for help, not ceding control), disengagements clustering on the downtown hot cells, and a single sensor vendor's failures driving a $412k subrogation exposure the ledger is already chasing.

Read the green tiles together and the board argues that v9.0 is a genuine safety improvement, not a hopeful one: MMBD is up, the counterfactual harness proved the regression gone before the release shipped, and the rising assist ratio says the stack is learning to defer rather than fail. Read the amber and rust and it tells the operator exactly where to spend: fourteen cars drifting out of their ODD envelope on low charge, a downtown intersection generating most of the disengagements, and one vendor's sensors carrying nearly half a million dollars of recoverable exposure. Every number on the board traces back to a frozen key — which is what makes it admissible as well as actionable.


§ 09 — THE RUBRICWhat was actually being tested

Strip the robots away and the question was testing five judgments, each of which generalizes far beyond autonomous vehicles:

VERSIONED ATTRIBUTION
Replacing an entity with the stack of versioned dimensions that produced its behavior. When the actor is software, "who did this" is a join of releases, and each must be reconstructable as-of any instant.
FREEZE, DON'T LOOK UP
Snapshotting the version keys onto the event at write time, so the record survives every future deployment. Resolving "current" at query time re-attributes history — the single bug that collapses the safety case.
APPEND-ONLY AS ADMISSIBILITY
Treating the disengagement and the fault finding as immutable evidence, with appeals appending versions. When the data is the legal record, an UPDATE is not a convenience — it is spoliation.
THE FIREHOSE BOUNDARY, AT SCALE
Keeping terabytes of raw sensor data in object storage with a per-second rollup and a pointer — and processing on event-time, because a vehicle that buffers in a tunnel will lie to a processing-time clock.
ONE GRAIN FOR A MIXED FLEET
Folding robot trips into the same marketplace fact as human trips so Finance reconciles once. The grain was built platform-agnostic for exactly this; forking it double-counts the market.
Replace the driver with a stack and the whole model becomes versioned attribution. Nothing here is new — it is the catalogue's SCD2, decision-replay, temporal-geofence, and firehose patterns, at the grade where a wrong join is a subpoena.— CLOSING ARGUMENT