The Autonomy Problem — PaddySpeaks

§ 01 — THE QUESTIONThere is no driver to blame

This is the capstone of the marketplace family, and it is hard for one reason that reshapes everything: the entity every other ride-share model leans on — the driver — has been deleted and replaced by a release pipeline.

Interview Prompt

"Design the data platform for a driverless ride-hail fleet — Waymo, a Tesla robotaxi, Uber AV. The autonomy stack replaces the human driver. Serve Operations, Safety & Regulatory, and Finance — at terabytes of sensor data per vehicle per day. How would you scope this out?"

LEVEL · STAFF / PRINCIPALDURATION · 45 MINFORMAT · WHITEBOARD

Every human ride-share model attributes a trip to a driver, prices insurance on that driver, and presumes them at fault when something goes wrong. Remove the driver and all three assumptions collapse at once. Responsibility for any action now traces to a software release crossed with an HD-map version, a sensor calibration, and an operational design domain — each of which changes over time and must be reconstructable to the millisecond, because a regulator will ask, a court will ask, and "the current version" is the wrong answer to both. The driver did not vanish; it became a stack of versioned dimensions, and the entire schema is organized around making that stack replayable.

A weak answer models the AV like a human driver with a robot in the seat. A strong answer notices the three structural forces first and names them before any boxes:

THE VERSIONED DRIVER: Responsibility is a join of four SCD2 dimensions. Release × HD-map × calibration × ODD — each drifts, each must be recoverable as-of any instant. Ship v9 tomorrow and last week's incident must still attribute to v8.
THE REGULATOR'S FACT TABLE: A disengagement is the safety currency. When autonomy hands control to a safety driver or remote operator, that event is counted, reported, and litigated. It is append-only and must replay against the exact versions that were driving.
THE FIREHOSE AT EXTREME SCALE: The sensor log is terabytes per vehicle per day. Camera, lidar, radar — none of it belongs in the warehouse. The analytics layer holds events and pointers; the raw drive-logs sit in object storage.

A mutable "current stack version" column is the cardinal sin. Overwrite the vehicle row on every over-the-air update and you can never answer "which release was driving last Tuesday" — and the entire safety case collapses.

Scoping out loud

Scope first, because the safety boundary is the design. State what you are building: a fleet data platform whose spine is versioned attribution — every safety-relevant event freezes the versions that produced it — folding into a shared marketplace trip fact so Finance never forks human and robot. State what you are deliberately treating as a callable service: the perception and planning stacks themselves (the platform records which stack ran, not how it thinks), the maps survey pipeline, and real-time motion control. State what you punt: payments rails (treated as a downstream double-entry ledger), the consumer rendering of the app, and rider identity beyond the trip.

Then the envelope, volunteered. A fleet at city scale:

Quantity	Estimate	Consequence
Vehicles in fleet	~10,000	Sets the telemetry fan-in
Telemetry cadence	100 ms	The event-time tempo
Telemetry events	≈ 100 K / s	Hot path vs cold path — the number that splits the pipeline
Raw sensor data	terabytes / vehicle / day	Never in the warehouse — drive-log + pointer only
Disengagements / 1k miles	~0.1–1	The MMBD denominator; the regulatory north star
Version dims (the "driver")	4 × SCD2	Frozen onto every safety event at write time
Cellular dead-zone buffer	up to 15 min	Late, out-of-order data — event-time or be wrong

Notice the asymmetry: a hundred thousand telemetry events a second is a modest stream, but the raw sensor data behind each vehicle is terabytes a day — three orders of magnitude apart. That gap is the whole architecture. The warehouse holds the small, structured stream and a pointer; the firehose stays in object storage. And the single most consequential design choice — frozen version keys on the safety event — is invisible in the numbers and decisive in court.

§ 02 — DATA FLOWFollowing a drive through the building

One architecture, two paths and one spine. A hot path serves sub-two-second fleet-ops alerts; a cold path lands the same events for overnight safety reporting; and the spine is the versioned-dimension join that every safety event freezes at write time, so replay is deterministic forever.

FIG. 1 — Hot path, cold path, frozen spine. The firehose stays in object storage; the warehouse holds structured events and pointers; every safety event freezes the four version keys at write time.

Three properties of this picture do most of the interview's work. First, the firehose never touches the warehouse — raw lidar and camera flow vehicle → object store, and only a per-second rollup plus a raw_log_uri pointer reaches the analytic layer. Second, the version dimensions are frozen onto the safety event at write time, not looked up at query time: the dashed arrow into the disengagement fact is a one-time snapshot, so when the stack ships v9 tomorrow, last week's incident still attributes to v8. Third, ingestion is event-time with bounded allowed lateness, because a vehicle buffers fifteen minutes in a cellular dead zone and dumps it on reconnect — process on arrival order and your safety metrics are computed against the network, not the world.

The Attribution Philosophy, In One Rule

Freeze the versions on the event; never resolve "current" at query time. A safety event is a photograph of a moment — which release was driving, which map it believed in, which sensors it trusted, where it was allowed to be. Snapshot all four onto the event the instant it is written and the photograph is permanent: the regulatory record cannot be falsified by tomorrow's deployment, the v8-vs-v9 comparison stays honest, and incident replay is a plain join that survives every future upgrade. Resolve "current" instead and you re-attribute history to whatever is running now — which is how a safety case becomes a liability.

§ 03 — DATA MODELThe driver becomes four versioned dimensions

The schema falls out of the versioned-driver insight. Four SCD2 dimensions replace the human; an append-only disengagement fact freezes foreign keys to all four; a telemetry rollup keeps the firehose at arm's length; and the trip fact folds into the shared marketplace grain so Finance never forks human and robot.

The four versioned dimensions

"Who was driving" is no longer a person — it is the join of four things that drift: the autonomy release, the HD-map it believed in, the vehicle's sensor calibration, and the operational design domain that says where and when it may drive. Each is SCD2 because each changes, and the as-of-trip configuration must be recoverable. The ODD is a temporal geofence — exactly the surge-zone pattern — so a multi-city rollout is just a new versioned row.

DDL · THE VERSIONED "DRIVER" (FOUR SCD2 DIMENSIONS)

-- Vehicle master asset (the car); SCD2 — sensors re-calibrate over its life.
CREATE TABLE dim_vehicles (
    vehicle_key         BIGINT       NOT NULL,
    vin                 VARCHAR(32)  NOT NULL,
    platform            VARCHAR(40)  NOT NULL,   -- 'Waymo Driver Gen 5'
    deployment_city     VARCHAR(40)  NOT NULL,
    calibration_version VARCHAR(40)  NOT NULL,
    valid_from          TIMESTAMP    NOT NULL,
    valid_to            TIMESTAMP    NOT NULL DEFAULT '9999-12-31',
    PRIMARY KEY (vehicle_key)
);

-- The "driver": the autonomy stack. This IS who was driving; SCD2.
CREATE TABLE dim_software_release (
    release_key           BIGINT      NOT NULL,
    stack_version         VARCHAR(40) NOT NULL,   -- 'v9.0.3'
    perception_model_hash VARCHAR(64) NOT NULL,
    planner_version       VARCHAR(40) NOT NULL,
    valid_from            TIMESTAMP   NOT NULL,
    valid_to              TIMESTAMP   NOT NULL DEFAULT '9999-12-31',
    PRIMARY KEY (release_key)
);

-- The map the car believed in; a stale tile can cause a fault. SCD2.
CREATE TABLE dim_hd_map (
    map_key     BIGINT      NOT NULL,
    map_version VARCHAR(40) NOT NULL,
    tile_set    VARCHAR(60) NOT NULL,
    survey_date DATE        NOT NULL,
    valid_from  TIMESTAMP   NOT NULL,
    valid_to    TIMESTAMP   NOT NULL DEFAULT '9999-12-31',
    PRIMARY KEY (map_key)
);

-- Operational Design Domain: a TEMPORAL GEOFENCE — where/when the AV may
-- drive. SCD2, so a city expansion is one new versioned row, no new machinery.
CREATE TABLE dim_odd (
    odd_key     BIGINT      NOT NULL,
    geofence_h3 VARCHAR[],                  -- set of allowed H3 cells
    weather_env VARCHAR(60) NOT NULL,      -- 'clear|light_rain'
    time_env    VARCHAR(60) NOT NULL,      -- 'always' | 'day_only'
    valid_from  TIMESTAMP   NOT NULL,
    valid_to    TIMESTAMP   NOT NULL DEFAULT '9999-12-31',
    PRIMARY KEY (odd_key)
);

-- Per-sensor hardware inventory, snowflaked off the vehicle; SCD2. A LiDAR
-- is swapped individually, so the grain is the COMPONENT, not the car.
CREATE TABLE dim_devices (
    sensor_id        VARCHAR(50) NOT NULL,
    vehicle_key      BIGINT      NOT NULL,
    sensor_type      VARCHAR(30) NOT NULL,   -- 'LiDAR','Camera','Radar','Compute'
    manufacturer     VARCHAR(50) NOT NULL,   -- the key to failure-profile + subrogation
    firmware_version VARCHAR(50) NOT NULL,
    status           VARCHAR(20) NOT NULL,   -- 'active','degraded','failed'
    valid_from       TIMESTAMP   NOT NULL,
    valid_to         TIMESTAMP   NOT NULL DEFAULT '9999-12-31',
    PRIMARY KEY (sensor_id, valid_from),
    FOREIGN KEY (vehicle_key) REFERENCES dim_vehicles(vehicle_key)
);

The trip fact and the firehose rollup

The trip fact freezes the four version keys at dispatch and folds into the marketplace grain through marketplace_trip_key, so a city dispatching humans and robots reports revenue once. It also carries the exception machinery — a status state machine, a self-referencing parent_trip_id, and a fare_adjustment — for the rescue chain. The telemetry fact is a per-second rollup with a pointer; the raw frames never arrive.

DDL · TRIP FACT + TELEMETRY ROLLUP

-- Autonomous trip; version FKs frozen at dispatch; self-FK rescue chain.
CREATE TABLE fact_av_trips (
    trip_key             BIGINT       NOT NULL,
    request_id           VARCHAR(50)  NOT NULL,   -- shared across rescue legs
    marketplace_trip_key BIGINT,                       -- folds into fct_trips (#1)
    vehicle_key          BIGINT       NOT NULL,
    release_key          BIGINT       NOT NULL,   -- frozen at dispatch
    map_key              BIGINT       NOT NULL,   -- frozen
    odd_key              BIGINT       NOT NULL,   -- frozen
    user_id              VARCHAR(50)  NOT NULL,
    start_ts             TIMESTAMP,
    end_ts               TIMESTAMP,
    autonomous_miles     DECIMAL(8,2) DEFAULT 0,
    actual_fare          DECIMAL(8,2) DEFAULT 0,
    fare_adjustment      DECIMAL(8,2) DEFAULT 0,   -- the rescue comp
    trip_status          VARCHAR(30)  NOT NULL,   -- state machine, below
    parent_trip_id       BIGINT,                       -- self-FK: rescue -> broken trip
    PRIMARY KEY (trip_key),
    FOREIGN KEY (release_key)    REFERENCES dim_software_release(release_key),
    FOREIGN KEY (parent_trip_id) REFERENCES fact_av_trips(trip_key)
);

-- Telemetry firehose: per-second rollup; raw frames in object store.
CREATE TABLE fact_vehicle_telemetry (
    vehicle_key     BIGINT      NOT NULL,
    event_ts        TIMESTAMP   NOT NULL,   -- vehicle clock (event-time)
    sec_bucket      TIMESTAMP   NOT NULL,
    h3_index        VARCHAR(15) NOT NULL,
    speed_mph       INT,
    soc_pct         INT,                        -- state of charge (a dispatch gate)
    autonomy_mode   VARCHAR(20) NOT NULL,   -- fully_autonomous|remote_guidance|manual
    active_trip_key BIGINT,                     -- NULL when cruising empty
    raw_log_uri     VARCHAR(255)                -- pointer to the full drive-log
);

The append-only disengagement — the regulator's table

This is the table the regulator owns, and it has two non-negotiable properties: it is append-only, and it freezes the version keys at event time. Re-classifying a takeover after review appends a new adjudication row elsewhere; the original event is never updated. The failed_sensor_id column does triple duty — root cause, failure-profile analysis, and, in §08, subrogation against the manufacturer.

DDL · DISENGAGEMENT (APPEND-ONLY, FROZEN FKs)

CREATE TABLE fact_disengagement_events (
    disengagement_id VARCHAR(50) NOT NULL,
    trip_key         BIGINT,                -- NULL if no passenger
    vehicle_key      BIGINT      NOT NULL,
    release_key      BIGINT      NOT NULL,  -- frozen — the release that was driving
    map_key          BIGINT      NOT NULL,  -- frozen
    odd_key          BIGINT      NOT NULL,  -- frozen
    failed_sensor_id VARCHAR(50),           -- -> dim_devices (root-cause / subrogation)
    event_ts         TIMESTAMP   NOT NULL,
    location_h3      VARCHAR(15) NOT NULL,
    type             VARCHAR(30) NOT NULL,  -- safety_driver_takeover|remote_operator|
                                            --   system_fault|hard_brake
    reg_reportable   BOOLEAN     NOT NULL,  -- counts toward the CA DMV report
    raw_log_uri      VARCHAR(255),          -- the replay source for §06
    PRIMARY KEY (disengagement_id),
    FOREIGN KEY (release_key)      REFERENCES dim_software_release(release_key),
    FOREIGN KEY (failed_sensor_id) REFERENCES dim_devices(sensor_id)
);

Two more facts complete the picture without inflating the headline. Remote assistance is not a disengagement: most human-in-the-loop moments are the car asking a remote operator to confirm an ambiguous scene while staying in control, so they get their own far-more-frequent fact_remote_assists — folding them into disengagements would inflate the safety rate and hide the real ops load. And the app layer gets a clickstream fact keyed on app version, so a dropped booking-conversion number has an answer: a buggy build, or cars failing pickups.

§ 04 — THE CORE INVARIANTFrozen keys, and the replay that survives every upgrade

The correctness of this entire platform lives in one discipline: the safety event freezes its version foreign keys at write time, so incident replay is a deterministic join that gives the same answer forever — even after the stack, the map, and the calibration have all advanced.

A trip is a state machine, and the exception edges are where the schema earns its grade. A sensor cracks mid-trip; the car can no longer drive autonomously, executes a fail-safe stop, and writes a disengagement row stamped with the failed sensor. The passenger's trip flips to a terminal failure state; dispatch auto-assigns a rescue:

EN_ROUTE→ IN_TRANSIT→ COMPLETED │ ABORTED_FAILSAFE→ RESCUE_DISPATCHED

The rescue is a new trip row carrying the same request_id (same rider intent) and a parent_trip_id pointing back at the broken leg. Billing walks that parent chain, sums both legs' distance, applies a fare_adjustment — usually comping the ride — and posts one clean invoice. The self-referencing chain keeps the incident, the rescue, and the billing on a single auditable thread, with no patchy side-tables.

Snapshot the release, map, calibration, and ODD onto the event at write time. Resolve "current" at query time and you re-attribute last week's incident to today's stack — destroying the regulatory record.AUTONOMY RULE Nº 1

Here is the atomic heart of the platform: the regulator's question — "what was in control when this happened?" — answered as a plain join, correct forever because the keys were frozen. Beside it, the north-star regression: Mean Miles Between Disengagements by release.

SQL · INCIDENT REPLAY + MMBD REGRESSION (THE INVARIANT)

-- Attribute every disengagement to the EXACT state that was driving. Because
-- the FKs were frozen at event time, this stays correct after any upgrade.
SELECT d.disengagement_id, d.event_ts, d.type, d.location_h3,
       r.stack_version,
       m.map_version,
       v.calibration_version
FROM fact_disengagement_events d
JOIN dim_vehicles         v ON v.vehicle_key = d.vehicle_key
JOIN dim_software_release r ON r.release_key = d.release_key   -- frozen key
JOIN dim_hd_map           m ON m.map_key     = d.map_key       -- frozen key
WHERE d.reg_reportable = TRUE
  AND d.event_ts >= DATE '2026-01-01'
ORDER BY d.event_ts;

-- MMBD by release: "is v9 safer than v8?" in one line. The metric every AV
-- program — and the CA DMV report — lives or dies by.
SELECT r.stack_version,
       count(*)                AS disengagements,
       sum(t.autonomous_miles) AS miles,
       round(sum(t.autonomous_miles) / nullif(count(*), 0), 1) AS mmbd
FROM fact_disengagement_events d
JOIN dim_software_release r ON r.release_key = d.release_key
JOIN fact_av_trips        t ON t.trip_key    = d.trip_key
WHERE d.reg_reportable = TRUE
GROUP BY r.stack_version
ORDER BY mmbd DESC;

The worked case makes the invariant concrete. Vehicle AV-7 has a remote_operator disengagement on May 2 while running stack v8.2, map 2026.4, calibration C-19. The next day the fleet upgrades to v9.0 — a new SCD2 release row. Run the replay query on May 2 and it returns v8.2; run it on May 10, with v9.0 live, and it still returns v8.2, because the release key was frozen at event time. A live lookup of "AV-7's current release" would now wrongly blame v9.0 — falsifying both the regulatory record and the v8-vs-v9 comparison. Frozen keys make replay deterministic; live lookups make it a fiction.

§ 05 — INGESTION & STREAMSPython on the firehose and the freeze

Three programs carry the platform: the telemetry consumer that handles late, out-of-order data on event-time, the disengagement writer that freezes the version keys, and the rescue dispatcher that threads a broken trip to its rescue. Each is small; the judgment is in what they refuse to do.

1 · The telemetry consumer — event-time, never arrival order

A vehicle buffers fifteen minutes in a cellular dead zone and dumps it all on reconnect. The consumer's defining refusal is that it never trusts arrival order: it windows on the vehicle's own clock with a bounded allowed-lateness, so a fifteen-minute-old event lands in the bucket it belongs to. Process on processing-time and a tunnel makes MMBD and the live map simply wrong — computed against the network, not the road.

PYTHON · TELEMETRY CONSUMER (EVENT-TIME WINDOWING)

ALLOWED_LATENESS_S = 900        # 15-min dead-zone buffer tolerance

def consume_telemetry(events, watermark, sink):
    """Land per-second telemetry rollups keyed by the VEHICLE clock, not arrival
    time. A burst replayed after a tunnel is bucketed by event_ts, so a
    15-minute-old reading updates the minute it actually belongs to.

    The refusal: arrival order is never the grain. Processing-time windows let
    a cellular dead zone masquerade as a fleet that went dark."""
    for e in events:
        event_ts = e["vehicle_clock"]            # the source of truth for time
        if watermark - event_ts > ALLOWED_LATENESS_S:
            sink.dead_letter(e)                  # beyond the window: quarantine,
            continue                             # never silently fold into "now"
        sink.upsert_rollup(
            vehicle_key = e["vehicle_key"],
            sec_bucket  = floor_to_second(event_ts),   # event-time bucket
            speed_mph   = e["speed_mph"],
            soc_pct     = e["soc_pct"],
            autonomy_mode = e["autonomy_mode"],
            raw_log_uri = e["raw_log_uri"],     # pointer only — frames stay in object store
        )
    # Idempotent upsert keyed by (vehicle_key, sec_bucket): a replayed burst
    # re-states a bucket, never double-counts it.

2 · The disengagement writer — freeze the versions, then it's immutable

The writer is the guardian of the invariant. Its one job is to resolve the four version keys as-of the event instant and stamp them onto the row, after which the row is never touched again. The refusal is structural: there is no update path. A re-classification of the takeover type, after human review, appends a new adjudication row in the liability layer — it does not mutate the original safety event, because the original is evidence.

PYTHON · DISENGAGEMENT WRITER (FREEZE-AT-WRITE)

def write_disengagement(conn, ev):
    """Resolve the four version keys AS-OF the event instant and freeze them onto
    an append-only row. There is deliberately no UPDATE path: a disengagement,
    once written, is immutable evidence. Re-classification appends elsewhere."""
    ts = ev["event_ts"]

    # resolve each SCD2 dimension to the version valid AT THE EVENT INSTANT
    release = scd2_asof(conn, "dim_software_release", ev["vehicle_key"], ts)
    hd_map  = scd2_asof(conn, "dim_hd_map",           ev["vehicle_key"], ts)
    odd     = scd2_asof(conn, "dim_odd",              ev["vehicle_key"], ts)

    conn.execute("""
        INSERT INTO fact_disengagement_events
            (disengagement_id, trip_key, vehicle_key, release_key, map_key,
             odd_key, failed_sensor_id, event_ts, location_h3, type,
             reg_reportable, raw_log_uri)
        VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)
        -- no ON CONFLICT DO UPDATE: append-only by construction
    """, (ev["id"], ev.get("trip_key"), ev["vehicle_key"],
          release["release_key"], hd_map["map_key"], odd["odd_key"],
          ev.get("failed_sensor_id"), ts, ev["location_h3"],
          ev["type"], ev["reg_reportable"], ev["raw_log_uri"]))
    # From here the row is frozen. v9 shipping tomorrow cannot re-point it at v9.

3 · The rescue dispatcher — thread the broken trip to its rescue

When a trip aborts to fail-safe, the dispatcher creates the rescue leg as a new trip that inherits the rider intent — same request_id — and points back at the broken trip via parent_trip_id. The refusal here is against side-tables: the rescue relationship lives on the trip fact itself, so billing and "how long was the rider stranded" are both plain joins rather than forensic reconstructions.

PYTHON · RESCUE DISPATCHER (SELF-REFERENCING CHAIN)

def dispatch_rescue(conn, broken_trip, rescue_vehicle_key):
    """A trip aborted to fail-safe; strand a rider. Create the rescue as a NEW
    trip carrying the SAME request_id (same rider intent) and a parent_trip_id
    back to the broken leg. The chain lives on the fact — not a side-table — so
    billing walks one auditable thread."""
    conn.execute("""
        UPDATE fact_av_trips SET trip_status = 'rescue_dispatched'
        WHERE trip_key = %s
    """, (broken_trip["trip_key"],))

    conn.execute("""
        INSERT INTO fact_av_trips
            (trip_key, request_id, vehicle_key, release_key, map_key, odd_key,
             user_id, trip_status, parent_trip_id, fare_adjustment)
        VALUES (%s, %s, %s, %s, %s, %s, %s, 'en_route', %s, %s)
    """, (next_trip_key(),
          broken_trip["request_id"],        # SAME intent, new leg
          rescue_vehicle_key,
          current_release(conn, rescue_vehicle_key),  # the rescue car's own stack
          current_map(conn, rescue_vehicle_key),
          current_odd(conn, rescue_vehicle_key),
          broken_trip["user_id"],
          broken_trip["trip_key"],          # parent_trip_id -> the broken trip
          comp_fare(broken_trip)))           # usually comps the ride

One carve-out, always stated: the rescue leg's version keys are the rescue vehicle's own current stack, not a copy of the broken trip's. The two legs were driven by two different cars under two different releases, and conflating them would corrupt the per-release safety statistics. The chain links the legs; it does not merge their attribution.

§ 06 — AGGREGATIONCounterfactual replay, and liability with money attached

The derived layer is where the frozen-state discipline pays its largest dividend. Because every disengagement froze its version keys and a pointer to the drive-log, you can re-simulate the logged sensor stream through a new stack and ask "would v9 have disengaged here?" — turning the safety case from "we shipped it and watched" into "we proved the regression was gone before we shipped."

Counterfactual replay is the capstone move. The same content-addressed inputs that explain a past incident become the inputs to a forward simulation: take the logged sensor stream, run it through a candidate release, and log the outcome. A regression that no longer disengages is proof, not hope. The run is recorded as a fact so the safety case is itself auditable.

PYTHON · COUNTERFACTUAL REPLAY HARNESS

def replay_against_candidate(conn, disengagement_id, candidate_release, sim):
    """Re-simulate a past disengagement's logged sensor stream through a NEW stack.
    Frozen state + the drive-log pointer make this deterministic: the question
    'would v9 have disengaged here?' becomes answerable BEFORE v9 ships."""
    d = conn.fetch_one("""
        SELECT raw_log_uri, release_key, location_h3, event_ts
        FROM fact_disengagement_events WHERE disengagement_id = %s
    """, (disengagement_id,))

    # fetch the exact logged frames that the OLD stack disengaged on
    frames  = sim.load_drive_log(d["raw_log_uri"])
    outcome = sim.run(candidate_release, frames)   # 'disengaged' | 'nominal'

    conn.execute("""
        INSERT INTO fact_sim_runs (disengagement_id, candidate_release, outcome, ran_at)
        VALUES (%s, %s, %s, now())
    """, (disengagement_id, candidate_release, outcome))
    return outcome
    # A wall of 'nominal' outcomes across last quarter's disengagements is the
    # regression-gone proof — the frozen-state discipline, run forward.

Liability is the same replay with money attached. Remove the driver and fault shifts from a person to product liability — the operator and its software stack — which promotes the data platform from analytics to the system of record for legal fault. The schema was already quietly satisfying the requirement: the frozen version keys and the failed sensor on every disengagement are the fault attribution, now adjudicated. An appeal appends a new finding version; the finding that was current when a payout posted is never overwritten — the risk-decision replay pattern, in a courtroom.

DDL · LIABILITY — INCIDENT + APPEND-ONLY FAULT FINDINGS

-- One row per reportable collision/claim. The evidence is content-addressed
-- and write-once: append-only stops being tidiness and becomes admissibility.
CREATE TABLE fact_incidents (
    incident_id             VARCHAR(50)  NOT NULL,
    disengagement_id        VARCHAR(50),
    vehicle_key             BIGINT       NOT NULL,
    failed_sensor_id        VARCHAR(50),                 -- the subrogation path
    counterparty_type       VARCHAR(20),   -- vehicle|pedestrian|cyclist|property
    severity                VARCHAR(20),
    claim_amount_usd        DECIMAL(12,2) DEFAULT 0,
    evidence_hash           VARCHAR(64)  NOT NULL,  -- content hash of sealed evidence
    current_finding_version INT          NOT NULL DEFAULT 1,
    event_ts                TIMESTAMP    NOT NULL,
    PRIMARY KEY (incident_id),
    FOREIGN KEY (disengagement_id)
        REFERENCES fact_disengagement_events(disengagement_id)
);

-- One row per adjudication version. An appeal APPENDS; nothing is overwritten.
CREATE TABLE fact_fault_findings (
    incident_id         VARCHAR(50) NOT NULL,
    finding_version     INT         NOT NULL,   -- append-only; appeals add a version
    at_fault            VARCHAR(20) NOT NULL,   -- av|third_party|shared|undetermined
    contributing_factor VARCHAR(20),            -- sensor|perception|planning|road
    adjudicator         VARCHAR(20) NOT NULL,   -- internal|insurer|court
    decided_at          TIMESTAMP   NOT NULL,
    PRIMARY KEY (incident_id, finding_version),
    FOREIGN KEY (incident_id) REFERENCES fact_incidents(incident_id)
);

Append-only stops being a tidiness preference and becomes admissibility. The original verdict, the original versions, the sealed evidence — none of them may move, because someone will challenge whether they did.AGGREGATION RULE Nº 1

§ 07 — ANALYTICS SQLInterrogating the fleet

The facts are where the platform explains itself. Three queries an interviewer loves, because each carries a classic pattern on its back — the self-join rescue chain, the cross-ecosystem root-cause funnel, and the subrogation aggregation that turns a safety column into a recovery ledger.

How long was the rider stranded? — the self-join

Because the rescue trip carries parent_trip_id, "who hit a vehicle failure today, and how long until rescue?" is a clean self-join rather than a forensic reconstruction. The broken legs and the rescue legs are the same table, joined to itself on the parent link.

SQL · RESCUE LATENCY (SELF-JOIN ON PARENT_TRIP_ID)

WITH failed AS (
    SELECT trip_key, user_id, vehicle_key AS broken_vehicle,
           end_ts AS failure_time
    FROM   fact_av_trips
    WHERE  trip_status = 'aborted_failsafe'
      AND  date(start_ts) = CURRENT_DATE
),
rescue AS (
    SELECT parent_trip_id, vehicle_key AS rescue_vehicle,
           start_ts AS rescue_pickup
    FROM   fact_av_trips
    WHERE  parent_trip_id IS NOT NULL
      AND  date(start_ts) = CURRENT_DATE
)
SELECT f.user_id, f.broken_vehicle, r.rescue_vehicle,
       extract(epoch FROM r.rescue_pickup - f.failure_time) / 60 AS minutes_stranded
FROM failed f
JOIN rescue r ON r.parent_trip_id = f.trip_key
ORDER BY minutes_stranded DESC;

App bug or fleet failure? — the cross-ecosystem funnel

The query the app-vs-fleet bridge exists for: booking-conversion by app release, cross-referenced with the physical fail-safes those same sessions witnessed. Low conversion with near-zero fail-safes points at the app build; low conversion with elevated fail-safes points at the fleet. One query, two ecosystems, root cause isolated.

SQL · CONVERSION × FAIL-SAFES BY APP RELEASE

SELECT v.platform, v.semantic_version,
       count(DISTINCT c.session_id)                     AS app_sessions,
       count(DISTINCT r.request_id)                     AS ride_requests,
       round(100.0 * count(DISTINCT r.request_id)
             / nullif(count(DISTINCT c.session_id), 0), 1) AS booking_conversion_pct,
       count(DISTINCT CASE WHEN t.trip_status = 'aborted_failsafe'
                          THEN t.trip_key END)        AS failsafes_witnessed
FROM fact_mobile_clickstream_events c
JOIN dim_mobile_app_versions  v ON v.app_version_key = c.app_version_key
LEFT JOIN fact_ride_requests  r ON r.session_id      = c.session_id
LEFT JOIN fact_av_trips       t ON t.request_id      = r.request_id
WHERE c.event_ts >= now() - INTERVAL '7 days'
GROUP BY v.platform, v.semantic_version
ORDER BY booking_conversion_pct ASC;
-- Low conversion + ~0 fail-safes -> blame the build. Low conversion +
-- elevated fail-safes -> blame the fleet. The blind spot is closed.

Which vendor's sensors are costing us? — subrogation

When a component caused the incident, the operator recovers from the manufacturer. The same failed_sensor_id that explained a disengagement now sizes a vendor recovery — one column, three uses: root-cause, failure-profile, and subrogation. The adjudication join takes only the current finding version, so an appealed claim doesn't double-count.

SQL · SUBROGATION EXPOSURE BY MANUFACTURER

SELECT s.manufacturer, s.sensor_type,
       count(DISTINCT i.incident_id) AS incidents,
       sum(i.claim_amount_usd)       AS recoverable_exposure_usd
FROM fact_incidents      i
JOIN fact_fault_findings f ON f.incident_id    = i.incident_id
                       AND f.finding_version = i.current_finding_version  -- latest only
JOIN dim_devices        s ON s.sensor_id       = i.failed_sensor_id
WHERE f.contributing_factor = 'sensor'
  AND f.at_fault IN ('av', 'shared')
GROUP BY s.manufacturer, s.sensor_type
ORDER BY recoverable_exposure_usd DESC;

§ 08 — THE DASHBOARDProving the fleet is safe

A senior design ends with observability, because every safety discipline above is invisible without it. The board watches three different definitions of "healthy" — the safety case, the fleet operations, and the liability ledger — and treats a release-over-release MMBD regression as the page-worthy incident.

THE SAFETY CASE: MMBD by release (is v9 safer than v8?), reg-reportable count vs the filing deadline, assist-to-disengage ratio — a rising ratio means the car asks for help instead of ceding control, which is maturity.
FLEET OPS: vehicles in ODD vs out (a low-state-of-charge car is outside its envelope for a long trip), telemetry watermark lag, rescue latency p95, charging downtime.
THE LEDGER: open incidents, subrogation exposure by vendor, COPQ (comped fare + recovery credit) — the cost of poor quality, watched like a budget.
THE GATE: counterfactual pass rate — the share of last quarter's disengagements the candidate release no longer trips. Below the bar, the release does not ship.

Fleet Safety & Ops — Phoenix FRI 19:05 MST · STACK v9.0 ROLLING · 2s REFRESH

MMBD v9.0

17.4k mi

MMBD v8.2

11.9k mi

Reg-Reportable (mo)

Assists : Disengage

9.2×

Disengagements by H3 cell — downtown grid (darker = more)

Vehicles Out-of-ODD

Rescue Latency p95

8.1min

Counterfactual Pass

96%

Watermark Lag

Subrogation Exp.

412k$

COPQ (today)

3.8k$

FIG. 2 — The story a maturing fleet tells: MMBD on v9.0 climbing well clear of v8.2 with a 96% counterfactual pass rate, the assist-to-disengage ratio rising to 9.2× (the car asking for help, not ceding control), disengagements clustering on the downtown hot cells, and a single sensor vendor's failures driving a $412k subrogation exposure the ledger is already chasing.

Read the green tiles together and the board argues that v9.0 is a genuine safety improvement, not a hopeful one: MMBD is up, the counterfactual harness proved the regression gone before the release shipped, and the rising assist ratio says the stack is learning to defer rather than fail. Read the amber and rust and it tells the operator exactly where to spend: fourteen cars drifting out of their ODD envelope on low charge, a downtown intersection generating most of the disengagements, and one vendor's sensors carrying nearly half a million dollars of recoverable exposure. Every number on the board traces back to a frozen key — which is what makes it admissible as well as actionable.

§ 09 — THE RUBRICWhat was actually being tested

Strip the robots away and the question was testing five judgments, each of which generalizes far beyond autonomous vehicles:

VERSIONED ATTRIBUTION: Replacing an entity with the stack of versioned dimensions that produced its behavior. When the actor is software, "who did this" is a join of releases, and each must be reconstructable as-of any instant.
FREEZE, DON'T LOOK UP: Snapshotting the version keys onto the event at write time, so the record survives every future deployment. Resolving "current" at query time re-attributes history — the single bug that collapses the safety case.
APPEND-ONLY AS ADMISSIBILITY: Treating the disengagement and the fault finding as immutable evidence, with appeals appending versions. When the data is the legal record, an UPDATE is not a convenience — it is spoliation.
THE FIREHOSE BOUNDARY, AT SCALE: Keeping terabytes of raw sensor data in object storage with a per-second rollup and a pointer — and processing on event-time, because a vehicle that buffers in a tunnel will lie to a processing-time clock.
ONE GRAIN FOR A MIXED FLEET: Folding robot trips into the same marketplace fact as human trips so Finance reconciles once. The grain was built platform-agnostic for exactly this; forking it double-counts the market.

Replace the driver with a stack and the whole model becomes versioned attribution. Nothing here is new — it is the catalogue's SCD2, decision-replay, temporal-geofence, and firehose patterns, at the grade where a wrong join is a subpoena.— CLOSING ARGUMENT