Design the backend for a ride-sharing service. Forty-five minutes, one whiteboard, three subsystems — and a single decision that separates a senior answer from a memorized one. A complete working through: data flow, schema, SQL, streaming Python, the aggregation layer, and the dashboard that proves it works.
Every senior data and platform engineer eventually meets some version of this question. It arrives in one casual sentence and contains three entire distributed systems.
"Design the core backend for a ride-sharing service — focus on driver matching, real-time location tracking, and surge pricing. How would you scope this out?"
The question is popular with interviewers for a reason: it cannot be answered with one architecture. The three subsystems it names run at wildly different tempos and want contradictory things. Location tracking is a firehose of data that is worthless four seconds after it arrives. Matching is a low-volume flow where a single consistency mistake — dispatching one driver to two riders — is the cardinal sin of the entire product. Surge pricing is a slow aggregation that must nevertheless be deterministic at the moment of quoting, because the price a rider sees must be the price a rider pays.
A weak answer treats these as one system and reaches for one database. A strong answer notices the tempos first. So before any boxes and arrows, the working frame for the whole session:
Scope is the first scored dimension, and most candidates skip it. State what you are building, what you are deliberately ignoring, and the numbers that will shape every later decision. Out of scope here, said explicitly: payments processing, ratings and fraud, the routing/ETA engine itself (treated as a callable service), the maps stack, and pooled rides — with the caveat that the matching design should not preclude pooling.
Then the envelope math, volunteered rather than extracted. Uber-shaped numbers:
| Quantity | Estimate | Consequence |
|---|---|---|
| Peak concurrent drivers | 5,000,000 | Sets the size of the hot index |
| Ping interval | 4 s | The freshness budget everything is measured against |
| Location writes | ≈ 1.25 M / s | The number that shapes the whole architecture |
| Trips per day | 25,000,000 | ≈ 300 matches/s average, ~3 K at peak — tiny by comparison |
| Hot index footprint | ~100 B × 5 M ≈ 500 MB | Fits in one box of RAM; sharded for throughput, not size |
| Surge recompute | ~1 M cells / 30 s ≈ 35 K updates/s | A modest streaming job |
| Raw location history | tens of TB / day | Sample and compress to cold storage; keep only latest hot |
Notice the asymmetry: nothing else in the system comes within two orders of magnitude of location ingestion. That single row of the table dictates the partition strategy, the storage medium, and the failure philosophy. The rest of this article follows the data.
One architecture, three tempos. The spine of the design is a partitioned change stream doing triple duty: it feeds the geospatial index, the surge aggregator, and the analytics lake — from a single ordered log.
Three properties of this picture do most of the interview’s work. First, the firehose never touches a durable store on the hot path — pings flow connection → gateway → stream → RAM, and only a sampled tributary reaches disk. Second, authority is deliberately tiny: the driver-state store and the trip service are small, keyed, boring databases, while everything large is a rebuildable cache. Third, the candidate query is dashed for a reason — the geo index merely proposes drivers; an atomic compare-and-swap on the driver record decides. Double-dispatch is prevented by that one line, not by geography.
Data plane degrades by dropping; control plane degrades by delaying. A lost ping costs nothing — the next one heals it in four seconds, so under backpressure the gateway buffers briefly, coalesces by driver (last value wins), drops oldest, and never blocks. A lost state transition — an acceptance, a cancellation — is never dropped: separate channel, durable local queue, retry with backoff.
The schema falls out of the authority question. Durable, transactional truth: trips, driver state, quotes. Ephemeral, rebuildable speed: the H3 index. Append-only history: the event log and the sampled ping archive.
Trips are small records at trivial volume — 25 M rows a day is gigabytes — so a conventional relational store is the right home. The interesting choices are the version columns (optimistic concurrency for every state transition) and the append-only trip_events table, which gives the lifecycle an audit trail and feeds analytics without touching the hot row.
The geospatial index is two hash maps and a discipline. It is not a database and must not become one: at 1.25 M writes per second, B-trees, WALs and MVCC are taxes paid for durability the data doesn’t want. If the process dies, the index rebuilds itself from live pings in about thirty seconds — recovery is free because the world keeps broadcasting its state.
Sharding follows geography — city-level shards, hot cities dedicated, small cities packed — because riders in one city never match drivers in another. Boundary cells are replicated to both neighboring shards so a rider standing on a border still gets a single-query answer; results are unioned and de-duplicated by driver_id, latest seen_ms wins. And because the index owns nothing, brief double-presence during a crossing is harmless: geography proposes candidates; the CAS on driver_state decides truth.
Matching is a saga wrapped around one atomic instruction. Everything else — ranking, offers, timers, retries — exists to feed that instruction good candidates and clean up when it loses.
The flow: a rider request arrives carrying a quote_id. Dispatch covers the rider’s radius with H3 cells via k-ring expansion, unions the driver sets, filters (status=available, vehicle class, rating floor), and ranks the survivors. Ranking is ETA-to-pickup over the road network — never haversine; a driver across the river is 200 meters and 20 minutes away — blended with acceptance rate and a fairness term (time since last trip), so the marketplace doesn’t starve its patient drivers.
Then the offer saga: compare-and-swap the top candidate’s state from available → offered, push the offer, start a 12-second server-side expiry. Decline or timeout → CAS back, next candidate, cap at five attempts, then widen the radius or fail gracefully. Offers go out sequentially by default; when supply is tight, parallel offers with first-accept-wins — same CAS arbitrates — trading driver annoyance for match latency. Two riders’ queries may both see the same driver; only one CAS succeeds. The race exists with or without shard boundaries, and one mechanism handles both.
Cancel edges exist from every pre-trip state, and CANCELLED records who and when — fee logic hangs off that pair. Failure transitions are explicit: MATCHED with no driver movement for N minutes auto-cancels and rematches; a driver app going dark mid-trip leaves the trip IN_TRIP with stale location and a support flag — the system never auto-completes on signal loss. Every transition is a CAS on the trip’s version, so a rider-cancel racing a driver-arrive resolves deterministically with exactly one winner; and every transition emits a trip_events row, which keeps the trip service small while payments, notifications and analytics consume downstream.
Three programs carry the fast loop: the gateway that tames the connections, the consumer that maintains the index, and the dispatcher’s candidate query. Each is small; the judgment is in what they refuse to do.
The batch window is 100 ms with a dual trigger — flush on timer or size cap, whichever first. The reasoning is physical: end-to-end staleness is ping interval + batch + stream hop + apply. The ping interval is 4,000 ms, so 100 ms adds 2.5% — and a driver at 30 mph moves ~1.3 m in that window, below GPS’s own 5–10 m noise floor. The latency cost is literally beneath the sensor; the downstream win is ~100× fewer produce requests and 5–10× batch compression. When the trade is imperceptible latency against an order of magnitude of infrastructure, take it every time.
One carve-out, always stated: status transitions bypass this entirely. A driver going offline or accepting a trip is control plane — low volume, dispatch-correctness-critical — flushed immediately on a separate durable channel. The buffer above is for the data plane only.
Both work; the choice deserves reasons, not a default. S2’s exact hierarchy and Hilbert-curve locality are decisive in a disk-backed sorted store — but this index is hash maps in RAM, where locality buys nothing. The query is ring-shaped, where hexagons’ uniform neighbor distance wins; and surge runs on the same grid, where hexes give near-uniform cell area and smooth neighbor gradients with no corner-adjacency ambiguity. Two systems, one cell vocabulary. If the index moved to disk, the answer flips to S2.
Surge is a streaming aggregation with two jobs: price honestly, and move supply. The computation is a per-cell supply/demand ratio over a sliding window; the craft is in the smoothing — and in the promise.
Two signals per H3 cell. Demand is ride requests plus app-opens — the “eyeballs” signal sees intent before the request button is pressed, which is what makes surge leading rather than trailing. Supply is available drivers, read from the same stream that feeds the geo index. Both aggregate over a sliding window of two to five minutes, recomputed every thirty seconds, with event-time windows and watermarks — gateway batching and partition lag mean pings arrive slightly out of order, and processing-time windows would let infrastructure jitter masquerade as demand spikes.
A raw ratio oscillates, so the multiplier is smoothed twice. Spatially: each cell’s ratio blends with its neighbors — hexagons again, six equidistant neighbors make this a clean diffusion — which prevents a one-block price cliff that teaches riders to walk across the street. Temporally: hysteresis, fast up and slow down — surge responds to a demand spike in one window but decays over several, so the price doesn’t flicker and drivers chasing the heat map aren’t whipsawed.
The lock is the product-honesty boundary between the slow loop and the medium loop: when a rider opens the app, pricing reads the current multiplier, mints a quote_id with the fare frozen inside, valid for three minutes. The request that follows carries the quote_id; the trip is charged exactly that figure. Surge updates the next quote, never an issued one. Eventual consistency everywhere in the pipeline, determinism at the instant of promise.
The append-only tables — trip_events and the sampled ping archive — are where the system explains itself. Three queries an interviewer loves, because each one carries a classic SQL pattern on its back.
“How long do drivers actually stay online?” The ping archive has no session boundaries — only timestamps. Sessionization is the canonical gaps-and-islands move: flag every gap larger than the threshold, then a running sum of flags becomes the session ID.
A senior design ends with observability, because every clever degradation mode above is invisible without it. The dashboard watches the three loops separately — each has a different definition of “healthy.”
Read the amber tiles together and the dashboard narrates the concert letout from §02: demand spiked, surge painted the heat map, drivers are converging (TTL evictions normal, ingest steady), matching is briefly slower and more contended — and nothing was dropped that mattered. That is what a designed degradation looks like from the operator’s chair.
Strip the trip details away and the question was testing five judgments, each of which generalizes far beyond ride-sharing: