PaddySpeaks · Systems at the Whiteboard · Nº 10

The Booking Problem

Design the data model behind Airbnb — search and availability, the request→accepted→stayed→reviewed lifecycle, cancellation refunds, host payouts and bilateral reviews. It reads like a CRUD app and hides a time machine: a pricing team must be able to ask what a listing cost on July 4th last year, as the site showed it on June 1st. One decision separates the senior answer — the calendar is not a state you mutate, it is a fact you snapshot. A full working-through: data flow, the dimensional schema, the snapshot-key invariant, append-only ingestion, the refund close, three analytics queries, and the dashboard that proves it.

§ 01 — THE QUESTIONA booking app that is secretly a time machine

Most candidates hear "model Airbnb" and reach for three tables and a status column. The question is harder than it sounds, and the difficulty is hiding in a single innocent-looking sentence near the end of the prompt: what was the price of this listing, on that night, as we showed it back then?

Interview Prompt

"Design a data model for Airbnb supporting search/availability lookup, the booking lifecycle — request → accepted → stayed → reviewed — cancellation-policy refund logic, host payouts after stay completion, bilateral guest-and-host reviews, and dynamic pricing, with full point-in-time queryability: what was the price of listing X on July 4 last year? How would you scope it?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The prompt names five subsystems and tests whether you notice they pull in opposite directions. Search wants a fat, denormalized read path — "Paris, July 4–7, 2 guests" must scan millions of listing-night cells and return in milliseconds. The booking lifecycle wants a strict, transactional spine where a confirmed reservation is sacred and a refund is arithmetic, not opinion. And dynamic pricing wants history — not the current price, but the price as it stood on every past day, because that is the only data a pricing experiment can be measured against. A naïve schema can serve any one of these. A senior schema notices that the last requirement quietly rewrites the first two: the calendar cannot be a mutable grid of "is this night free and what does it cost." It has to be an accumulating record of what was true, and what was shown, on each day it was observed.

The weak answer is a calendar table keyed by (listing_id, night_date) that you UPDATE as prices change and nights get booked. It can answer "what is the price tonight" and nothing else. The instant a host nudges a price or an algorithm re-prices a weekend, yesterday's number is gone, and the pricing team's most important question becomes un-answerable. The strong answer treats the calendar the way a warehouse treats any fast-moving dimension that must be auditable: snapshot it. So before any boxes, the working frame for the session — three grains, three tempos:

THE CALENDAR
One row per listing × night × snapshot. The hot read path and the time machine in one table. A daily photograph of "is this night available, and what does it cost" for every future night, kept so the past is queryable, not reconstructed. The largest object in the system by three orders of magnitude.
THE BOOKING
One row per reservation lifecycle. Request → accepted → stayed → reviewed, plus cancellation. Append-only: a modification is a new row linked by parent_booking_id, never an edit in place. Low volume, high stakes — this is where money and trust live.
THE REVIEW & PAYOUT
The settlement layer. Bilateral reviews (guest→host and host→guest) and host payouts that fire only after a completed stay. Slow, derived, eventually consistent — but the source of the superhost signal and the trust-and-safety alarm.
The current price is the cheap question. The interview is the expensive one: what was the price, on that night, the day the guest looked? You cannot derive that from a row you overwrote.

Scoping out loud

Scope is the first scored dimension, so name it before any schema. In scope: the calendar-as-fact with daily snapshots, the booking lifecycle and its append-only modification chain, refund computation locked at cancel time, host payouts after stay, bilateral reviews, and multi-currency with the FX rate frozen at booking. Out of scope, stated aloud: the search ranking model itself (we feed it; we do not design it), the maps and geocoding stack, the messaging and payments rails beyond the entries that represent them, and fraud scoring (a downstream consumer of this model, not part of it). One discipline I will state up front: nothing in the booking spine is ever updated in place except the three timestamps that mark a state transition. Price changes, cancellations, modifications — all of them move the system forward by adding a row, never by erasing one.

Then the envelope math, volunteered rather than extracted. Airbnb-shaped numbers, and one row that reshapes the whole design:

QuantityEstimateConsequence
Active listings~7 MThe width of every calendar snapshot
Future nights tracked / listing~365How far the bookable horizon extends
Calendar rows / day7M × 365 ≈ 2.5 BOne daily snapshot — the number that shapes the whole storage strategy
Bookings / day~2 MTiny by comparison; a conventional row store
Search queries / s (peak)~10 KServed from Elasticsearch, not the warehouse
Snapshot retention90 d raw, then weeklyFull grain is unaffordable forever; downsample old snapshots
Reviews / booking0–2Bilateral; both sides may stay silent

Notice the asymmetry. Bookings are a rounding error — two million rows a day is gigabytes. The calendar is the whole problem: a daily snapshot is billions of rows, and naïvely keeping every snapshot forever is petabytes a year for a question most listings will never be asked. That single row of the table dictates the partitioning, the retention ladder, and the entire reason the calendar looks the way it does. The rest of this article follows the calendar.


§ 02 — DATA FLOWFrom a host's slider to a pricing team's question

One transactional spine, one snapshot tributary, two read shapes. Hosts and guests mutate a small operational store; a nightly job freezes the calendar into the warehouse; search reads a fast index, and the pricing team reads the frozen past.

OPERATIONAL · ONLINE TRANSACTIONS ANALYTICAL · CALENDAR-AS-FACT + READ SHAPES HOST APP price · block · accept GUEST APP search · request · cancel BOOKING SERVICE (OLTP) lifecycle txn · refund calc append-only · FX lock SMART-PRICING ENGINE writes nightly price guidance CHANGE LOG (CDC) bookings · listings · prices ordered per listing_id SEARCH INDEX · ELASTICSEARCH live availability · < 50 ms rebuildable cache, owns nothing LIVE CALENDAR (OLTP) (listing × night) · mutable NIGHTLY SNAPSHOT JOB freeze calendar @ 00:00 UTC stamp snapshot_date · immutable fct_calendar_nights listing × night × snapshot partitioned by night_date fct_bookings · dims (SCD2) payouts · reviews · policies PRICING / TIME-TRAVEL price on night N, as of date D experiments · dispute resolution RETENTION LADDER 90d raw → weekly → cold Parquet read live grid @ 00:00 stream bookings → fact SOLID — online writes & rebuilds · DASHED — the nightly freeze and CDC load · The live grid forgets; the snapshot remembers.
FIG. 1 — Two worlds joined by one job. The operational calendar is mutable and amnesiac; the nightly snapshot stamps it with a date and makes it immortal.

Three properties of this picture do most of the interview's work. First, search never touches the warehouse — it reads a rebuildable Elasticsearch index fed from the change log, because a guest typing "Paris, 2 guests" needs a 50 ms answer over live availability, not a scan of a 2.5-billion-row fact table. Second, the snapshot job is the only bridge between the two worlds, and it runs exactly once per night: it reads the entire live calendar, stamps every row with today's snapshot_date, and appends — it never updates a prior snapshot, because a prior snapshot is, by definition, what was true then. Third, the live calendar is allowed to forget. It holds only the current state; the moment a price changes, the old number is overwritten in OLTP and preserved only in the warehouse. The operational store optimizes for the booking flow; the fact table carries the memory.

The Failure Philosophy, In One Rule

The operational store may lie about the past; the warehouse may never lie about it. If the search index is stale by a minute, a guest occasionally sees a night that was just booked — annoying, self-healing, recoverable at request time by the booking service's transactional check. But if a snapshot is wrong, or a refund is restated because a policy changed, the system has corrupted evidence — and disputes, pricing experiments, and host trust all rest on that evidence. So freshness is negotiable on the read path and non-negotiable in the fact: snapshots are written once, refunds are frozen at cancel time, and FX is locked at booking. The past is a contract.


§ 03 — DATA MODELFour facts, four dimensions, one grain that matters

The schema falls out of the three tempos. A snapshotted calendar fact for the hot path and the time machine; an append-only booking fact for the lifecycle; reviews and payouts for settlement; and slowly-changing dimensions so a booking made last year still joins to the listing as it was, not as it is.

The calendar fact — the time machine

Everything turns on the grain of one table. fct_calendar_nights carries one row per (listing_id, night_date, snapshot_date) — and that triple is the primary key. The first two columns say which night of which listing; the third says which day we are looking at it from. Put snapshot_date in the key and time-travel becomes a WHERE clause; leave it out and the question "what did this cost on July 4th, as shown on June 1st" becomes an archaeology project against SCD2 history and booking events — slow, brittle, often impossible after data drift. The three price columns earn their place: base_price_local is what the host set, smart_price_local is what the algorithm suggested, and effective_price_local is what a guest would actually have been charged. Keeping all three lets the pricing team measure the algorithm against the host against reality.

DDL · FACT — CALENDAR-AS-FACT (THE HOT PATH)
-- One row per (listing × night × snapshot_date). The whole design is -- here: snapshot_date is IN the primary key, which is what turns -- "the price now" into "the price on any night, as seen on any day." -- Partitioned by night_date; ~2.5B rows land per daily snapshot. CREATE TABLE fct_calendar_nights ( listing_id BIGINT NOT NULL, night_date DATE NOT NULL, -- the night being stayed snapshot_date DATE NOT NULL, -- the day we observed it is_available BOOLEAN NOT NULL, is_blocked BOOLEAN NOT NULL, -- host-blocked, not for sale is_booked BOOLEAN NOT NULL, -- the three are mutually exclusive booking_id BIGINT, -- FK, set only when is_booked base_price_local NUMERIC(10,2) NOT NULL, -- host-set nightly rate smart_price_local NUMERIC(10,2), -- algorithm's suggestion effective_price_local NUMERIC(10,2) NOT NULL, -- what a guest would pay currency_code CHAR(3) NOT NULL, -- the listing's currency PRIMARY KEY (listing_id, night_date, snapshot_date), CHECK ( (is_available::int + is_blocked::int + is_booked::int) = 1 ) ) PARTITION BY RANGE (night_date); -- search reads only the freshest snapshot; pricing reads a chosen one. CREATE INDEX idx_cal_search ON fct_calendar_nights (night_date, snapshot_date, is_available) INCLUDE (listing_id, effective_price_local);

The booking fact — append-only lifecycle

A booking is a small record at trivial volume, so a conventional row store is the right home — but it is governed by one rule that separates seniors from juniors: append-only. A modification — a date change, an extra guest, a renegotiated price — does not UPDATE the original row; it inserts a new booking that points back via parent_booking_id, and the original's terminal_state becomes 'modified'. The chain is the audit trail. The only in-place writes permitted are the three lifecycle timestamps and the cancellation fields, because those record when a transition happened, which is itself a fact, not a rewrite of history.

DDL · FACT — BOOKING LIFECYCLE (APPEND-ONLY)
-- One row per booking lifecycle. Modifications append a new row linked -- by parent_booking_id; the original is retired to terminal_state. -- Money is captured in local currency AND in USD with the FX rate that -- was true at booking time — frozen, never re-converted. CREATE TABLE fct_bookings ( booking_id BIGINT PRIMARY KEY, parent_booking_id BIGINT REFERENCES fct_bookings(booking_id), listing_id BIGINT NOT NULL, guest_id BIGINT NOT NULL, host_id BIGINT NOT NULL, request_ts TIMESTAMPTZ NOT NULL, accepted_ts TIMESTAMPTZ, -- null until host accepts cancelled_ts TIMESTAMPTZ, -- null unless cancelled cancellation_actor TEXT CHECK (cancellation_actor IN ('guest','host','platform')), check_in_date DATE NOT NULL, check_out_date DATE NOT NULL, nights INT NOT NULL, guests INT NOT NULL, subtotal_local NUMERIC(10,2) NOT NULL, -- nightly × nights cleaning_local NUMERIC(10,2) NOT NULL DEFAULT 0, service_local NUMERIC(10,2) NOT NULL DEFAULT 0, taxes_local NUMERIC(10,2) NOT NULL DEFAULT 0, total_local NUMERIC(10,2) NOT NULL, currency_code CHAR(3) NOT NULL, fx_rate_to_usd NUMERIC(12,6) NOT NULL, -- LOCKED at booking time total_usd NUMERIC(12,2) NOT NULL, -- total_local × fx_rate_to_usd cancellation_policy TEXT NOT NULL CHECK (cancellation_policy IN ('flexible','moderate','strict','super_strict')), refund_amount_usd NUMERIC(12,2), -- locked at cancel time terminal_state TEXT NOT NULL DEFAULT 'requested' CHECK (terminal_state IN ('requested','confirmed', 'stayed','cancelled','modified','declined')) ); CREATE INDEX idx_bk_host ON fct_bookings (host_id, request_ts DESC); CREATE INDEX idx_bk_guest ON fct_bookings (guest_id, request_ts DESC);

Settlement & the policy dimension

Reviews are bilateral and typed, so one table with a review_type discriminator carries both directions without a self-join nightmare. Payouts fire after a completed stay and reference the booking they settle. And the cancellation policy is a real dimension, not an enum scattered through code: dim_cancellation_policies stores a refund_curve_json mapping days-to-check-in to the refundable fraction, so the refund math is data, versioned and auditable, not a buried CASE statement.

DDL · SETTLEMENT + THE REFUND-CURVE DIMENSION
-- Bilateral reviews: one row per direction, typed. Double-blind reveal -- handled in the app; here we only record what was written and when. CREATE TABLE fct_reviews ( review_id BIGINT PRIMARY KEY, booking_id BIGINT NOT NULL REFERENCES fct_bookings(booking_id), review_type TEXT NOT NULL CHECK (review_type IN ('guest_to_host','host_to_guest')), overall SMALLINT CHECK (overall BETWEEN 1 AND 5), cleanliness SMALLINT, accuracy SMALLINT, communication SMALLINT, submitted_ts TIMESTAMPTZ NOT NULL, UNIQUE (booking_id, review_type) -- at most one each way ); -- Host payouts: only ever created after terminal_state = 'stayed'. CREATE TABLE fct_host_payouts ( payout_id BIGINT PRIMARY KEY, host_id BIGINT NOT NULL, booking_id BIGINT REFERENCES fct_bookings(booking_id), -- NULL for batch settle payout_ts TIMESTAMPTZ NOT NULL, amount_local NUMERIC(12,2) NOT NULL, fx_rate_to_usd NUMERIC(12,6) NOT NULL, -- payout-time FX, distinct from booking FX amount_usd NUMERIC(12,2) NOT NULL, status TEXT NOT NULL CHECK (status IN ('pending','paid','failed','reversed')) ); -- The refund rule, as data. days_to_check_in → refundable fraction. CREATE TABLE dim_cancellation_policies ( policy TEXT PRIMARY KEY, refund_curve_json JSONB NOT NULL, -- e.g. {"30":1.0,"7":0.5,"0":0.0} cleaning_refunded BOOLEAN NOT NULL DEFAULT TRUE, service_refunded BOOLEAN NOT NULL DEFAULT FALSE );

The dimensions that change slowly

A booking made in July must, a year later, still join to the listing's price, capacity, and policy as they were in July — not as they are now. That is the textbook job of a Type-2 slowly-changing dimension. dim_listings and dim_hosts carry validity windows, so an as-of join recovers the historical row. The calendar snapshot already freezes the price; SCD2 freezes everything else about the listing and host.

DDL · SCD2 DIMENSIONS (AS-OF JOINABLE)
CREATE TABLE dim_listings ( listing_sk BIGINT PRIMARY KEY, -- surrogate, one per version listing_id BIGINT NOT NULL, -- natural key, stable title TEXT, capacity INT NOT NULL, room_type TEXT, cancellation_policy TEXT NOT NULL, geography_id BIGINT NOT NULL, valid_from TIMESTAMPTZ NOT NULL, valid_to TIMESTAMPTZ NOT NULL DEFAULT '9999-12-31', is_current BOOLEAN NOT NULL DEFAULT TRUE ); CREATE INDEX idx_listing_asof ON dim_listings (listing_id, valid_from, valid_to); CREATE TABLE dim_hosts ( host_sk BIGINT PRIMARY KEY, host_id BIGINT NOT NULL, is_superhost BOOLEAN NOT NULL, response_rate NUMERIC(4,3), valid_from TIMESTAMPTZ NOT NULL, valid_to TIMESTAMPTZ NOT NULL DEFAULT '9999-12-31', is_current BOOLEAN NOT NULL DEFAULT TRUE );

§ 04 — THE CORE INVARIANTThe snapshot is the grain, and the grain is the proof

Every system has one place its correctness lives. Here it is not a transaction or a lock — it is a column in a primary key. Put snapshot_date in the key and the past becomes addressable; the time machine is not a feature you build, it is a consequence of the grain you chose.

Consider the two designs side by side. In the mutable calendar, the price of listing 42 on July 4th is a single cell, repeatedly overwritten — by the host on June 1st, by the algorithm on June 3rd, by a last-minute discount on July 2nd. Ask "what did it cost on June 1st" and the only honest answer is a shrug. In the snapshotted calendar, that same night exists as many rows — one per day it was observed — and the question is a coordinate lookup: fix the listing, fix the night, fix the day you are asking from. The price is wherever those three lines cross. The cost of this is real and must be named: roughly 2.5 billion rows per nightly snapshot. The payback is that an entire class of question — pricing experiments, dispute resolution, "show me what the guest saw" — moves from impossible to a single indexed read.

The booking lifecycle, as a chain not a column

The same forward-only discipline governs the booking. A reservation walks a state machine, and a modification does not mutate it — it forks it. CANCELLED and MODIFIED are terminal; the chain of parent_booking_id links reconstructs the full history of a stay that changed three times.

REQUESTED CONFIRMED STAYED REVIEWED · CANCELLED · MODIFIED

The single most senior move in the whole problem is the refund. When a guest cancels, the refundable amount is computed once, from the policy's curve and the days remaining until check-in, and the result is written into refund_amount_usd and never recomputed. This is not laziness — it is correctness. Cancellation policies change; a host moves from 'moderate' to 'strict' next quarter. If refunds were computed on read, that change would silently restate every historical refund, and a guest's settled $496.80 would quietly become a different number. Locking the refund at cancel time means the past stays the past.

SQL · TIME-TRAVEL + THE REFUND LOCK
-- THE invariant query: the price of a listing on a night, as it was -- shown on a chosen day. snapshot_date in the PK makes this O(1)-ish. SELECT listing_id, night_date, base_price_local, smart_price_local, effective_price_local FROM fct_calendar_nights WHERE listing_id = 42 AND night_date = '2025-07-04' AND snapshot_date = '2025-06-01'; -- the day the guest looked -- The data product the dynamic-pricing team cannot ship without. -- The refund, locked once at cancel time. days_to_check_in indexes -- the policy curve; the answer is frozen onto the booking row. UPDATE fct_bookings b SET cancelled_ts = now(), cancellation_actor = 'guest', terminal_state = 'cancelled', refund_amount_usd = b.total_usd * ( SELECT refund_fraction(p.refund_curve_json, b.check_in_date - current_date) FROM dim_cancellation_policies p WHERE p.policy = b.cancellation_policy) WHERE b.booking_id = 777 AND b.terminal_state = 'confirmed'; -- guard: cannot cancel a cancelled booking
The price is locked in the snapshot. The refund is locked at cancellation. The FX is locked at booking. A senior model is a museum of frozen moments — and the freezing is the whole point.BOOKING RULE Nº 1 — FREEZE AT THE EVENT

§ 05 — INGESTION & STREAMSPython that writes the past down carefully

Three programs carry the load: the nightly snapshot that freezes the calendar, the booking writer that appends modifications instead of editing them, and the search-index updater that keeps live availability fresh. Each is small; the judgment is in what they refuse to do.

1 · The snapshot job — freeze, never overwrite

The snapshot runs once a night and has exactly one rule: it appends rows stamped with today's date and never touches a prior snapshot. It is idempotent on (listing_id, night_date, snapshot_date), so a retried run after a crash is a no-op rather than a duplicate. The reasoning is the whole design: a snapshot is a photograph, and you do not edit a photograph — you take a new one tomorrow.

PYTHON · NIGHTLY CALENDAR SNAPSHOT
import datetime as dt HORIZON_DAYS = 365 # how far the bookable future extends BATCH_ROWS = 50_000 # keep the append transactions bounded def snapshot_calendar(live, warehouse, snapshot_date=None) -> None: """Freeze the live calendar into fct_calendar_nights for one day. Append-only and idempotent: the PK (listing_id, night_date, snapshot_date) makes a re-run a no-op, never a duplicate. We do not UPDATE yesterday's snapshot, EVER — yesterday is what was true then.""" snapshot_date = snapshot_date or dt.date.today() horizon = snapshot_date + dt.timedelta(days=HORIZON_DAYS) for batch in live.stream_calendar(through=horizon, size=BATCH_ROWS): rows = [ (c.listing_id, c.night_date, snapshot_date, c.is_available, c.is_blocked, c.is_booked, c.booking_id, c.base_price, c.smart_price, c.effective_price, c.currency) for c in batch ] warehouse.execute(""" INSERT INTO fct_calendar_nights VALUES %s ON CONFLICT (listing_id, night_date, snapshot_date) DO NOTHING -- the snapshot is immutable once written """, rows) # the retention ladder runs AFTER the fresh snapshot is durable: # keep daily for 90 days, then thin to one snapshot per week. warehouse.execute(""" DELETE FROM fct_calendar_nights WHERE snapshot_date < %s - INTERVAL '90 days' AND EXTRACT(DOW FROM snapshot_date) <> 0 -- keep Sundays """, [snapshot_date])

One carve-out, always stated: the retention thinning runs only after the new snapshot is confirmed durable. Deleting old history before the new photograph is safely written would risk a window where the system has neither — the kind of ordering bug that looks fine in code review and loses a quarter of pricing data the one night the job is killed mid-run.

2 · The booking writer — modify by appending

The booking writer enforces the append-only law in code so no caller can forget it. A modification reads the parent, computes the new totals with FX frozen at this moment, inserts a fresh booking pointing back, and retires the parent — all in one transaction. The parent is never edited except to set its terminal state, which is itself an immutable fact about when it was superseded.

PYTHON · APPEND-ONLY BOOKING MODIFICATION
import uuid def modify_booking(db, parent_id: int, changes: dict, fx) -> int: """A modification is a NEW booking linked by parent_booking_id, never an UPDATE of the original. The chain is the audit trail. FX is locked again, at modification time, because the money is being re-agreed.""" with db.transaction(): parent = db.fetch_one( "SELECT * FROM fct_bookings WHERE booking_id = %s FOR UPDATE", [parent_id]) if parent["terminal_state"] not in ("confirmed", "stayed"): raise IllegalTransition("only a live booking can be modified") new_id = uuid.uuid4().int >> 64 rate = fx.rate(parent["currency_code"], "USD") # NOW, not parent's new_total_local = price_booking(parent, changes) db.execute(""" INSERT INTO fct_bookings (booking_id, parent_booking_id, listing_id, guest_id, host_id, request_ts, check_in_date, check_out_date, nights, guests, total_local, currency_code, fx_rate_to_usd, total_usd, cancellation_policy, terminal_state) VALUES (%s, %s, ..., %s, %s, %s, %s) """, [new_id, parent_id, ..., new_total_local, parent["currency_code"], rate, round(new_total_local * rate, 2), parent["cancellation_policy"], "confirmed"]) # retire the parent — the ONLY write to it, and it records a fact: # the instant this booking was superseded. db.execute("UPDATE fct_bookings SET terminal_state = 'modified' " "WHERE booking_id = %s", [parent_id]) return new_id

3 · The search-index updater — fresh, lossy, rebuildable

PYTHON · CDC → SEARCH INDEX
async def apply_changes(es, change_log) -> None: """The search index is a rebuildable cache that owns nothing. It may lag the live store by seconds and that is fine: the booking service re-checks availability transactionally at request time, so a stale 'available' costs a retry, never a double-booking.""" async for ev in change_log.consume("calendar", "bookings"): if ev.table == "bookings" and ev.new["terminal_state"] == "confirmed": # a confirmed booking blocks its nights immediately await es.update_nights(ev.new["listing_id"], ev.new["check_in_date"], ev.new["check_out_date"], available=False) elif ev.table == "calendar": await es.upsert(ev.new["listing_id"], ev.new["night_date"], price=ev.new["effective_price_local"], available=ev.new["is_available"]) # no fsync, no ledger: if the index dies, rebuild from the live # calendar in minutes. Truth lives in OLTP and the snapshots.

§ 06 — AGGREGATIONThe settlement close: pay the host, only once, only after the stay

The slow loop has one job and one promise: money moves to a host after a completed stay, exactly once, net of any refund, with the payout-time FX recorded distinctly from the booking-time FX. The craft is in the idempotency and in keeping two exchange rates honest.

A payout is not a read-time calculation; it is a daily close that materializes a row into fct_host_payouts. The trigger is a state, not a timer: a booking becomes eligible only when its terminal_state reaches 'stayed' and the guest's check-out has passed. The amount is the host's earnings — the subtotal less the platform's service fee — and, critically, it is converted at the FX rate true on the payout day, which is generally not the rate locked at booking. The system therefore carries two FX facts per stay: one frozen at booking (what the guest was charged) and one frozen at payout (what the host received). The gap between them is realized currency drift, and a senior model surfaces it rather than hiding it inside a single blended number.

PYTHON · DAILY PAYOUT CLOSE (IDEMPOTENT)
def run_payout_close(db, fx, close_date) -> int: """Settle every booking whose stay has completed and which has not already been paid. Idempotent by a unique (booking_id) in payouts: re-running the close pays no one twice. A refund reduces the payout; a fully refunded booking pays the host nothing.""" eligible = db.fetch_all(""" SELECT b.booking_id, b.host_id, b.subtotal_local, b.service_local, b.currency_code, b.refund_amount_usd, b.fx_rate_to_usd FROM fct_bookings b WHERE b.terminal_state = 'stayed' AND b.check_out_date <= %s AND NOT EXISTS (SELECT 1 FROM fct_host_payouts p WHERE p.booking_id = b.booking_id) """, [close_date]) paid = 0 for b in eligible: host_earn_local = b.subtotal_local - b.service_local rate = fx.rate(b.currency_code, "USD", on=close_date) amount_usd = round(host_earn_local * rate, 2) # a refund already paid back to the guest is netted out of the host: amount_usd = max(0, amount_usd - (b.refund_amount_usd or 0)) db.execute(""" INSERT INTO fct_host_payouts (host_id, booking_id, payout_ts, amount_local, fx_rate_to_usd, amount_usd, status) VALUES (%s, %s, now(), %s, %s, %s, 'pending') ON CONFLICT (booking_id) DO NOTHING -- pay once, never twice """, [b.host_id, b.booking_id, host_earn_local, rate, amount_usd]) paid += 1 return paid
The guest's price freezes at booking. The host's payment freezes at the stay's end. The two exchange rates almost never agree — and pretending they do is how a marketplace quietly loses money it cannot explain.SETTLEMENT RULE Nº 1 — TWO RATES, BOTH RECORDED

The same daily rhythm closes the reviews window. Airbnb's reviews are double-blind: neither side sees the other's until both have submitted or a fourteen-day window expires. The aggregation that drives the superhost signal therefore reads only revealed reviews, recomputes a host's trailing rating and response rate, and writes a new dim_hosts version when the status flips — SCD2 again, so "was this host a superhost when the booking was made" stays answerable forever.


§ 07 — ANALYTICS SQLInterrogating the marketplace

The frozen tables are where the system explains itself. Three queries an interviewer loves, because each one carries a classic SQL pattern on its back: the as-of join, the funnel by conditional aggregation, and the window-function gap detector.

Availability search — the as-of join over snapshots

Search is a join against one chosen snapshot. "Paris, July 4–7, 2 guests" filters the freshest snapshot_date to available nights, joins the listing dimension as-of the same instant, and demands the listing have all three nights free — the HAVING COUNT(*) = 3 that turns "some nights available" into "the whole stay is bookable."

SQL · AVAILABLE LISTINGS, AS-OF A SNAPSHOT
-- All Paris listings free for every night of Jul 4-6, for 2 guests, -- priced as the freshest snapshot shows them. The SCD2 as-of join -- pins dim_listings to the listing as it was on the snapshot date. SELECT l.listing_id, l.title, min(c.effective_price_local) AS price_low, max(c.effective_price_local) AS price_high FROM fct_calendar_nights c JOIN dim_listings l ON l.listing_id = c.listing_id AND c.snapshot_date::timestamptz BETWEEN l.valid_from AND l.valid_to JOIN dim_geography g ON g.geography_id = l.geography_id WHERE c.snapshot_date = (SELECT max(snapshot_date) FROM fct_calendar_nights) AND c.night_date BETWEEN '2025-07-04' AND '2025-07-06' AND c.is_available = TRUE AND l.capacity >= 2 AND g.market_id = 'PARIS' GROUP BY l.listing_id, l.title HAVING count(*) = 3; -- every one of the 3 nights is free

The booking funnel — conditional aggregation

"Where do bookings die?" The lifecycle timestamps make the funnel a single pass of COUNT(*) FILTER: requested, accepted, stayed, plus the host's median time-to-accept. The same conditional-aggregation pattern that powers every funnel in analytics, applied to one append-only fact.

SQL · REQUEST → ACCEPT → STAY FUNNEL
SELECT date_trunc('week', b.request_ts) AS wk, count(*) AS requested, count(*) FILTER (WHERE b.accepted_ts IS NOT NULL) AS accepted, count(*) FILTER (WHERE b.terminal_state = 'stayed') AS stayed, count(*) FILTER (WHERE b.cancellation_actor = 'host') AS host_cancels, round(100.0 * count(*) FILTER (WHERE b.terminal_state = 'stayed') / nullif(count(*), 0), 1) AS stay_rate_pct, percentile_cont(0.50) WITHIN GROUP ( ORDER BY extract(epoch FROM b.accepted_ts - b.request_ts)/3600.0) AS p50_accept_hrs FROM fct_bookings b WHERE b.parent_booking_id IS NULL -- count original requests, not mods AND b.request_ts >= now() - INTERVAL '90 days' GROUP BY 1 ORDER BY 1;

Price drift before booking — window LAG over snapshots

"How much did a listing's price move in the run-up to the night, and did the smart-pricing engine chase demand?" This is the snapshot table's signature query: for one night, walk its successive snapshots in date order and LAG to measure day-over-day change. It is the time machine doing what only it can — and it doubles as a price-dispute resolver.

SQL · DAY-OVER-DAY PRICE DRIFT FOR ONE NIGHT
-- Trace how the price of one (listing, night) evolved across the -- snapshots leading up to it. LAG over snapshot_date is the move. WITH trail AS ( SELECT snapshot_date, effective_price_local, is_booked, lag(effective_price_local) OVER w AS prev_price, lag(snapshot_date) OVER w AS prev_snap FROM fct_calendar_nights WHERE listing_id = 42 AND night_date = '2025-07-04' WINDOW w AS (ORDER BY snapshot_date) ) SELECT snapshot_date, effective_price_local, effective_price_local - prev_price AS day_move, round(100.0 * (effective_price_local - prev_price) / nullif(prev_price, 0), 1) AS pct_move, is_booked FROM trail WHERE prev_price IS NOT NULL AND effective_price_local <> prev_price -- only the days it moved ORDER BY snapshot_date;

§ 08 — THE DASHBOARDProving the marketplace is healthy

A senior design ends with observability, because every frozen moment above is invisible without it. The dashboard watches the three loops separately — supply and search, the booking funnel, and settlement & trust — each with its own definition of healthy.

SUPPLY & SEARCH
snapshot freshness (did last night's 2.5 B rows land on time), calendar utilization by market, search→view conversion, and effective vs base price spread — the smart-pricing engine's footprint on the marketplace.
BOOKING FUNNEL
request→accept→stay rates, host time-to-accept p50/p95, modification rate (how often a stay changes after confirmation), and instant-book share — the friction map of the lifecycle.
SETTLEMENT & TRUST
payout backlog and payouts settled / day, host-cancel rate (the trust-and-safety alarm), review submission rate, and realized FX drift between booking-time and payout-time rates.
Marketplace Ops — Global SAT 09:10 UTC · ALL SYSTEMS · SNAPSHOT 00:00 OK · 5m REFRESH
Snapshot Freshness
8.4h ago
Calendar Rows / Snap
2.49B
Calendar Utilization
63%
Search → View
11.2%
Effective vs Base Price — Paris market · snapshots, last 30 days
effective (smart) base (host-set)
Request → Accept
78%
Accept → Stay
91%
Host Time-to-Accept
5.8h
Host-Cancel Rate
3.4%
Payout Backlog
$2.1M
Realized FX Drift
−$54k
Review Submit Rate
68%
Modification Rate
4.7%
Instant-Book Share
59%
Payouts Settled / d
1.9M
FIG. 2 — The story a healthy weekend tells: the snapshot landed, the smart-price line is pulling away from host-set base as summer demand builds, the funnel is wide — and two amber-to-red signals worth watching: host time-to-accept creeping up and one market's host-cancel rate crossing the trust threshold.

Read the warning tiles together and the dashboard narrates a marketplace under demand pressure. The effective-vs-base spread widening is the smart-pricing engine doing its job — and the snapshot table is the only reason that line can be drawn at all, because it remembers every price the algorithm ever showed. Meanwhile the red host-cancel tile is the trust-and-safety alarm from §07 firing in real time: that is the query that costs a host their superhost status, and it is visible here the moment it crosses the line.


§ 09 — THE RUBRICWhat was actually being tested

Strip the listings and the reviews away and the question was testing five judgments, each of which generalizes far beyond travel:

THE GRAIN IS THE ARCHITECTURE
Choosing one row per listing × night × snapshot — putting time in the key — so the system's hardest question becomes a coordinate lookup. The grain you pick decides which questions are cheap and which are impossible.
SNAPSHOT OVER MUTATE
Treating a fast-moving, audit-critical state as a sequence of photographs rather than a single editable cell — and paying the storage to keep the past addressable instead of reconstructing it.
APPEND, NEVER EDIT
Modeling modifications as new, back-linked rows so the history of a changing booking stays simultaneously correct and complete. The chain is the audit trail; the edit is the bug.
FREEZE AT THE EVENT
Locking the refund at cancel time, the FX at booking, the payout rate at the stay's end — so a later policy or market change can never silently restate a settled fact.
RIGHT STORE, RIGHT TEMPO
Serving search from a rebuildable index, the lifecycle from a transactional spine, and history from a partitioned fact — and letting write volume, not the feature list, decide where each lives.
A booking app is easy until someone asks what was true last July. The senior answer is a schema that never forgets and never lies — because it was built to remember the past the day the past happened, not to reconstruct it under pressure.— CLOSING ARGUMENT