The Booking Problem — PaddySpeaks

§ 01 — THE QUESTIONA booking app that is secretly a time machine

Most candidates hear "model Airbnb" and reach for three tables and a status column. The question is harder than it sounds, and the difficulty is hiding in a single innocent-looking sentence near the end of the prompt: what was the price of this listing, on that night, as we showed it back then?

Interview Prompt

"Design a data model for Airbnb supporting search/availability lookup, the booking lifecycle — request → accepted → stayed → reviewed — cancellation-policy refund logic, host payouts after stay completion, bilateral guest-and-host reviews, and dynamic pricing, with full point-in-time queryability: what was the price of listing X on July 4 last year? How would you scope it?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The prompt names five subsystems and tests whether you notice they pull in opposite directions. Search wants a fat, denormalized read path — "Paris, July 4–7, 2 guests" must scan millions of listing-night cells and return in milliseconds. The booking lifecycle wants a strict, transactional spine where a confirmed reservation is sacred and a refund is arithmetic, not opinion. And dynamic pricing wants history — not the current price, but the price as it stood on every past day, because that is the only data a pricing experiment can be measured against. A naïve schema can serve any one of these. A senior schema notices that the last requirement quietly rewrites the first two: the calendar cannot be a mutable grid of "is this night free and what does it cost." It has to be an accumulating record of what was true, and what was shown, on each day it was observed.

The weak answer is a calendar table keyed by (listing_id, night_date) that you UPDATE as prices change and nights get booked. It can answer "what is the price tonight" and nothing else. The instant a host nudges a price or an algorithm re-prices a weekend, yesterday's number is gone, and the pricing team's most important question becomes un-answerable. The strong answer treats the calendar the way a warehouse treats any fast-moving dimension that must be auditable: snapshot it. So before any boxes, the working frame for the session — three grains, three tempos:

THE CALENDAR: One row per listing × night × snapshot. The hot read path and the time machine in one table. A daily photograph of "is this night available, and what does it cost" for every future night, kept so the past is queryable, not reconstructed. The largest object in the system by three orders of magnitude.
THE BOOKING: One row per reservation lifecycle. Request → accepted → stayed → reviewed, plus cancellation. Append-only: a modification is a new row linked by parent_booking_id, never an edit in place. Low volume, high stakes — this is where money and trust live.
THE REVIEW & PAYOUT: The settlement layer. Bilateral reviews (guest→host and host→guest) and host payouts that fire only after a completed stay. Slow, derived, eventually consistent — but the source of the superhost signal and the trust-and-safety alarm.

The current price is the cheap question. The interview is the expensive one: what was the price, on that night, the day the guest looked? You cannot derive that from a row you overwrote.

Scoping out loud

Scope is the first scored dimension, so name it before any schema. In scope: the calendar-as-fact with daily snapshots, the booking lifecycle and its append-only modification chain, refund computation locked at cancel time, host payouts after stay, bilateral reviews, and multi-currency with the FX rate frozen at booking. Out of scope, stated aloud: the search ranking model itself (we feed it; we do not design it), the maps and geocoding stack, the messaging and payments rails beyond the entries that represent them, and fraud scoring (a downstream consumer of this model, not part of it). One discipline I will state up front: nothing in the booking spine is ever updated in place except the three timestamps that mark a state transition. Price changes, cancellations, modifications — all of them move the system forward by adding a row, never by erasing one.

Then the envelope math, volunteered rather than extracted. Airbnb-shaped numbers, and one row that reshapes the whole design:

Quantity	Estimate	Consequence
Active listings	~7 M	The width of every calendar snapshot
Future nights tracked / listing	~365	How far the bookable horizon extends
Calendar rows / day	7M × 365 ≈ 2.5 B	One daily snapshot — the number that shapes the whole storage strategy
Bookings / day	~2 M	Tiny by comparison; a conventional row store
Search queries / s (peak)	~10 K	Served from Elasticsearch, not the warehouse
Snapshot retention	90 d raw, then weekly	Full grain is unaffordable forever; downsample old snapshots
Reviews / booking	0–2	Bilateral; both sides may stay silent

Notice the asymmetry. Bookings are a rounding error — two million rows a day is gigabytes. The calendar is the whole problem: a daily snapshot is billions of rows, and naïvely keeping every snapshot forever is petabytes a year for a question most listings will never be asked. That single row of the table dictates the partitioning, the retention ladder, and the entire reason the calendar looks the way it does. The rest of this article follows the calendar.

§ 02 — DATA FLOWFrom a host's slider to a pricing team's question

One transactional spine, one snapshot tributary, two read shapes. Hosts and guests mutate a small operational store; a nightly job freezes the calendar into the warehouse; search reads a fast index, and the pricing team reads the frozen past.

FIG. 1 — Two worlds joined by one job. The operational calendar is mutable and amnesiac; the nightly snapshot stamps it with a date and makes it immortal.

Three properties of this picture do most of the interview's work. First, search never touches the warehouse — it reads a rebuildable Elasticsearch index fed from the change log, because a guest typing "Paris, 2 guests" needs a 50 ms answer over live availability, not a scan of a 2.5-billion-row fact table. Second, the snapshot job is the only bridge between the two worlds, and it runs exactly once per night: it reads the entire live calendar, stamps every row with today's snapshot_date, and appends — it never updates a prior snapshot, because a prior snapshot is, by definition, what was true then. Third, the live calendar is allowed to forget. It holds only the current state; the moment a price changes, the old number is overwritten in OLTP and preserved only in the warehouse. The operational store optimizes for the booking flow; the fact table carries the memory.

The Failure Philosophy, In One Rule

The operational store may lie about the past; the warehouse may never lie about it. If the search index is stale by a minute, a guest occasionally sees a night that was just booked — annoying, self-healing, recoverable at request time by the booking service's transactional check. But if a snapshot is wrong, or a refund is restated because a policy changed, the system has corrupted evidence — and disputes, pricing experiments, and host trust all rest on that evidence. So freshness is negotiable on the read path and non-negotiable in the fact: snapshots are written once, refunds are frozen at cancel time, and FX is locked at booking. The past is a contract.

§ 03 — DATA MODELFour facts, four dimensions, one grain that matters

The schema falls out of the three tempos. A snapshotted calendar fact for the hot path and the time machine; an append-only booking fact for the lifecycle; reviews and payouts for settlement; and slowly-changing dimensions so a booking made last year still joins to the listing as it was, not as it is.

The calendar fact — the time machine

Everything turns on the grain of one table. fct_calendar_nights carries one row per (listing_id, night_date, snapshot_date) — and that triple is the primary key. The first two columns say which night of which listing; the third says which day we are looking at it from. Put snapshot_date in the key and time-travel becomes a WHERE clause; leave it out and the question "what did this cost on July 4th, as shown on June 1st" becomes an archaeology project against SCD2 history and booking events — slow, brittle, often impossible after data drift. The three price columns earn their place: base_price_local is what the host set, smart_price_local is what the algorithm suggested, and effective_price_local is what a guest would actually have been charged. Keeping all three lets the pricing team measure the algorithm against the host against reality.

DDL · FACT — CALENDAR-AS-FACT (THE HOT PATH)

-- One row per (listing × night × snapshot_date). The whole design is
-- here: snapshot_date is IN the primary key, which is what turns
-- "the price now" into "the price on any night, as seen on any day."
-- Partitioned by night_date; ~2.5B rows land per daily snapshot.
CREATE TABLE fct_calendar_nights (
    listing_id            BIGINT       NOT NULL,
    night_date            DATE         NOT NULL,   -- the night being stayed
    snapshot_date         DATE         NOT NULL,   -- the day we observed it
    is_available          BOOLEAN      NOT NULL,
    is_blocked            BOOLEAN      NOT NULL,   -- host-blocked, not for sale
    is_booked             BOOLEAN      NOT NULL,   -- the three are mutually exclusive
    booking_id            BIGINT,                    -- FK, set only when is_booked
    base_price_local      NUMERIC(10,2) NOT NULL,  -- host-set nightly rate
    smart_price_local     NUMERIC(10,2),             -- algorithm's suggestion
    effective_price_local NUMERIC(10,2) NOT NULL,  -- what a guest would pay
    currency_code         CHAR(3)      NOT NULL,   -- the listing's currency
    PRIMARY KEY (listing_id, night_date, snapshot_date),
    CHECK ( (is_available::int + is_blocked::int + is_booked::int) = 1 )
) PARTITION BY RANGE (night_date);
-- search reads only the freshest snapshot; pricing reads a chosen one.
CREATE INDEX idx_cal_search ON fct_calendar_nights
    (night_date, snapshot_date, is_available) INCLUDE (listing_id, effective_price_local);

The booking fact — append-only lifecycle

A booking is a small record at trivial volume, so a conventional row store is the right home — but it is governed by one rule that separates seniors from juniors: append-only. A modification — a date change, an extra guest, a renegotiated price — does not UPDATE the original row; it inserts a new booking that points back via parent_booking_id, and the original's terminal_state becomes 'modified'. The chain is the audit trail. The only in-place writes permitted are the three lifecycle timestamps and the cancellation fields, because those record when a transition happened, which is itself a fact, not a rewrite of history.

DDL · FACT — BOOKING LIFECYCLE (APPEND-ONLY)

-- One row per booking lifecycle. Modifications append a new row linked
-- by parent_booking_id; the original is retired to terminal_state.
-- Money is captured in local currency AND in USD with the FX rate that
-- was true at booking time — frozen, never re-converted.
CREATE TABLE fct_bookings (
    booking_id          BIGINT       PRIMARY KEY,
    parent_booking_id   BIGINT       REFERENCES fct_bookings(booking_id),
    listing_id          BIGINT       NOT NULL,
    guest_id            BIGINT       NOT NULL,
    host_id             BIGINT       NOT NULL,
    request_ts          TIMESTAMPTZ  NOT NULL,
    accepted_ts         TIMESTAMPTZ,                -- null until host accepts
    cancelled_ts        TIMESTAMPTZ,                -- null unless cancelled
    cancellation_actor  TEXT         CHECK (cancellation_actor IN
                        ('guest','host','platform')),
    check_in_date       DATE         NOT NULL,
    check_out_date      DATE         NOT NULL,
    nights              INT          NOT NULL,
    guests              INT          NOT NULL,
    subtotal_local      NUMERIC(10,2) NOT NULL,  -- nightly × nights
    cleaning_local      NUMERIC(10,2) NOT NULL DEFAULT 0,
    service_local       NUMERIC(10,2) NOT NULL DEFAULT 0,
    taxes_local         NUMERIC(10,2) NOT NULL DEFAULT 0,
    total_local         NUMERIC(10,2) NOT NULL,
    currency_code       CHAR(3)      NOT NULL,
    fx_rate_to_usd      NUMERIC(12,6) NOT NULL,  -- LOCKED at booking time
    total_usd           NUMERIC(12,2) NOT NULL,  -- total_local × fx_rate_to_usd
    cancellation_policy TEXT         NOT NULL CHECK (cancellation_policy IN
                        ('flexible','moderate','strict','super_strict')),
    refund_amount_usd   NUMERIC(12,2),              -- locked at cancel time
    terminal_state      TEXT         NOT NULL DEFAULT 'requested'
                        CHECK (terminal_state IN ('requested','confirmed',
                        'stayed','cancelled','modified','declined'))
);
CREATE INDEX idx_bk_host  ON fct_bookings (host_id, request_ts DESC);
CREATE INDEX idx_bk_guest ON fct_bookings (guest_id, request_ts DESC);

Settlement & the policy dimension

Reviews are bilateral and typed, so one table with a review_type discriminator carries both directions without a self-join nightmare. Payouts fire after a completed stay and reference the booking they settle. And the cancellation policy is a real dimension, not an enum scattered through code: dim_cancellation_policies stores a refund_curve_json mapping days-to-check-in to the refundable fraction, so the refund math is data, versioned and auditable, not a buried CASE statement.

DDL · SETTLEMENT + THE REFUND-CURVE DIMENSION

-- Bilateral reviews: one row per direction, typed. Double-blind reveal
-- handled in the app; here we only record what was written and when.
CREATE TABLE fct_reviews (
    review_id     BIGINT       PRIMARY KEY,
    booking_id    BIGINT       NOT NULL REFERENCES fct_bookings(booking_id),
    review_type   TEXT         NOT NULL CHECK (review_type IN
                  ('guest_to_host','host_to_guest')),
    overall       SMALLINT     CHECK (overall BETWEEN 1 AND 5),
    cleanliness   SMALLINT, accuracy SMALLINT, communication SMALLINT,
    submitted_ts  TIMESTAMPTZ  NOT NULL,
    UNIQUE (booking_id, review_type)        -- at most one each way
);

-- Host payouts: only ever created after terminal_state = 'stayed'.
CREATE TABLE fct_host_payouts (
    payout_id     BIGINT       PRIMARY KEY,
    host_id       BIGINT       NOT NULL,
    booking_id    BIGINT       REFERENCES fct_bookings(booking_id), -- NULL for batch settle
    payout_ts     TIMESTAMPTZ  NOT NULL,
    amount_local  NUMERIC(12,2) NOT NULL,
    fx_rate_to_usd NUMERIC(12,6) NOT NULL,   -- payout-time FX, distinct from booking FX
    amount_usd    NUMERIC(12,2) NOT NULL,
    status        TEXT         NOT NULL CHECK (status IN
                  ('pending','paid','failed','reversed'))
);

-- The refund rule, as data. days_to_check_in → refundable fraction.
CREATE TABLE dim_cancellation_policies (
    policy            TEXT      PRIMARY KEY,
    refund_curve_json JSONB     NOT NULL,   -- e.g. {"30":1.0,"7":0.5,"0":0.0}
    cleaning_refunded BOOLEAN   NOT NULL DEFAULT TRUE,
    service_refunded  BOOLEAN   NOT NULL DEFAULT FALSE
);

The dimensions that change slowly

A booking made in July must, a year later, still join to the listing's price, capacity, and policy as they were in July — not as they are now. That is the textbook job of a Type-2 slowly-changing dimension. dim_listings and dim_hosts carry validity windows, so an as-of join recovers the historical row. The calendar snapshot already freezes the price; SCD2 freezes everything else about the listing and host.

DDL · SCD2 DIMENSIONS (AS-OF JOINABLE)

CREATE TABLE dim_listings (
    listing_sk         BIGINT       PRIMARY KEY,   -- surrogate, one per version
    listing_id         BIGINT       NOT NULL,        -- natural key, stable
    title              TEXT,
    capacity           INT          NOT NULL,
    room_type          TEXT,
    cancellation_policy TEXT        NOT NULL,
    geography_id       BIGINT       NOT NULL,
    valid_from         TIMESTAMPTZ  NOT NULL,
    valid_to           TIMESTAMPTZ  NOT NULL DEFAULT '9999-12-31',
    is_current         BOOLEAN      NOT NULL DEFAULT TRUE
);
CREATE INDEX idx_listing_asof ON dim_listings (listing_id, valid_from, valid_to);

CREATE TABLE dim_hosts (
    host_sk        BIGINT      PRIMARY KEY,
    host_id        BIGINT      NOT NULL,
    is_superhost   BOOLEAN     NOT NULL,
    response_rate  NUMERIC(4,3),
    valid_from     TIMESTAMPTZ NOT NULL,
    valid_to       TIMESTAMPTZ NOT NULL DEFAULT '9999-12-31',
    is_current     BOOLEAN     NOT NULL DEFAULT TRUE
);

§ 04 — THE CORE INVARIANTThe snapshot is the grain, and the grain is the proof

Every system has one place its correctness lives. Here it is not a transaction or a lock — it is a column in a primary key. Put snapshot_date in the key and the past becomes addressable; the time machine is not a feature you build, it is a consequence of the grain you chose.

Consider the two designs side by side. In the mutable calendar, the price of listing 42 on July 4th is a single cell, repeatedly overwritten — by the host on June 1st, by the algorithm on June 3rd, by a last-minute discount on July 2nd. Ask "what did it cost on June 1st" and the only honest answer is a shrug. In the snapshotted calendar, that same night exists as many rows — one per day it was observed — and the question is a coordinate lookup: fix the listing, fix the night, fix the day you are asking from. The price is wherever those three lines cross. The cost of this is real and must be named: roughly 2.5 billion rows per nightly snapshot. The payback is that an entire class of question — pricing experiments, dispute resolution, "show me what the guest saw" — moves from impossible to a single indexed read.

The booking lifecycle, as a chain not a column

The same forward-only discipline governs the booking. A reservation walks a state machine, and a modification does not mutate it — it forks it. CANCELLED and MODIFIED are terminal; the chain of parent_booking_id links reconstructs the full history of a stay that changed three times.

REQUESTED→ CONFIRMED→ STAYED→ REVIEWED · CANCELLED · MODIFIED

The single most senior move in the whole problem is the refund. When a guest cancels, the refundable amount is computed once, from the policy's curve and the days remaining until check-in, and the result is written into refund_amount_usd and never recomputed. This is not laziness — it is correctness. Cancellation policies change; a host moves from 'moderate' to 'strict' next quarter. If refunds were computed on read, that change would silently restate every historical refund, and a guest's settled $496.80 would quietly become a different number. Locking the refund at cancel time means the past stays the past.

SQL · TIME-TRAVEL + THE REFUND LOCK

-- THE invariant query: the price of a listing on a night, as it was
-- shown on a chosen day. snapshot_date in the PK makes this O(1)-ish.
SELECT listing_id, night_date,
       base_price_local, smart_price_local, effective_price_local
FROM   fct_calendar_nights
WHERE  listing_id    = 42
  AND  night_date    = '2025-07-04'
  AND  snapshot_date = '2025-06-01';   -- the day the guest looked
-- The data product the dynamic-pricing team cannot ship without.

-- The refund, locked once at cancel time. days_to_check_in indexes
-- the policy curve; the answer is frozen onto the booking row.
UPDATE fct_bookings b
SET    cancelled_ts       = now(),
       cancellation_actor = 'guest',
       terminal_state     = 'cancelled',
       refund_amount_usd  = b.total_usd * (
           SELECT refund_fraction(p.refund_curve_json,
                                  b.check_in_date - current_date)
           FROM   dim_cancellation_policies p
           WHERE  p.policy = b.cancellation_policy)
WHERE  b.booking_id = 777
  AND  b.terminal_state = 'confirmed';   -- guard: cannot cancel a cancelled booking

The price is locked in the snapshot. The refund is locked at cancellation. The FX is locked at booking. A senior model is a museum of frozen moments — and the freezing is the whole point.BOOKING RULE Nº 1 — FREEZE AT THE EVENT

§ 05 — INGESTION & STREAMSPython that writes the past down carefully

Three programs carry the load: the nightly snapshot that freezes the calendar, the booking writer that appends modifications instead of editing them, and the search-index updater that keeps live availability fresh. Each is small; the judgment is in what they refuse to do.

1 · The snapshot job — freeze, never overwrite

The snapshot runs once a night and has exactly one rule: it appends rows stamped with today's date and never touches a prior snapshot. It is idempotent on (listing_id, night_date, snapshot_date), so a retried run after a crash is a no-op rather than a duplicate. The reasoning is the whole design: a snapshot is a photograph, and you do not edit a photograph — you take a new one tomorrow.

PYTHON · NIGHTLY CALENDAR SNAPSHOT

import datetime as dt

HORIZON_DAYS = 365        # how far the bookable future extends
BATCH_ROWS   = 50_000     # keep the append transactions bounded

def snapshot_calendar(live, warehouse, snapshot_date=None) -> None:
    """Freeze the live calendar into fct_calendar_nights for one day.
    Append-only and idempotent: the PK (listing_id, night_date,
    snapshot_date) makes a re-run a no-op, never a duplicate. We do not
    UPDATE yesterday's snapshot, EVER — yesterday is what was true then."""
    snapshot_date = snapshot_date or dt.date.today()
    horizon = snapshot_date + dt.timedelta(days=HORIZON_DAYS)

    for batch in live.stream_calendar(through=horizon, size=BATCH_ROWS):
        rows = [
            (c.listing_id, c.night_date, snapshot_date,
             c.is_available, c.is_blocked, c.is_booked, c.booking_id,
             c.base_price, c.smart_price, c.effective_price, c.currency)
            for c in batch
        ]
        warehouse.execute("""
            INSERT INTO fct_calendar_nights VALUES %s
            ON CONFLICT (listing_id, night_date, snapshot_date)
            DO NOTHING        -- the snapshot is immutable once written
        """, rows)

    # the retention ladder runs AFTER the fresh snapshot is durable:
    # keep daily for 90 days, then thin to one snapshot per week.
    warehouse.execute("""
        DELETE FROM fct_calendar_nights
        WHERE snapshot_date < %s - INTERVAL '90 days'
          AND EXTRACT(DOW FROM snapshot_date) <> 0   -- keep Sundays
    """, [snapshot_date])

One carve-out, always stated: the retention thinning runs only after the new snapshot is confirmed durable. Deleting old history before the new photograph is safely written would risk a window where the system has neither — the kind of ordering bug that looks fine in code review and loses a quarter of pricing data the one night the job is killed mid-run.

2 · The booking writer — modify by appending

The booking writer enforces the append-only law in code so no caller can forget it. A modification reads the parent, computes the new totals with FX frozen at this moment, inserts a fresh booking pointing back, and retires the parent — all in one transaction. The parent is never edited except to set its terminal state, which is itself an immutable fact about when it was superseded.

PYTHON · APPEND-ONLY BOOKING MODIFICATION

import uuid

def modify_booking(db, parent_id: int, changes: dict, fx) -> int:
    """A modification is a NEW booking linked by parent_booking_id, never
    an UPDATE of the original. The chain is the audit trail. FX is locked
    again, at modification time, because the money is being re-agreed."""
    with db.transaction():
        parent = db.fetch_one(
            "SELECT * FROM fct_bookings WHERE booking_id = %s FOR UPDATE",
            [parent_id])
        if parent["terminal_state"] not in ("confirmed", "stayed"):
            raise IllegalTransition("only a live booking can be modified")

        new_id   = uuid.uuid4().int >> 64
        rate     = fx.rate(parent["currency_code"], "USD")   # NOW, not parent's
        new_total_local = price_booking(parent, changes)
        db.execute("""
            INSERT INTO fct_bookings
              (booking_id, parent_booking_id, listing_id, guest_id, host_id,
               request_ts, check_in_date, check_out_date, nights, guests,
               total_local, currency_code, fx_rate_to_usd, total_usd,
               cancellation_policy, terminal_state)
            VALUES (%s, %s, ...,  %s, %s, %s,  %s)
        """, [new_id, parent_id, ...,
                  new_total_local, parent["currency_code"], rate,
                  round(new_total_local * rate, 2),
                  parent["cancellation_policy"], "confirmed"])

        # retire the parent — the ONLY write to it, and it records a fact:
        # the instant this booking was superseded.
        db.execute("UPDATE fct_bookings SET terminal_state = 'modified' "
                   "WHERE booking_id = %s", [parent_id])
        return new_id

3 · The search-index updater — fresh, lossy, rebuildable

PYTHON · CDC → SEARCH INDEX

async def apply_changes(es, change_log) -> None:
    """The search index is a rebuildable cache that owns nothing. It may
    lag the live store by seconds and that is fine: the booking service
    re-checks availability transactionally at request time, so a stale
    'available' costs a retry, never a double-booking."""
    async for ev in change_log.consume("calendar", "bookings"):
        if ev.table == "bookings" and ev.new["terminal_state"] == "confirmed":
            # a confirmed booking blocks its nights immediately
            await es.update_nights(ev.new["listing_id"],
                                  ev.new["check_in_date"],
                                  ev.new["check_out_date"], available=False)
        elif ev.table == "calendar":
            await es.upsert(ev.new["listing_id"], ev.new["night_date"],
                            price=ev.new["effective_price_local"],
                            available=ev.new["is_available"])
        # no fsync, no ledger: if the index dies, rebuild from the live
        # calendar in minutes. Truth lives in OLTP and the snapshots.

§ 06 — AGGREGATIONThe settlement close: pay the host, only once, only after the stay

The slow loop has one job and one promise: money moves to a host after a completed stay, exactly once, net of any refund, with the payout-time FX recorded distinctly from the booking-time FX. The craft is in the idempotency and in keeping two exchange rates honest.

A payout is not a read-time calculation; it is a daily close that materializes a row into fct_host_payouts. The trigger is a state, not a timer: a booking becomes eligible only when its terminal_state reaches 'stayed' and the guest's check-out has passed. The amount is the host's earnings — the subtotal less the platform's service fee — and, critically, it is converted at the FX rate true on the payout day, which is generally not the rate locked at booking. The system therefore carries two FX facts per stay: one frozen at booking (what the guest was charged) and one frozen at payout (what the host received). The gap between them is realized currency drift, and a senior model surfaces it rather than hiding it inside a single blended number.

PYTHON · DAILY PAYOUT CLOSE (IDEMPOTENT)

def run_payout_close(db, fx, close_date) -> int:
    """Settle every booking whose stay has completed and which has not
    already been paid. Idempotent by a unique (booking_id) in payouts:
    re-running the close pays no one twice. A refund reduces the payout;
    a fully refunded booking pays the host nothing."""
    eligible = db.fetch_all("""
        SELECT b.booking_id, b.host_id, b.subtotal_local, b.service_local,
               b.currency_code, b.refund_amount_usd, b.fx_rate_to_usd
        FROM   fct_bookings b
        WHERE  b.terminal_state = 'stayed'
          AND  b.check_out_date <= %s
          AND  NOT EXISTS (SELECT 1 FROM fct_host_payouts p
                           WHERE p.booking_id = b.booking_id)
    """, [close_date])

    paid = 0
    for b in eligible:
        host_earn_local = b.subtotal_local - b.service_local
        rate            = fx.rate(b.currency_code, "USD", on=close_date)
        amount_usd      = round(host_earn_local * rate, 2)
        # a refund already paid back to the guest is netted out of the host:
        amount_usd      = max(0, amount_usd - (b.refund_amount_usd or 0))

        db.execute("""
            INSERT INTO fct_host_payouts
              (host_id, booking_id, payout_ts, amount_local,
               fx_rate_to_usd, amount_usd, status)
            VALUES (%s, %s, now(), %s, %s, %s, 'pending')
            ON CONFLICT (booking_id) DO NOTHING   -- pay once, never twice
        """, [b.host_id, b.booking_id, host_earn_local, rate, amount_usd])
        paid += 1
    return paid

The guest's price freezes at booking. The host's payment freezes at the stay's end. The two exchange rates almost never agree — and pretending they do is how a marketplace quietly loses money it cannot explain.SETTLEMENT RULE Nº 1 — TWO RATES, BOTH RECORDED

The same daily rhythm closes the reviews window. Airbnb's reviews are double-blind: neither side sees the other's until both have submitted or a fourteen-day window expires. The aggregation that drives the superhost signal therefore reads only revealed reviews, recomputes a host's trailing rating and response rate, and writes a new dim_hosts version when the status flips — SCD2 again, so "was this host a superhost when the booking was made" stays answerable forever.

§ 07 — ANALYTICS SQLInterrogating the marketplace

The frozen tables are where the system explains itself. Three queries an interviewer loves, because each one carries a classic SQL pattern on its back: the as-of join, the funnel by conditional aggregation, and the window-function gap detector.

Availability search — the as-of join over snapshots

Search is a join against one chosen snapshot. "Paris, July 4–7, 2 guests" filters the freshest snapshot_date to available nights, joins the listing dimension as-of the same instant, and demands the listing have all three nights free — the HAVING COUNT(*) = 3 that turns "some nights available" into "the whole stay is bookable."

SQL · AVAILABLE LISTINGS, AS-OF A SNAPSHOT

-- All Paris listings free for every night of Jul 4-6, for 2 guests,
-- priced as the freshest snapshot shows them. The SCD2 as-of join
-- pins dim_listings to the listing as it was on the snapshot date.
SELECT l.listing_id, l.title,
       min(c.effective_price_local) AS price_low,
       max(c.effective_price_local) AS price_high
FROM   fct_calendar_nights c
JOIN   dim_listings l
       ON  l.listing_id = c.listing_id
       AND c.snapshot_date::timestamptz BETWEEN l.valid_from AND l.valid_to
JOIN   dim_geography g ON g.geography_id = l.geography_id
WHERE  c.snapshot_date = (SELECT max(snapshot_date) FROM fct_calendar_nights)
  AND  c.night_date BETWEEN '2025-07-04' AND '2025-07-06'
  AND  c.is_available = TRUE
  AND  l.capacity >= 2
  AND  g.market_id = 'PARIS'
GROUP BY l.listing_id, l.title
HAVING count(*) = 3;        -- every one of the 3 nights is free

The booking funnel — conditional aggregation

"Where do bookings die?" The lifecycle timestamps make the funnel a single pass of COUNT(*) FILTER: requested, accepted, stayed, plus the host's median time-to-accept. The same conditional-aggregation pattern that powers every funnel in analytics, applied to one append-only fact.

SQL · REQUEST → ACCEPT → STAY FUNNEL

SELECT
    date_trunc('week', b.request_ts)                          AS wk,
    count(*)                                                  AS requested,
    count(*) FILTER (WHERE b.accepted_ts IS NOT NULL)        AS accepted,
    count(*) FILTER (WHERE b.terminal_state = 'stayed')        AS stayed,
    count(*) FILTER (WHERE b.cancellation_actor = 'host')     AS host_cancels,
    round(100.0 * count(*) FILTER (WHERE b.terminal_state = 'stayed')
                 / nullif(count(*), 0), 1)                    AS stay_rate_pct,
    percentile_cont(0.50) WITHIN GROUP (
        ORDER BY extract(epoch FROM b.accepted_ts - b.request_ts)/3600.0)
                                                              AS p50_accept_hrs
FROM   fct_bookings b
WHERE  b.parent_booking_id IS NULL          -- count original requests, not mods
  AND  b.request_ts >= now() - INTERVAL '90 days'
GROUP BY 1 ORDER BY 1;

Price drift before booking — window LAG over snapshots

"How much did a listing's price move in the run-up to the night, and did the smart-pricing engine chase demand?" This is the snapshot table's signature query: for one night, walk its successive snapshots in date order and LAG to measure day-over-day change. It is the time machine doing what only it can — and it doubles as a price-dispute resolver.

SQL · DAY-OVER-DAY PRICE DRIFT FOR ONE NIGHT

-- Trace how the price of one (listing, night) evolved across the
-- snapshots leading up to it. LAG over snapshot_date is the move.
WITH trail AS (
    SELECT snapshot_date, effective_price_local, is_booked,
           lag(effective_price_local) OVER w AS prev_price,
           lag(snapshot_date)          OVER w AS prev_snap
    FROM   fct_calendar_nights
    WHERE  listing_id = 42
      AND  night_date = '2025-07-04'
    WINDOW w AS (ORDER BY snapshot_date)
)
SELECT snapshot_date,
       effective_price_local,
       effective_price_local - prev_price                AS day_move,
       round(100.0 * (effective_price_local - prev_price)
             / nullif(prev_price, 0), 1)                   AS pct_move,
       is_booked
FROM   trail
WHERE  prev_price IS NOT NULL
  AND  effective_price_local <> prev_price        -- only the days it moved
ORDER BY snapshot_date;

§ 08 — THE DASHBOARDProving the marketplace is healthy

A senior design ends with observability, because every frozen moment above is invisible without it. The dashboard watches the three loops separately — supply and search, the booking funnel, and settlement & trust — each with its own definition of healthy.

SUPPLY & SEARCH: snapshot freshness (did last night's 2.5 B rows land on time), calendar utilization by market, search→view conversion, and effective vs base price spread — the smart-pricing engine's footprint on the marketplace.
BOOKING FUNNEL: request→accept→stay rates, host time-to-accept p50/p95, modification rate (how often a stay changes after confirmation), and instant-book share — the friction map of the lifecycle.
SETTLEMENT & TRUST: payout backlog and payouts settled / day, host-cancel rate (the trust-and-safety alarm), review submission rate, and realized FX drift between booking-time and payout-time rates.

Marketplace Ops — Global SAT 09:10 UTC · ALL SYSTEMS · SNAPSHOT 00:00 OK · 5m REFRESH

Snapshot Freshness

8.4h ago

Calendar Rows / Snap

2.49B

Calendar Utilization

63%

Search → View

11.2%

Effective vs Base Price — Paris market · snapshots, last 30 days

Request → Accept

78%

Accept → Stay

91%

Host Time-to-Accept

5.8h

Host-Cancel Rate

3.4%

Payout Backlog

$2.1M

Realized FX Drift

−$54k

Review Submit Rate

68%

Modification Rate

4.7%

Instant-Book Share

59%

Payouts Settled / d

1.9M

FIG. 2 — The story a healthy weekend tells: the snapshot landed, the smart-price line is pulling away from host-set base as summer demand builds, the funnel is wide — and two amber-to-red signals worth watching: host time-to-accept creeping up and one market's host-cancel rate crossing the trust threshold.

Read the warning tiles together and the dashboard narrates a marketplace under demand pressure. The effective-vs-base spread widening is the smart-pricing engine doing its job — and the snapshot table is the only reason that line can be drawn at all, because it remembers every price the algorithm ever showed. Meanwhile the red host-cancel tile is the trust-and-safety alarm from §07 firing in real time: that is the query that costs a host their superhost status, and it is visible here the moment it crosses the line.