The Recurrence Problem

§ 01 — THE QUESTIONOne row, infinite meetings

A calendar looks like the easiest app you will ever build: a list of events with a start and an end, sorted by time. It is, in fact, one of the quietest correctness traps in consumer software — and the question reveals fast whether someone has shipped against time, or only read about it.

Interview Prompt

"Design the data model behind Google Calendar — recurring events with per-instance exceptions, IANA time zones with daylight-saving correctness, invitations that accept/decline/tentative, and 'find a time' free/busy across many attendees. And it has to answer point-in-time: what did my Tuesday look like as Calendar showed it last Monday? How would you scope it?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

Two facts make this hard, and both are facts about data, not about the web tier. First, a recurring event is one row that denotes an unbounded number of occurrences — "every weekday, forever" is a single record and an infinite series. Second, the correct time of an event is a moving target: a 9:00 AM stand-up has to stay 9:00 AM after the clocks change, which means you cannot freeze it to a UTC instant and walk away. Layer invitations and "find a time across ten people" on top, and you have a read-heavy, latency-bound system where the bugs are measured in missed meetings. The weak answer draws a load balancer and three app servers. The strong answer names the forces that will deform the design first, because every later decision exists to survive one of them.

There are exactly four forces, and they are not the usual suspects. So before any boxes and arrows, the working frame for the whole session:

THE EXPLOSION: Recurrence-as-rule. "Daily stand-up, every weekday, no end date" is one row that denotes an infinite set. Materialize occurrences as rows and you sign up for a forever-running backfill and a thousand-row rewrite per series edit. Expand the rule on read and writes stay O(1) — but a malicious FREQ=MINUTELY rule tries to emit a million instances for a one-month view.
THE MOVING CLOCK: Time is not a number. Storing an event as a single UTC instant is correct for a one-off and wrong for anything that recurs. The rule has to live in local wall-clock time beside an IANA zone, and UTC is derived per occurrence — never frozen, or the day the clocks change the whole company's stand-up jumps an hour.
THE MANY CALENDARS: One event, ten partitions. A meeting I organize belongs to my calendar and to all nine attendees' at once. Read-time fan-in turns the hottest path into a scatter-gather; write-time fan-out is the answer — until a 50,000-person all-hands turns one edit into 50,000 writes.
THE SECOND QUERY SYSTEM: Free/busy is not a calendar read. "When is everyone free?" is a privacy-preserving projection — opaque busy intervals, never titles — computed across many people and hammered by schedulers, room booking, and now AI agents polling availability. Re-expanding everyone's recurrences on every call recomputes a deterministic answer forever.

Read those four again: every one is a data problem, not a web-tier problem. That is the tell of the question — the interesting engineering is in the model and the pipeline, and the request path is almost boring.

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. In scope: event CRUD, recurrence with per-instance exceptions, invitations and RSVP, and free/busy. Out of scope, said explicitly: email-delivery internals, video conferencing, working-hours and appointment-slot features, and sharing ACLs — modeled-for, not designed in detail. One tension I state up front: this is read-heavy with a hard correctness requirement around time. A stale calendar view for a few seconds is fine, so availability beats strict consistency for reads — but RSVPs need read-your-writes, and free/busy must not double-book in obvious ways. Time gets stored three ways at once, which I will defend with the DDL.

Then the envelope math, volunteered rather than extracted. Google-class numbers:

Quantity	Estimate	Consequence
Monthly / daily active users	~500M / ~180M	Sets the read fan-out; everyone opens "this week" at once
Read : write ratio	~10 : 1	Make the common read nearly free; pay costs on the rare write
Occurrences per series row	1 row → ∞	The materialize-vs-expand call that shapes the whole architecture
Event writes / day	50–80M	~1k writes/s average, spiking to 5–10k/s, business-hours-spiky
Peak reads / s	100k+	Sync clients + calendar opens; a latency problem, not a volume one
Free/busy slot	15 min · 96 bits/day	12 bytes/calendar-day; ten people for two weeks < 2 KB
Storage over years	single-digit PB	Not storage-bound. Correctness and read latency are the game

Notice the row unlike the others: a single physical row denotes an unbounded number of occurrences. That is not a detail — it dictates the storage layout, the write cost, and the entire free/busy design. The rest of this piece is a consequence of resolving it deliberately: store the rule, expand for the visible window only, and derive UTC per occurrence. Everything else follows the data.

§ 02 — DATA FLOWFollowing an edit through the building

One source of truth, four projections off one ordered log. The event store owns truth; attendee copies, the free/busy bitmap, the rendered-window cache, and client sync are all derived consumers of a single per-calendar change stream — fed by a transactional outbox so the stream can never miss a committed change or replay one that rolled back.

FIG. 1 — One source of truth, four jobs off one ordered log. The event row and its outbox row commit together; the change stream fans out to attendee copies, the free/busy bitmap, cache versioning, and client sync — each replayable from a stored offset.

Three properties of this picture carry the interview. First, the write stores the rule, not the occurrences — editing a series is one row touched, not thousands rewritten, so writes stay O(1) no matter how long the series runs. Second, the event row and its changelog row commit in one transaction, so a CDC connector turns committed rows into exactly one ordered stream message — no dual-write window where the bitmaps and attendee copies silently disagree with truth. Third, everything below the dashed line is derived: copies, bitmaps, the rendered-window cache, the client sync feed. Drop the whole projection layer and you have lost only cache — the event store survives, and every projection rebuilds by replaying the log.

The Failure Philosophy, In One Rule

The store is truth; everything else degrades to stale, never to wrong. Because copies, bitmaps, and caches are versioned projections off one ordered log, every failure has the same shape: a consumer falls behind and a guest sees an invitation a few seconds late, or free/busy lags a write — both acceptable, because availability is advisory. A write degrades in exactly one direction: it either commits the rule plus its outbox row atomically, or it commits nothing. There is no half-fanned-out event and no orphaned bitmap, because the projections never hold authority — they hold a replayable copy of it.

§ 03 — DATA MODELOne overloaded table, and the columns that earn their keep

The entity skeleton is unremarkable: users, calendars, events, and an attendee join table. Every interesting decision is inside events — where singles, recurring masters, and per-instance overrides all live in one table, disambiguated by which columns are populated, and time is stored three ways on purpose.

The small tables — identity and the unit of sharing

A user owns several calendars; the calendar is the unit of ownership and of sharing. The kind column lets rooms and resources reuse the exact same machinery as people, and the version counter — bumped on any write — becomes the cache key that makes invalidation a non-event.

DDL · IDENTITY & CALENDARS (POSTGRESQL)

-- A login identity. home_tz is the user's default zone, used when an
-- event is created without an explicit one. Always an IANA id, never an offset.
CREATE TABLE users (
    user_id       BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email         CITEXT      UNIQUE NOT NULL,
    display_name  TEXT        NOT NULL,
    home_tz       TEXT        NOT NULL DEFAULT 'UTC',   -- IANA id
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- A calendar is the unit of ownership AND of sharing/ACLs. 'kind' lets
-- rooms and resources reuse the exact same machinery as people's calendars.
CREATE TABLE calendars (
    calendar_id   BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    owner_id      BIGINT      NOT NULL REFERENCES users(user_id),
    summary       TEXT        NOT NULL,
    default_tz    TEXT        NOT NULL DEFAULT 'UTC',
    kind          TEXT        NOT NULL DEFAULT 'primary'
                  CHECK (kind IN ('primary','secondary','resource','room')),
    version       BIGINT      NOT NULL DEFAULT 1,   -- bumped on any write; used as a cache key
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

The heart — single, master, and override in one table

The single most consequential choice: a row with an rrule and no parent is a master; a row with a parent_event_id and a recurrence_id is an override of one occurrence; a plain row is a one-off. One table keeps the read path to two queries and lets an override carry its own attendees for free — which is what lets someone decline a single moved instance while staying accepted on the series. And time is stored three ways: *_utc for range scans and one-off correctness, *_local as bare wall-clock, and tz_id as the IANA zone from which UTC for a recurrence is derived.

DDL · EVENTS — THE OVERLOADED CORE

CREATE TABLE events (
    event_id        BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    calendar_id     BIGINT NOT NULL REFERENCES calendars(calendar_id),
    organizer_id    BIGINT NOT NULL REFERENCES users(user_id),
    uid             TEXT   NOT NULL,        -- RFC 5545 UID, stable across edits + ingest

    -- content. On an OVERRIDE row, NULL means "inherit from the master"
    -- (sparse override — see §04). On a master/single it is the value.
    title           TEXT,
    description     TEXT,
    location        TEXT,

    -- time, stored THREE ways on purpose:
    --   *_utc   -> the instant, for range scans and one-off correctness
    --   *_local -> wall-clock with NO zone, e.g. 2026-06-09 09:00
    --   tz_id   -> IANA zone; UTC for a recurrence is DERIVED from these two
    start_utc       TIMESTAMPTZ,
    end_utc         TIMESTAMPTZ,
    start_local     TIMESTAMP,
    end_local       TIMESTAMP,
    tz_id           TEXT,
    all_day         BOOLEAN NOT NULL DEFAULT false,

    -- recurrence (MASTER rows only)
    rrule           TEXT,                -- 'FREQ=WEEKLY;BYDAY=TU'
    rdate           TIMESTAMP[],         -- explicit extra occurrences
    exdate          TIMESTAMP[],         -- cheap inline cancellations
    rrule_until_utc TIMESTAMPTZ,         -- DENORMALIZED series end; NULL = open-ended

    -- exception linkage (OVERRIDE rows only)
    parent_event_id BIGINT REFERENCES events(event_id),
    recurrence_id   TIMESTAMP,           -- original wall-clock start this row replaces
    overridden_cols TEXT[],              -- field mask: cols the user explicitly set

    status          TEXT NOT NULL DEFAULT 'confirmed'
                    CHECK (status IN ('confirmed','tentative','cancelled')),
    transparency    TEXT NOT NULL DEFAULT 'opaque'     -- opaque=busy, transparent=free
                    CHECK (transparency IN ('opaque','transparent')),

    version         BIGINT NOT NULL DEFAULT 1,   -- optimistic concurrency (compare-and-swap)
    sequence        INT    NOT NULL DEFAULT 0,   -- RFC 5545 SEQUENCE, bumped on material change
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),

    -- an override must name the occurrence it replaces
    CONSTRAINT override_names_occurrence
        CHECK (parent_event_id IS NULL OR recurrence_id IS NOT NULL),
    -- a row is a recurring master XOR an override, never both
    CONSTRAINT master_xor_override
        CHECK (NOT (rrule IS NOT NULL AND parent_event_id IS NOT NULL))
);

-- Range scan: "singles + masters that could touch [lo, hi)" on one calendar.
-- Partial index keeps cancelled rows and overrides out of the hot path.
CREATE INDEX idx_events_cal_window ON events (calendar_id, start_utc)
    WHERE status <> 'cancelled' AND parent_event_id IS NULL;

-- Open-ended series stay filterable thanks to the denormalized horizon.
CREATE INDEX idx_events_cal_series ON events (calendar_id, rrule_until_utc)
    WHERE rrule IS NOT NULL;

-- Overlay path: fetch every override/cancel for a set of masters; and an
-- occurrence can only be overridden once per series.
CREATE UNIQUE INDEX uq_override ON events (parent_event_id, recurrence_id)
    WHERE parent_event_id IS NOT NULL;

Why three time columns, never a raw offset

An offset like -08:00 is a fact about an instant, not a place. Store it and you have thrown away the rule that produced it, so you cannot re-derive the right wall-clock after a DST transition or a government zone change. The IANA zone id (America/Los_Angeles) is a pointer into the tz database, which does know the rules. So start_local + tz_id is the source of truth for anything that recurs, start_utc is a derived convenience for range scans and one-offs, and the two are kept consistent by the app, not by the user.

Attendees get their own table because RSVP state is per-person and, crucially, because an override row can have its own attendee set. External guests live by email and have a NULL attendee_id; the inbox index answers "what am I invited to, and have I replied?" in one seek.

DDL · EVENT_ATTENDEES

CREATE TABLE event_attendees (
    event_id      BIGINT NOT NULL REFERENCES events(event_id) ON DELETE CASCADE,
    attendee_id   BIGINT  REFERENCES users(user_id),  -- NULL for external guests
    email         CITEXT  NOT NULL,                    -- external attendees live by email
    response      TEXT    NOT NULL DEFAULT 'needs_action'
                  CHECK (response IN ('needs_action','accepted','declined','tentative')),
    optional      BOOLEAN NOT NULL DEFAULT false,
    is_organizer  BOOLEAN NOT NULL DEFAULT false,
    PRIMARY KEY (event_id, email)
);

-- "what am I invited to, and have I replied?" — the invitation inbox.
CREATE INDEX idx_attendee_inbox ON event_attendees (attendee_id, response);

That is the whole model the system rides on: two small tables, one carefully overloaded big one, and a join table that quietly carries the per-instance RSVP magic. The analytics warehouse mirrors this as a star — dim_events (SCD2, the recurrence master whose title and organizer drift over time), dim_calendars, dim_users, a dim_timezone that is the DST source of truth, and the occurrence, exception, invitation, and free/busy facts we interrogate in §07. The operational row shape and the dimensional grain are the same idea told twice.

§ 04 — THE CORE INVARIANTThe rule lives in local time; UTC is derived

Every correctness guarantee in this system collapses to one ordering: a recurrence is iterated in local wall-clock time, and the UTC instant is computed per occurrence afterward. Get that ordering backward — freeze each occurrence to UTC up front — and the next daylight-saving transition silently slides the whole series by an hour.

Consider the canonical case. A 9:00 AM Pacific stand-up is 17:00 UTC in winter and 16:00 UTC in summer, because America/Los_Angeles is UTC−8 under standard time and UTC−7 under daylight time. If the system had stored each occurrence as a UTC instant, the spring-forward boundary would leave half the series at 17:00Z and half at 16:00Z — and on the morning the clocks changed, everyone would show up an hour wrong. Storing local time and deriving UTC makes the right answer automatic: the wall-clock never moves, and the offset is recomputed from the zone database, which is the only thing that knows the transition dates.

The three edits, one primitive

A recurring event is one logical record but unbounded physical occurrences, and the model has to survive three kinds of edit. All three reduce to the same primitive — an exception row, a child pointing at the master via parent_event_id, tagged with the recurrence_id of the occurrence it is about.

MASTER · RRULE→ EXPAND (local)→ DERIVE UTC→ OVERLAY EXCEPTIONS→ OCCURRENCE LIST

A cancel one instance is an override row with status = cancelled for that recurrence_id — iCalendar's EXDATE, but as a row so it can carry metadata. A move or edit one instance is an override row with its own start and, because it is a full row, its own attendees and RSVPs. This and all following is a series split: set the old master's UNTIL to just before the split and create a new master from the split point. That third one is what people get wrong — they try to version a rule in place; the right move is two writes, history preserved, old exceptions still attached to the old master.

The exception lives in an append-only ledger keyed by recurrence_id: a cancel tombstones one occurrence, a move carries its own new start, an edit records which columns the user pinned. Recurrence health then becomes a GROUP BY series — and nothing ever UPDATEs the series in place.

SQL · THE OPTIMISTIC WRITE (COMPARE-AND-SWAP ON VERSION)

-- The organizer fixing the title and an attendee moving the instance can
-- race. With sparse overrides they mostly write different rows, but both are
-- guarded by a CAS on version: 0 rows affected => someone beat us; 409 to the
-- client. Last-write-wins on the same row is never silently allowed.
UPDATE events
SET    title = :title, location = :location,
       version = version + 1, sequence = sequence + 1, updated_at = now()
WHERE  event_id = :id
  AND  version  = :expected_version     -- the guard IS the lock
RETURNING version;

Expand in local time, convert per occurrence. That single ordering is the entire daylight-saving fix — and the reason a clock change can't move a million stand-ups.RECURRENCE RULE Nº 1 — STORE THE RULE, DERIVE THE INSTANT

There is a second invariant hiding inside the first, and it is what makes the system bounded: the per-query window cap and the max_instances guardrail. A recurrence yields occurrences in order, so expansion stops the instant it crosses the window's far edge — and a crafted FREQ=SECONDLY rule that tries to emit millions of instances dies at the cap instead of melting a worker. Local time keeps the books honest; the cap keeps a single read from being weaponized. Together they are the entire correctness story of the read path.

§ 05 — INGESTION & STREAMSPython that refuses to freeze time

Three programs carry the read and write loops: the expansion that turns one master into concrete occurrences without ever freezing UTC, the two-query read path that overlays exceptions in memory, and the iCalendar ingestion that maps the outside world onto the same columns. Each is small; the judgment is in what it refuses to do.

1 · The expansion — DST-correct by construction

This is the procedure that makes or breaks the design. The rule is walked in local time; each occurrence is converted to UTC only after it is generated. The guardrail is not decoration — it is the line that stands between a worker and a FREQ=MINUTELY abuse rule. And the merge of an override is sparse: it supplies its own time plus only the columns it explicitly set, so a later typo-fix on the series still reaches the moved Wednesday instance.

PYTHON · EXPAND ONE MASTER INTO OCCURRENCES (DST-CORRECT)

from datetime import timedelta
from zoneinfo import ZoneInfo
from dateutil.rrule import rrulestr

UTC = ZoneInfo("UTC")
INHERITABLE = ("title", "description", "location", "transparency")


def expand_series(master, win_lo, win_hi, overrides, max_instances=730):
    """Expand a recurring MASTER into occurrences within [win_lo, win_hi) UTC,
    applying cancellations and per-instance overrides.

    The rule is walked in LOCAL wall-clock time and each occurrence is then
    converted to UTC. That order is the daylight-saving fix: 09:00 local maps
    to 17:00Z in winter and 16:00Z in summer, automatically."""
    tz       = ZoneInfo(master["tz_id"])
    dtstart  = master["start_local"].replace(tzinfo=tz)
    duration = master["end_local"] - master["start_local"]
    rule     = rrulestr(master["rrule"], dtstart=dtstart)

    # index overrides by the occurrence they replace (naive local start)
    ov_by_rid = {o["recurrence_id"]: o for o in overrides}
    exdates   = set(master.get("exdate") or ())

    out, count = [], 0
    for local_start in rule:
        if count >= max_instances:
            break                                 # guardrail vs FREQ=MINUTELY abuse
        start_utc = local_start.astimezone(UTC)
        if start_utc >= win_hi:
            break                                 # rrule yields in order; safe to stop
        end_utc = (local_start + duration).astimezone(UTC)
        if end_utc <= win_lo:
            continue                              # before the window; keep walking

        rid = local_start.replace(tzinfo=None)   # match key == recurrence_id
        if rid in exdates:
            continue                              # cheap inline cancel

        ov = ov_by_rid.get(rid)
        if ov is None:
            out.append(_plain(master, start_utc, end_utc, rid))
        elif ov["status"] != "cancelled":
            out.append(_merge_override(master, ov))   # moved / edited instance
        # else: cancelled override -> drop silently
        count += 1
    return out


def _merge_override(master, ov):
    """Sparse merge: the override supplies its own time plus only the columns it
    explicitly set (overridden_cols); everything else falls through to the
    master's CURRENT value, so a later typo-fix on the series still reaches it."""
    merged = {"event_id": ov["event_id"], "start_utc": ov["start_utc"],
              "end_utc": ov["end_utc"], "recurrence_id": ov["recurrence_id"]}
    overridden = set(ov.get("overridden_cols") or ())
    for col in INHERITABLE:
        merged[col] = ov[col] if col in overridden else master[col]
    return merged

Notice the sparse-merge discipline. The naive instinct is a full snapshot — the override copies every field — which is simple and wrong, because the typo-fix never reaches the moved instance. The opposite extreme is a field-mask fan-out, which writes to every override on every master edit and can race or partially fail at occurrence 4,000 of 9,000. The sparse override makes propagation correct by construction: NULL means inherit, the merge is in-memory because we already loaded the master, and a series rename reaches the override for free.

2 · The read path — two queries, not one

"Give me this calendar for June" is two SQL statements. First, pull the singles and masters that could touch the window — the denormalized rrule_until_utc is exactly what keeps open-ended series index-filterable. Then pull every override for those masters and overlay them in the application after expansion.

SQL · STEP 1 — SINGLES + MASTERS OVERLAPPING THE WINDOW

SELECT event_id, title, location, transparency, all_day, tz_id,
       start_utc, end_utc, start_local, end_local, rrule, rdate, exdate
FROM   events
WHERE  calendar_id = :cal
  AND  status <> 'cancelled'
  AND  parent_event_id IS NULL          -- masters + singles; overrides overlaid in step 2
  AND  (
         -- a one-off that overlaps [lo, hi)
         (rrule IS NULL  AND start_utc < :hi AND end_utc > :lo)
      OR
         -- a series whose lifespan overlaps the window; the denormalized
         -- rrule_until_utc is what keeps open-ended series index-filterable
         (rrule IS NOT NULL AND start_utc < :hi
                          AND (rrule_until_utc IS NULL OR rrule_until_utc >= :lo))
       );

-- step 2: overrides & cancellations for those masters (hits uq_override)
SELECT parent_event_id, recurrence_id, status, overridden_cols,
       title, location, description, transparency, start_utc, end_utc
FROM   events
WHERE  parent_event_id = ANY(:master_ids);
-- app then calls expand_series(master, lo, hi, overrides[master_id]); the two
-- SELECTs merge into one ordered, exception-corrected occurrence list.

3 · Ingesting the outside world

Half of "invitations" is interop — an Outlook export, a subscribed holiday feed, a room-booking system, all of which speak iCalendar. The payoff of modeling recurrence on RFC 5545 in the first place is that ingestion is almost a straight column mapping: RRULE, EXDATE, and RECURRENCE-ID land on the columns we already have, and the uid makes re-ingesting the same feed a no-op.

PYTHON · iCALENDAR (.ics) INGESTION — IDEMPOTENT UPSERT

from icalendar import Calendar


def ingest_ics(feed_bytes, calendar_id, db):
    """Ingest an external iCalendar feed into events. RRULE / EXDATE /
    RECURRENCE-ID map straight onto our columns; (calendar_id, uid,
    recurrence_id) is the idempotency key so re-pulling a feed is a no-op."""
    cal, rows = Calendar.from_ical(feed_bytes), []
    for c in cal.walk("VEVENT"):
        start = c.decoded("dtstart")
        end   = c.decoded("dtend", start)               # default to start if absent
        rid   = c.get("recurrence-id")                  # present => this VEVENT is an override
        rows.append({
            "calendar_id":   calendar_id,
            "uid":           str(c["uid"]),
            "title":         str(c.get("summary", "")),
            "location":      str(c.get("location", "")),
            "start_local":   to_wall(start),  "end_local": to_wall(end),
            "tz_id":         tzid_of(start),                # 'floating' -> calendar default
            "rrule":         rrule_text(c.get("rrule")),
            "exdate":        exdates(c.get("exdate")),
            "recurrence_id": to_wall(rid.dt) if rid else None,
            "status":        str(c.get("status", "CONFIRMED")).lower(),
        })
    # ON CONFLICT (calendar_id, uid, recurrence_id) DO UPDATE ...
    db.bulk_upsert_events(rows, conflict=("calendar_id", "uid", "recurrence_id"))
    return len(rows)

The carve-out an interviewer listens for: the heavyweight case. When the master's rule or time changes, the occurrence grid moves out from under every recurrence_id, and there is no clean automatic answer to "where does the cancelled-Tuesday exception go in a Thursday series." So that one edit is treated as a rebuild — warn the user that instance-level changes will be discarded, then split or recreate the series. Everything else is a sparse merge; only rule/time changes earn the heavy hammer.

§ 06 — AGGREGATIONFree/busy as a bitmap, and the change stream as the spine

Two derived layers sit atop the event store: the free/busy index, which materializes the very thing we refused to materialize for events, and the change-stream pipeline, which keeps every projection in lockstep. Both exist for speed; both can be thrown away and rebuilt from the log.

For ten people over two weeks, computing free/busy live is genuinely fine — fan out to the ten calendars, expand each, project to busy intervals, merge per attendee, and intersect across attendees with a sweep line. So why not stop there? Three reasons: tail latency (one slow shard out of ten makes an interactive query slow, and people drag the duration slider, re-firing it), volume (schedulers, room booking, and AI agents hammer this far beyond the UI), and waste (recurrence expansion is deterministic, so re-expanding everyone on every call recomputes the same answer). The fix is a materialized free/busy index, separate from the source-of-truth event store.

Per calendar, per UTC day, store a bitmap at 15-minute granularity — 96 bits, 12 bytes. Two weeks for one user is 168 bytes; ten users is under 2 KB. The query collapses to: fetch the tiny per-day bitmaps, OR them together across attendees, and scan for a run of zero bits at least the meeting length long. What just came back through the side door is materialization of recurrences — the very thing rejected for the event store — and here it is right, because free/busy needs only coverage, not event identity, and the bitmap is fixed-size no matter how many events overlap a slot.

PYTHON · FREE/BUSY MATERIALIZER (EVENT CHANGE → BUSY BITMAPS)

from datetime import datetime, timedelta, time
from collections import defaultdict
from math import ceil

SLOT_MIN, SLOTS = 15, 96                              # 24h / 15min


def stamp(mask: bytearray, start_utc, end_utc, day):
    """Set the 15-min bits [start, end) covers within one UTC day."""
    day0   = datetime.combine(day, time.min, tzinfo=UTC)
    lo, hi = max(start_utc, day0), min(end_utc, day0 + timedelta(days=1))
    if hi <= lo:
        return
    first = int((lo - day0).total_seconds()) // (SLOT_MIN * 60)
    last  = ceil(int((hi - day0).total_seconds()) / (SLOT_MIN * 60))
    for s in range(first, min(last, SLOTS)):
        mask[s >> 3] |= 1 << (s & 7)


def on_change_freebusy(change, db, horizon_days=365):
    # recompute ONLY the UTC days this change touched, out to a rolling horizon
    cal = change["calendar_id"]
    lo  = change["win_lo"]
    hi  = min(change["win_hi"], lo + timedelta(days=horizon_days))

    # authoritative busy intervals (singles + expanded series) for the span
    busy = load_busy_intervals(db, cal, lo, hi)       # calls expand_series

    by_day = defaultdict(lambda: bytearray(SLOTS // 8))
    for s, e in busy:
        d = s.date()
        while d <= e.date():
            stamp(by_day[d], s, e, d)
            d += timedelta(days=1)

    for day, mask in by_day.items():
        db.upsert_freebusy(cal, day, bytes(mask), version=change["new_version"])

Only one thing makes those four projections consistent: the spine. Three different systems must react to one event write — attendee copies must fan out, free/busy bitmaps must rebuild, cache keys must rotate. Doing them inline makes "create event" latency hostage to three downstream systems, and a partial failure leaves the world inconsistent. The fix is a single ordered log per calendar fed by a transactional outbox: every mutation writes the event row and a changelog row in the same database transaction, so the stream can never miss a committed change or replay one that rolled back.

SQL · THE TRANSACTIONAL OUTBOX — EVENT + CHANGELOG COMMIT ATOMICALLY

CREATE TABLE calendar_changelog (
    seq          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,  -- IS the sync token
    calendar_id  BIGINT      NOT NULL,
    event_id     BIGINT      NOT NULL,
    op           TEXT        NOT NULL CHECK (op IN ('insert','update','delete')),
    new_version  BIGINT      NOT NULL,
    win_lo       TIMESTAMPTZ NOT NULL,   -- affected span, so the f/b consumer knows
    win_hi       TIMESTAMPTZ NOT NULL,   -- which days to recompute (whole horizon for a series)
    payload      JSONB       NOT NULL,   -- the changed row, for consumers that don't re-read
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- The write: the rule edit and its intent-to-publish are ONE commit.
BEGIN;
  UPDATE events
     SET title = :title, location = :location,
         version = version + 1, sequence = sequence + 1, updated_at = now()
   WHERE event_id = :id AND version = :expected      -- optimistic CAS from §04
  RETURNING version INTO :new_version;

  INSERT INTO calendar_changelog
        (calendar_id, event_id, op, new_version, win_lo, win_hi, payload)
  VALUES (:cal, :id, 'update', :new_version, :win_lo, :win_hi, :payload);
COMMIT;
-- a CDC connector (Debezium tailing the WAL) ships committed changelog rows
-- to a Kafka topic keyed by calendar_id, preserving per-calendar order.

Dual-writing is the difference between a pipeline that is eventually consistent and one that is eventually wrong. The outbox makes the change and its intent-to-publish one atomic commit; CDC turns committed rows into stream messages exactly once.RECURRENCE RULE Nº 2 — ONE TRANSACTION, OR NONE

From there the log forks into independent consumers that share nothing but the stream. Fan-out replicates an invitation into each attendee's calendar partition, carrying the version so a stale copy is detectable and self-repairing — and above a threshold (a 50,000-person all-hands) it flips to reference-only, one source row resolved at read time the way feeds handle celebrity accounts. Free/busy runs the materializer above. Cache-bump rotates the calendar's version so rendered-window keys roll over and stale entries simply age out — no explicit invalidation. And client sync is the cheapest leverage of all: the changelog seq is the sync token, so "what changed since X?" turns most phone wake-ups into a tiny, usually empty delta. One spine, four jobs — and the deep cut: when IANA ships a new tz release, you diff which zones changed over which future ranges and enqueue a synthetic "touch" through the very same stream, so the materializer and cache-bump do the rest with code that already exists.

§ 07 — ANALYTICS SQLInterrogating the schedule

The occurrence, free/busy, invitation, and exception facts are where the system explains itself. Three queries an interviewer loves, each carrying a classic pattern on its back: the as-of read against materialized occurrences, gaps-and-islands for "find a time," and conditional aggregation over the append-only exception ledger.

The calendar view — occurrence-as-fact, exceptions already applied

The user's question, and the hot path of the whole product: my stand-up over the next month, cancellations dropped and moves honored. Because occurrences are materialized to a rolling horizon, this is a range scan on fct_event_occurrences — no RRULE re-expansion at read time, because the override row already carries its moved time and the cancelled occurrence already carries its tombstone flag.

SQL · MY OCCURRENCES, MOVES HONORED, CANCELS DROPPED

SELECT o.recurrence_id           AS original_slot,
       o.occurrence_start_local  AS shows_at_local,
       o.occurrence_start_utc    AS instant_utc,
       CASE WHEN o.is_override THEN 'moved' ELSE 'series' END AS kind
FROM   fct_event_occurrences o
WHERE  o.event_sk = 501
  AND  o.is_cancelled = FALSE
  AND  o.occurrence_start_local >= '2026-06-09'
  AND  o.occurrence_start_local <  '2026-07-09'
ORDER BY o.occurrence_start_local;
-- 4 rows: 6/9, 6/16, (6/23 dropped), 7/1 14:00 'moved', 7/7. No re-expansion.
-- 09:00 PDT shows as 16:00Z all summer; the same series lands at 17:00Z in winter.

Find a time — gaps and islands over the slot bitmap

The scheduler's question: the first 60-minute slot where Alice, Bren, and Chen are all free. The pattern is the canonical gaps-and-islands move — OR the per-attendee masks (here a SUM across calendars), keep only the all-free slots, and let a running count of busy slots label each free run; the first island long enough wins. A 60-minute meeting needs four consecutive zero slots.

SQL · FIRST COMMON 60-MIN OPENING (GAPS-AND-ISLANDS)

WITH combined AS (
    SELECT slot_start_utc, sum(is_busy) AS busy_heads
    FROM   fct_freebusy_slots
    WHERE  calendar_sk IN (77,78,79)
      AND  slot_start_utc >= '2026-07-01 15:00+00'   -- 08:00 PT
      AND  slot_start_utc <  '2026-07-01 21:00+00'   -- 14:00 PT
    GROUP BY slot_start_utc
),
runs AS (   -- the islands trick: count busy slots seen so far to label free runs
    SELECT slot_start_utc,
           count(*) FILTER (WHERE busy_heads > 0)
             OVER (ORDER BY slot_start_utc) AS busy_group
    FROM   combined WHERE busy_heads = 0
)
SELECT min(slot_start_utc)                        AS open_from,
       min(slot_start_utc) + INTERVAL '60 min'    AS open_to
FROM   runs
GROUP BY busy_group
HAVING count(*) >= 4               -- 4 x 15-min = 60 min, all heads free
ORDER BY open_from
LIMIT  1;                         -- -> 18:00Z = 11:00 PT

RSVP funnel and recurrence health — conditional aggregation

The organizer's question and the workspace admin's, in one shape. Because invitations are per-occurrence and exceptions are an append-only ledger, the response mix is a GROUP BY response on one occurrence, and recurrence health is conditional aggregation over fct_event_exceptions — a series with cancels every week is a "zombie" the workspace nudges to delete.

SQL · RESPONSE MIX + WHICH SERIES CHURN THEIR OCCURRENCES

-- RSVP funnel for one occurrence (the moved Wednesday instance)
SELECT response, count(*) AS heads,
       round(avg(latency_min)) AS avg_latency_min
FROM   fct_invitations
WHERE  occurrence_sk = 504
GROUP BY response;          -- accepted 2, declined 1 (Chen can't do Wed)

-- recurrence health: which series churn their occurrences?
SELECT series_sk,
       count(*)                                          AS exceptions,
       count(*) FILTER (WHERE exception_type = 'cancel') AS cancels,
       count(*) FILTER (WHERE exception_type = 'move')   AS moves
FROM   fct_event_exceptions
GROUP BY series_sk
HAVING count(*) >= 2
ORDER BY exceptions DESC;   -- many cancels => a 'zombie' to nudge

§ 08 — THE DASHBOARDProving the schedule is alive

A senior design ends with observability, because every clever degradation above is invisible without it. The calendar dashboard watches three things that fail differently: the read path that must feel instant, the pipeline that keeps projections honest, and the time-correctness machinery that no one notices until it is wrong.

READ PATH: render p95 for a window, sync-delta hit rate (how often "what changed?" returns nothing — the cheapest query is the one that fetches nothing), window-cache hit %, and expansion guardrail trips — a spike means someone shipped a FREQ=MINUTELY rule into a hot calendar.
THE PIPELINE: fan-out consumer lag (seconds of staleness on attendee copies), free/busy lag, dead-letter / reconcile count, and stale-copy repairs — proof that versioned copies are detecting and healing themselves instead of silently disagreeing with truth.
TIME & RECURRENCE: DST-affected occurrences pending recompute after a tz-database bump, exceptions per series and the zombie tail, find-a-time p99, and RSVP response latency — the late-arriving truth the model is built to absorb.

Calendar Ops — Global MON 09:12 UTC · ALL ZONES · 30s REFRESH

Render p95

38ms

Sync-Delta Empty %

91%

Window-Cache Hit

87%

Fan-out Lag

4.1s

Find-a-Time — 3 attendees, 08:00–14:00 PT · OR the masks, first zero-run wins

Free/Busy Lag

2.3s

Guardrail Trips

3/h

DST Recompute Backlog

61k

Stale-Copy Repairs

820/s

Zombie Series

1.2k

RSVP Latency p50

2.4h

FIG. 2 — The story a healthy tz-bump tells: a DST recompute backlog draining through the same change stream, fan-out lag drifting amber under the burst, find-a-time still landing 11:00 PT in microseconds — and render p95 flat, because reads come off bitmaps and version-keyed caches, untouched by the churn behind them.

Read the amber tiles together and the dashboard narrates the deep cut from §06. IANA shipped a zone change overnight; the bump job diffed the affected zones and ranges and enqueued synthetic touches, so 61k future occurrences are being re-derived and their bitmaps restamped — and because every projection is versioned and replayable, fan-out merely lagged rather than corrupted, stale copies are repairing themselves at 820/s, and not one user's render got slower. That is what a designed degradation looks like from the operator's chair: the time machinery churns, and the front of the house stays boring.

§ 09 — THE RUBRICWhat was actually being tested

Strip the calendar details away and the question was probing five judgments, each of which generalizes far beyond scheduling:

RULE OVER ROWS: Seeing that one row can denote an unbounded set, and storing the generator rather than the output — so writes stay O(1) and materialization is an optimization for hot windows, never the model.
DERIVE THE INSTANT: Storing local wall-clock plus an IANA zone and computing UTC per occurrence, never a frozen offset — because the only thing that knows the transition dates is the zone database, and a clock change must not move the meeting.
CORRECTIONS AS A LEDGER: Modeling cancels and moves as sparse, append-only override rows keyed by the occurrence they replace — so a series edit propagates by inheritance, per-instance RSVPs fall out for free, and recurrence health is a GROUP BY.
MATERIALIZE WHERE IDENTITY DOESN'T MATTER: Refusing materialization for events but embracing it for free/busy, because availability needs only coverage — a fixed-size bitmap you OR across attendees turns "find a time" into a popcount cheap enough for agents to poll.
ONE LOG, MANY JOBS: Making the change stream the spine: a transactional outbox feeds fan-out, bitmaps, cache versioning, and client sync, so every projection is derived, versioned, and replayable — and every failure degrades to stale, never to wrong.

The calendar is a data-modeling problem wearing a UI. Get the row shape and the change stream right and the request path becomes almost boring — which, for something a billion people open every morning, is exactly what you want.— CLOSING ARGUMENT