Premise. A calendar looks like the easiest app you'll ever build — a list of events with a start and an end, sorted by time. It is, in fact, one of the most quietly brutal correctness problems in consumer software, and it is the system-design question I reach for when I want to find out whether someone has actually shipped against time, or only read about it. Two facts make it hard. First, a recurring event is one row that represents an unbounded number of occurrences — "every weekday, forever" is a single record and an infinite series. Second, the correct time of an event is a moving target: a 9:00 AM standup has to stay 9:00 AM after the clocks change, which means you cannot freeze it to a UTC instant and walk away. Layer invitations and "find a time across ten people" on top, and you have a read-heavy, latency-bound system where the bugs are measured in missed meetings. This piece takes it apart the way a real panel does — scope, model, recurrence, propagation, free/busy, pipeline, sharding — and shows the plumbing in code.
One row, infinite meetings —
and time itself refuses to hold still.
Recurring events with exceptions, IANA time zones and daylight saving, invitations that fan out to other people's calendars, and "find a time" across a room of attendees. Every one of these is a small data-engineering problem in disguise. We'll model all four — with DDL you could run, SQL the read path actually issues, and Python for the expansion, the free/busy index, and the ingestion pipeline.
What actually breaks before you write a line of code
Most candidates open this problem by drawing a load balancer and three app servers. That tells me nothing. The first thing a strong answer does is name the forces that will deform the design, because every later decision exists to survive one of them. A calendar has exactly four, and they are not the usual suspects.
The four things that break first
- The recurrence explosion. "Daily stand-up, every weekday, no end date" is one row that denotes an infinite set of occurrences. If you materialize occurrences as rows you sign up for an unbounded backfill job and a thousand-row rewrite on every series edit. If you expand the rule at read time you keep writes O(1) — but now every read does real computation, and a malicious
FREQ=MINUTELYrule can try to emit a million instances for a one-month view. - Time is not a number. Storing an event as a single UTC instant is correct for a one-off and wrong for anything that recurs. A 9:00 AM Pacific stand-up is 17:00 UTC in winter and 16:00 UTC in summer. Freeze it to UTC and the day the clocks change, the whole company's stand-up jumps an hour. The recurrence rule has to live in local wall-clock time, and UTC is derived per occurrence.
- One event, many calendars. A meeting I organize logically belongs to my calendar and to all nine attendees' calendars at once. Render-time fan-in (query every organizer's shard when you open your calendar) turns the hottest path in the product into a scatter-gather. Write-time fan-out (copy the event into each attendee's partition) is the answer — until a 50,000-person all-hands turns one edit into 50,000 writes.
- Free/busy is a second query system. "When is everyone free?" is not a calendar read. It is a privacy-preserving projection (opaque busy intervals, never titles) computed across many people, hammered by schedulers, room booking, and now AI agents polling availability. Recomputing it by re-expanding everyone's recurrences on every call is recomputing a deterministic answer over and over.
Read those again and notice they are all data problems, not web-tier problems. That is the tell of this question: the interesting engineering is in the model and the pipeline, not in the request path.
The three numbers that drive every choice
Reads dominate writes
Every phone wake-up, tab refocus, and sync poll is a read; users create only a couple of events a week. The architecture should make the common read nearly free and pay its costs on the rare write.
Two clocks, always
Store the instant in UTC and the wall-clock time plus an IANA zone id. Never a raw offset like -08:00 — offsets change twice a year; only the zone database knows the rules.
Free/busy may lag
A few seconds of staleness in availability is fine — calendars already permit double-booking. That single concession lets the whole read path be asynchronous and cached. Rooms are the one exception.
The trade-off to say out loud
Materialize everything, store UTC
- Expand every series into occurrence rows up front
- Store each occurrence as a UTC start/end and move on
- Compute free/busy on demand from the events table
Dies at: the first open-ended series (infinite rows), the first DST change (every recurring event shifts an hour), and the first 12-person scheduling poll under load.
Store rules, expand on read, derive UTC
- Keep one master row per series; expand the RRULE for the visible window only
- Recurrence lives in local time; UTC is computed per occurrence via the zone db
- Free/busy served from an async-maintained bitmap index, not live expansion
Costs: read-time CPU (bounded by a window cap) and a few seconds of free/busy staleness — both acceptable, both contained.
Everything below is a consequence of that right-hand column. Now let's scope it properly and put numbers on it.
Scope the problem, then size it
Before any schema, I pin down four functional buckets and the non-functional tensions, then anchor on rough Google-class numbers. The buckets are deliberately ordered by difficulty.
1 · Event CRUD
Create, edit, delete single and all-day events. The easy 20% — but it sets the row shape everything else reuses.
2 · Recurrence + exceptions
RRULE-style series with per-instance overrides: "every Tuesday, except next week, and move the one after to Wednesday." The hard core.
3 · Invitations
Invite attendees, collect accept / decline / tentative, organizer sees responses, and interop with people who aren't on our system over iCalendar.
4 · Free/busy
Given a set of people and a window, return when everyone is open. The engine behind "find a time."
I'd explicitly punt on email-delivery internals, video-conferencing, and working-hours / appointment-slot features, and treat sharing ACLs as a stretch — modeled-for, not designed in detail. On the non-functional side the defining tension is that this is read-heavy with a hard correctness requirement around time. A stale calendar view for a few seconds is fine, so availability beats strict consistency for reads; but writes — especially RSVPs — need read-your-own-writes, and free/busy must not double-book in obvious ways. Time gets stored three ways at once, which I'll defend with the DDL in the next section.
Users: ~500M monthly, ~180M daily active. Writes: at ~2–3 events/active-user/week, on the order of 50–80M event writes/day → ~1k writes/s average, spiking to 5–10k/s because calendar activity is business-hours-spiky and follows the sun. Reads: sync clients and calendar opens push 100k+ reads/s at peak. Storage: an event with attendees is a couple KB; even a few billion events/year is single-digit petabytes over time. Conclusion: this is not storage-bound. It is a read-latency and correctness problem, and the recurrence model is where correctness is won or lost.
One scale subtlety worth flagging before we model anything: because a single row can denote an unbounded number of occurrences, the materialize-vs-expand decision is not a detail — it dictates the storage layout, the write cost, and the entire free/busy design. We resolve it deliberately in §3.
Four tables, and the columns that earn their keep
The entity skeleton is unremarkable: users, calendars (a user owns several; the calendar is the unit of sharing), events, and an event_attendees join table holding per-attendee RSVP state. The interesting decisions are all inside events, so I'll lay the small tables down first and then spend real time on the big one.
-- A login identity. home_tz is the user's default zone, used when an
-- event is created without an explicit one.
CREATE TABLE users (
user_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email CITEXT UNIQUE NOT NULL,
display_name TEXT NOT NULL,
home_tz TEXT NOT NULL DEFAULT 'UTC', -- IANA id
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- A calendar is the unit of ownership AND of sharing/ACLs. 'kind'
-- lets rooms and resources reuse the exact same machinery as people.
CREATE TABLE calendars (
calendar_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
owner_id BIGINT NOT NULL REFERENCES users(user_id),
summary TEXT NOT NULL,
default_tz TEXT NOT NULL DEFAULT 'UTC',
kind TEXT NOT NULL DEFAULT 'primary'
CHECK (kind IN ('primary','secondary','resource','room')),
version BIGINT NOT NULL DEFAULT 1, -- bumped on any write; used as a cache key
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Now the heart. The single most consequential choice here is that singles, recurring masters, and exception overrides all live in one events table, disambiguated by which columns are populated. A row with an rrule and no parent is a master; a row with a parent_event_id and a recurrence_id is an override of one occurrence; a plain row is a one-off. One table keeps the read path to two queries and lets an override carry its own attendees for free.
CREATE TABLE events (
event_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
calendar_id BIGINT NOT NULL REFERENCES calendars(calendar_id),
organizer_id BIGINT NOT NULL REFERENCES users(user_id),
uid TEXT NOT NULL, -- RFC 5545 UID, stable across edits + ingest
-- content. On an OVERRIDE row, NULL means "inherit from the master"
-- (sparse override — see §4). On a master/single it is the value.
title TEXT,
description TEXT,
location TEXT,
-- time, stored THREE ways on purpose:
-- *_utc -> the instant, for range scans and one-off correctness
-- *_local -> wall-clock with NO zone, e.g. 2026-06-09 09:00
-- tz_id -> IANA zone; UTC for a recurrence is DERIVED from these two
start_utc TIMESTAMPTZ,
end_utc TIMESTAMPTZ,
start_local TIMESTAMP,
end_local TIMESTAMP,
tz_id TEXT,
all_day BOOLEAN NOT NULL DEFAULT false,
-- recurrence (MASTER rows only)
rrule TEXT, -- 'FREQ=WEEKLY;BYDAY=TU'
rdate TIMESTAMP[], -- explicit extra occurrences
exdate TIMESTAMP[], -- cheap inline cancellations
rrule_until_utc TIMESTAMPTZ, -- DENORMALIZED series end; NULL = open-ended
-- exception linkage (OVERRIDE rows only)
parent_event_id BIGINT REFERENCES events(event_id),
recurrence_id TIMESTAMP, -- original wall-clock start this row replaces
overridden_cols TEXT[], -- field mask: cols the user explicitly set
status TEXT NOT NULL DEFAULT 'confirmed'
CHECK (status IN ('confirmed','tentative','cancelled')),
transparency TEXT NOT NULL DEFAULT 'opaque' -- opaque=busy, transparent=free
CHECK (transparency IN ('opaque','transparent')),
version BIGINT NOT NULL DEFAULT 1, -- optimistic concurrency (compare-and-swap)
sequence INT NOT NULL DEFAULT 0, -- RFC 5545 SEQUENCE, bumped on material change
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-- an override must name the occurrence it replaces
CONSTRAINT override_names_occurrence
CHECK (parent_event_id IS NULL OR recurrence_id IS NOT NULL),
-- a row is a recurring master XOR an override, never both
CONSTRAINT master_xor_override
CHECK (NOT (rrule IS NOT NULL AND parent_event_id IS NOT NULL))
);
-- Range scan: "singles + masters that could touch [lo, hi)" on one calendar.
-- Partial index keeps cancelled rows and overrides out of the hot path.
CREATE INDEX idx_events_cal_window ON events (calendar_id, start_utc)
WHERE status <> 'cancelled' AND parent_event_id IS NULL;
-- Open-ended series stay filterable thanks to the denormalized horizon.
CREATE INDEX idx_events_cal_series ON events (calendar_id, rrule_until_utc)
WHERE rrule IS NOT NULL;
-- Overlay path: fetch every override/cancel for a set of masters.
CREATE INDEX idx_events_parent ON events (parent_event_id, recurrence_id)
WHERE parent_event_id IS NOT NULL;
-- One occurrence can only be overridden once per series.
CREATE UNIQUE INDEX uq_override ON events (parent_event_id, recurrence_id)
WHERE parent_event_id IS NOT NULL;
An offset like -08:00 is a fact about an instant, not a place. Store it and you've thrown away the rule that produced it, so you can't re-derive the right wall-clock after a DST transition or a government zone change. The IANA zone id (America/Los_Angeles) is a pointer into the tz database, which does know the rules. So: start_local + tz_id is the source of truth for anything that recurs, start_utc is a derived convenience for range scans and one-offs, and the two are kept consistent by the app, not the user.
Attendees get their own table because RSVP state is per-person and, crucially, because an override row can have its own attendee set — that's what lets someone decline a single moved instance while staying accepted on the series.
CREATE TABLE event_attendees (
event_id BIGINT NOT NULL REFERENCES events(event_id) ON DELETE CASCADE,
attendee_id BIGINT REFERENCES users(user_id), -- NULL for external guests
email CITEXT NOT NULL, -- external attendees live by email
response TEXT NOT NULL DEFAULT 'needs_action'
CHECK (response IN ('needs_action','accepted','declined','tentative')),
optional BOOLEAN NOT NULL DEFAULT false,
is_organizer BOOLEAN NOT NULL DEFAULT false,
PRIMARY KEY (event_id, email)
);
-- "what am I invited to, and have I replied?" — the invitation inbox.
CREATE INDEX idx_attendee_inbox ON event_attendees (attendee_id, response);
That's the whole model the rest of the system rides on. Two small tables, one carefully overloaded big one, and a join table that quietly carries the per-instance RSVP magic. Now the part that actually earns the salary.
Recurring events, and the three edits that define them
Here is the central tension restated as a decision: a recurring event is one logical record but unbounded physical occurrences. Do you store the occurrences as rows, or store the rule and expand it on read?
One row per occurrence
- Reads become a trivial range scan
- But infinite series force a horizon + a forever-running backfill
- And editing the series rewrites thousands of rows in one transaction
One master row per series
- Writes stay O(1) — edit the rule, not the occurrences
- A month view touches only a few dozen occurrences per series
- Use a materialized cache later only for hot windows — an optimization, not the model
So the base design stores the rule and expands at query time, and I'll lean on RFC 5545 (iCalendar) RRULEs rather than a homegrown daily/weekly/monthly enum. RRULE is battle-tested, it's what interop with Outlook and Apple Calendar requires anyway, and it expresses the gnarly cases — "third Thursday of every month," "last weekday of the month" — that a hand-rolled enum will eventually fail on.
The model then has to survive three kinds of edit, and all three reduce to the same primitive: an exception row — a child event pointing at the master via parent_event_id, tagged with the recurrence_id of the occurrence it's about.
Cancel one instance
"Skip this Tuesday's stand-up." An override row with status = cancelled for that recurrence_id. (This is iCalendar's EXDATE, but as a row so it can carry metadata.)
Move / edit one instance
"Push this Tuesday to Wednesday 2pm." An override row with its own start/end — and, because it's a full row, its own attendees and RSVPs.
This & all following
A series split: set the old master's UNTIL to just before the split, create a new master from the split point with the new properties.
That third one is the move people get wrong. Don't try to version a rule in place; split the series. Two writes, history preserved, old exceptions stay attached to the old master, the new master starts clean.
FREQ=WEEKLY;BYDAY=TU) expanded across five weeks. Week 3 is cancelled; week 4 is moved to Wednesday. The master is untouched — both edits are separate override rows overlaid at read time.The expansion itself is the procedure that makes or breaks the design, and it's where the DST fix lives. The rule is iterated in local time; UTC is computed per occurrence. That single ordering is why a 9:00 AM stand-up stays 9:00 AM across the spring-forward boundary instead of silently sliding to 8:00.
from datetime import timedelta
from zoneinfo import ZoneInfo
from dateutil.rrule import rrulestr
UTC = ZoneInfo("UTC")
INHERITABLE = ("title", "description", "location", "transparency")
def expand_series(master, win_lo, win_hi, overrides, max_instances=730):
"""Expand a recurring MASTER into occurrences within [win_lo, win_hi) UTC,
applying cancellations and per-instance overrides.
The rule is walked in LOCAL wall-clock time and each occurrence is then
converted to UTC. That order is the daylight-saving fix: 09:00 local maps
to 17:00Z in winter and 16:00Z in summer, automatically.
"""
tz = ZoneInfo(master["tz_id"])
dtstart = master["start_local"].replace(tzinfo=tz)
duration = master["end_local"] - master["start_local"]
rule = rrulestr(master["rrule"], dtstart=dtstart)
# index overrides by the occurrence they replace (naive local start)
ov_by_rid = {o["recurrence_id"]: o for o in overrides}
exdates = set(master.get("exdate") or ())
out, count = [], 0
for local_start in rule:
if count >= max_instances:
break # guardrail vs FREQ=MINUTELY abuse
start_utc = local_start.astimezone(UTC)
if start_utc >= win_hi:
break # rrule yields in order; safe to stop
end_utc = (local_start + duration).astimezone(UTC)
if end_utc <= win_lo:
continue # before the window; keep walking
rid = local_start.replace(tzinfo=None) # match key == recurrence_id
if rid in exdates:
continue # cheap inline cancel
ov = ov_by_rid.get(rid)
if ov is None:
out.append(_plain(master, start_utc, end_utc, rid))
elif ov["status"] != "cancelled":
out.append(_merge_override(master, ov)) # moved / edited instance
# else: cancelled override -> drop silently
count += 1
return out
def _plain(master, start_utc, end_utc, rid):
return {"event_id": master["event_id"], "title": master["title"],
"location": master["location"], "transparency": master["transparency"],
"start_utc": start_utc, "end_utc": end_utc, "recurrence_id": rid}
def _merge_override(master, ov):
"""Sparse merge: the override supplies its own time plus only the columns it
explicitly set (overridden_cols); everything else falls through to the
master's CURRENT value, so a later typo-fix on the series still reaches it."""
merged = {"event_id": ov["event_id"], "start_utc": ov["start_utc"],
"end_utc": ov["end_utc"], "recurrence_id": ov["recurrence_id"]}
overridden = set(ov.get("overridden_cols") or ())
for col in INHERITABLE:
merged[col] = ov[col] if col in overridden else master[col]
return merged
On the data plane, "give me this calendar for June" is two SQL statements, not one. First pull the singles and masters that could touch the window; then pull every override for those masters and overlay them in the application after expansion.
SELECT event_id, title, location, transparency, all_day, tz_id,
start_utc, end_utc, start_local, end_local,
rrule, rdate, exdate
FROM events
WHERE calendar_id = :cal
AND status <> 'cancelled'
AND parent_event_id IS NULL -- masters + singles; overrides overlaid in step 2
AND (
-- a one-off that overlaps [lo, hi)
(rrule IS NULL AND start_utc < :hi AND end_utc > :lo)
OR
-- a series whose lifespan overlaps the window; the denormalized
-- rrule_until_utc is exactly what keeps open-ended series index-filterable
(rrule IS NOT NULL AND start_utc < :hi
AND (rrule_until_utc IS NULL OR rrule_until_utc >= :lo))
);
SELECT parent_event_id, recurrence_id, status, overridden_cols,
title, location, description, transparency, start_utc, end_utc
FROM events
WHERE parent_event_id = ANY(:master_ids); -- hits idx_events_parent
-- app then calls expand_series(master, lo, hi, overrides[master_id]) and
-- the two SELECTs merge into one ordered, exception-corrected occurrence list.
"The organizer edits the master — changes the title, or the location. What propagates to the instances people already moved or renamed?"
This is the question that separates a model that compiles from a model that's right, and it has a non-obvious answer: propagation must be field-level. A typo fix on the series should reach the moved Wednesday instance; a rename the user made on that instance must not be clobbered by it. The next section is entirely about how the schema makes that correct by construction instead of by a fan-out job that can fail halfway.
Edits to the series, edits to the instance, and who wins
Users have strong, specific intuitions here. Fix a typo in the stand-up title and they expect the moved Wednesday instance to get the fix. Rename that one instance to "Stand-up + design review" and they'll be furious if your typo-fix later wipes their rename. So the rule they actually want is: master edits propagate to overrides, except for the exact fields the override customized. Three ways to represent that, and the choice is load-bearing.
A · Full snapshot
Override copies every field; no propagation. Simple, and wrong — the typo fix never reaches the moved instance. Reject.
B · Sparse override
Override stores only the fields it changed; everything else is NULL = "inherit." Propagation isn't an operation — it's free, because reads merge with the master's latest value. Pick this.
C · Field mask
Full denormalized row plus an overridden_cols set; on a master edit, fan out writes to every non-masked column. Single-row reads, but O(overrides) writes and a fan-out that can race or partially fail.
I pick B, sparse overrides. The merge cost is trivial because I'm already fetching the master to expand the series, so the override merge is in-memory, not an extra query. And it makes propagation correct by construction rather than by a job that can fail at occurrence 4,000 of 9,000. Note the schema carries overridden_cols anyway — not to drive option C's fan-out, but as an explicit record of intent that disambiguates "inherit" from "the user deliberately cleared this field." On the read path the merge is a plain COALESCE.
-- content inherits from the master unless the override explicitly set it;
-- TIME always comes from the override (its identity depends on its own start).
SELECT o.event_id,
COALESCE(o.title, m.title) AS title,
COALESCE(o.location, m.location) AS location,
COALESCE(o.description, m.description) AS description,
COALESCE(o.transparency, m.transparency) AS transparency,
o.start_utc, o.end_utc, o.recurrence_id, o.status
FROM events o
JOIN events m ON m.event_id = o.parent_event_id
WHERE o.parent_event_id = ANY(:master_ids);
But not all fields are equal, and pretending they are is how you ship a subtly broken calendar. I partition them into three categories.
title · description · location
Pure inherit-unless-overridden. The COALESCE above handles them with zero ceremony.
start · end · rrule
Special, because an override's identity is its recurrence_id — keyed to the occurrence start under the original rule. Change the master from Tuesdays to Thursdays and every stored recurrence_id is now an orphan.
structural, not field-masked
Adding an attendee to the series adds them to overrides too — with a fresh needs_action RSVP each. Removing one removes them everywhere. Propagation here is by rule, not by NULL.
When the master's rule or time changes, the occurrence grid moves out from under every exception, and there is no clean automatic answer to "where does the cancelled-Tuesday exception go in a Thursday series." So I treat it as a heavyweight operation: warn the user that instance-level changes will be discarded, then rebuild the series — exactly what Google Calendar does in practice. A gentler variant remaps exceptions by ordinal position or nearest occurrence, but that's heuristic and produces surprises; ship discard-with-warning, consider remapping later.
Last, concurrency. The organizer fixing the title and an attendee moving the instance can race. With sparse overrides this mostly composes, because they're writing different rows — but I still guard both with optimistic concurrency: a compare-and-swap on version, surfacing a conflict to the client rather than silently letting last-write-win on the same row.
UPDATE events
SET title = :title, location = :location,
version = version + 1, sequence = sequence + 1, updated_at = now()
WHERE event_id = :id
AND version = :expected_version -- 0 rows affected => someone beat us; 409 to client
RETURNING version;
So the propagation story is: sparse override rows, NULL-means-inherit with an explicit field mask for deliberate clears, read-time COALESCE, heavyweight rebuild for rule/time changes, structural rules for attendees, and CAS on version for the races. None of that is a feature you bolt on later — it falls out of getting the row shape right in §2.
"Find a time" — the naive way, then the way that scales
First, semantics, because free/busy is not "all your events." It's a privacy-preserving projection: the person querying gets opaque busy intervals, never titles or details — essential when you're scheduling with people whose calendars you can't read. And not every event makes you busy. Include only events the attendee accepted or is tentative on (a declined invite shouldn't block scheduling) and respect the event's transparency flag, because "OOO reminder" might be marked free.
SELECT e.start_utc, e.end_utc
FROM events e
JOIN event_attendees a ON a.event_id = e.event_id
WHERE e.calendar_id = :cal
AND a.attendee_id = :user
AND a.response IN ('accepted','tentative') -- declined != busy
AND e.transparency = 'opaque' -- 'transparent' shows as free
AND e.status <> 'cancelled'
AND e.rrule IS NULL -- recurring busy comes from expansion / bitmap
AND e.start_utc < :hi AND e.end_utc > :lo;
For ten people over two weeks, the simple computation is genuinely fine, so let me do it honestly before optimizing. Fan out to the ten calendars in parallel; for each, run the read path from §3 (singles + expanded series, exceptions overlaid), filter to busy-qualifying events, and project to [start, end) intervals in UTC. Per attendee, merge overlaps into a clean busy timeline; then intersect across attendees with a sweep line.
def merge_intervals(intervals):
"""Collapse overlapping [start, end) intervals into a clean busy timeline."""
if not intervals:
return []
intervals.sort()
out = [list(intervals[0])]
for s, e in intervals[1:]:
if s <= out[-1][1]:
out[-1][1] = max(out[-1][1], e) # overlap -> extend
else:
out.append([s, e])
return out
def find_free_slots(busy_by_attendee, win_lo, win_hi, duration):
"""Sweep-line intersection: return [start, end) gaps where NOBODY is busy and
the gap is at least `duration` long. O(E log E) over total busy intervals."""
edges = [] # +1 entering busy, -1 leaving
for intervals in busy_by_attendee.values():
for s, e in merge_intervals(intervals):
edges.append((s, 1)); edges.append((e, -1))
edges.sort()
free, busy, cursor = [], 0, win_lo
for t, delta in edges:
if busy == 0 and t > cursor and t - cursor >= duration:
free.append((cursor, min(t, win_hi)))
busy += delta
cursor = max(cursor, t)
if busy == 0 and win_hi - cursor >= duration:
free.append((cursor, win_hi))
return free
So why not stop there? Three reasons: tail latency (one slow shard out of ten makes an interactive query slow, and people drag the duration slider, re-firing it); volume (schedulers, room booking, and AI agents hammer this far beyond the UI); and waste (recurrence expansion is deterministic, so re-expanding everyone on every call recomputes the same answer). The fix is a materialized free/busy index, separate from the source-of-truth event store.
Per calendar, per UTC day, store a bitmap at 15-minute granularity — 96 bits, 12 bytes. Two weeks for one user is 168 bytes; ten users is under 2 KB. The query collapses to: fetch the tiny per-day bitmaps, OR them together across attendees, and scan for a run of zero bits at least the meeting length long.
-- One row per (calendar, UTC day): a 96-slot busy bitmap. Shown in Postgres
-- for clarity; in production this is a KV store (Bigtable/Cassandra) keyed
-- by (calendar_id, day) so a 14-day fetch is one batched multi-get.
CREATE TABLE freebusy_slots (
calendar_id BIGINT NOT NULL,
day_utc DATE NOT NULL,
busy_mask BIT(96) NOT NULL DEFAULT B'0'::BIT(96), -- 96 x 15-min slots
version BIGINT NOT NULL, -- last change applied
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (calendar_id, day_utc)
);
SELECT day_utc, bit_or(busy_mask) AS combined_busy
FROM freebusy_slots
WHERE calendar_id = ANY(:attendee_cals)
AND day_utc BETWEEN :lo_day AND :hi_day
GROUP BY day_utc
ORDER BY day_utc;
from math import ceil
SLOT_MIN, SLOTS = 15, 96 # 24h / 15min
def free_runs(combined: bytes, duration_min: int):
"""Yield (start_slot, end_slot) where the combined busy bitmap is 0 for at
least ceil(duration/15) consecutive slots. 'combined' is one UTC day."""
need = ceil(duration_min / SLOT_MIN)
run = 0
for s in range(SLOTS):
busy = (combined[s >> 3] >> (s & 7)) & 1 # bit s of the mask
if busy:
if run >= need:
yield (s - run, s)
run = 0
else:
run += 1
if run >= need:
yield (SLOTS - run, SLOTS)
Notice what just came back in through the side door: materialization of recurrences, the very thing I rejected for the event store. Here it's the right call, because free/busy doesn't need event identity — only coverage — and the bitmap is fixed-size no matter how many events overlap a slot. It also composes beautifully: "any 3 of these 5 optional attendees" becomes counting set bits instead of OR-ing; ranking slots by fewest conflicts is a popcount.
Live expansion per query
- Always correct, no staleness
- But fan-out reads + recurrence expansion on every call
- Tail-latency bound; re-does deterministic work
OR fixed-size masks
- One batched KV multi-get + bitwise OR + bit-scan
- Microseconds of CPU; composes (popcount, "k of n")
- Cost: async staleness + 15-min granularity (both acceptable)
The honest costs are worth naming out loud — it's where the senior signal is.
Async maintenance
The index lags a write by seconds. Fine — free/busy is advisory and the real event create goes against the source of truth. For rooms, do a synchronous conflict check on the room's actual events at booking time; that's the one place double-booking is unacceptable.
15-minute slots
Odd-duration precision is lost. Use the bitmap as a candidate filter, then refine the displayed slots' edges against real events.
23- and 25-hour days
A local day can have 92 or 100 slots. Define bitmap days in UTC and let the query layer window into local days, rather than baking local days into storage.
Unknown availability
Attendees off our system are shown unknown, or fetched via iCalendar free/busy federation if their provider supports it — never silently treated as free.
The change stream is the spine — and it does triple duty
Three different systems all need to react to one event write: attendee copies must fan out, free/busy bitmaps must be rebuilt, and cache keys must rotate. The mistake is to do those inline in the write request — now your "create event" latency is hostage to three downstream systems, and a partial failure leaves the world inconsistent. The fix is a single ordered log per calendar with several independent consumers, fed by a transactional outbox: every mutation writes the event and a changelog row in the same database transaction, so the stream can never miss a committed change or replay one that rolled back.
CREATE TABLE calendar_changelog (
seq BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, -- IS the sync token
calendar_id BIGINT NOT NULL,
event_id BIGINT NOT NULL,
op TEXT NOT NULL CHECK (op IN ('insert','update','delete')),
new_version BIGINT NOT NULL,
win_lo TIMESTAMPTZ NOT NULL, -- affected span, so the f/b consumer knows
win_hi TIMESTAMPTZ NOT NULL, -- which days to recompute (whole horizon for a series)
payload JSONB NOT NULL, -- the changed row, for consumers that don't re-read
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_changelog_cal ON calendar_changelog (calendar_id, seq);
BEGIN;
UPDATE events
SET title = :title, location = :location,
version = version + 1, sequence = sequence + 1, updated_at = now()
WHERE event_id = :id AND version = :expected -- optimistic CAS from §4
RETURNING version INTO :new_version;
INSERT INTO calendar_changelog
(calendar_id, event_id, op, new_version, win_lo, win_hi, payload)
VALUES (:cal, :id, 'update', :new_version, :win_lo, :win_hi, :payload);
COMMIT;
-- a CDC connector (e.g. Debezium tailing the WAL) ships committed changelog
-- rows to a partitioned Kafka topic keyed by calendar_id, preserving order.
From there the log forks into independent consumers. They share nothing but the stream, they can be scaled and redeployed separately, and each is replayable from a stored offset.
Mutation
App writes event + outbox row in one DB transaction.
Postgres txChange stream
CDC tails the outbox; one ordered partition per calendar.
Debezium · KafkaFan-out
Copy the event into each attendee's partition.
consumerFree/busy
Rebuild the affected day bitmaps.
consumerCache-bump
Rotate the calendar version; wake sync clients.
consumerServing
Replicas + cache answer reads from the new state.
read pathConsumer 3a — fan-out. Replicate an invitation into each attendee's calendar partition, carrying the version so a stale copy is detectable and repairable. Above a threshold, don't fan out at all — switch to read-time reference resolution (the §7 "celebrity" pattern for a 50k-person all-hands).
FANOUT_THRESHOLD = 200 # above this -> reference-only, resolve on read
def on_change_fanout(change, db, bus):
if change["op"] == "delete":
db.delete_attendee_copies(change["event_id"])
return
event = db.load_event(change["event_id"])
attendees = db.load_attendees(event["event_id"])
if len(attendees) > FANOUT_THRESHOLD:
db.mark_reference_only(event["event_id"]) # one source row for all
return
for a in attendees:
if a["attendee_id"] is None: # external guest
bus.publish("email.invite", build_ics(event, a))
continue
db.upsert_attendee_copy(
calendar_id=db.primary_calendar(a["attendee_id"]),
source_event_id=event["event_id"],
version=event["version"], # detect/repair stale copies
response=a["response"],
)
Consumer 3b — free/busy materializer. The ingestion job that turns events into bitmaps. It recomputes only the UTC days the change touched, expanding a series out to a rolling horizon.
from datetime import datetime, timedelta, time
from collections import defaultdict
from math import ceil
SLOT_MIN, SLOTS = 15, 96
def stamp(mask: bytearray, start_utc, end_utc, day):
"""Set the 15-min bits [start, end) covers within one UTC day."""
day0 = datetime.combine(day, time.min, tzinfo=UTC)
lo, hi = max(start_utc, day0), min(end_utc, day0 + timedelta(days=1))
if hi <= lo:
return
first = int((lo - day0).total_seconds()) // (SLOT_MIN * 60)
last = ceil(int((hi - day0).total_seconds()) / (SLOT_MIN * 60))
for s in range(first, min(last, SLOTS)):
mask[s >> 3] |= 1 << (s & 7)
def on_change_freebusy(change, db, horizon_days=365):
cal = change["calendar_id"]
lo = change["win_lo"]
hi = min(change["win_hi"], lo + timedelta(days=horizon_days))
# authoritative busy intervals (singles + expanded series) for the span
busy = load_busy_intervals(db, cal, lo, hi) # calls expand_series
by_day = defaultdict(lambda: bytearray(SLOTS // 8))
for s, e in busy:
d = s.date()
while d <= e.date():
stamp(by_day[d], s, e, d)
d += timedelta(days=1)
for day, mask in by_day.items():
db.upsert_freebusy(cal, day, bytes(mask), version=change["new_version"])
Consumer 3c — cache-bump. The cheapest and most important of the three: rotate the calendar's version so rendered-window cache keys roll over, and publish the new sync token so connected clients wake up. No explicit invalidation — old keys simply stop being requested and age out (more in §7).
def on_change_cache(change, db, cache):
db.bump_calendar_version(change["calendar_id"], change["new_version"])
# the changelog seq IS the sync token; clients ask "what changed since X?"
cache.publish_sync_token(change["calendar_id"], change["seq"])
Ingesting from the outside world
Half of "invitations" is interop: importing an Outlook or Apple export, a subscribed holiday feed, or a room-booking system — all of which speak iCalendar. The nice payoff of having modeled recurrence on RFC 5545 in the first place is that ingestion is almost a straight column mapping: RRULE, EXDATE and RECURRENCE-ID land on the columns we already have. The uid makes re-ingesting the same feed idempotent.
from icalendar import Calendar
def ingest_ics(feed_bytes, calendar_id, db):
"""Ingest an external iCalendar feed into events. RRULE / EXDATE /
RECURRENCE-ID map straight onto our columns; (calendar_id, uid,
recurrence_id) is the idempotency key so re-pulling a feed is a no-op."""
cal, rows = Calendar.from_ical(feed_bytes), []
for c in cal.walk("VEVENT"):
start = c.decoded("dtstart")
end = c.decoded("dtend", start) # default to start if absent
rid = c.get("recurrence-id") # present => this VEVENT is an override
rows.append({
"calendar_id": calendar_id,
"uid": str(c["uid"]),
"title": str(c.get("summary", "")),
"location": str(c.get("location", "")),
"start_local": to_wall(start), "end_local": to_wall(end),
"tz_id": tzid_of(start), # 'floating' -> calendar default
"rrule": rrule_text(c.get("rrule")),
"exdate": exdates(c.get("exdate")),
"recurrence_id": to_wall(rid.dt) if rid else None,
"status": str(c.get("status", "CONFIRMED")).lower(),
})
# ON CONFLICT (calendar_id, uid, recurrence_id) DO UPDATE ...
db.bulk_upsert_events(rows, conflict=("calendar_id", "uid", "recurrence_id"))
return len(rows)
Dual-writing — commit to Postgres, then publish to Kafka — has a window where the DB commit succeeds and the publish fails (or vice versa), and now your bitmaps and attendee copies silently disagree with the source of truth. The outbox makes the event change and its intent-to-publish one atomic commit; the CDC connector turns committed rows into stream messages exactly once in order. It's the difference between a pipeline that's eventually consistent and one that's eventually wrong.
Making the read path nearly free
Shard by calendar_id, hashed. The dominant query is "render this calendar for this window," and I want that to be a single-shard operation; hashing spreads load evenly where range-sharding by id would create hot ranges. The one key I would never shard by is time — tempting for a calendar, fatal in practice, because everyone reads the current two weeks and a time-partitioned layout concentrates the entire planet's read traffic on whichever shard owns "now." Calendar-keyed sharding spreads that temporal hotness across millions of calendars.
The wrinkle is the one from §0: an event with attendees belongs to many calendars at once. Resolve it with write-time fan-out (from §6) rather than read-time fan-in, because reads dominate and median attendee count is tiny — a handful of asynchronous shard writes, with the organizer's copy as source of truth. The exception is the huge event, handled the way feeds handle celebrity accounts.
Fan-out on write
- Push a thin copy into each attendee's partition
- Reads are single-shard and dead simple
- Versioned copies are self-repairing
Reference + read-time resolve
- Fan-out would mean 50k writes per edit
- Instead: one source row, thin reference in each partition
- Resolve details from the organizer's shard, fronted by cache
On caching I think in three layers, ordered by how much traffic each deletes before it reaches the next. The biggest win isn't a server-side cache at all — it's recognizing that calendar clients are sync clients, not request/response clients.
Incremental sync (client)
The client holds a sync token and asks "what changed since X?" Most phone wake-ups and tab refocuses become a tiny — usually empty — delta. The cheapest query is the one that fetches nothing.
Rendered-window cache (server)
The fully expanded, exception-merged view for a range, keyed by calendar version. Expansion is deterministic, so it's perfectly cacheable; the version in the key makes invalidation a non-event.
Read replicas
Reads from replicas; read-your-writes via the client echoing the version it just wrote and the server waiting for the replica to catch up. Traffic follows the sun, so capacity planning is kind.
The sync token is just the changelog seq from §6 — the same log that drives fan-out and bitmaps also drives client sync. One spine, three jobs.
SELECT seq, event_id, op, new_version, payload
FROM calendar_changelog
WHERE calendar_id = :cal
AND seq > :since_token -- client's stored high-water mark
ORDER BY seq
LIMIT 500; -- page; client advances token to MAX(seq) returned
def render_window(cal, lo, hi, db, cache):
"""Serve a fully expanded, exception-merged window. The calendar's version is
part of the key, so ANY write rotates the key and stale entries simply age
out — we never hunt down and delete cache entries."""
ver = db.calendar_version(cal)
key = f"win:{cal}:{lo:%Y%m%d}:{hi:%Y%m%d}:v{ver}"
hit = cache.get(key)
if hit is not None:
return hit # also where all-hands references resolve
occ = render_occurrences(db, cal, lo, hi) # §3: two SELECTs + expand_series
cache.set(key, occ, ttl=3600)
return occ
For multi-region, home each calendar in the owner's region with async cross-region replication, accepting slightly stale reads for cross-region attendees — consistent with free/busy being advisory and with the versioned-copy repair path. The thread tying everything together is, again, the change stream: attendee fan-out, version bumps for cache keys, free/busy maintenance, and client sync are all consumers of one ordered log per calendar.
The same facts, told two ways — the difference is the senior signal.
What breaks in production, and the seam that saves you
An architecture is only as good as its behavior when a piece is on fire. The recurring theme below is the same seam: because the event store is the source of truth and everything else (copies, bitmaps, caches) is a derived, versioned projection off one ordered log, every failure degrades to "slightly stale" rather than "wrong," and every projection is rebuildable by replay.
The five that actually page you
- Fan-out consumer falls behind. A burst of edits and the attendee-copy consumer lags; guests see a stale invite for a few seconds. Because each copy carries the event
version, a read can detect that its copy is older than the source and fall back to the organizer's shard. Mitigation: a bounded-lag SLA, autoscaling on consumer lag, and the source of truth always being correct. - Poison message / partial fan-out. One attendee's shard is down mid-fan-out. The upsert is idempotent and keyed by
(source_event_id, version), so retry and replay are safe; a failed message goes to a dead-letter queue and a reconciliation job re-drives it. You never half-apply and corrupt. - Orphaned exceptions after a rule change. Covered in §4: changing the master's rule or time moves the occurrence grid out from under every
recurrence_id. Discard-with-warning and rebuild is the shipped behavior; the alternative is silent, surprising remaps. - The tz database changes under you. A government cancels DST or shifts a transition date. Because recurrence is stored as local time + IANA zone, live expansion picks up the new rules automatically — but any cached UTC values and free/busy bitmaps for future dates are now wrong and must be recomputed. This is a real, scheduled job, below.
- Expansion bomb. A crafted
FREQ=SECONDLYrule or a 50-year window tries to emit millions of occurrences. Themax_instancesguardrail and a two-year per-query window cap from §3 keep a single request from melting a worker.
When IANA ships a new tz release, you can't re-expand the whole planet. The move: diff the old and new zone rules to find which zones changed and over which future date ranges, then enqueue recompute only for calendars whose events fall in those zones and ranges — fanned through the very same change stream as a synthetic "touch" event, so the free/busy materializer and cache-bump consumers do the rest with code that already exists. Storing local + zone instead of frozen UTC is precisely what makes this a targeted job instead of a full rebuild.
If you remember five things
One row, expanded on read — never materialized
Store the RRULE, expand for the visible window, cap the expansion. Materialization is an optimization for hot windows, not the model.events.rrule + expand_series(window) — writes stay O(1)
Local time is the source of truth; UTC is derived
Wall-clock + IANA zone, never a raw offset. Expand in local time, convert per occurrence. That ordering is the entire DST fix.09:00 'America/Los_Angeles' → 17:00Z winter, 16:00Z summer
Exceptions are sparse override rows
One table for singles, masters, and overrides. NULL means inherit, so a series edit propagates by COALESCE — correct by construction, no fan-out job to fail.parent_event_id + recurrence_id + overridden_cols
Free/busy is a bitmap, not a re-expansion
96 bits per calendar-day, OR across attendees, scan for a zero-run. Async-maintained, advisory, microseconds to query.bit_or(busy_mask) → first run ≥ ⌈duration/15⌉
One change stream, four jobs
A transactional outbox feeds fan-out, free/busy, cache versioning, and client sync. Every projection is derived, versioned, and replayable, so failures degrade to stale, never wrong.outbox → CDC → {copies · bitmaps · cache · sync}
That's the whole system, and notice how little of it is web tier. The calendar is a data-modeling problem wearing a UI: get the row shape and the change stream right and the request path becomes almost boring — which, for something a billion people open every morning, is exactly what you want.
Glossary
- RRULE
- The RFC 5545 recurrence rule, e.g.
FREQ=WEEKLY;BYDAY=TU;UNTIL=.... One string that denotes a whole series. - Master
- The single row holding a series' rule and base properties. Has an
rrule, noparent_event_id. - Override / exception
- A child row that modifies or cancels one occurrence, linked by
parent_event_idand tagged withrecurrence_id. - recurrence_id
- The original wall-clock start of the occurrence an override replaces — its identity key into the expanded series.
- EXDATE / RDATE
- iCalendar lists of excluded / extra dates. We use
exdatefor cheap cancels and rows for richer overrides. - SEQUENCE
- RFC 5545 revision counter, bumped on a material change so attendees' clients know to re-prompt.
- IANA tz id
- A named zone like
America/Los_Angeles— a pointer into the tz database, which knows the DST rules. Not an offset. - Wall-clock
- A local date-time with no zone (
TIMESTAMP), e.g.2026-06-09 09:00. The thing a recurrence must preserve. - DST
- Daylight saving time. Why a local day can be 23 or 25 hours and why frozen-UTC recurrence is a bug.
- UTC instant
- A
TIMESTAMPTZ; correct for one-offs, derived per occurrence for series. Great for range scans, wrong as a sole store.
- Transactional outbox
- Writing the data change and a changelog row in one transaction, so a CDC stream reflects exactly the committed changes.
- CDC
- Change Data Capture — tailing the database log (e.g. Debezium on the WAL) to turn committed rows into stream events.
- Fan-out on write
- Copying an event into each attendee's partition at write time, so reads stay single-shard. Flipped to reference-only for huge events.
- Free/busy bitmap
- A fixed-size per-calendar-day mask (96 × 15-min slots) of busy/free, OR-ed across attendees to find common openings.
- Sync token
- The client's high-water mark into the per-calendar changelog; "what changed since X?" powers incremental sync.
- Read-your-writes
- The guarantee that you immediately see your own change, via version echo + replica catch-up, even on an eventually-consistent read path.
A note on scope
This is an educational system-design teardown written for interview preparation and for the fun of taking apart something we all use without thinking. It is not affiliated with Google, and it reproduces no proprietary code or schema. The data model, DDL, SQL, and Python here are illustrative composites drawn from the iCalendar standard (RFC 5545), public engineering talks, and the broad calendaring literature — a clean-room "how I'd build it," not a leak of how anyone did. Real production calendars are richer, messier, and more interesting than any single article can be. If you spot something you'd model differently, that's the point: bring it to your next whiteboard.