The Hierarchy Problem — PaddySpeaks

§ 01 — THE QUESTIONOne table, every customer, a tree inside

Every data engineer who builds B2B software eventually meets this question. It sounds like a schema for accounts and deals. It is two problems most consumer platforms never face — a recursive hierarchy and absolute tenant isolation — sharing one set of tables.

Interview Prompt

"Design the data model for a multi-tenant CRM with account hierarchies. JP Morgan has a hundred subsidiaries, each running its own pipeline, but the renewal happens at the parent. Revenue must roll up the tree correctly, and no tenant can ever see another's data. How do you model it?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The two halves of this question pull in opposite directions, and a senior answer names that tension first. The hierarchy wants flexibility: enterprises re-org constantly, a subsidiary gets sold, a division is folded in, and the tree must bend without a migration. A revenue rollup that does not aggregate up the tree is simply wrong — JP Morgan's number is the sum of its hundred children — but one that aggregates carelessly double-counts a subsidiary that appears under two parents. The multi-tenancy wants rigidity: every Salesforce customer's rows live in the same physical tables, separated only by an org_id, and one query that forgets the filter reads someone else's book of business. That is not a bug you log and fix next sprint — it is an existential, contract-ending breach. Flexibility and rigidity, in the same schema, is the whole assignment.

A weak answer stores current_stage as a column on the opportunity and trusts the application to add the tenant filter. A strong answer notices that current state is a lossy projection of an event stream, and that tenant isolation cannot live in application code where a single forgotten WHERE clause defeats it. So before any boxes and arrows, the working frame for the whole session:

THE TREE: Account hierarchy. A self-referencing parent_account_id for clean writes, plus a materialized path for fast subtree reads, reconciled nightly. Re-orgs are cheap on the write side; rollups are cheap on the read side; the daily job keeps them honest.
THE WALL: Tenant isolation. org_id enforced at three layers — partition key, composite-index leading column, and a row-level security policy — so isolation survives a careless query, a rogue engineer, and a runaway scan. Defense in depth, because one layer is one mistake from a breach.
THE LEDGER: Stage changes as fact. One row per opportunity stage transition, never a mutable current_stage column. Current state is the latest event; velocity, conversion, and time-in-stage all fall out of the same append-only table. One source of truth for everything pipeline.

Losing tenant isolation is the one bug you cannot apologize your way out of. So you do not trust a query to remember the wall — you make the database refuse to serve without it.

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. State what you build, what you ignore, and the numbers that shape every later choice. Out of scope here, said explicitly: the CRM application layer and UI, the email/activity sync engine, territory assignment rules beyond the dimension, and the forecasting model itself (treated as a consumer of these facts). In scope: how the account tree is stored so re-orgs are cheap and rollups are fast, how tenant isolation is enforced so it cannot be bypassed, how pipeline state is modeled so velocity is derivable, and how record-sharing cascades up the role hierarchy.

Then the envelope math, volunteered rather than extracted. Salesforce-shaped numbers — a multi-tenant CRM serving every customer from shared infrastructure:

Quantity	Estimate	Consequence
Tenants (orgs)	150,000	Every table is partitioned by org_id, period
Accounts (all tenants)	~2 B	The tree dimension; recursive, SCD2 on re-org
Max hierarchy depth	~10–15	Recursive CTE needs a cycle circuit-breaker
Cross-tenant reads allowed	0	The invariant that shapes the whole architecture
Stage changes/day	~500 M	Append-only event fact; the pipeline spine
ARR snapshot rows/day	~2 B accounts × 1	Daily, partition-pruned by org; 7-yr retention
Largest single hierarchy	~100 subsidiaries	Rollup must be correct and not double-count

Notice the row that is a number and not a quantity: zero cross-tenant reads, ever. That single requirement is more architecturally decisive than any volume figure, because it dictates that org_id is not just a column — it is the partition key, the leading index column, and the subject of a security policy the database enforces on every statement. Everything else in this design is dimensional modeling; that row is the reason the modeling has a security stack bolted underneath it. The rest of this article follows the wall and the tree.

§ 02 — DATA FLOWFollowing a deal through the building

One platform, every tenant. The spine of the design is an append-only stage-change stream, partition-keyed by org_id at the source, feeding the pipeline analytics on one side and the hierarchical ARR rollup on the other — with a row-level security policy standing guard on every read.

FIG. 1 — End-to-end flow. The stage-change stream is the spine; the account tree is reconciled nightly; org_id rides every edge, and RLS guards every read.

Three properties of this picture do most of the interview's work. First, there is no current_stage box anywhere — current state is a view over the append-only stream, the latest event per opportunity, so the same table that powers "where is this deal" also powers "how long did it sit in negotiation." Second, the tree is split across two representations on purpose: writes hit a cheap parent_account_id on a single row, while reads hit a precomputed materialized_path, and a nightly job reconciles the second from the first — flexibility on write, speed on read. Third, every edge in the diagram is implicitly filtered by org_id, and the app tier binds the tenant to the session so the row-level security policy can enforce it; the wall is not drawn as a box because it is everywhere.

The Failure Philosophy, In One Rule

A stale rollup is recoverable; a leaked row is not. If the nightly path rebuild lags, a subtree query falls back to the recursive CTE over parent_account_id — slower, but correct, and self-healing at the next rebuild. But there is no fallback for a cross-tenant read: it is prevented, never detected-and-fixed, because by the time you detect it the data has already left the building. So the hierarchy degrades gracefully toward correctness, while isolation is made structurally impossible to violate — three independent layers, any one of which alone would stop the breach.

§ 03 — DATA MODELA recursive dimension, a wall, and an event log

The schema falls out of the two tensions. The account dimension is recursive and hybrid — parent_id plus path. Every table carries a leading org_id. And pipeline truth is an append-only fact of stage changes, never a mutable column.

The tree — three ways, and why the hybrid wins

There are three honest ways to model a parent-child tree, and a senior answer names all three before choosing. Pure parent_id with a recursive CTE: cheap writes, clean schema, but every read recurses and it crawls past depth ten. Materialized path ('/JPM/Chase/Investment/'): a LIKE 'path%' makes subtree queries instant, but a re-org rewrites every descendant's path. Closure table (a row per ancestor-descendant pair): O(1) lookup in either direction, but a re-org cascades a heavy rewrite. Salesforce takes the hybrid: parent_account_id for clean single-row writes plus a materialized_path for fast reads, with a nightly job rebuilding paths from the parent links.

DDL · THE RECURSIVE DIMENSION — HYBRID PARENT_ID + PATH

-- The account tree. parent_account_id is the source of truth for shape
-- (cheap writes, one row touched on re-org). materialized_path is the
-- read accelerator, rebuilt nightly. org_id LEADS the primary key —
-- it is the partition key AND the first column of every index.
CREATE TABLE dim_account (
    org_id            BIGINT      NOT NULL,        -- the wall, layer 1: partition key
    account_id        BIGINT      NOT NULL,
    parent_account_id BIGINT,                       -- self-FK; NULL = top of a tree
    materialized_path TEXT        NOT NULL,        -- '/JPM/Chase/Investment/' — read perf
    depth             SMALLINT    NOT NULL,
    name              TEXT        NOT NULL,
    tier              TEXT        NOT NULL,        -- SCD2 dimension attribute
    valid_from        TIMESTAMPTZ NOT NULL,
    valid_to          TIMESTAMPTZ,                   -- NULL = current row (SCD2)
    is_current        BOOLEAN     NOT NULL DEFAULT true,
    PRIMARY KEY (org_id, account_id, valid_from),
    FOREIGN KEY (org_id, parent_account_id)
        REFERENCES dim_account (org_id, account_id)  -- parent in same tenant only
);
-- Layer 2: org_id-LEADING composite index. A subtree read is one range scan.
CREATE INDEX idx_acct_path ON dim_account (org_id, materialized_path text_pattern_ops);
CREATE INDEX idx_acct_parent ON dim_account (org_id, parent_account_id);

The wall — org_id at three layers

Tenant isolation is not one mechanism; it is three independent ones, because any single layer is one mistake from a breach. Layer one is the partition key: org_id leads every primary key, so a tenant's rows are physically grouped and a per-tenant query cap can throttle a runaway scan without touching the cluster. Layer two is the index: every composite index begins with org_id, so the planner cannot accidentally choose a cross-tenant scan. Layer three is the row-level security policy — the database itself appends the tenant filter to every query, so an engineer with raw SQL access still cannot read another org without going through the API that binds the session variable.

DDL · THE WALL — LAYER 3, ROW-LEVEL SECURITY

-- The application binds the tenant per session: SET app.current_org = :org.
-- Then the database FORCES the filter onto every statement — SELECT,
-- UPDATE, DELETE — so a forgotten WHERE clause cannot leak a row.
ALTER TABLE dim_account ENABLE ROW LEVEL SECURITY;
ALTER TABLE dim_account FORCE ROW LEVEL SECURITY;   -- applies even to table owner

CREATE POLICY tenant_isolation ON dim_account
    USING      (org_id = current_setting('app.current_org')::bigint)
    WITH CHECK (org_id = current_setting('app.current_org')::bigint);
-- USING guards reads; WITH CHECK guards writes — you cannot INSERT a row
-- into another tenant any more than you can SELECT one. Same policy shape
-- is applied verbatim to every fact and dimension in the schema.

The pipeline — stage changes as an append-only fact

The cardinal modeling sin here is a current_stage column on the opportunity. It is lossy: the moment a deal moves from Negotiation to Closed-Won, the fact that it spent nineteen days in Negotiation is gone, and with it every velocity and conversion metric. Instead, every stage transition is an immutable row. Current stage is the latest event; time-in-stage is the gap to the next event; conversion is a count by stage. One append-only table is the single source of truth for everything pipeline.

DDL · THE PIPELINE SPINE — STAGE-CHANGE FACT

CREATE TABLE fct_opportunity_stage_change (
    org_id          BIGINT      NOT NULL,        -- the wall rides every fact too
    event_id        BIGINT      NOT NULL,
    opp_id          BIGINT      NOT NULL,
    account_id      BIGINT      NOT NULL,
    stage           TEXT        NOT NULL,        -- prospecting → … → closed_won/lost
    entered_at      TIMESTAMPTZ NOT NULL,
    owner_id        BIGINT      NOT NULL,
    forecast_amount NUMERIC(16,2) NOT NULL,
    PRIMARY KEY (org_id, opp_id, entered_at)
    -- ⚠ There is NO current_stage column anywhere. Latest event wins.
);
CREATE INDEX idx_stage_opp ON fct_opportunity_stage_change
    (org_id, opp_id, entered_at DESC);

-- The daily snapshot that survives renewals and downgrades — one row per
-- (account × day), the substrate the hierarchical rollup sums over.
CREATE TABLE fct_account_arr_snapshot (
    org_id            BIGINT     NOT NULL,
    account_id        BIGINT     NOT NULL,
    snapshot_date     DATE       NOT NULL,
    arr_usd           NUMERIC(16,2) NOT NULL,
    plan_tier         TEXT       NOT NULL,
    days_until_renewal SMALLINT  NOT NULL,
    PRIMARY KEY (org_id, account_id, snapshot_date)
);

And the bridge — record sharing, the many-to-many that powers "share this deal with my manager." Opportunity Y is owned by rep A but visible to manager B and VP C; that visibility is modeled as bridge_account_user_access(org_id, account_id, user_id, permission_level, granted_via), and the rows are generated by a daily job that cascades up the role hierarchy. The granted_via column records whether access came from ownership, a role rule, or a sharing rule — so a permission can be revoked precisely when its source disappears.

§ 04 — THE INVARIANTorg_id rides every query, or the query fails

The whole correctness of this system lives in one rule about access: no statement reaches a row without proving its tenant. Not by convention, not by a code-review checklist — by a database policy that rewrites every query to carry the filter, so the only way to read a row is to already be inside the right org.

Why this, and not the rollup, is the invariant. A wrong rollup is embarrassing and fixable; a cross-tenant read is the end of the contract and possibly the company. So isolation cannot live where mistakes live — in application code, where a single hand-written report query that forgets WHERE org_id = ? exposes every customer's pipeline. It must live in the engine. The application's only job is to bind the tenant to the session at the start of each request; from there the row-level security policy does the enforcing, on reads and writes alike, for every table in the schema.

REQUEST IN→ SET app.current_org→ QUERY (no org filter)→ RLS REWRITES IT→ ONLY TENANT ROWS

Read the chain left to right. The application never writes the tenant filter into its analytics queries — it sets a session variable once, and the policy silently appends AND org_id = current_setting('app.current_org') to every statement the session issues. A developer who writes SELECT sum(arr_usd) FROM fct_account_arr_snapshot with no filter gets only their own tenant's rows, because the database refused to run the query the way it was written. FORCE ROW LEVEL SECURITY means even the table owner is bound; there is no privileged path around the wall short of disabling the policy, which is an auditable, alarmed event.

SQL · THE ATOMIC HEART — ISOLATION IS NON-NEGOTIABLE

-- What the application runs at the start of every request. One line.
SET app.current_org = 'JPM_ORG_8842';

-- What a developer writes — note the ABSENCE of an org_id filter:
SELECT account_id, sum(arr_usd) AS arr
FROM   fct_account_arr_snapshot
WHERE  snapshot_date = CURRENT_DATE
GROUP BY account_id;

-- What the database ACTUALLY executes — RLS rewrote it:
SELECT account_id, sum(arr_usd) AS arr
FROM   fct_account_arr_snapshot
WHERE  snapshot_date = CURRENT_DATE
  AND  org_id = current_setting('app.current_org')::bigint   -- injected, always
GROUP BY account_id;
-- The forgotten filter is impossible to forget, because you never wrote it.

The defense-in-depth matters precisely because each layer fails differently. RLS could be misconfigured on a new table; the partition key still groups the data so a per-tenant cap contains a runaway scan; the leading index column still steers the planner away from a cross-tenant seq-scan. Three independent mechanisms, and a breach requires all three to fail at once — which is the only acceptable posture for a bug you cannot take back.

The wall is not a WHERE clause you remember to add. It is a policy the database adds for you, on every statement, whether you asked or not.HIERARCHY RULE Nº 1

§ 05 — INGESTION & STREAMSPython on the tree

Three programs carry the data. The session binder that makes isolation automatic, the nightly path rebuilder that keeps the hybrid tree honest, and the sharing-cascade job that turns the role hierarchy into bridge rows. Each is small; the judgment is in what they refuse to do.

1 · The session binder — isolation without ceremony

Every database connection a request touches must be bound to exactly one tenant before it issues a single query, and unbound before it returns to the pool. This is the one place tenant safety is the application's responsibility, so it is made impossible to forget: a context manager binds on entry and clears on exit, and any query that runs outside it hits an unbound session and fails closed rather than open.

PYTHON · TENANT SESSION BINDER — FAIL CLOSED

from contextlib import asynccontextmanager

class TenantContext:
    """Binds org_id to the DB session for the life of one request, then
    clears it. RLS does the enforcing; this just sets the variable the
    policy reads. The reset is in a finally — a pooled connection must
    NEVER carry one tenant's binding into another tenant's request."""

    def __init__(self, pool):
        self.pool = pool

    @asynccontextmanager
    async def for_org(self, org_id: int):
        if org_id is None:
            raise ValueError("no tenant — refusing to open an unbound session")
        conn = await self.pool.acquire()
        try:
            # set_config(..., is_local=true): scoped to this transaction only
            await conn.execute("SELECT set_config('app.current_org', $1, true)",
                               str(org_id))
            yield conn                      # every query here is RLS-filtered to org_id
        finally:
            await conn.execute("SELECT set_config('app.current_org', '', true)")
            await self.pool.release(conn)   # returns CLEAN to the pool

One carve-out, always stated: cross-tenant platform jobs run under a separate, audited service role, never by loosening the policy on the shared path. The nightly billing aggregator that must read all orgs uses a role whose policy is "all orgs" by design, with its access logged — so the privileged path exists, but it is a deliberate, observable door, not a hole anyone can stumble through.

2 · The path rebuilder — reconcile the hybrid tree

PYTHON · NIGHTLY MATERIALIZED-PATH REBUILD

def rebuild_paths(accounts) -> dict[int, str]:
    """parent_account_id is truth; materialized_path is a derived cache.
    A re-org touches one parent_id cheaply during the day, leaving paths
    stale; this job rebuilds every path from the parent links, per tenant.
    Stale paths never cause WRONG answers — the rollup falls back to the
    recursive CTE — they only cause slow ones until the rebuild lands."""
    by_id = {a.account_id: a for a in accounts}
    cache: dict[int, str] = {}

    def path_of(aid: int, seen: frozenset) -> str:
        if aid in cache:
            return cache[aid]
        if aid in seen or len(seen) > 100:        # cycle / runaway guard
            raise ValueError(f"cycle or depth>100 at account {aid}")
        node = by_id[aid]
        if node.parent_account_id is None:
            cache[aid] = f"/{aid}/"                    # root
        else:
            parent = path_of(node.parent_account_id, seen | {aid})
            cache[aid] = f"{parent}{aid}/"
        return cache[aid]

    return {a.account_id: path_of(a.account_id, frozenset()) for a in accounts}

3 · The sharing cascade — role tree to bridge rows

PYTHON · RECORD-SHARING CASCADE UP THE ROLE HIERARCHY

def cascade_sharing(owner_id, account_id, role_tree) -> list[dict]:
    """'Share with my manager' in data terms: an account owned by a rep is
    visible to everyone ABOVE them in the role tree. We walk up from the
    owner's role, emitting one bridge row per ancestor, tagging granted_via
    so the grant can be revoked precisely when its source disappears."""
    rows = [{"account_id": account_id, "user_id": owner_id,
             "permission_level": "owner", "granted_via": "own"}]
    role = role_tree.role_of(owner_id)
    depth = 0
    while role.parent_role_id is not None and depth < 100:   # circuit breaker
        role = role_tree.parent(role)
        for manager in role_tree.users_in(role):
            rows.append({"account_id": account_id, "user_id": manager,
                         "permission_level": "read", "granted_via": "role"})
        depth += 1
    return rows   # written under the tenant's own org_id — sharing never crosses the wall

§ 06 — THE ROLLUPSumming the tree without double-counting

The hierarchical rollup is where the tree pays off. Two ways to compute it — the recursive CTE for correctness, the materialized path for speed — and the discipline that keeps a subsidiary appearing under two parents from being counted twice.

The recursive CTE is the always-correct path, and the one to show first. Start from the roots, walk down the parent_account_id links accumulating each node's root, then join to the ARR snapshot and sum per root. The circuit-breaker on depth is not optional: a malformed tree with a cycle — a re-org that accidentally made a grandchild its own ancestor — would recurse forever without it, so the depth cap turns a data-quality bug into a bounded, loud failure instead of a hung query.

SQL · ROLLUP, THE CORRECT WAY — RECURSIVE CTE WITH A FUSE

WITH RECURSIVE descendants AS (
    -- anchor: every root account is its own subtree's top
    SELECT account_id, account_id AS root, 0 AS depth
    FROM   dim_account
    WHERE  parent_account_id IS NULL AND is_current
  UNION ALL
    -- recurse: attach each child to its ancestor's root
    SELECT a.account_id, d.root, d.depth + 1
    FROM   dim_account a
    JOIN   descendants d ON a.parent_account_id = d.account_id
    WHERE  a.is_current AND d.depth < 15            -- the circuit breaker
)
SELECT d.root AS top_level_account,
       sum(s.arr_usd)                AS total_subtree_arr,
       count(DISTINCT d.account_id) AS subsidiaries  -- DISTINCT: no double-count
FROM   descendants d
JOIN   fct_account_arr_snapshot s
       ON s.account_id = d.account_id
      AND s.snapshot_date = CURRENT_DATE
GROUP BY d.root;

The materialized-path version is the fast path, and the reason the hybrid model exists. Once the nightly job has stamped '/JPM/Chase/Investment/' onto every node, the entire JP Morgan subtree is a single range scan: WHERE materialized_path LIKE '/JPM/%', no recursion at all. This is what makes a dashboard that rolls up a hundred-subsidiary hierarchy feel instant instead of grinding through ten levels of CTE on every page load. The two paths agree by construction — the path is built from the same parent links the CTE walks — so the fast path is a cache, never a second truth.

SQL · ROLLUP, THE FAST WAY — MATERIALIZED PATH RANGE SCAN

-- The whole JPM subtree in one index range scan. No recursion.
-- org_id-leading index makes this a tight, tenant-local seek.
SELECT count(DISTINCT a.account_id) AS subsidiaries,
       sum(s.arr_usd)                AS total_subtree_arr
FROM   dim_account a
JOIN   fct_account_arr_snapshot s
       ON s.account_id = a.account_id
      AND s.snapshot_date = CURRENT_DATE
WHERE  a.is_current
  AND  a.materialized_path LIKE '/JPM/%';   -- the entire subtree, instantly

Write to the parent link; read from the path. The tree bends cheaply when the org re-orgs, and the rollup stays fast — because the slow truth and the fast cache are rebuilt from each other every night.HIERARCHY RULE Nº 2 — CHEAP WRITES, FAST READS

§ 07 — ANALYTICS SQLInterrogating the pipeline

The stage-change fact is where the business explains itself. Three queries an interviewer loves, because each one carries a classic pattern on its back — current-state via window, time-in-stage via LEAD, and forecast accuracy as a cohort comparison.

Current stage — latest event via ROW_NUMBER

The query that justifies refusing a current_stage column. Current state is simply the most recent stage-change row per opportunity — a ROW_NUMBER partitioned by opp, ordered by time descending, filtered to the first. The same table that stores history answers "where is every deal right now," so there is exactly one source of truth and no risk of a denormalized column drifting from the events.

SQL · CURRENT STAGE — LATEST-EVENT WINDOW

SELECT opp_id, stage AS current_stage, entered_at, forecast_amount
FROM (
    SELECT opp_id, stage, entered_at, forecast_amount,
           row_number() OVER (PARTITION BY opp_id
                              ORDER BY entered_at DESC) AS rn
    FROM   fct_opportunity_stage_change
    -- org_id filter is injected by RLS; we never write it here
) latest
WHERE rn = 1;   -- the newest transition IS the current stage

Pipeline velocity — time-in-stage via LEAD

The metric the whole event model was built to serve. The duration a deal spent in each stage is the gap between its entry and the next stage's entry — a textbook LEAD over the ordered transitions. From this one derivation come average time-in-stage, conversion rates between stages, and the stale-opportunity early warning that flags deals sitting too long.

SQL · TIME-IN-STAGE — LEAD OVER THE TRANSITION STREAM

WITH spans AS (
    SELECT opp_id, account_id, stage, entered_at,
           lead(entered_at) OVER (PARTITION BY opp_id
                                ORDER BY entered_at)        AS left_at
    FROM   fct_opportunity_stage_change
)
SELECT stage,
       count(*)                                                  AS deals,
       round(avg(extract(epoch FROM
           coalesce(left_at, now()) - entered_at) / 86400), 1)     AS avg_days_in_stage,
       count(*) FILTER (WHERE left_at IS NULL
                       AND now() - entered_at > INTERVAL '30 days') AS stale_now
FROM   spans
WHERE  stage NOT IN ('closed_won','closed_lost')
GROUP BY stage
ORDER BY avg_days_in_stage DESC;
-- stale_now = open deals stuck >30 days in a non-terminal stage: the warning list.

Forecast accuracy — predicted versus actual, per rep

The query that keeps the forecast honest. Join each closed opportunity's actual outcome to the forecast that was on it, per rep per quarter, and compute bias (predicted minus actual) and MAPE. A rep whose forecast consistently lands above actual is sandbagging; one consistently below is happy-ears. Both are visible the moment you compare the frozen forecast on the stage-change events to the realized close amount.

SQL · FORECAST BIAS & MAPE — COHORT BY REP & QUARTER

WITH closed AS (
    SELECT c.opp_id, c.owner_id,
           date_trunc('quarter', c.closed_at) AS close_q,
           c.amount                          AS actual_amount,
           f.forecast_amount                 AS predicted_amount
    FROM   fct_opportunity_close c
    JOIN LATERAL (                          -- the forecast at the moment it entered final stage
        SELECT forecast_amount
        FROM   fct_opportunity_stage_change s
        WHERE  s.opp_id = c.opp_id
        ORDER BY s.entered_at DESC LIMIT 1
    ) f ON TRUE
    WHERE  c.status = 'won'
)
SELECT owner_id, close_q,
       count(*)                                                       AS deals,
       round(sum(predicted_amount - actual_amount), 0)               AS bias_usd,
       round(100.0 * avg(abs(predicted_amount - actual_amount)
                       / nullif(actual_amount, 0)), 1)                  AS mape_pct
FROM   closed
GROUP BY owner_id, close_q
ORDER BY abs(round(sum(predicted_amount - actual_amount), 0)) DESC;
-- persistent positive bias = sandbagging; persistent negative = happy ears.

§ 08 — THE DASHBOARDProving the walls hold

A senior design ends with observability, because both halves of this problem fail silently. The CRM dashboard watches three things: revenue is rolling up correctly, the pipeline is moving, and — above all — the tenant walls are intact.

HIERARCHY HEALTH: top-level ARR by largest tree, path rebuild lag (CTE-fallback queries running), orphaned accounts (parent missing), cycle-guard trips — a non-zero cycle count is a re-org that broke the tree.
PIPELINE HEALTH: avg time-in-stage, stage conversion, stale-opp count, forecast bias — the velocity signals derived entirely from the stage-change fact.
ISOLATION HEALTH: cross-tenant query attempts blocked (RLS denials — should be the only place a leak is ever even attempted), unbound-session refusals, per-tenant query cap trips, privileged-role accesses (audited platform jobs) — the wall's own telemetry.

CRM Platform Ops — Multi-Tenant THU 09:15 PT · 150K ORGS · 60s REFRESH

Top Tree ARR (JPM)

418M$

Path Rebuild Lag

2.1h

Cycle-Guard Trips

RLS Denials

37/h

Stage funnel — current-quarter pipeline (deals by stage, org JPM_8842)

Avg Time-in-Stage

21d

Stale Opps

148

Unbound Refusals

Query-Cap Trips

3/h

Forecast Bias

+9%

Privileged Access

4/d

FIG. 2 — The story a healthy platform tells: top-tree ARR rolling up cleanly, cycle-guard and unbound-session refusals at exactly zero, RLS denials present but bounded — the wall is doing its job, catching the queries that would otherwise have leaked.

Read the tiles together and the dashboard narrates the two-front nature of the problem. On the pipeline side, amber is creeping — time-in-stage and stale opps rising, forecast running 9% hot — the ordinary drift of a sales org that needs a nudge. On the isolation side, the only numbers that matter are at zero or bounded: cycle trips zero, unbound refusals zero, RLS denials present but contained, each one a query the database refused to leak. That asymmetry is the whole design philosophy in one screen — the pipeline is allowed to wobble and self-correct; the wall is not allowed to fail at all, and the telemetry proves it did not.

§ 09 — THE RUBRICWhat was actually being tested

Strip the CRM details away and the question was testing five judgments, each of which generalizes far beyond Salesforce:

REPRESENTATION: Choosing how to store a tree by what you do with it — naming parent_id, materialized path, and closure table, then picking the hybrid because writes and reads want different shapes. The right model is two models reconciled.
DEFENSE IN DEPTH: Putting tenant isolation where mistakes do not reach it — partition key, leading index column, and a database-enforced policy — so a breach needs three independent failures, not one forgotten clause.
EVENTS OVER STATE: Refusing a mutable current-stage column. State is a projection of a stream; store the transitions and current state, velocity, and conversion all fall out of one append-only table.
CORRECTNESS UNDER RECURSION: Summing a tree without double-counting and with a cycle fuse — DISTINCT on the descendant set, a depth cap that turns a malformed tree into a loud bounded error instead of a hung query.
FAILURE ASYMMETRY: Knowing which failures self-heal and which end the company — letting the rollup degrade toward correctness while making isolation structurally impossible to violate. Not all bugs are equal; design accordingly.

The tree may be rebuilt every night; the wall may never be breached even once. The craft is letting the flexible thing bend and the rigid thing hold — in the same schema, on the same afternoon.— CLOSING ARGUMENT