The Integration Problem

§ 01 — THE QUESTIONOne deal, four problems

Every acquisition press release says "stronger together." Every acquirer's data team hears "reconcile two of everything before the next board meeting." The senior signal is refusing to ship a single integration plan, and naming the four orthogonal problems first.

Interview Prompt

"Your company just acquired a $20B SaaS business. Walk me through the data integration — how do you get to unified, trustworthy numbers without freezing either company's product?"

LEVEL · STAFF / PRINCIPALDURATION · 45 MINFORMAT · WHITEBOARD

The mistake that fails the question is treating this as one migration. It is four, and they have different owners, different timelines, and different failure modes. Entity resolution asks whether their Acme Corp is our Acme Corp — a graph problem dressed as a join. Schema unification has to happen while both applications keep shipping, because you cannot freeze a database that is still taking customer writes. Warehouse consolidation is a vendor decision — Snowflake versus Databricks versus BigQuery — that can be deferred behind a bridge for six months. And multi-org metrics is the quiet killer: their DAU is not your DAU, their fiscal calendar is not your fiscal calendar, and the board deck on day ninety will be subtly, defensibly wrong unless the definitions are versioned. A junior answer reaches for a big-bang migration and a single "combined" view. A senior answer sequences the four, makes every merge reversible, and gates the cutover on a number.

So before any boxes and arrows, the working frame for the session — the four problems, in the order they bite:

ENTITY RESOLUTION: Is their customer our customer? Golden record plus an ordered rule playbook (strict-then-fuzzy) plus an append-only crosswalk. The part everyone forgets is the un-merge — and you only learn you needed it two months in, mid-incident.
SCHEMA UNIFICATION: Both apps keep shipping. You cannot freeze either database. Strangler pattern: stand the unified schema up alongside the sources, CDC every change through a translation layer, reconcile daily, cut over one entity at a time.
WAREHOUSE CONSOLIDATION: Two vendors, one bill. Big-bang, federated query, or share-and-bridge. Every real deal share-and-bridges for the first six months, then commits — the warehouse decision is the one you are allowed to defer.
MULTI-ORG METRICS: Their DAU isn't your DAU. Fiscal calendars, FX policy, consolidation rules, overlapping ARR. Bake the definitions into a versioned dim_metric_definition the board deck can cite, not into SQL nobody can audit.

Never delete a merge. The first time an analyst says "those two companies are actually parent and subsidiary," an irreversible golden record becomes a two-week incident.

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. State the sequencing and the deferrals explicitly. In scope: master-data entity resolution, the strangler dual-write and its reconciliation gate, and the versioned metric layer — the three things that block the day-ninety board deck. Deferred on purpose: the warehouse vendor decision (bridged, not migrated, for six months), org-chart and HR-system merges, and the inevitable rationalization of overlapping product SKUs. The thread that ties scope together is reversibility — every structural decision in the first ninety days has to be undoable, because the integration will be re-litigated by Finance, Legal, and the regulators before it is finished.

Then the envelope math, offered rather than extracted. Deal-shaped numbers:

Quantity	Estimate	Consequence
Deal size	$20 B	Board-level scrutiny on every reported KPI
Source systems to unify	2 CRMs + 2 warehouses	Two of everything: IDs, calendars, currencies
Customer-base overlap	~40% (pro tier)	1+1≠2 — ARR double-counts without dedup
Reconciliation drift gate	< 0.1%	The number that gates cutover
Cutover green window	72 consecutive hours	One amber hour resets the clock
Convergence horizon	6–18 months	Both apps ship the whole time
Day-90 reporting deadline	fixed	Share-and-bridge buys the warehouse decision time

Notice which number gates everything. The drift threshold — source row counts versus unified row counts, held under one-tenth of one percent for seventy-two straight hours — is the single quantity that decides whether an entity is allowed to cut over. It dictates the reconciliation cadence, the dual-write design, and the on-call posture for the whole convergence. The rest of this article follows the crosswalk.

§ 02 — DATA FLOWFollowing a record into the golden core

One core, two living sources. Both CRMs keep writing; CDC streams every change through a translation layer into a unified read model; a daily reconciliation gates the cutover. The crosswalk sits in the middle, mapping every source ID to a golden ID — reversibly.

FIG. 1 — End-to-end flow. Sources keep writing; CDC translates and streams; the append-only crosswalk maps source IDs to a mastered golden record; reconciliation gates every cutover on drift.

Three properties of this picture do most of the interview's work. First, the sources are never frozen — both CRMs take writes for the entire convergence, and the unified model is fed by CDC, not by a one-time dump that goes stale the moment it lands. Second, the crosswalk is the hinge of the whole design and it is append-only: a merge writes a row, an un-merge writes another row that supersedes it, and nothing is ever destroyed — which is the only way an analyst's "those are actually parent and subsidiary" becomes a correction instead of a catastrophe. Third, the dashed line from reconciliation is a gate, not a pipe: drift over one-tenth of a percent does not corrupt data, it withholds permission to cut over, and the clock resets.

The Failure Philosophy, In One Rule

Every merge is a hypothesis, and every hypothesis must be retractable. Strict rules auto-merge because they are nearly certain; fuzzy rules stage candidates for a human because they are guesses. But even a strict merge can be wrong, so no merge ever rewrites or deletes — it appends a crosswalk row pointing source IDs at a golden ID, and the un-merge appends a superseding row pointing them somewhere new, emits a CDC event, and lets the facts re-attribute. The failure mode is a reversible correction, never an irreversible loss.

§ 03 — DATA MODELA mastered core, a reversible crosswalk, a versioned metric

The schema falls out of reversibility and auditability. The golden record is SCD2 and lineage-bearing. The crosswalk is append-only with a supersession pointer. The metric layer is versioned so a number can be defended to the board.

The golden record — mastered, lineage-bearing, SCD2

The master entity is a UUID-keyed golden_customer that no source owns. Its canonical fields are SCD2 — a customer's industry classification or DUNS number can be corrected over time — and it carries a sources[] array, the lineage that makes overlap analysis a column lookup rather than a join. The presence of both source systems in that array is exactly the "customer in both products" signal Sales and Finance fight over on day ninety.

DDL · GOLDEN MASTER (SCD2 + LINEAGE)

-- The mastered entity. No source system owns it; it is synthesized. SCD2 on
-- canonical fields; sources[] is the lineage that makes overlap a lookup.
CREATE TABLE golden_customer (
    golden_customer_id  UUID         PRIMARY KEY,
    canonical_name      TEXT         NOT NULL,
    canonical_domain    TEXT,
    duns                TEXT,                       -- Dun & Bradstreet number, if known
    industry            TEXT,
    country             TEXT,
    sources             TEXT[]       NOT NULL DEFAULT '{}',  -- ['acquirer_crm','target_crm']
    first_seen_at       TIMESTAMPTZ  NOT NULL DEFAULT now(),
    effective_from      TIMESTAMPTZ  NOT NULL DEFAULT now(),
    effective_to        TIMESTAMPTZ,                -- NULL = current; set on retire/un-merge
    status              TEXT         NOT NULL DEFAULT 'active'
                        CHECK (status IN ('active','retired'))
);

The crosswalk — append-only, reversible, audit-trailed

This is the most important table in the system. It maps every (source_system, source_customer_id) to a golden_customer_id, records which rule fired and at what confidence, and — the part everyone forgets — carries a superseded_by pointer. The crosswalk is never updated in place and never deleted from; a re-match appends a new row and retires the old one by pointing at it. That single column is the difference between an un-merge being a SQL statement and being a two-week incident.

DDL · THE CROSSWALK (APPEND-ONLY + UN-MERGE)

CREATE TABLE customer_xref (
    xref_id             BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    source_system       TEXT         NOT NULL,    -- 'acquirer_crm' | 'target_crm'
    source_customer_id  TEXT         NOT NULL,
    golden_customer_id  UUID         NOT NULL REFERENCES golden_customer,
    match_rule          INT          NOT NULL,    -- 1..5, from the playbook
    match_confidence    NUMERIC(4,3) NOT NULL,    -- 0.000 .. 1.000
    matched_at          TIMESTAMPTZ  NOT NULL DEFAULT now(),
    matched_by          TEXT         NOT NULL,    -- 'rule_engine' | analyst id
    superseded_by       BIGINT       REFERENCES customer_xref,  -- un-merge pointer
    CHECK (superseded_by <> xref_id)
);
-- The CURRENT mapping for a source ID is the row with no successor:
CREATE INDEX idx_xref_current ON customer_xref
    (source_system, source_customer_id) WHERE superseded_by IS NULL;

The unified read model and the reconciliation fact

The strangler's destination is unified.customers, keyed by account_id ≡ golden_customer_id, carrying source lineage (source_system, source_row_id) so any unified row can be traced back to the record that produced it. Reads go through the crosswalk; writes arrive by dual-write from CDC. Watching over all of it is recon.daily_drift — the reconciliation fact that compares source row counts to unified row counts per system and per day, and pages the on-call the moment drift crosses one-tenth of a percent.

DDL · UNIFIED READ MODEL + RECONCILIATION FACT

-- Dual-write destination of the strangler. account_id IS the golden id.
CREATE TABLE unified_customers (
    account_id      UUID         NOT NULL,        -- = golden_customer_id
    source_system   TEXT         NOT NULL,        -- lineage: which app wrote this
    source_row_id   TEXT         NOT NULL,
    name            TEXT,
    domain          TEXT,
    industry        TEXT,
    region          TEXT,
    status          TEXT,
    ingested_at     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    ingest_status   TEXT         NOT NULL DEFAULT 'ok'
                    CHECK (ingest_status IN ('ok','rejected')),
    PRIMARY KEY (source_system, source_row_id)   -- idempotent on re-delivery
);

-- The cutover gate, materialized. One row per (date, source). drift_pct is
-- the single number that decides whether an entity may flip.
CREATE TABLE recon_daily_drift (
    recon_date      DATE         NOT NULL,
    source_system   TEXT         NOT NULL,
    source_rows     BIGINT       NOT NULL,
    unified_rows    BIGINT       NOT NULL,
    rejected_rows   BIGINT       NOT NULL,
    drift_pct       NUMERIC(6,4) NOT NULL,        -- pages on-call if > 0.1
    lag_seconds     BIGINT       NOT NULL,        -- pages on-call if > 300
    PRIMARY KEY (recon_date, source_system)
);

One more table completes the governance layer, and it is covered in §06: dim_metric_definition, the versioned home for "what does combined DAU actually mean." It is the reason a board number can be audited rather than argued.

§ 04 — THE CORE INVARIANTNo merge is ever final

Everything in this design protects one sentence: any merge can be undone without losing referential integrity in the pipelines that already consumed it. The invariant lives in the append-only crosswalk and its supersession pointer.

The lifecycle of a golden record is not a straight line; it is a graph that can be walked backward. A source row is ingested, matched, and either auto-merged into an existing golden record or made into a distinct one. Months later an analyst discovers a mistake — two companies merged that should not have been — and the record must un-merge. The invariant says this is a forward operation: retire, re-point, log, emit. Nothing is destroyed; the history of who was merged into whom, by which rule, at what confidence, survives the correction intact.

INGESTED→ MATCHED→ MERGED (golden_id)→ RETIRED→ RE-POINTED (new golden_id)

The un-merge is the atomic heart of the system, and it is the move that separates a designer who has shipped one of these from one who has only read about it. It runs as a single transaction: open a new golden record, append crosswalk rows pointing the affected source IDs at it, retire the old golden record by closing its SCD2 window, and stamp the supersession pointer so the audit trail is intact. The CDC event it emits lets downstream fact tables re-attribute on their own schedule. Never a DELETE.

SQL · THE UN-MERGE (REVERSIBLE, AUDIT-TRAILED)

-- "Acme Corp Holding" and "Acme Corporation" were merged but are actually
-- parent/subsidiary. Split them WITHOUT deleting history. One transaction.
BEGIN;

-- 1. Mint the new golden record for the subsidiary.
INSERT INTO golden_customer (golden_customer_id, canonical_name, sources)
VALUES (:new_golden_id, 'Acme Corporation', ARRAY['target_crm']);

-- 2. Append the corrected crosswalk rows (NEW rows; the old ones stay).
INSERT INTO customer_xref
    (source_system, source_customer_id, golden_customer_id,
     match_rule, match_confidence, matched_by)
SELECT source_system, source_customer_id, :new_golden_id,
       5, 1.000, 'analyst:mdm-07'
FROM   customer_xref
WHERE  golden_customer_id = :old_golden_id
  AND  source_customer_id = ANY(:subsidiary_source_ids)
  AND  superseded_by IS NULL;

-- 3. Retire the old mappings by pointing them at their replacements.
UPDATE customer_xref old
SET    superseded_by = new.xref_id
FROM   customer_xref new
WHERE  old.golden_customer_id = :old_golden_id
  AND  new.golden_customer_id = :new_golden_id
  AND  old.source_customer_id = new.source_customer_id
  AND  old.superseded_by IS NULL;

-- 4. Close the old golden record's SCD2 window — retire, never delete.
UPDATE golden_customer
SET    status = 'retired', effective_to = now()
WHERE  golden_customer_id = :old_golden_id;

COMMIT;
-- A CDC event on customer_xref lets fact tables re-attribute downstream.

The crosswalk only ever grows. A merge appends a row; an un-merge appends another; the truth is the latest row with no successor.INTEGRATION RULE Nº 1 — APPEND, NEVER DESTROY

§ 05 — INGESTION & STREAMSThe strangler dual-write in Python

Three programs carry the convergence: the translation layer that remaps source schemas without freezing them, the matching playbook that runs strict-before-fuzzy, and the dual-writer that lands records idempotently. Each is small; the judgment is in what they refuse to do.

1 · The translation layer — remap, rewrite, dead-letter

CDC gives you the raw source change; the translation layer turns it into the unified contract. It renames fields, rewrites source IDs into the unified namespace, and stamps lineage. Its one discipline is that a record violating the contract — an unexpected field, a missing key — goes to a dead-letter queue, not into the unified model and not silently dropped. A contract violation is a signal that a source app shipped a schema change; swallowing it is how drift creeps in unseen.

PYTHON · TRANSLATION LAYER (CDC → UNIFIED CONTRACT)

from dataclasses import dataclass

@dataclass
class UnifiedCustomer:
    source_system: str
    source_row_id: str
    name:   str
    domain: str | None
    region: str

# Per-source field maps. Both apps keep their native schema; the translation
# layer is the ONLY place the two vocabularies meet. Add a source = add a map.
FIELD_MAP = {
    "acquirer_crm": {"customer_id": "source_row_id", "company": "name",
                     "domain": "domain",      "country": "region"},
    "target_crm":   {"tenant_id": "source_row_id",   "workspace_name": "name",
                     "email_domain": "domain", "region": "region"},
}

def translate(source_system: str, change: dict, dlq) -> UnifiedCustomer | None:
    """Map one CDC change to the unified contract. A contract violation is a
    SIGNAL (a source shipped a schema change), so it goes to the DLQ to be
    triaged — never into unified, never silently dropped."""
    fmap = FIELD_MAP[source_system]
    try:
        out = {dest: change[src] for src, dest in fmap.items()}
    except KeyError as missing:
        dlq.put({"source": source_system, "change": change,
                 "reason": f"missing field {missing}"})   # triage, don't guess
        return None
    return UnifiedCustomer(source_system=source_system, **out)

One distinction, always stated: the translation layer never matches. It only normalizes shape. Deciding whether this normalized record is the same customer as another is the playbook's job, kept separate so that a schema change and a matching change can be reasoned about — and rolled back — independently.

2 · The matching playbook — strict before fuzzy, first rule wins

You never decide two rows are the same customer with one rule. You run an ordered list from strict to permissive, and the first rule that fires wins — the same logic as airport security: passport first, then bag scan, then pat-down. Strict rules auto-merge into a golden record; fuzzy rules stage a candidate for a human via match_candidates. The ordering is the whole point: a confident identifier match must never be overruled by a fuzzy name similarity.

PYTHON · THE MATCH-KEY CASCADE (STRICT → FUZZY)

from rapidfuzz import fuzz

# Ordered strict→permissive. First rule to fire wins. Strict auto-merges;
# fuzzy stages a candidate for an MDM analyst. Confidence is recorded, never
# discarded — it is what lets you audit and reverse a bad merge later.
def match(rec, golden) -> tuple[int, float, str]:
    """Return (rule, confidence, disposition). disposition is 'auto' or 'review'."""
    if rec.domain == golden.canonical_domain and rec.duns == golden.duns:
        return 1, 0.99, "auto"          # strict: same domain AND same DUNS
    if rec.domain == golden.canonical_domain \
       and normalize(rec.name) == normalize(golden.canonical_name):
        return 2, 0.95, "auto"          # strict: domain AND normalized name
    if fuzz.token_sort_ratio(rec.name, golden.canonical_name) > 88 \
       and rec.country == golden.country and rec.industry == golden.industry:
        return 3, 0.80, "review"        # fuzzy: typo-tolerant name + geo + industry
    if shared_employee_emails(rec, golden) >= 3:
        return 4, 0.70, "review"        # fuzzy: 3+ @same-domain emails on both
    return 5, 0.0, "distinct"         # nothing fired → its own golden record

3 · The dual-writer — idempotent landing into the unified model

PYTHON · DUAL-WRITE LANDING (IDEMPOTENT)

async def dual_write(db, recs: list) -> None:
    """Land translated records into unified.customers. Idempotent on
    (source_system, source_row_id) so at-least-once CDC re-delivery is safe.
    The golden id is resolved through the CURRENT crosswalk row (no successor)."""
    for r in recs:
        await db.execute(
            """INSERT INTO unified_customers
                 (account_id, source_system, source_row_id,
                  name, domain, region, status)
               SELECT x.golden_customer_id, $1, $2, $3, $4, $5, 'active'
               FROM   customer_xref x
               WHERE  x.source_system = $1 AND x.source_customer_id = $2
                 AND  x.superseded_by IS NULL          -- the current mapping
               ON CONFLICT (source_system, source_row_id)
               DO UPDATE SET name = EXCLUDED.name, domain = EXCLUDED.domain,
                             region = EXCLUDED.region, ingested_at = now()""",
            r.source_system, r.source_row_id, r.name, r.domain, r.region,
        )

§ 06 — AGGREGATIONReconciliation gates, and metrics get a version

Two slow layers do the governance work. Reconciliation turns "are we drifting?" into a number that gates the cutover. The versioned metric definition turns "what is combined DAU?" from an argument into an auditable row. Both exist so a human can defend a decision.

Reconciliation runs hourly during cutover and compares source row counts to what the unified model actually ingested, per source system. The output is drift_pct and lag_seconds; both have hard pager thresholds. Drift over one-tenth of a percent or lag over five minutes pages the on-call, and the cutover gate is the simple, unforgiving rule on top: seventy-two consecutive green hours before an entity is allowed to flip, and one amber hour resets the clock to zero.

SQL · RECONCILIATION (RUNS HOURLY DURING CUTOVER)

WITH source_counts AS (
    SELECT 'acquirer_crm' AS src, count(*) AS n, max(updated_at) AS last_change
    FROM   acquirer_crm.customers
    UNION ALL
    SELECT 'target_crm', count(*), max(updated_at) FROM target_crm.tenants
),
unified_counts AS (
    SELECT source_system AS src, count(*) AS n, max(ingested_at) AS last_ingest,
           count(*) FILTER (WHERE ingest_status = 'rejected') AS rejected
    FROM   unified_customers GROUP BY source_system
)
SELECT s.src, s.n AS source_rows, u.n AS unified_rows,
       100.0 * abs(s.n - u.n) / nullif(s.n, 0) AS drift_pct,
       u.rejected,
       extract(epoch FROM (s.last_change - u.last_ingest)) AS lag_seconds
FROM   source_counts s
LEFT JOIN unified_counts u USING (src);
-- drift_pct > 0.1 → page. lag_seconds > 300 → page. Cutover: 72h all green.

The second governance layer is the versioned metric. The board deck has "combined DAU" on slide three, and it is wrong — because the acquirer counts "any session in 24h" while the target counts "any session in 7 days, deduped to one row per user-day." Rather than bury that disagreement in SQL, bake the definition into a versioned dimension. Every dashboard tile cites a (metric_id, definition_version); combined DAU has three competing versions; Finance picks the one that matches the board narrative; the data layer auto-generates the SQL from the row.

DDL · VERSIONED METRIC DEFINITION (THE BOARD CAN AUDIT)

CREATE TABLE dim_metric_definition (
    metric_id          TEXT NOT NULL,        -- 'dau'
    definition_version INT  NOT NULL,
    org_scope          TEXT NOT NULL,        -- 'acquirer' | 'target' | 'combined'
    window_unit        TEXT NOT NULL,        -- 'hour' | 'day' | 'week'
    window_size        INT  NOT NULL,
    dedup_grain        TEXT NOT NULL,        -- 'user' | 'user-day' | 'session'
    fiscal_calendar    TEXT NOT NULL,        -- 'calendar' | '445'
    currency_policy    TEXT NOT NULL,        -- 'lock-at-txn' | 'lock-at-month-end'
    effective_from     DATE NOT NULL,
    effective_to       DATE,
    PRIMARY KEY (metric_id, definition_version)
);

A number without a version is an opinion. The board deck does not cite a query; it cites a definition row — and three of them disagree, on purpose.INTEGRATION RULE Nº 2 — VERSION THE DEFINITION

§ 07 — ANALYTICS SQLThe day-90 questions

The golden core and its crosswalk are where the deal explains itself. Three queries an interviewer loves, because each carries a classic pattern: overlap as a set intersection, deduped roll-up through a crosswalk, and an as-of resolution over the append-only mappings.

Overlap analysis — the set intersection Sales fights over

"How many customers are in both products?" is the first question Sales asks (for the cross-sell list) and the first Finance asks (to deduct overlapping ARR before announcing combined revenue). Because lineage lives in golden_customer.sources[], it is a containment test, not a join: the golden records whose sources array holds both systems are precisely the overlap.

SQL · CUSTOMERS IN BOTH SYSTEMS (LINEAGE CONTAINMENT)

SELECT count(*) FILTER (WHERE sources @> ARRAY['acquirer_crm','target_crm'])
           AS in_both,
       count(*) FILTER (WHERE sources = ARRAY['acquirer_crm'])  AS acquirer_only,
       count(*) FILTER (WHERE sources = ARRAY['target_crm'])    AS target_only
FROM   golden_customer
WHERE  status = 'active';
-- @> is array-containment: "sources holds both". The overlap (~40% on the
-- pro tier in Adobe-Figma) is the cross-sell list AND the ARR-dedup base.

Combined ARR — deduped roll-up through the crosswalk

Summing both companies' revenue naively gives 1 + 1 = 2 for a customer that exists in both — overstating combined ARR by the overlap. The fix is to route every order through the current crosswalk row to its golden id, then sum by golden id, so a shared customer contributes once. The gap between the gross sum and this deduped figure is exactly the number Finance must disclose.

SQL · DEDUPED COMBINED ARR (NO 1+1=2)

WITH attributed AS (
    SELECT x.golden_customer_id, o.arr
    FROM   orders o
    JOIN   customer_xref x
           ON x.source_system      = o.source_system
          AND x.source_customer_id = o.source_customer_id
          AND x.superseded_by IS NULL          -- current mapping only
)
SELECT sum(arr)                              AS gross_arr,
       sum(arr_per_golden)                   AS deduped_arr,
       sum(arr) - sum(arr_per_golden)        AS overlap_deduction
FROM (
    SELECT golden_customer_id, sum(arr) AS arr_per_golden, sum(arr) AS arr
    FROM   attributed
    GROUP BY golden_customer_id
) g;

As-of mapping — which golden id did this order point at last quarter?

After an un-merge, "what was this source customer's golden id on a given date?" has two answers — before and after the correction. The append-only crosswalk makes the historical answer recoverable: walk the supersession chain and pick the row that was current as of the date in question. This is the audit query that defends a restated number to Finance.

SQL · AS-OF GOLDEN ID (TEMPORAL OVER THE CROSSWALK)

SELECT DISTINCT ON (source_system, source_customer_id)
       source_system, source_customer_id, golden_customer_id,
       matched_at, match_rule
FROM   customer_xref
WHERE  matched_at <= :as_of_date           -- mapping as it stood that day
ORDER BY source_system, source_customer_id, matched_at DESC;
-- Because the crosswalk is append-only, every historical mapping survives.
-- A restated ARR figure can always show which golden id it rolled up to.

Warehouse Consolidation — Three Strategies, One Default

Big-bang migration (copy data, rewrite dbt models) fits a strategic incompatibility caught early. Federated query (Snowflake External Tables, Unity Catalog, Trino) fits stacks that complement and a vendor decision you want to defer. Share-and-bridge (cross-account Snowflake Share, Delta Sharing, Authorized Views) is what every real deal uses for the first six months — lowest engineering cost, unblocks day-90 reporting. Tag every query with cost_owner ∈ {pre-merger-target, pre-merger-acquirer, post-merger-combined} so the CFO's "how much of last month's bill came from the acquired team" has an answer.

§ 08 — THE DASHBOARDProving the cutover is safe

A senior design ends with observability, because the whole convergence is a bet that you can watch. The dashboard answers one operational question per loop: is the drift under the gate, is the analyst queue draining, and does every board number cite a version?

RECONCILIATION: drift_pct per source against the 0.1% gate, lag_seconds against the 5-minute pager, reject rate on the unified ingest, and the green-hours streak — the count toward 72 that one amber hour resets.
ENTITY RESOLUTION: auto-merge rate (strict rules firing), candidate queue depth awaiting analyst review, un-merge count (a spike means a bad rule), and DLQ depth on the translation layer — a rising DLQ means a source shipped a schema change.
BOARD READINESS: overlap % feeding the ARR deduction, tiles citing a metric version (must be 100%), and cost by owner so the CFO's bill-attribution question is answered before it is asked.

Integration Ops — Customer Entity Cutover DAY 61 OF 90 · 14:20 UTC · 60s REFRESH

Drift — acquirer

0.04%

Drift — target

0.12%

CDC Lag p95

38s

Green-Hours Streak

19/72

Drift — target_crm vs gate · last 36 h (0.1% line)

Auto-Merge Rate

82%

Candidate Queue

1.4k

Un-Merges 7d

Translation DLQ

Overlap %

39%

Tiles Versioned

100%

FIG. 2 — The story a careful cutover tells: the acquirer source is green and steady, but the target breached the 0.1% gate at hour 28 — a backfill batch outran ingestion — and the green-hours clock reset to 19. The cutover does not flip until the streak reaches 72.

Read the amber tiles together and the dashboard narrates the discipline from §06: the target CRM's drift crept over the gate during a historical backfill, the on-call was paged, and the seventy-two-hour clock reset to nineteen — no data was lost, the cutover was simply not yet permitted. The DLQ at zero says both source schemas are still honoring the contract; the auto-merge rate at 82% says the strict rules are carrying the load while the analyst queue handles the fuzzy tail; every board tile cites a metric version. That is what a designed, reversible, gated integration looks like from the operator's chair on day sixty-one.

§ 09 — THE RUBRICWhat was actually being tested

Strip the deal away and the question was testing five judgments, each of which generalizes far beyond M&A:

DECOMPOSITION: Refusing to ship one "integration plan" — seeing four orthogonal problems (entity resolution, schema unification, warehouse consolidation, metrics) with different owners and timelines, and sequencing them.
REVERSIBILITY: Designing the un-merge before you need it: append-only crosswalk, superseded_by, retire-never-delete. The first irreversible merge is the incident that teaches this the hard way.
LIVE MIGRATION: Knowing you cannot freeze a database that is still taking writes — strangler dual-write through a translation layer, with the contract violation dead-lettered, not swallowed.
GATING: Letting a single number — drift under 0.1% for 72 hours — hold the authority to cut over, so the decision is mechanical and defensible rather than a judgment call under deadline pressure.
DEFENSIBILITY: Versioning the metric definition so a board number can be audited: combined DAU has three competing versions, Finance picks one on the record, and the SQL is generated from the row.

Adobe-Figma, Microsoft-Activision, Cisco-Splunk, Salesforce-Slack — all four hit the same four problems. The only difference is which one was hardest in their context, and whether the team built the un-merge before or after they needed it.— CLOSING ARGUMENT