PaddySpeaks · Systems at the Whiteboard · Nº 15

The Integration Problem

Your company just bought a twenty-billion-dollar SaaS. Two CRMs, two warehouses, two ID systems, two fiscal calendars — and an executive expecting unified KPIs on day ninety. The one decision that separates a senior answer: every merge must be reversible. A complete working-through — data flow, the golden-record schema, the matching playbook, the strangler dual-write, the reconciliation SQL, and the cutover dashboard.

§ 01 — THE QUESTIONOne deal, four problems

Every acquisition press release says "stronger together." Every acquirer's data team hears "reconcile two of everything before the next board meeting." The senior signal is refusing to ship a single integration plan, and naming the four orthogonal problems first.

Interview Prompt

"Your company just acquired a $20B SaaS business. Walk me through the data integration — how do you get to unified, trustworthy numbers without freezing either company's product?"

LEVEL · STAFF / PRINCIPALDURATION · 45 MINFORMAT · WHITEBOARD

The mistake that fails the question is treating this as one migration. It is four, and they have different owners, different timelines, and different failure modes. Entity resolution asks whether their Acme Corp is our Acme Corp — a graph problem dressed as a join. Schema unification has to happen while both applications keep shipping, because you cannot freeze a database that is still taking customer writes. Warehouse consolidation is a vendor decision — Snowflake versus Databricks versus BigQuery — that can be deferred behind a bridge for six months. And multi-org metrics is the quiet killer: their DAU is not your DAU, their fiscal calendar is not your fiscal calendar, and the board deck on day ninety will be subtly, defensibly wrong unless the definitions are versioned. A junior answer reaches for a big-bang migration and a single "combined" view. A senior answer sequences the four, makes every merge reversible, and gates the cutover on a number.

So before any boxes and arrows, the working frame for the session — the four problems, in the order they bite:

ENTITY RESOLUTION
Is their customer our customer? Golden record plus an ordered rule playbook (strict-then-fuzzy) plus an append-only crosswalk. The part everyone forgets is the un-merge — and you only learn you needed it two months in, mid-incident.
SCHEMA UNIFICATION
Both apps keep shipping. You cannot freeze either database. Strangler pattern: stand the unified schema up alongside the sources, CDC every change through a translation layer, reconcile daily, cut over one entity at a time.
WAREHOUSE CONSOLIDATION
Two vendors, one bill. Big-bang, federated query, or share-and-bridge. Every real deal share-and-bridges for the first six months, then commits — the warehouse decision is the one you are allowed to defer.
MULTI-ORG METRICS
Their DAU isn't your DAU. Fiscal calendars, FX policy, consolidation rules, overlapping ARR. Bake the definitions into a versioned dim_metric_definition the board deck can cite, not into SQL nobody can audit.
Never delete a merge. The first time an analyst says "those two companies are actually parent and subsidiary," an irreversible golden record becomes a two-week incident.

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. State the sequencing and the deferrals explicitly. In scope: master-data entity resolution, the strangler dual-write and its reconciliation gate, and the versioned metric layer — the three things that block the day-ninety board deck. Deferred on purpose: the warehouse vendor decision (bridged, not migrated, for six months), org-chart and HR-system merges, and the inevitable rationalization of overlapping product SKUs. The thread that ties scope together is reversibility — every structural decision in the first ninety days has to be undoable, because the integration will be re-litigated by Finance, Legal, and the regulators before it is finished.

Then the envelope math, offered rather than extracted. Deal-shaped numbers:

QuantityEstimateConsequence
Deal size$20 BBoard-level scrutiny on every reported KPI
Source systems to unify2 CRMs + 2 warehousesTwo of everything: IDs, calendars, currencies
Customer-base overlap~40% (pro tier)1+1≠2 — ARR double-counts without dedup
Reconciliation drift gate< 0.1%The number that gates cutover
Cutover green window72 consecutive hoursOne amber hour resets the clock
Convergence horizon6–18 monthsBoth apps ship the whole time
Day-90 reporting deadlinefixedShare-and-bridge buys the warehouse decision time

Notice which number gates everything. The drift threshold — source row counts versus unified row counts, held under one-tenth of one percent for seventy-two straight hours — is the single quantity that decides whether an entity is allowed to cut over. It dictates the reconciliation cadence, the dual-write design, and the on-call posture for the whole convergence. The rest of this article follows the crosswalk.


§ 02 — DATA FLOWFollowing a record into the golden core

One core, two living sources. Both CRMs keep writing; CDC streams every change through a translation layer into a unified read model; a daily reconciliation gates the cutover. The crosswalk sits in the middle, mapping every source ID to a golden ID — reversibly.

SOURCE PLANE · BOTH APPS KEEP WRITING (6–18 MONTHS) GOVERNANCE PLANE · RECONCILE DAILY · GATE THE CUTOVER acquirer_crm Postgres · customers target_crm MySQL · tenants CDC + TRANSLATION Debezium → Kafka field remap · ID rewrite DLQ on contract violation MATCH PLAYBOOK strict auto-merge fuzzy → analyst queue customer_xref source_id ↔ golden_id append-only · superseded_by golden_customer SCD2 master · sources[] canonical, mastered unified.customers dual-write target reads via the xref recon.daily_drift source vs unified counts pages if drift > 0.1% match_candidates tier 3–4 fuzzy MDM analyst review CUTOVER GATE 72h green required flip one entity at a time dim_metric_definition versioned · board cites auto-generates the SQL drift gates the flip SOLID — sustained data flow · DASHED — control signals · The crosswalk is the hinge; nothing is ever deleted.
FIG. 1 — End-to-end flow. Sources keep writing; CDC translates and streams; the append-only crosswalk maps source IDs to a mastered golden record; reconciliation gates every cutover on drift.

Three properties of this picture do most of the interview's work. First, the sources are never frozen — both CRMs take writes for the entire convergence, and the unified model is fed by CDC, not by a one-time dump that goes stale the moment it lands. Second, the crosswalk is the hinge of the whole design and it is append-only: a merge writes a row, an un-merge writes another row that supersedes it, and nothing is ever destroyed — which is the only way an analyst's "those are actually parent and subsidiary" becomes a correction instead of a catastrophe. Third, the dashed line from reconciliation is a gate, not a pipe: drift over one-tenth of a percent does not corrupt data, it withholds permission to cut over, and the clock resets.

The Failure Philosophy, In One Rule

Every merge is a hypothesis, and every hypothesis must be retractable. Strict rules auto-merge because they are nearly certain; fuzzy rules stage candidates for a human because they are guesses. But even a strict merge can be wrong, so no merge ever rewrites or deletes — it appends a crosswalk row pointing source IDs at a golden ID, and the un-merge appends a superseding row pointing them somewhere new, emits a CDC event, and lets the facts re-attribute. The failure mode is a reversible correction, never an irreversible loss.


§ 03 — DATA MODELA mastered core, a reversible crosswalk, a versioned metric

The schema falls out of reversibility and auditability. The golden record is SCD2 and lineage-bearing. The crosswalk is append-only with a supersession pointer. The metric layer is versioned so a number can be defended to the board.

The golden record — mastered, lineage-bearing, SCD2

The master entity is a UUID-keyed golden_customer that no source owns. Its canonical fields are SCD2 — a customer's industry classification or DUNS number can be corrected over time — and it carries a sources[] array, the lineage that makes overlap analysis a column lookup rather than a join. The presence of both source systems in that array is exactly the "customer in both products" signal Sales and Finance fight over on day ninety.

DDL · GOLDEN MASTER (SCD2 + LINEAGE)
-- The mastered entity. No source system owns it; it is synthesized. SCD2 on -- canonical fields; sources[] is the lineage that makes overlap a lookup. CREATE TABLE golden_customer ( golden_customer_id UUID PRIMARY KEY, canonical_name TEXT NOT NULL, canonical_domain TEXT, duns TEXT, -- Dun & Bradstreet number, if known industry TEXT, country TEXT, sources TEXT[] NOT NULL DEFAULT '{}', -- ['acquirer_crm','target_crm'] first_seen_at TIMESTAMPTZ NOT NULL DEFAULT now(), effective_from TIMESTAMPTZ NOT NULL DEFAULT now(), effective_to TIMESTAMPTZ, -- NULL = current; set on retire/un-merge status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active','retired')) );

The crosswalk — append-only, reversible, audit-trailed

This is the most important table in the system. It maps every (source_system, source_customer_id) to a golden_customer_id, records which rule fired and at what confidence, and — the part everyone forgets — carries a superseded_by pointer. The crosswalk is never updated in place and never deleted from; a re-match appends a new row and retires the old one by pointing at it. That single column is the difference between an un-merge being a SQL statement and being a two-week incident.

DDL · THE CROSSWALK (APPEND-ONLY + UN-MERGE)
CREATE TABLE customer_xref ( xref_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, source_system TEXT NOT NULL, -- 'acquirer_crm' | 'target_crm' source_customer_id TEXT NOT NULL, golden_customer_id UUID NOT NULL REFERENCES golden_customer, match_rule INT NOT NULL, -- 1..5, from the playbook match_confidence NUMERIC(4,3) NOT NULL, -- 0.000 .. 1.000 matched_at TIMESTAMPTZ NOT NULL DEFAULT now(), matched_by TEXT NOT NULL, -- 'rule_engine' | analyst id superseded_by BIGINT REFERENCES customer_xref, -- un-merge pointer CHECK (superseded_by <> xref_id) ); -- The CURRENT mapping for a source ID is the row with no successor: CREATE INDEX idx_xref_current ON customer_xref (source_system, source_customer_id) WHERE superseded_by IS NULL;

The unified read model and the reconciliation fact

The strangler's destination is unified.customers, keyed by account_idgolden_customer_id, carrying source lineage (source_system, source_row_id) so any unified row can be traced back to the record that produced it. Reads go through the crosswalk; writes arrive by dual-write from CDC. Watching over all of it is recon.daily_drift — the reconciliation fact that compares source row counts to unified row counts per system and per day, and pages the on-call the moment drift crosses one-tenth of a percent.

DDL · UNIFIED READ MODEL + RECONCILIATION FACT
-- Dual-write destination of the strangler. account_id IS the golden id. CREATE TABLE unified_customers ( account_id UUID NOT NULL, -- = golden_customer_id source_system TEXT NOT NULL, -- lineage: which app wrote this source_row_id TEXT NOT NULL, name TEXT, domain TEXT, industry TEXT, region TEXT, status TEXT, ingested_at TIMESTAMPTZ NOT NULL DEFAULT now(), ingest_status TEXT NOT NULL DEFAULT 'ok' CHECK (ingest_status IN ('ok','rejected')), PRIMARY KEY (source_system, source_row_id) -- idempotent on re-delivery ); -- The cutover gate, materialized. One row per (date, source). drift_pct is -- the single number that decides whether an entity may flip. CREATE TABLE recon_daily_drift ( recon_date DATE NOT NULL, source_system TEXT NOT NULL, source_rows BIGINT NOT NULL, unified_rows BIGINT NOT NULL, rejected_rows BIGINT NOT NULL, drift_pct NUMERIC(6,4) NOT NULL, -- pages on-call if > 0.1 lag_seconds BIGINT NOT NULL, -- pages on-call if > 300 PRIMARY KEY (recon_date, source_system) );

One more table completes the governance layer, and it is covered in §06: dim_metric_definition, the versioned home for "what does combined DAU actually mean." It is the reason a board number can be audited rather than argued.


§ 04 — THE CORE INVARIANTNo merge is ever final

Everything in this design protects one sentence: any merge can be undone without losing referential integrity in the pipelines that already consumed it. The invariant lives in the append-only crosswalk and its supersession pointer.

The lifecycle of a golden record is not a straight line; it is a graph that can be walked backward. A source row is ingested, matched, and either auto-merged into an existing golden record or made into a distinct one. Months later an analyst discovers a mistake — two companies merged that should not have been — and the record must un-merge. The invariant says this is a forward operation: retire, re-point, log, emit. Nothing is destroyed; the history of who was merged into whom, by which rule, at what confidence, survives the correction intact.

INGESTED MATCHED MERGED (golden_id) RETIRED RE-POINTED (new golden_id)

The un-merge is the atomic heart of the system, and it is the move that separates a designer who has shipped one of these from one who has only read about it. It runs as a single transaction: open a new golden record, append crosswalk rows pointing the affected source IDs at it, retire the old golden record by closing its SCD2 window, and stamp the supersession pointer so the audit trail is intact. The CDC event it emits lets downstream fact tables re-attribute on their own schedule. Never a DELETE.

SQL · THE UN-MERGE (REVERSIBLE, AUDIT-TRAILED)
-- "Acme Corp Holding" and "Acme Corporation" were merged but are actually -- parent/subsidiary. Split them WITHOUT deleting history. One transaction. BEGIN; -- 1. Mint the new golden record for the subsidiary. INSERT INTO golden_customer (golden_customer_id, canonical_name, sources) VALUES (:new_golden_id, 'Acme Corporation', ARRAY['target_crm']); -- 2. Append the corrected crosswalk rows (NEW rows; the old ones stay). INSERT INTO customer_xref (source_system, source_customer_id, golden_customer_id, match_rule, match_confidence, matched_by) SELECT source_system, source_customer_id, :new_golden_id, 5, 1.000, 'analyst:mdm-07' FROM customer_xref WHERE golden_customer_id = :old_golden_id AND source_customer_id = ANY(:subsidiary_source_ids) AND superseded_by IS NULL; -- 3. Retire the old mappings by pointing them at their replacements. UPDATE customer_xref old SET superseded_by = new.xref_id FROM customer_xref new WHERE old.golden_customer_id = :old_golden_id AND new.golden_customer_id = :new_golden_id AND old.source_customer_id = new.source_customer_id AND old.superseded_by IS NULL; -- 4. Close the old golden record's SCD2 window — retire, never delete. UPDATE golden_customer SET status = 'retired', effective_to = now() WHERE golden_customer_id = :old_golden_id; COMMIT; -- A CDC event on customer_xref lets fact tables re-attribute downstream.
The crosswalk only ever grows. A merge appends a row; an un-merge appends another; the truth is the latest row with no successor.INTEGRATION RULE Nº 1 — APPEND, NEVER DESTROY

§ 05 — INGESTION & STREAMSThe strangler dual-write in Python

Three programs carry the convergence: the translation layer that remaps source schemas without freezing them, the matching playbook that runs strict-before-fuzzy, and the dual-writer that lands records idempotently. Each is small; the judgment is in what they refuse to do.

1 · The translation layer — remap, rewrite, dead-letter

CDC gives you the raw source change; the translation layer turns it into the unified contract. It renames fields, rewrites source IDs into the unified namespace, and stamps lineage. Its one discipline is that a record violating the contract — an unexpected field, a missing key — goes to a dead-letter queue, not into the unified model and not silently dropped. A contract violation is a signal that a source app shipped a schema change; swallowing it is how drift creeps in unseen.

PYTHON · TRANSLATION LAYER (CDC → UNIFIED CONTRACT)
from dataclasses import dataclass @dataclass class UnifiedCustomer: source_system: str source_row_id: str name: str domain: str | None region: str # Per-source field maps. Both apps keep their native schema; the translation # layer is the ONLY place the two vocabularies meet. Add a source = add a map. FIELD_MAP = { "acquirer_crm": {"customer_id": "source_row_id", "company": "name", "domain": "domain", "country": "region"}, "target_crm": {"tenant_id": "source_row_id", "workspace_name": "name", "email_domain": "domain", "region": "region"}, } def translate(source_system: str, change: dict, dlq) -> UnifiedCustomer | None: """Map one CDC change to the unified contract. A contract violation is a SIGNAL (a source shipped a schema change), so it goes to the DLQ to be triaged — never into unified, never silently dropped.""" fmap = FIELD_MAP[source_system] try: out = {dest: change[src] for src, dest in fmap.items()} except KeyError as missing: dlq.put({"source": source_system, "change": change, "reason": f"missing field {missing}"}) # triage, don't guess return None return UnifiedCustomer(source_system=source_system, **out)

One distinction, always stated: the translation layer never matches. It only normalizes shape. Deciding whether this normalized record is the same customer as another is the playbook's job, kept separate so that a schema change and a matching change can be reasoned about — and rolled back — independently.

2 · The matching playbook — strict before fuzzy, first rule wins

You never decide two rows are the same customer with one rule. You run an ordered list from strict to permissive, and the first rule that fires wins — the same logic as airport security: passport first, then bag scan, then pat-down. Strict rules auto-merge into a golden record; fuzzy rules stage a candidate for a human via match_candidates. The ordering is the whole point: a confident identifier match must never be overruled by a fuzzy name similarity.

PYTHON · THE MATCH-KEY CASCADE (STRICT → FUZZY)
from rapidfuzz import fuzz # Ordered strict→permissive. First rule to fire wins. Strict auto-merges; # fuzzy stages a candidate for an MDM analyst. Confidence is recorded, never # discarded — it is what lets you audit and reverse a bad merge later. def match(rec, golden) -> tuple[int, float, str]: """Return (rule, confidence, disposition). disposition is 'auto' or 'review'.""" if rec.domain == golden.canonical_domain and rec.duns == golden.duns: return 1, 0.99, "auto" # strict: same domain AND same DUNS if rec.domain == golden.canonical_domain \ and normalize(rec.name) == normalize(golden.canonical_name): return 2, 0.95, "auto" # strict: domain AND normalized name if fuzz.token_sort_ratio(rec.name, golden.canonical_name) > 88 \ and rec.country == golden.country and rec.industry == golden.industry: return 3, 0.80, "review" # fuzzy: typo-tolerant name + geo + industry if shared_employee_emails(rec, golden) >= 3: return 4, 0.70, "review" # fuzzy: 3+ @same-domain emails on both return 5, 0.0, "distinct" # nothing fired → its own golden record

3 · The dual-writer — idempotent landing into the unified model

PYTHON · DUAL-WRITE LANDING (IDEMPOTENT)
async def dual_write(db, recs: list) -> None: """Land translated records into unified.customers. Idempotent on (source_system, source_row_id) so at-least-once CDC re-delivery is safe. The golden id is resolved through the CURRENT crosswalk row (no successor).""" for r in recs: await db.execute( """INSERT INTO unified_customers (account_id, source_system, source_row_id, name, domain, region, status) SELECT x.golden_customer_id, $1, $2, $3, $4, $5, 'active' FROM customer_xref x WHERE x.source_system = $1 AND x.source_customer_id = $2 AND x.superseded_by IS NULL -- the current mapping ON CONFLICT (source_system, source_row_id) DO UPDATE SET name = EXCLUDED.name, domain = EXCLUDED.domain, region = EXCLUDED.region, ingested_at = now()""", r.source_system, r.source_row_id, r.name, r.domain, r.region, )

§ 06 — AGGREGATIONReconciliation gates, and metrics get a version

Two slow layers do the governance work. Reconciliation turns "are we drifting?" into a number that gates the cutover. The versioned metric definition turns "what is combined DAU?" from an argument into an auditable row. Both exist so a human can defend a decision.

Reconciliation runs hourly during cutover and compares source row counts to what the unified model actually ingested, per source system. The output is drift_pct and lag_seconds; both have hard pager thresholds. Drift over one-tenth of a percent or lag over five minutes pages the on-call, and the cutover gate is the simple, unforgiving rule on top: seventy-two consecutive green hours before an entity is allowed to flip, and one amber hour resets the clock to zero.

SQL · RECONCILIATION (RUNS HOURLY DURING CUTOVER)
WITH source_counts AS ( SELECT 'acquirer_crm' AS src, count(*) AS n, max(updated_at) AS last_change FROM acquirer_crm.customers UNION ALL SELECT 'target_crm', count(*), max(updated_at) FROM target_crm.tenants ), unified_counts AS ( SELECT source_system AS src, count(*) AS n, max(ingested_at) AS last_ingest, count(*) FILTER (WHERE ingest_status = 'rejected') AS rejected FROM unified_customers GROUP BY source_system ) SELECT s.src, s.n AS source_rows, u.n AS unified_rows, 100.0 * abs(s.n - u.n) / nullif(s.n, 0) AS drift_pct, u.rejected, extract(epoch FROM (s.last_change - u.last_ingest)) AS lag_seconds FROM source_counts s LEFT JOIN unified_counts u USING (src); -- drift_pct > 0.1 → page. lag_seconds > 300 → page. Cutover: 72h all green.

The second governance layer is the versioned metric. The board deck has "combined DAU" on slide three, and it is wrong — because the acquirer counts "any session in 24h" while the target counts "any session in 7 days, deduped to one row per user-day." Rather than bury that disagreement in SQL, bake the definition into a versioned dimension. Every dashboard tile cites a (metric_id, definition_version); combined DAU has three competing versions; Finance picks the one that matches the board narrative; the data layer auto-generates the SQL from the row.

DDL · VERSIONED METRIC DEFINITION (THE BOARD CAN AUDIT)
CREATE TABLE dim_metric_definition ( metric_id TEXT NOT NULL, -- 'dau' definition_version INT NOT NULL, org_scope TEXT NOT NULL, -- 'acquirer' | 'target' | 'combined' window_unit TEXT NOT NULL, -- 'hour' | 'day' | 'week' window_size INT NOT NULL, dedup_grain TEXT NOT NULL, -- 'user' | 'user-day' | 'session' fiscal_calendar TEXT NOT NULL, -- 'calendar' | '445' currency_policy TEXT NOT NULL, -- 'lock-at-txn' | 'lock-at-month-end' effective_from DATE NOT NULL, effective_to DATE, PRIMARY KEY (metric_id, definition_version) );
A number without a version is an opinion. The board deck does not cite a query; it cites a definition row — and three of them disagree, on purpose.INTEGRATION RULE Nº 2 — VERSION THE DEFINITION

§ 07 — ANALYTICS SQLThe day-90 questions

The golden core and its crosswalk are where the deal explains itself. Three queries an interviewer loves, because each carries a classic pattern: overlap as a set intersection, deduped roll-up through a crosswalk, and an as-of resolution over the append-only mappings.

Overlap analysis — the set intersection Sales fights over

"How many customers are in both products?" is the first question Sales asks (for the cross-sell list) and the first Finance asks (to deduct overlapping ARR before announcing combined revenue). Because lineage lives in golden_customer.sources[], it is a containment test, not a join: the golden records whose sources array holds both systems are precisely the overlap.

SQL · CUSTOMERS IN BOTH SYSTEMS (LINEAGE CONTAINMENT)
SELECT count(*) FILTER (WHERE sources @> ARRAY['acquirer_crm','target_crm']) AS in_both, count(*) FILTER (WHERE sources = ARRAY['acquirer_crm']) AS acquirer_only, count(*) FILTER (WHERE sources = ARRAY['target_crm']) AS target_only FROM golden_customer WHERE status = 'active'; -- @> is array-containment: "sources holds both". The overlap (~40% on the -- pro tier in Adobe-Figma) is the cross-sell list AND the ARR-dedup base.

Combined ARR — deduped roll-up through the crosswalk

Summing both companies' revenue naively gives 1 + 1 = 2 for a customer that exists in both — overstating combined ARR by the overlap. The fix is to route every order through the current crosswalk row to its golden id, then sum by golden id, so a shared customer contributes once. The gap between the gross sum and this deduped figure is exactly the number Finance must disclose.

SQL · DEDUPED COMBINED ARR (NO 1+1=2)
WITH attributed AS ( SELECT x.golden_customer_id, o.arr FROM orders o JOIN customer_xref x ON x.source_system = o.source_system AND x.source_customer_id = o.source_customer_id AND x.superseded_by IS NULL -- current mapping only ) SELECT sum(arr) AS gross_arr, sum(arr_per_golden) AS deduped_arr, sum(arr) - sum(arr_per_golden) AS overlap_deduction FROM ( SELECT golden_customer_id, sum(arr) AS arr_per_golden, sum(arr) AS arr FROM attributed GROUP BY golden_customer_id ) g;

As-of mapping — which golden id did this order point at last quarter?

After an un-merge, "what was this source customer's golden id on a given date?" has two answers — before and after the correction. The append-only crosswalk makes the historical answer recoverable: walk the supersession chain and pick the row that was current as of the date in question. This is the audit query that defends a restated number to Finance.

SQL · AS-OF GOLDEN ID (TEMPORAL OVER THE CROSSWALK)
SELECT DISTINCT ON (source_system, source_customer_id) source_system, source_customer_id, golden_customer_id, matched_at, match_rule FROM customer_xref WHERE matched_at <= :as_of_date -- mapping as it stood that day ORDER BY source_system, source_customer_id, matched_at DESC; -- Because the crosswalk is append-only, every historical mapping survives. -- A restated ARR figure can always show which golden id it rolled up to.
Warehouse Consolidation — Three Strategies, One Default

Big-bang migration (copy data, rewrite dbt models) fits a strategic incompatibility caught early. Federated query (Snowflake External Tables, Unity Catalog, Trino) fits stacks that complement and a vendor decision you want to defer. Share-and-bridge (cross-account Snowflake Share, Delta Sharing, Authorized Views) is what every real deal uses for the first six months — lowest engineering cost, unblocks day-90 reporting. Tag every query with cost_owner ∈ {pre-merger-target, pre-merger-acquirer, post-merger-combined} so the CFO's "how much of last month's bill came from the acquired team" has an answer.


§ 08 — THE DASHBOARDProving the cutover is safe

A senior design ends with observability, because the whole convergence is a bet that you can watch. The dashboard answers one operational question per loop: is the drift under the gate, is the analyst queue draining, and does every board number cite a version?

RECONCILIATION
drift_pct per source against the 0.1% gate, lag_seconds against the 5-minute pager, reject rate on the unified ingest, and the green-hours streak — the count toward 72 that one amber hour resets.
ENTITY RESOLUTION
auto-merge rate (strict rules firing), candidate queue depth awaiting analyst review, un-merge count (a spike means a bad rule), and DLQ depth on the translation layer — a rising DLQ means a source shipped a schema change.
BOARD READINESS
overlap % feeding the ARR deduction, tiles citing a metric version (must be 100%), and cost by owner so the CFO's bill-attribution question is answered before it is asked.
Integration Ops — Customer Entity Cutover DAY 61 OF 90 · 14:20 UTC · 60s REFRESH
Drift — acquirer
0.04%
Drift — target
0.12%
CDC Lag p95
38s
Green-Hours Streak
19/72
Drift — target_crm vs gate · last 36 h (0.1% line)
0.1% gate a backfill batch breached at hour 28 — clock reset
Auto-Merge Rate
82%
Candidate Queue
1.4k
Un-Merges 7d
6
Translation DLQ
0
Overlap %
39%
Tiles Versioned
100%
FIG. 2 — The story a careful cutover tells: the acquirer source is green and steady, but the target breached the 0.1% gate at hour 28 — a backfill batch outran ingestion — and the green-hours clock reset to 19. The cutover does not flip until the streak reaches 72.

Read the amber tiles together and the dashboard narrates the discipline from §06: the target CRM's drift crept over the gate during a historical backfill, the on-call was paged, and the seventy-two-hour clock reset to nineteen — no data was lost, the cutover was simply not yet permitted. The DLQ at zero says both source schemas are still honoring the contract; the auto-merge rate at 82% says the strict rules are carrying the load while the analyst queue handles the fuzzy tail; every board tile cites a metric version. That is what a designed, reversible, gated integration looks like from the operator's chair on day sixty-one.


§ 09 — THE RUBRICWhat was actually being tested

Strip the deal away and the question was testing five judgments, each of which generalizes far beyond M&A:

DECOMPOSITION
Refusing to ship one "integration plan" — seeing four orthogonal problems (entity resolution, schema unification, warehouse consolidation, metrics) with different owners and timelines, and sequencing them.
REVERSIBILITY
Designing the un-merge before you need it: append-only crosswalk, superseded_by, retire-never-delete. The first irreversible merge is the incident that teaches this the hard way.
LIVE MIGRATION
Knowing you cannot freeze a database that is still taking writes — strangler dual-write through a translation layer, with the contract violation dead-lettered, not swallowed.
GATING
Letting a single number — drift under 0.1% for 72 hours — hold the authority to cut over, so the decision is mechanical and defensible rather than a judgment call under deadline pressure.
DEFENSIBILITY
Versioning the metric definition so a board number can be audited: combined DAU has three competing versions, Finance picks one on the record, and the SQL is generated from the row.
Adobe-Figma, Microsoft-Activision, Cisco-Splunk, Salesforce-Slack — all four hit the same four problems. The only difference is which one was hardest in their context, and whether the team built the un-merge before or after they needed it.— CLOSING ARGUMENT