PaddySpeaks · Systems at the Whiteboard · Nº 04

The Attribution Problem

One human sees an ad on a phone, opens the site on a laptop, and buys on a tablet a week later — and a platform must decide which touch earned the sale, honestly, across devices, while Apple has cut the signal that used to make it easy. A complete working through: the identity graph, the SCD2 device bridge, match-at-read attribution, parallel windows, the SKAN carve-out, holdout lift, and the dashboard.

§ 01 — THE QUESTIONOne human, many devices, a week of silence

This prompt looks like a join problem and is in fact an identity problem wearing a join's clothes — because the three events you must connect were never stamped with the same key, and the law now forbids you from minting one.

Interview Prompt

"Design a model for an ad platform where a user sees an ad on phone, opens the advertiser's site on laptop later, and converts on tablet a week after. Support 1/7/28-day click and view-through windows, ATT/SKAdNetwork iOS constraints, and incrementality lift. How do you scope it?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The trap is to assume the three events share a user id. They do not. A phone impression, a laptop visit, and a tablet purchase arrive as three rows with three different device tokens and — on iOS, after Apple's App Tracking Transparency — quite possibly no user identifier at all. The only thing that can connect them is an identity graph: a model that says, with some confidence, that these three devices belong to one person. And confidence is the word that makes this senior. A login is certainty; a shared household IP is a guess; behavioral overlap is a softer guess still. A system that pretends a 0.88 probabilistic link is the same as a 1.0 deterministic one is lying to the advertiser about why they should believe the number.

A weak answer hard-codes attribution into the pipeline — last-click, one window — and silently drops everything iOS won't let it see. A strong answer notices two things: that the identity graph must be versioned, because devices come and go, and that attribution must be computed at read time, because the advertiser will ask for last-click and data-driven and three windows over the same data. So before any tables, the frame for the session:

THE GRAPH
The identity bridge. Many devices per person, each link carrying a confidence and a method. Versioned (SCD2) because a device sold or wiped is a link that ends — and an attribution must read the graph as it stood when the events happened.
MATCH AT READ
Attribution as a function. Never computed at ingest. Every conversion is paired with every eligible touchpoint; the model and the window are arguments evaluated at query time. Re-attribution is a SELECT; multiple windows run in parallel.
THE CARVE-OUT
Two privacy paths. Opted-in users flow through the graph at user level. iOS opt-outs and SKAdNetwork postbacks are aggregate-only — a separate fact that never joins to identity. The wall is in the schema, not a WHERE clause.
Cross-device attribution is honest only when it is honest about its uncertainty. A probabilistic link is not a fact — it is a weight, and the schema must carry the weight, not hide it.

Scoping out loud

Scope is the first scored dimension, so name it. In scope: the touch/conversion facts, the identity graph and its versioned device bridge, match-at-read attribution with parallel click and view windows, the SKAN aggregate path, and incrementality via holdouts. Out of scope, said explicitly: the identity resolution algorithm itself (the probabilistic matching that decides which devices link and at what confidence — treated as an upstream producer of bridge_identity_devices), ad serving and the auction, creative storage, and billing rails. The caveat: the model must not preclude a future deterministic signal (a publisher login, a clean-room match) upgrading a link's confidence — so confidence is a column on the bridge, never a hard-coded constant.

Then the envelope math, volunteered. Cross-device-ads numbers at platform scale:

QuantityEstimateConsequence
Ad events / day~10,000,000,000Impressions + clicks across all surfaces — the touch firehose
Devices per identity2.6 avgThe bridge fans every conversion out to several touchpoints
iOS opt-out share~75% of iOSThe row that forces the aggregate carve-out, not a filter
Conversion-to-touch lagup to 28 daysWindows span 1/7/28 days × click vs view — read-time math
Attribution models livelast-click · MTA · DDAAll run in parallel; advertiser picks at query time
Link confidence range0.88 – 1.00Probabilistic to deterministic — weights, not booleans
Holdout share~1% of usersSized for statistical power on lift, not derived from campaigns

Read the table and the architecture is half-decided. The iOS opt-out row dictates that a privacy wall is a structural fact, not a query predicate — three-quarters of iOS will never have a user-level identity to join. The lag and the parallel-models rows dictate match-at-read: with windows up to 28 days and several models live at once, attribution cannot be frozen at ingest. And the confidence range dictates that the bridge carries a weight, because the difference between a login and a household IP is the difference between a fact and an inference. The rest of this article follows the identity.


§ 02 — DATA FLOWThree devices, one identity, two paths

One graph, two privacy paths. Opted-in touches and conversions fan into a single identity through the versioned bridge; iOS opt-outs and SKAN postbacks travel a walled aggregate lane that never touches a person.

USER-LEVEL PATH · OPTED-IN · CROSS-DEVICE AGGREGATE PATH · iOS OPT-OUT + SKAN · WALLED PHONE · impression Apr 25 · feed LAPTOP · click Apr 28 · feed TABLET · impression Apr 30 · reel fct_ad_events event_type · device_token identity_id (NULL on iOS) privacy_path IDENTITY BRIDGE (SCD2) device → identity · confidence effective_from / to dim_identities one row per person fct_conversions tablet purchase · May 2 MATCH-AT-READ pair conv × touchpoints window × model × run resolve identity as-of event_ts iOS DEVICE (opt-out) no identity · SKAN fct_skan_postbacks campaign · value_bucket aggregate only AGGREGATE REPORTING campaign × day rollups HOLDOUT / LIFT treatment − holdout THE WALL · aggregate never joins identity SOLID — immutable writes · DASHED — the as-of identity resolution · The graph connects people; the wall protects the ones who said no.
FIG. 1 — Two paths, one wall. Opted-in devices resolve to a single identity through the versioned bridge and fan into match-at-read; iOS opt-outs and SKAN live in a walled aggregate lane that never reaches a person.

Three properties of this picture do most of the interview's work. First, the identity is resolved as-of the event's timestamp through the SCD2 bridge — the phone impression links to whatever identity owned that device on April 25, not whatever owns it today — so a re-sold phone never mis-credits a stranger. Second, attribution is a plane unto itself: the conversion is paired with its eligible touchpoints and a run writes credit as rows, so windows and models multiply without ever rebuilding the facts. Third, the wall is a structural line, not a predicate — SKAN postbacks live in a fact that has no identity column to join on, so an aggregate-only measurement cannot accidentally be de-anonymized by a careless query.

The Uncertainty Philosophy, In One Rule

Carry the confidence; never collapse it. A deterministic link (a login) and a probabilistic one (a household IP) are different kinds of truth, and the bridge stores both as a number between 0 and 1 with the method that produced it. Downstream, last-click can ignore the weight, but multi-touch and data-driven models read it — a tablet linked at 0.88 contributes less certain credit than a phone linked at 1.0. The system never launders a guess into a fact; the moment a link cannot be made at all — an iOS opt-out with a null identity — the event is routed to the aggregate path, counted, and never guessed into a person.


§ 03 — DATA MODELA graph, a versioned bridge, and a walled aggregate

The schema falls out of the identity question. A graph of people; a SCD2 bridge that links devices to people with confidence; immutable touch and conversion facts; a derived, append-only attribution fact; and a SKAN fact behind a wall, with no key that could reach a person.

The identity graph and its bridge

The heart of the model is the bridge. One row per (identity, device) link, each carrying a link_confidence (1.0 deterministic, below that probabilistic), the link_method that produced it, and the SCD2 validity window — because devices come and go, and an attribution must read the graph as it was, not as it is. The identity itself is a thin row: a person token and the strongest resolution method that vouches for them.

DDL · IDENTITY GRAPH (SCD2 BRIDGE)
-- One resolved person. Thin by design -- the edges carry the weight. CREATE TABLE dim_identities ( identity_id TEXT PRIMARY KEY, person_token TEXT NOT NULL, -- privacy-tokenized, never raw PII resolution_method TEXT NOT NULL -- strongest method that vouches ); -- The heart of cross-device. Many devices per person, each link a -- WEIGHT, not a boolean. SCD2: a device sold or wiped ends a link, so -- attribution can read the graph AS IT STOOD at the event's timestamp. CREATE TABLE bridge_identity_devices ( identity_id TEXT NOT NULL REFERENCES dim_identities, device_id_token TEXT NOT NULL, link_confidence NUMERIC(4,3) NOT NULL -- 1.000 login; <1 inferred CHECK (link_confidence > 0 AND link_confidence <= 1), link_method TEXT NOT NULL CHECK (link_method IN ( 'login','email_match','household_ip', 'behavioral','deterministic_id')), effective_from TIMESTAMPTZ NOT NULL, effective_to TIMESTAMPTZ, -- NULL = currently linked PRIMARY KEY (identity_id, device_id_token, effective_from) ); CREATE INDEX idx_bid_device ON bridge_identity_devices (device_id_token, effective_from);

The touch and the conversion

A single touch fact carries both impressions and clicks, event-typed, because attribution treats them as the same kind of thing — eligible touchpoints differing only in whether they open a click window or a view window. The decisive column is identity_id, which is nullable: on iOS opt-out it is null by law, and that null is the schema's way of saying "this touch can only ever be counted in aggregate." The conversion is the advertiser's reported outcome, tied to an identity when one is known.

DDL · TOUCH (EVENT-TYPED) + CONVERSION
-- One row per impression OR click. identity_id is NULLABLE: NULL means -- iOS opt-out -- this touch is aggregate-only and cannot enter the graph. CREATE TABLE fct_ad_events ( event_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, event_type TEXT NOT NULL CHECK (event_type IN ('impression','click')), identity_id TEXT REFERENCES dim_identities, -- NULL on iOS opt-out device_id_token TEXT NOT NULL, campaign_id BIGINT NOT NULL REFERENCES dim_campaigns, creative_id BIGINT NOT NULL REFERENCES dim_creatives, placement TEXT NOT NULL, -- feed | reel | story ... event_ts TIMESTAMPTZ NOT NULL, privacy_path TEXT NOT NULL DEFAULT 'opted_in' CHECK (privacy_path IN ('opted_in','att_optout','limited')) ); CREATE INDEX idx_ae_identity ON fct_ad_events (identity_id, event_ts); CREATE INDEX idx_ae_device ON fct_ad_events (device_id_token, event_ts); -- The advertiser-reported outcome. Tied to an identity when known. CREATE TABLE fct_conversions ( conversion_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, advertiser_id BIGINT NOT NULL REFERENCES dim_advertisers, identity_id TEXT REFERENCES dim_identities, conversion_ts TIMESTAMPTZ NOT NULL, conversion_type TEXT NOT NULL, -- purchase | signup | install value_usd NUMERIC(10,2) NOT NULL DEFAULT 0 );

Attribution as a match, and SKAN behind the wall

Attribution is the match grain — one row per (conversion × eligible touchpoint × window × model × run), partitioned by attribution_run_id. The same conversion appears under click_7d / last-click with full credit and under view_28d / MTA with a fraction, all in one run. SKAN is its own fact with a deliberately impoverished schema: a campaign, a coarse conversion-value bucket, a postback count — and crucially no identity column and no event id, so the privacy wall is enforced by the absence of a join key, not by a query author's discipline.

DDL · ATTRIBUTION (MATCH GRAIN) + SKAN (WALLED)
-- The match grain. NOT collected -- computed at read time and persisted -- per run. Multiple windows AND models coexist for one conversion. CREATE TABLE fct_attributions ( attribution_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, conversion_id BIGINT NOT NULL REFERENCES fct_conversions, ad_event_id BIGINT NOT NULL REFERENCES fct_ad_events, window_type TEXT NOT NULL CHECK (window_type IN ( 'click_1d','click_7d','click_28d', 'view_1d','view_7d','view_28d')), model_id TEXT NOT NULL REFERENCES dim_attribution_models, attribution_run_id TEXT NOT NULL, -- partition; latest wins on read is_attributed BOOLEAN NOT NULL, fractional_credit NUMERIC(5,4) NOT NULL, -- weighted by link_confidence in MTA attributed_value_usd NUMERIC(10,2) NOT NULL ); CREATE INDEX idx_attr_run ON fct_attributions (attribution_run_id, model_id, window_type); -- The walled aggregate. SKAdNetwork returns no user-level data, so this -- fact has NO identity_id and NO ad_event_id -- there is literally no key -- with which to join it to a person. The wall is structural. CREATE TABLE fct_skan_postbacks ( postback_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, campaign_id BIGINT NOT NULL REFERENCES dim_campaigns, conversion_value_bucket SMALLINT NOT NULL, -- coarse 0-63, not a dollar value redownload BOOLEAN NOT NULL DEFAULT FALSE, fidelity_type SMALLINT NOT NULL, -- SKAN postback fidelity postback_ts TIMESTAMPTZ NOT NULL, postback_count INT NOT NULL DEFAULT 1 );

The remaining dimensions are conformed and SCD2 where history matters. dim_campaigns and dim_creatives version budget, objective, and placement. dim_attribution_models is the keystone of re-attribution — the model is a versioned row, so "last-click vs MTA vs DDA" is a join, never an ETL fork. And dim_user_holdouts is the one dimension that is deliberately not derived: holdout assignment lives as its own SCD2 fact per advertiser × period, so a campaign re-org cannot silently dissolve the control group an incrementality study depends on.


§ 04 — THE CORE INVARIANTWhere the credit — and the consent — are conserved

The correctness of the whole model lives in three rules. The conservation rule: per conversion, per window, per model, credit sums to one. The as-of rule: identity is resolved at the event's timestamp, through the versioned bridge. The wall rule: the aggregate path has no key to a person, by construction.

Cross-device attribution divides one sale's value among touches on different devices, so the same dangers as any ledger apply — double-credit, lost fractions — plus one unique to identity: crediting the wrong person because a device changed hands. The first guard is arithmetic and per-conversion: within one window under one model in one run, fractional_credit sums to exactly 1.0 (or 0.0 when no touch falls in window — unattributed, a valid recorded answer). This is RULE Nº 1, made executable, and in multi-touch models the credits are shaped by each link's confidence before they are normalized back to one.

The conversion's journey to credit

TOUCH IDENTITY RESOLVED CONVERSION ELIGIBLE SET ATTRIBUTED

Two branches leave this spine. A touch whose identity cannot be resolved skips straight to the aggregate path — SKAN / AGGREGATE, never reaching the eligible set. And from CONVERSION, a re-run re-enters ATTRIBUTED under a new attribution_run_id — the only state allowed to repeat, because re-attributing when a model upgrades is the entire reason for match-at-read. Every other transition is immutable.

SQL · MATCH-AT-READ — CONFIDENCE-WEIGHTED, SUMS TO ONE
-- Materialize one attribution run for one window+model. Pair each -- conversion with its in-window touchpoints (resolved AS-OF event_ts via -- the SCD2 bridge), weight by link_confidence, normalize to 1, persist. INSERT INTO fct_attributions (conversion_id, ad_event_id, window_type, model_id, attribution_run_id, is_attributed, fractional_credit, attributed_value_usd) WITH touch AS ( SELECT cv.conversion_id, cv.value_usd, e.event_id, b.link_confidence, model_weight(:model_id, e.event_type, e.event_ts, cv.conversion_ts) AS w FROM fct_conversions cv JOIN bridge_identity_devices b ON b.identity_id = cv.identity_id AND cv.conversion_ts >= b.effective_from -- as-of: graph as it stood AND cv.conversion_ts < coalesce(b.effective_to, 'infinity') JOIN fct_ad_events e ON e.device_id_token = b.device_id_token AND e.event_ts <= cv.conversion_ts AND e.event_ts >= cv.conversion_ts - window_span(:window_type) WHERE cv.identity_id IS NOT NULL -- opted-in only; iOS -> aggregate ) SELECT conversion_id, event_id, :window_type, :model_id, :run_id, TRUE, (w * link_confidence) / sum(w * link_confidence) OVER (PARTITION BY conversion_id), round(value_usd * (w * link_confidence) / sum(w * link_confidence) OVER (PARTITION BY conversion_id), 2) FROM touch; -- the PARTITION BY normalization guarantees § 04: credit sums to 1 per conversion.
A conversion's credit sums to one, weighted by how sure we are each device is the same person. The model proposes the split; the confidence shapes it; the normalization conserves it.ATTRIBUTION RULE Nº 1 — CREDIT, WEIGHTED BY CERTAINTY

The third rule is the wall, and it is the one most candidates state as a WHERE clause and most senior engineers state as a schema. Because fct_skan_postbacks has no identity and no event id, there is no query — careless, malicious, or accidental — that can join a SKAN postback back to a person. The aggregate path can only ever be grouped by campaign and day. This is the difference between a privacy policy and a privacy guarantee: a policy is a promise a query might break; a guarantee is a key that does not exist. The opted-in graph and the aggregate lane share a warehouse and share nothing else.


§ 05 — INGESTION & STREAMSPython on identity

Three programs carry the write path: the touch router that decides graph-or-aggregate at the door, the bridge maintainer that versions a link without ever losing history, and the holdout guard that keeps the control group sacred. Each is small; the judgment is in what they refuse to do.

1 · The touch router — graph or aggregate, never both

Every incoming touch must be sorted onto exactly one path. An opted-in event resolves its device to an identity through the bridge and lands in the user-level fact; an iOS opt-out lands with a null identity, destined only for aggregate rollups. The router refuses the one thing that would breach the wall: it never invents an identity for an opt-out, and it never lets an unresolved opted-in device silently borrow a stranger's identity.

PYTHON · TOUCH ROUTER (PATH AT THE DOOR)
async def route_touch(pool, ev: dict) -> None: """Sort each touch onto exactly one path. Opt-outs get NULL identity and are aggregate-only forever -- we NEVER fabricate an identity to 'rescue' a conversion, because that is exactly the consent the user declined. Opted-in devices resolve as-of now; an unresolved device is stored with NULL identity (countable) but never mis-linked.""" if ev["privacy_path"] == "att_optout": identity = None # by law, and by design else: async with pool.acquire() as con: identity = await con.fetchval(""" SELECT identity_id FROM bridge_identity_devices WHERE device_id_token = $1 AND $2 >= effective_from AND $2 < COALESCE(effective_to, 'infinity') ORDER BY link_confidence DESC LIMIT 1 """, ev["device_id_token"], ev["event_ts"]) async with pool.acquire() as con: await con.execute(""" INSERT INTO fct_ad_events (event_type, identity_id, device_id_token, campaign_id, creative_id, placement, event_ts, privacy_path) VALUES ($1,$2,$3,$4,$5,$6,$7,$8) """, ev["event_type"], identity, ev["device_id_token"], ev["campaign_id"], ev["creative_id"], ev["placement"], ev["event_ts"], ev["privacy_path"])

2 · The bridge maintainer — version, never overwrite

When the upstream resolver reports that a device's link has changed — a phone sold, a confidence upgraded by a fresh login — the maintainer expires the old row and opens a new one. It refuses to mutate history: the previous link is closed with an effective_to, not deleted, so an attribution run over last month still reads last month's graph. This is the SCD2 discipline that makes "as-of" possible.

PYTHON · BRIDGE MAINTAINER (SCD2, AS-OF SAFE)
async def relink_device(pool, link: dict) -> None: """A device's identity or confidence changed. Close the current row and open a new one -- NEVER UPDATE in place. History is the whole point: an attribution run over April must see the April graph, even after the phone was sold in May. Idempotent on (device, identity, confidence): a no-op change opens no new version.""" async with pool.acquire() as con, con.transaction(): cur = await con.fetchrow(""" SELECT identity_id, link_confidence FROM bridge_identity_devices WHERE device_id_token = $1 AND effective_to IS NULL """, link["device_id_token"]) if cur and cur["identity_id"] == link["identity_id"] \ and cur["link_confidence"] == link["link_confidence"]: return # nothing changed -- no new version if cur: await con.execute(""" UPDATE bridge_identity_devices SET effective_to = $2 WHERE device_id_token = $1 AND effective_to IS NULL """, link["device_id_token"], link["effective_from"]) await con.execute(""" INSERT INTO bridge_identity_devices (identity_id, device_id_token, link_confidence, link_method, effective_from, effective_to) VALUES ($1,$2,$3,$4,$5,NULL) """, link["identity_id"], link["device_id_token"], link["link_confidence"], link["link_method"], link["effective_from"])

3 · The holdout guard — the control group is sacred

Incrementality is only real if the holdout never sees an ad. The guard sits in the serving request path and refuses to let a held-out identity be targeted — and, just as importantly, it ensures held-out users who convert anyway are counted as conversions without ad events, which is exactly what makes the lift math valid. It refuses to derive holdout membership from campaign state; membership is read from the dedicated dimension, so a re-org cannot leak the control group into treatment.

PYTHON · HOLDOUT GUARD (CONTROL STAYS CLEAN)
async def may_serve(pool, identity_id: str, advertiser_id: int, period: str) -> bool: """Return False if this identity is in the advertiser's holdout for this period. Membership is READ from dim_user_holdouts, never inferred from campaign targeting -- a campaign re-org must not dissolve the control group. A held-out user who converts anyway becomes a conversion with NO ad events, which is precisely what lift measurement needs.""" async with pool.acquire() as con: held = await con.fetchval(""" SELECT 1 FROM dim_user_holdouts WHERE identity_id = $1 AND advertiser_id = $2 AND period = $3 """, identity_id, advertiser_id, period) return held is None # in holdout -> never serve; conversion still counts

One carve-out, always stated: none of these programs computes attribution. The router sorts, the maintainer versions, the guard protects the experiment — but credit is never assigned at ingest. That refusal is the architecture: because nothing here freezes a touch into an attributed outcome, the measurement layer is free to re-decide every conversion under any model, over any window, as many times as the advertiser asks.


§ 06 — AGGREGATIONParallel windows and the lift calculation

Two derived layers carry the slow loop. The attribution run materializes every window and model in parallel as an append-only batch. The lift calculation compares treatment against holdout — the one number that survives the death of cross-device signal, because it needs no identity match at all.

The attribution run's defining trait is that it produces many answers at once. A single run sweeps the live windows — click_1d, click_7d, view_28d — and the live models, writing a row for every combination that attributes. The same conversion legitimately appears a dozen times: once per window-model pair it qualifies for. This is not duplication; it is the product surface. The advertiser whose funnel closes in a day reads click_1d; the one selling a considered purchase reads view_28d; the platform reports all of them, and the latest run wins on read while every prior run remains for audit.

PYTHON · ATTRIBUTION RUN (ALL WINDOWS × MODELS)
# One run sweeps every live window AND model, emitting a row per # qualifying combination. The same conversion in click_1d and view_28d is # the PRODUCT, not a bug -- the advertiser picks the lens at query time. WINDOWS = ["click_1d", "click_7d", "click_28d", "view_1d", "view_28d"] MODELS = ["MODEL_LAST_CLICK", "MODEL_MTA", "MODEL_DDA"] def run_attribution(conversions, graph, touches): run_id = f"RUN_{date.today():%Y_%m_%d}" for window in WINDOWS: for model in MODELS: for cv in conversions: if cv.identity_id is None: continue # iOS opt-out -> aggregate path eligible = touches.in_window(cv, window, graph) # as-of resolve if not eligible: continue # no touch in window -> not attributed credits = model.distribute(eligible, graph) # confidence-weighted assert abs(sum(credits.values()) - 1.0) < 1e-3 # § 04, in code yield from emit(cv, window, model, run_id, credits)

The lift calculation is the slow loop's other half, and it is the strategic counterweight to the entire identity edifice. Attribution asks "which touch caused this sale?" — a question that gets harder every year as signal erodes. Incrementality asks a different, more robust question: "how many more sales happened because of the ads?" — answered by comparing the conversion rate of treated users against a holdout that saw nothing. It needs no cross-device match, no window, no model; it survives ATT untouched, because a holdout user who never saw an ad is a clean counterfactual regardless of how many devices they own.

Attribution divides the credit for sales that happened. Incrementality measures the sales that would not have. When the signal dies, the holdout is the truth that remains.MEASUREMENT RULE Nº 1 — LIFT NEEDS NO MATCH

§ 07 — ANALYTICS SQLInterrogating the graph

The facts are where the system explains itself — once identity is resolved. Three queries an interviewer loves, because each answers a different stakeholder and each carries a classic pattern on its back.

Conversions by window — the parallel-window comparison

The advertiser's first question: how many conversions fall inside each window, so they can pick the one that matches their funnel. Because every window's rows coexist in the attribution fact, this is a GROUP BY window_type — the same conversions, sliced by horizon. The pattern is dimensional comparison over a derived fact.

SQL · ATTRIBUTED CONVERSIONS BY WINDOW (LATEST RUN)
-- One scan, every window. Lets the advertiser see how their conversions -- spread across 1d/7d/28d and choose the horizon that fits their funnel. SELECT atr.window_type, count(DISTINCT atr.conversion_id) AS conversions_attributed, round(sum(atr.attributed_value_usd), 2) AS attributed_value FROM fct_attributions atr WHERE atr.attribution_run_id = (SELECT max(attribution_run_id) FROM fct_attributions) AND atr.model_id = 'MODEL_LAST_CLICK' AND atr.is_attributed = TRUE GROUP BY atr.window_type ORDER BY attributed_value DESC;

Cross-device match rate — coverage of the graph

The measurement team's health query, and the one that watches the graph erode. What share of conversions threaded back to at least one ad event via identity? The pattern is a left-anti-join expressed as a coverage ratio — total conversions over those the graph could connect — and its decline is the leading indicator of signal loss.

SQL · CROSS-DEVICE MATCH RATE
-- The coverage metric: % of conversions the identity graph could tie to -- >=1 ad event. Falling match rate = the graph is going dark (ATT, churn). WITH conv AS ( SELECT conversion_id, identity_id FROM fct_conversions WHERE conversion_ts >= CURRENT_DATE - INTERVAL '30 days' ), matched AS ( SELECT DISTINCT atr.conversion_id FROM fct_attributions atr JOIN conv USING (conversion_id) WHERE atr.is_attributed = TRUE ) SELECT count(DISTINCT c.conversion_id) AS total_conversions, count(DISTINCT m.conversion_id) AS matched_conversions, round(100.0 * count(DISTINCT m.conversion_id) / nullif(count(DISTINCT c.conversion_id), 0), 1) AS match_rate_pct FROM conv c LEFT JOIN matched m USING (conversion_id);

Treatment vs holdout — incrementality as two proportions

The growth team's defining query, and the one that needs no identity match to be valid. Compute the conversion rate of treated users beside the holdout's, and the difference is lift. The pattern is a partitioned-cohort comparison — two arms, one metric — whose significance is a Welch's t-test on the two proportions.

SQL · CONVERSION RATE — TREATMENT VS HOLDOUT
-- Incrementality. Lift = treatment_rate - holdout_rate. Survives ATT -- because a holdout user who saw nothing needs no cross-device match. WITH treatment AS ( SELECT i.identity_id, EXISTS (SELECT 1 FROM fct_conversions c WHERE c.identity_id = i.identity_id AND c.advertiser_id = 'ADV_NIKE' AND c.conversion_ts BETWEEN '2025-04-01' AND '2025-05-01') AS converted FROM dim_identities i WHERE i.identity_id NOT IN ( SELECT identity_id FROM dim_user_holdouts WHERE advertiser_id = 'ADV_NIKE' AND period = '2025-04') ), holdout AS ( SELECT i.identity_id, EXISTS (SELECT 1 FROM fct_conversions c WHERE c.identity_id = i.identity_id AND c.advertiser_id = 'ADV_NIKE' AND c.conversion_ts BETWEEN '2025-04-01' AND '2025-05-01') AS converted FROM dim_identities i JOIN dim_user_holdouts h ON i.identity_id = h.identity_id WHERE h.advertiser_id = 'ADV_NIKE' AND h.period = '2025-04' ) SELECT 'treatment' AS arm, count(*) AS n, round(100.0 * avg(converted::int), 3) AS conv_rate_pct FROM treatment UNION ALL SELECT 'holdout', count(*), round(100.0 * avg(converted::int), 3) FROM holdout;

§ 08 — THE DASHBOARDProving the graph is honest

A senior design ends with observability, because every confidence weight and privacy wall above is invisible without it. The dashboard watches three things — coverage of the graph, the shape of attribution, and lift — and one tile watches the wall.

COVERAGE
cross-device match rate, average link confidence, identity-resolved % over time — a falling match rate is the graph going dark, the leading indicator that more value is sliding to the aggregate path.
ATTRIBUTION SHAPE
click-to-conversion median lag, % attributed in 1d/7d/28d, view vs click split — a lengthening lag means conversions are landing later, so short windows are quietly under-counting.
LIFT
treatment − holdout conversion rate, cost per incremental conversion, significance — the number that does not depend on a single device match holding.
THE WALL
% iOS opt-out and SKAN postback coverage — the share of measurement that must stay aggregate. This is not a failure metric; it is the size of the world the wall protects.
Attribution Ops — Global SAT 03 MAY · LATEST RUN 2025-05-03 · 1h REFRESH
Cross-Device Match
63%
Avg Link Confidence
0.93
iOS Opt-Out
75%
Credit Sum Errors
0
Credit split — last-click vs MTA · conversion CONV_777 ($129.99)
LAST-CLICK (laptop click) $129.99 · 100% MTA (phone view / click / tablet view) $38.99 · 30% $65.00 · 50% $26 · 20%
% Attributed in 7d
68%
View vs Click Split
31%v
Conversion Lift
+4.1pp
Lift Significance
p<.01
Conv Lag p50
5.2d
SKAN Coverage
22%
FIG. 2 — The story a healthy signal-erosion tells: match rate sliding amber and conversion lag stretching past five days, so more measurement leans on SKAN and on lift — yet credit-sum errors stay flat zero, because every run still normalizes to one.

Read the amber tiles together and the dashboard narrates the post-ATT world from the operator's chair: the graph is going quieter (match rate falling, opt-out at 75%), conversions arrive later (lag past five days), so the platform increasingly reports through the walled aggregate and trusts incrementality, which holds at +4.1 points with real significance. The single most important tile is the boring one: credit-sum errors at zero. Coverage is a battle you fight every quarter; credit conservation is a law you never break — and the wall is a guarantee you do not have to defend, because it has no door.


§ 09 — THE RUBRICWhat was actually being tested

Strip the devices away and the question was testing five judgments, each of which generalizes far beyond ad attribution:

IDENTITY
Recognizing that the join key does not exist and must be inferred — modeling the identity graph as a first-class, versioned bridge that carries the confidence of every link instead of pretending a guess is a fact.
AS-OF
Resolving identity at the event's timestamp through SCD2, so a re-sold device never mis-credits a stranger and a re-run reads the graph as it was, not as it is.
MATCH AT READ
Refusing to compute attribution at ingest, so windows and models multiply at query time and re-attribution is a SELECT that preserves yesterday's answer.
THE STRUCTURAL WALL
Enforcing privacy with the absence of a key, not the presence of a filter — a SKAN fact with no identity column cannot be de-anonymized by any query, careless or hostile.
THE ROBUST METRIC
Knowing that when signal erodes, the answer is to change the question — incrementality via a sacred holdout needs no match, no window, no model, and survives the death of cross-device tracking.
The graph guesses who; the confidence says how sure; the normalization keeps the credit whole; the wall protects the ones who declined. Attribution is honest only when every one of those is a column, not a hope.— CLOSING ARGUMENT