PaddySpeaks · Systems at the Whiteboard · Nº 16

The Fraud Problem

Model the data behind a fraud system at a fintech or marketplace. One bad actor wears three emails, two phones, four devices, five IPs — and a chargeback you have to defend to a regulator arrives six months after you blocked it. The one decision that separates a senior answer: fraud is a graph, frozen in time, with a feedback loop. A complete working-through — data flow, schema, the cluster expansion, the replay SQL, and the dashboard.

§ 01 — THE QUESTIONOne model, five problems

It sounds like a column. It is really a graph, a time machine, and a feedback loop — three systems the prompt hides inside the word "score."

Interview Prompt

"Design the data model for a fraud system at a fintech / marketplace / streaming product. How do you detect it, and how do you defend a decision six months later?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The junior answer stops at "a transaction table with a risk_score column," and the interviewer knows it within thirty seconds, because that schema cannot answer any of the questions that actually matter. It cannot tell you that the account you just blocked shares a payment instrument with forty-seven others. It cannot reproduce why you blocked the September 14 transaction when the regulator asks in March. And it cannot learn, because the chargeback that proves you were right arrives ninety days after the decision and has nowhere to land. The senior signal is naming the five orthogonal sub-problems first, and noticing that three of them — the graph, the frozen decision context, and the feedback loop — are the parts the naive schema has no room for.

So before any boxes and arrows, the working frame for the session — five problems pretending to be one model:

IDENTITY GRAPH
The same user is many rows. Three emails, two phones, four devices, five IPs. Build a graph with typed-strength edges — strong (shared payment instrument), medium (shared device), weak (shared IP within an hour) — not a row, and resolve it into connected components.
IMPOSSIBLE TRAVEL
Velocity, not parallelism. Login from Lagos, transaction from Berlin eighteen minutes later. Great-circle distance over elapsed time exceeds any plane — the canonical LEAD over session events.
SHARING vs ATO
Three things look identical in raw transactions. Account sharing, account takeover, and multi-account fraud all read as "identity ↔ account isn't 1:1." The schema must disambiguate them, or every rule produces the wrong false-positive class.
DECISION REPLAY
Defend a block six months later. Freeze the ruleset, the model version, and the feature snapshot at decision time. SCD2 everywhere, all three IDs stamped on every fact row — or "why did you block this customer?" becomes "we don't know."
FEEDBACK LOOP
Labels come back late, from two sources. Analyst verdicts in hours, chargebacks in 30–180 days, and they disagree. A union fact reconciles them into the training set of record — without it, the model degrades silently.
The feedback loop matters more than the model. A model with no labels coming back is a model quietly rotting — and you will not see it in any dashboard until the fraud is already paid out.

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. State the disambiguation up front, because conflating the three "identity ≠ account" patterns is the classic pitfall. In scope: the typed identity graph and its connected-component rollup, the frozen decision context for replay, the velocity and sharing detectors, and the label-reconciliation feedback fact. Out of scope, said explicitly: the model training code itself (a consumer of these tables), the real-time feature store internals, the case-management UI, and the choice of graph backend — Neo4j versus JanusGraph versus a Postgres recursive CTE — which is an implementation detail the schema is deliberately agnostic to. The thread tying scope together is defensibility: every decision the system makes must be reproducible and every outcome must flow back as a label.

Then the envelope math, offered rather than extracted. Fintech-shaped numbers:

QuantityEstimateConsequence
Decision latency budget< 100 msGraph lookup must be precomputed, not walked live
Strong-edge cluster jobnightly + incrementalConnected components batched; new edges merged online
Decision-replay window6 monthsThe horizon that forces SCD2 on everything
Chargeback delay30–180 daysLabels are late-arriving facts — idempotent merge
Signal-fact volume1 row / rule firingHigh — partition by day, 90-day hot tier
Cluster expansion haul30–200× the seedOne mule surfaces a whole ring
Impossible-travel threshold> 900 km/hFaster than any plane ⇒ two identities

Notice which number forces the architecture. The six-month replay window is the constraint that turns "store a score" into "freeze the entire decision context" — the ruleset, the model, and the exact feature vector the model saw, all SCD2, all stamped on the fact. That single requirement dictates the dimensional design, the partition strategy, and the audit posture. The rest of this article follows the decision.


§ 02 — DATA FLOWFollowing a transaction through the decision

One decision path, two slower loops feeding it. The graph is precomputed and read in milliseconds; the decision freezes its context as it writes; and labels return on their own delayed schedule to retrain the next model. The score is the cheap part.

DECISION PATH · <100 MS · FREEZES ITS OWN CONTEXT FEEDBACK LOOP · LABELS RETURN IN HOURS-TO-MONTHS LOGIN / TXN user · device · ip · amount FEATURE ASSEMBLY velocity · device trust cluster lookup (precomputed) RULES + MODEL ruleset_id · model_id allow / step_up / block fact_transaction score · decision · 3 frozen IDs feature_snapshot_id fact_rule_firing 1 row / rule · score_delta FROZEN CONTEXT (SCD2) ruleset · model · snapshot identity_cluster connected components strong edges · nightly brg_user_* edges typed strength append-only fact_case / labels analyst · hours fact_chargeback 30–180 days · idempotent fact_outcome_label union · source of truth RETRAIN (weekly) writes new model_id cluster read · precomputed, <100ms new model → SCD2 dim SOLID — sustained data flow · DASHED — precomputed reads & the retrain loop · The graph is read, never walked, at decision time.
FIG. 1 — End-to-end flow. The decision reads a precomputed cluster and freezes its own context as it writes; labels return from two sources on a delay; the union feeds the retrain that mints the next model version.

Three properties of this picture do most of the interview's work. First, the cluster lookup on the decision path is a read of a precomputed component, never a live graph walk — at a sub-100ms budget you cannot traverse edges synchronously, so the connected-component job runs nightly and incrementally and the decision path simply reads a cluster id. Second, the decision freezes its own context as it writes: ruleset_id, model_id, and feature_snapshot_id land on the fact row in the same transaction as the score, because a context reconstructed later is a context you cannot trust. Third, the feedback arrows are slow and they close the loop: analyst labels in hours, chargebacks in months, both flowing into a union fact that the retrain reads — the loop, not the model, is what keeps the system from rotting.

The Failure Philosophy, In One Rule

A decision you cannot reproduce is a decision you cannot defend. Every fact row carries the three frozen IDs so that six months later the exact ruleset, the exact model SHA, and the exact feature vector are recoverable — not approximated from "what the rules probably were." Strong-edge clusters drive automated action; weak-edge graphs are advisory and go to an analyst, never to an auto-block, because a weak edge is a coincidence until a human says otherwise. The failure mode is a defensible, replayable record — never a score with no provenance.


§ 03 — DATA MODELA typed graph, a frozen context, a union of labels

The schema falls out of three demands: resolve identity as a graph, freeze the decision so it can be replayed, and reconcile two late label sources. Subject dimensions are SCD2; edges are append-only and typed; the decision context is frozen on the fact.

Subjects and locations — SCD2 on the risk-bearing attributes

The subject dimensions — dim_user, dim_device, dim_payment_instrument — are SCD2, but only on the attributes that bear risk and must be frozen at decision time: kyc_status, account_state, device_trust_score, bin_country. Location dimensions carry the third-party signals — is_vpn, is_proxy, is_datacenter, asn_risk_class — also SCD2, because an ASN's risk classification changes and a replay must use the classification that was current then.

DDL · SUBJECT & LOCATION DIMENSIONS (SCD2 ON RISK ATTRS)
CREATE TABLE dim_user ( user_sk BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, user_id BIGINT NOT NULL, email TEXT, phone TEXT, kyc_status TEXT NOT NULL, -- frozen at decision time risk_tier TEXT NOT NULL, account_state TEXT NOT NULL, effective_from TIMESTAMPTZ NOT NULL, effective_to TIMESTAMPTZ, is_current BOOLEAN NOT NULL DEFAULT true, UNIQUE (user_id, effective_from) ); CREATE TABLE dim_ip ( ip_sk BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, ip INET NOT NULL, city TEXT, country TEXT, asn INT, is_vpn BOOLEAN NOT NULL DEFAULT false, is_proxy BOOLEAN NOT NULL DEFAULT false, is_datacenter BOOLEAN NOT NULL DEFAULT false, asn_risk_class TEXT NOT NULL DEFAULT 'unknown', -- SCD2 from feed effective_from TIMESTAMPTZ NOT NULL, effective_to TIMESTAMPTZ );

The identity graph — typed-strength edges, append-only

This is the part the naive schema has no room for, and the part the interviewer is really probing. Identity is not a row; it is a graph. Edges connect users to the things they share — devices, IPs, payment instruments — and each edge carries a edge_strength that encodes how much that sharing means. Strong (1.0): the same payment instrument, the same KYC document hash, the same SIM. Medium (0.5): the same device fingerprint within twenty-four hours. Weak (0.2): the same IP within an hour, the same residential ASN within a week. The edges are append-only — you never un-observe a coincidence — and the strengths are what let the connected-component job draw the line between an action cluster and an advisory one.

DDL · IDENTITY-GRAPH EDGES + CLUSTER ROLLUP
-- Append-only typed edges. edge_strength is the whole game: it decides which -- edges form the ACTION cluster vs which merely advise the analyst. CREATE TABLE brg_user_device ( edge_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, user_a BIGINT NOT NULL, user_b BIGINT NOT NULL, edge_strength NUMERIC(2,1) NOT NULL, -- 1.0 strong / 0.5 medium / 0.2 weak via TEXT NOT NULL, -- 'device_fp' | 'payment' | 'ip' | 'sim' first_seen TIMESTAMPTZ NOT NULL, last_seen TIMESTAMPTZ NOT NULL, CHECK (user_a < user_b) -- canonical undirected ordering ); -- (brg_user_ip and brg_user_payment mirror this shape, different `via`.) -- The connected-component rollup over strong+medium edges (>= 0.5). -- Recomputed nightly; new edges merged incrementally during the day. CREATE TABLE identity_cluster ( cluster_id BIGINT NOT NULL, user_id BIGINT NOT NULL, cluster_score NUMERIC(5,2) NOT NULL, -- aggregate risk of the component member_count INT NOT NULL, computed_at TIMESTAMPTZ NOT NULL DEFAULT now(), PRIMARY KEY (cluster_id, user_id) );

Decision facts and the frozen context

The decision facts — fact_transaction, fact_login_attempt, fact_session_event — are one row per attempt and each carries the score, the decision, and the three frozen IDs. The frozen context lives in three SCD2 dimensions: dim_ruleset (the rules JSON and its effective window), dim_risk_model (the model SHA, training cutoff, feature-set hash), and dim_feature_snapshot (the exact feature vector the model saw, computed at decision time). Stamping all three IDs on the fact is what makes a six-month-old block replayable to the byte.

DDL · DECISION FACT + FROZEN CONTEXT (SCD2)
CREATE TABLE fact_transaction ( txn_id BIGINT NOT NULL, user_id BIGINT NOT NULL, payment_id BIGINT NOT NULL, amount NUMERIC(14,2) NOT NULL, currency TEXT NOT NULL, merchant TEXT, ip_id BIGINT NOT NULL, device_id BIGINT NOT NULL, ts TIMESTAMPTZ NOT NULL, risk_score NUMERIC(6,2) NOT NULL, decision TEXT NOT NULL -- allow | step_up | block CHECK (decision IN ('allow','step_up','block')), ruleset_id BIGINT NOT NULL, -- frozen: which rules ran model_id TEXT NOT NULL, -- frozen: which model scored feature_snapshot_id BIGINT NOT NULL, -- frozen: what it saw txn_date DATE NOT NULL, PRIMARY KEY (txn_date, txn_id) ) PARTITION BY RANGE (txn_date); -- The three frozen dims. SCD2 is ESSENTIAL — the only way to replay. CREATE TABLE dim_risk_model ( model_id TEXT PRIMARY KEY, -- 'v7.2' sha TEXT NOT NULL, training_cutoff TIMESTAMPTZ NOT NULL, feature_set_hash TEXT NOT NULL, effective_from TIMESTAMPTZ NOT NULL, effective_to TIMESTAMPTZ ); CREATE TABLE dim_feature_snapshot ( feature_snapshot_id BIGINT PRIMARY KEY, feature_vector_json JSONB NOT NULL, -- exact inputs at decision time computed_at TIMESTAMPTZ NOT NULL );

Two append-only signal facts trace why: fact_rule_firing (one row per rule that contributed, with its score_delta) and fact_model_score (the raw score, plus shadow-model scores for the next candidate). They are high-volume — partitioned by day, ninety-day hot tier — and they are what turn a single number into an explainable chain of contributions.

The feedback facts — and the union that reconciles them

Labels arrive from two places on two timelines and they disagree. fact_analyst_label records a human verdict — fraud, clean, inconclusive — within hours. fact_chargeback arrives 30–180 days later, idempotent because card networks retry. The reconciliation is fact_outcome_label: a union that resolves disagreements (analyst said clean, a chargeback came back ninety days later → final label is fraud) into the training-set source of truth.

DDL · FEEDBACK FACTS + THE OUTCOME UNION
CREATE TABLE fact_chargeback ( txn_id BIGINT PRIMARY KEY, -- idempotent: network retries dedup here network TEXT NOT NULL, reason_code TEXT NOT NULL, disputed_amount NUMERIC(14,2) NOT NULL, arrived_ts TIMESTAMPTZ NOT NULL -- 30–180 days after the txn ); -- The training set of record. Reconciles analyst + chargeback disagreements. CREATE TABLE fact_outcome_label ( txn_id BIGINT PRIMARY KEY, final_label TEXT NOT NULL -- fraud | clean CHECK (final_label IN ('fraud','clean')), source TEXT NOT NULL, -- 'analyst' | 'chargeback' | 'agreed' resolved_at TIMESTAMPTZ NOT NULL DEFAULT now() );

§ 04 — THE CORE INVARIANTEvery decision is replayable to the byte

Everything in this design protects one sentence: the exact inputs to any decision are recoverable six months later. The invariant lives in three frozen IDs stamped on the fact in the same write as the score.

The lifecycle of a decision is short, but its defense is long. Features are assembled, the rules and model produce a score, a decision is rendered, and — atomically — the fact is written with the ruleset, model, and feature-snapshot IDs frozen alongside it. Then come the labels: a case opens, an analyst rules, and months later a chargeback may or may not agree. The invariant says the decision's provenance never decays: the rules that ran, the model that scored, and the precise vector it saw are pinned, not reconstructed.

FEATURES ASSEMBLED SCORED DECIDED + FROZEN CASE / LABEL OUTCOME RECONCILED

The replay query is where correctness lives. Join the transaction to its frozen model and feature snapshot by the IDs on the fact, and you reproduce exactly what the model saw — not an approximation assembled from "what the features probably were that day." Without this, every "why did you block this customer?" audit ends in "we don't know," which in a regulated context is not an inconvenience but a liability.

SQL · DECISION REPLAY (REPRODUCE WHAT v7.2 SAW)
SELECT t.txn_id, t.risk_score, t.decision, t.model_id, m.sha, m.training_cutoff, r.rules_json, fs.feature_vector_json FROM fact_transaction t JOIN dim_risk_model m ON t.model_id = m.model_id JOIN dim_ruleset r ON t.ruleset_id = r.ruleset_id JOIN dim_feature_snapshot fs ON t.feature_snapshot_id = fs.feature_snapshot_id WHERE t.txn_id = :txn_id; -- Reproduces the EXACT inputs the model used 6 months ago. The three frozen -- IDs on the fact are the entire defense; reconstructed context is no defense.

The graph supplies the other half of correctness: the connected-component lookup that turns a per-account question into a per-cluster one. Clusters are formed on edges of strength ≥ 0.5 so that strong and medium ties draw the action graph while weak ties stay advisory. The component itself is computed off the hot path; the decision merely reads which cluster a user belongs to.

SQL · CONNECTED-COMPONENT MEMBERSHIP (READ AT DECISION TIME)
SELECT cluster_id, cluster_score, member_count FROM identity_cluster WHERE user_id = :user_id; -- O(1) read; the walk happened nightly -- Strong+medium edges (>= 0.5) form this component. Weak edges (0.2) are NOT -- here — they feed the analyst as "may be related," never an auto-block.
A risk score is an opinion until you can show the rules, the model, and the vector behind it. Freeze all three on the fact, or do not write the fact at all.FRAUD RULE Nº 1 — FREEZE THE CONTEXT

§ 05 — INGESTION & STREAMSPython on the decision path

Three programs carry the system: the scorer that freezes its context as it writes, the edge ingester that types every observed coincidence, and the chargeback merger that lands late labels idempotently. Each is small; the judgment is in what they refuse to do.

1 · The scorer — decide, then freeze, in one transaction

The scorer's one discipline is atomicity of provenance: the score and the three frozen IDs are written together, never in two steps. It assembles the feature vector, persists that vector as a snapshot, runs the rules and the model against the currently effective ruleset and model, renders the decision, and writes the fact with all three IDs in a single transaction. The thing it refuses to do is reconstruct context after the fact — a feature vector re-derived at audit time is not the vector the model saw, and pretending otherwise is the bug that loses the regulator's trust.

PYTHON · SCORER: DECIDE AND FREEZE ATOMICALLY
async def decide(db, txn: dict) -> str: """Assemble features, score, decide — and freeze the EXACT context on the fact in one transaction. The feature vector is persisted as a snapshot BEFORE the decision row references it, so replay is always exact.""" rs = await current_ruleset(db) # effective-now ruleset model = await current_model(db) # effective-now model feats = await assemble_features(db, txn) # incl. cluster_id lookup (precomputed) async with db.transaction(): snap_id = await db.fetchval( """INSERT INTO dim_feature_snapshot (feature_vector_json, computed_at) VALUES ($1, now()) RETURNING feature_snapshot_id""", feats.as_json(), # the exact inputs, frozen ) score, deltas = score_txn(rs, model, feats) decision = ("block" if score >= rs.block_threshold else "step_up" if score >= rs.step_up_threshold else "allow") await db.execute( """INSERT INTO fact_transaction (txn_id, user_id, payment_id, amount, currency, ip_id, device_id, ts, risk_score, decision, ruleset_id, model_id, feature_snapshot_id, txn_date) VALUES ($1,$2,$3,$4,$5,$6,$7,now(),$8,$9,$10,$11,$12,current_date)""", txn["txn_id"], txn["user_id"], txn["payment_id"], txn["amount"], txn["currency"], txn["ip_id"], txn["device_id"], score, decision, rs.ruleset_id, model.model_id, snap_id, # the three frozen IDs ) await db.executemany( """INSERT INTO fact_rule_firing (txn_id, rule_id, score_delta) VALUES ($1, $2, $3)""", [(txn["txn_id"], r, d) for r, d in deltas], # the explainable chain ) return decision

One distinction, always stated: the scorer reads the cluster id from the precomputed identity_cluster table; it never traverses edges live. The sub-100ms budget forbids a graph walk on the decision path, and the freshness cost — a cluster up to one nightly cycle stale, patched by the incremental merge — is far cheaper than the latency a live traversal would add.

2 · The edge ingester — type every coincidence

PYTHON · EDGE INGESTER (TYPED STRENGTH, APPEND-ONLY)
# Strength is assigned by HOW the two users are linked. Shared payment is # near-certain; shared IP within an hour is a coincidence until corroborated. STRENGTH = {"payment": 1.0, "kyc_doc": 1.0, "sim": 1.0, "device_fp": 0.5, "ip": 0.2, "residential_asn": 0.2} async def observe_edge(db, ua: int, ub: int, via: str, seen) -> None: """Record a typed edge between two users. Append-only: a coincidence is never un-observed. Canonical ordering (ua < ub) keeps the graph undirected and the edge idempotent on (user_a, user_b, via).""" if ua == ub: return ua, ub = sorted((ua, ub)) # canonical undirected pair await db.execute( """INSERT INTO brg_user_device (user_a, user_b, edge_strength, via, first_seen, last_seen) VALUES ($1, $2, $3, $4, $5, $5) ON CONFLICT (user_a, user_b, via) DO UPDATE SET last_seen = GREATEST(brg_user_device.last_seen, EXCLUDED.last_seen)""", ua, ub, STRENGTH[via], via, seen, )

3 · The chargeback merger — idempotent late labels

Chargebacks arrive months late and the networks retry them, so the merge must be idempotent — the same dispute landing twice is one row, not two. And it must reconcile: a chargeback that contradicts an earlier "clean" analyst verdict overrides it in fact_outcome_label, because money changing hands is the ground truth that an opinion is not.

PYTHON · CHARGEBACK MERGE → OUTCOME RECONCILIATION
async def merge_chargeback(db, cb: dict) -> None: """Land a late chargeback idempotently and reconcile the outcome label. A chargeback is ground truth: where it disagrees with an earlier analyst 'clean', the final label flips to fraud.""" async with db.transaction(): await db.execute( """INSERT INTO fact_chargeback (txn_id, network, reason_code, disputed_amount, arrived_ts) VALUES ($1,$2,$3,$4,$5) ON CONFLICT (txn_id) DO NOTHING""", # network retries dedup cb["txn_id"], cb["network"], cb["reason_code"], cb["disputed_amount"], cb["arrived_ts"], ) await db.execute( """INSERT INTO fact_outcome_label (txn_id, final_label, source) VALUES ($1, 'fraud', 'chargeback') ON CONFLICT (txn_id) DO UPDATE SET final_label = 'fraud', source = CASE WHEN fact_outcome_label.final_label = 'fraud' THEN 'agreed' ELSE 'chargeback' END""", cb["txn_id"], )

§ 06 — AGGREGATIONThe cluster job, and the moment you find the ring

Two slow layers do the heavy thinking. The nightly connected-component job turns millions of typed edges into clusters the decision path can read in O(1). And cluster expansion is the payoff: confirm one mule, and the graph hands you the other forty-seven.

The connected-component job is the aggregation that makes the graph usable. Walking edges live at decision time is impossible under a 100ms budget, so the job runs nightly over all edges of strength ≥ 0.5, assigns each maximal component a cluster_id, scores the component by the risk of its members, and writes the flat identity_cluster rollup. New edges observed during the day are merged incrementally — a new strong edge between two existing clusters unions them — so the decision path always reads a near-current component without ever traversing the graph itself.

PYTHON · NIGHTLY CONNECTED-COMPONENT JOB (UNION-FIND)
def build_clusters(edges) -> dict[int, int]: """Connected components over strong+medium edges (>= 0.5) via union-find. Runs nightly over the full edge set; the result is the flat membership the decision path reads in O(1). Weak edges (0.2) are deliberately EXCLUDED — they would collapse half the user base into one giant useless component.""" parent: dict[int, int] = {} def find(x): parent.setdefault(x, x) while parent[x] != x: parent[x] = parent[parent[x]] # path compression x = parent[x] return x def union(a, b): parent[find(a)] = find(b) for ua, ub, strength in edges: if strength >= 0.5: # the action-graph threshold union(ua, ub) return {u: find(u) for u in parent} # user_id → cluster_id (= root)

The expansion query is the operational climax of the whole system, and it is a recursive walk on strong edges out from a confirmed-fraud seed. When an analyst confirms one account as fraud, the system does not stop there — it expands outward up to four hops on edges of strength ≥ 0.5 and surfaces every account in the same component, most of which the analyst never knew were related. The typical haul is thirty to two hundred times the seed for mule rings, and none of those accounts had individually crossed the per-account threshold; only the cluster tied them together.

SQL · CLUSTER EXPANSION (FIND ONE MULE, FIND THE RING)
WITH RECURSIVE walk AS ( SELECT user_id, 0 AS hop FROM fact_analyst_label al JOIN fact_case c USING (case_id) WHERE c.case_id = :seed_case AND al.label = 'fraud' UNION SELECT CASE WHEN e.user_a = w.user_id THEN e.user_b ELSE e.user_a END, w.hop + 1 FROM walk w JOIN brg_user_device e ON w.user_id IN (e.user_a, e.user_b) WHERE e.edge_strength >= 0.5 AND w.hop < 4 -- strong edges, 4 hops out ) SELECT DISTINCT user_id, min(hop) AS hops_from_seed FROM walk GROUP BY user_id ORDER BY hops_from_seed; -- Typical haul: 30–200x the seed for mule rings, 2–5x for friendly fraud.
The threshold catches one account at a time. The graph catches the whole ring — accounts no single rule would ever have flagged.FRAUD RULE Nº 2 — ACT ON THE COMPONENT

§ 07 — ANALYTICS SQLInterrogating the system, by who asks

The decision facts, the graph, and the outcome union are where the system explains itself. Three queries an interviewer loves, each grouped by its stakeholder and each carrying a classic pattern: window LEAD for velocity, conditional aggregation for parallelism, and an SCD2-joined false-positive rate.

Detection engineer — impossible travel (window LEAD)

Impossible travel is about velocity, not parallelism: the same user appears in two geographies faster than any aircraft could carry them. The canonical move is LEAD over each user's session events ordered by time — pair each event with the next, compute great-circle distance over elapsed time, and flag anything exceeding 900 km/h.

SQL · IMPOSSIBLE-TRAVEL SIGNAL (LEAD OVER SESSIONS)
WITH paired AS ( SELECT s.user_id, s.ts AS ts_a, s.geo AS geo_a, lead(s.ts) OVER (PARTITION BY s.user_id ORDER BY s.ts) AS ts_b, lead(s.geo) OVER (PARTITION BY s.user_id ORDER BY s.ts) AS geo_b FROM fact_session_event s ) SELECT user_id, ts_a, geo_a, ts_b, geo_b, extract(epoch FROM (ts_b - ts_a)) / 60 AS gap_minutes, great_circle_km(geo_a, geo_b) AS distance_km FROM paired WHERE ts_b IS NOT NULL AND geo_a <> geo_b AND great_circle_km(geo_a, geo_b) / nullif(extract(epoch FROM (ts_b - ts_a)) / 3600, 0) > 900 ORDER BY gap_minutes; -- > 900 km/h ⇒ faster than any plane ⇒ two identities, not one traveler.

Detection engineer — account sharing (conditional aggregation)

Account sharing is the inverse shape: parallelism, not velocity. One user, many simultaneous devices and geographies — the Netflix-family or Spotify-share pattern. The detector counts distinct geos and distinct devices per user in a window; three-plus of each is a candidate for shared-account review, explicitly separated from impossible travel because the two have different false-positive classes and different remediations.

SQL · ACCOUNT-SHARING DETECTOR (PARALLELISM, NOT VELOCITY)
SELECT user_id, count(DISTINCT geo) AS distinct_geos_24h, count(DISTINCT device_id) AS distinct_devices_24h FROM fact_session_event WHERE ts >= now() - INTERVAL '24 hours' GROUP BY user_id HAVING count(DISTINCT geo) >= 3 AND count(DISTINCT device_id) >= 3; -- 3+ geos AND 3+ devices in 24h = shared-account candidate. Distinct from -- impossible-travel (velocity) and from ATO (one new device, one new geo).

Risk PM — false-positive rate by ruleset version (SCD2 as-of)

"Which ruleset version is over-blocking?" is the question that decides whether a rule ships or rolls back. Because the ruleset id is frozen on every fact and the final verdict lives in the outcome union, the false-positive rate per ruleset version is a left join from decisions to outcomes — blocks that the ground truth later called clean, divided by all blocks, grouped by the version that made them.

SQL · FALSE-POSITIVE RATE BY RULESET VERSION (LAST 30 D)
SELECT t.ruleset_id, r.rules_json -> 'name' AS ruleset_name, count(*) FILTER (WHERE t.decision = 'block') AS blocked, count(*) FILTER (WHERE t.decision = 'block' AND ol.final_label = 'clean') AS false_pos, round(100.0 * count(*) FILTER (WHERE t.decision = 'block' AND ol.final_label = 'clean') / nullif(count(*) FILTER (WHERE t.decision = 'block'), 0), 2) AS fp_pct FROM fact_transaction t JOIN dim_ruleset r ON t.ruleset_id = r.ruleset_id LEFT JOIN fact_outcome_label ol ON t.txn_id = ol.txn_id WHERE t.ts >= now() - INTERVAL '30 days' GROUP BY t.ruleset_id, r.rules_json -> 'name' ORDER BY fp_pct DESC;

§ 08 — THE DASHBOARDProving the system is honest

A senior design ends with observability, because a fraud system that cannot see its own false-positive rate is a liability with a UI. The dashboard watches three things: are we blocking the right things, is the loop closing, and can we still replay?

DECISION QUALITY
block rate, step-up rate, false-positive rate by ruleset version against the outcome union, and approval friction — every false positive is a real customer turned away, so a rising FP rate is a revenue alarm, not just a model one.
FEEDBACK HEALTH
label latency (analyst hours, chargeback days), outcome agreement % between the two sources, unlabeled-decision backlog, and chargeback rate on allowed txns — the leakage the model missed, the truest measure of the loop.
GRAPH & REPLAY
cluster job freshness, largest component size (a sudden giant means a weak edge leaked into the ≥ 0.5 set), expansion haul per confirmed seed, and replay coverage — the fraction of decisions whose three frozen IDs all resolve, which must be 100%.
Trust & Safety Ops — Decision Engine TUE 03:10 UTC · MODEL v7.2 · 60s REFRESH
Block Rate
1.9%
False-Positive Rate
4.1%
Step-Up Rate
3.4%
Chargeback Leakage
0.18%
Cluster expansion · accounts surfaced per confirmed seed (last 12 cases)
tallest bars = mule rings (~180x) · short = friendly fraud (~3x)
Replay Coverage
100%
Outcome Agreement
91%
Cluster Freshness
7h
Largest Component
412
Label Backlog
2.1k
Impossible-Travel Hits
340/d
FIG. 2 — The story a working loop tells: replay coverage pinned at 100% so every decision is defensible, expansion hauls spiking on mule rings, outcome agreement healthy at 91% — but a false-positive rate drifting amber and a largest-component creeping up, two signals that want investigating before the next model ships.

Read the amber and red tiles together and the dashboard narrates the two failure modes from §06 and §04. The largest component climbing toward 412 is the tell that a weak edge has leaked into the ≥ 0.5 action set — a single mislabeled device_fp strength can chain hundreds of unrelated users into one false cluster, so it is investigated before the cluster job runs again. The chargeback-leakage tile glowing red is the truest alarm in the room: it is fraud the model allowed, surfacing 30–180 days later, and it is the number that justifies the whole feedback apparatus. Replay coverage at 100% is the quiet hero — it means every one of these decisions can be defended to a regulator. That is what a designed, auditable fraud system looks like from the operator's chair at three in the morning.


§ 09 — THE RUBRICWhat was actually being tested

Strip the transactions away and the question was testing five judgments, each of which generalizes far beyond fraud:

GRAPH
Modeling identity as a typed-strength graph resolved into connected components, not a row — and knowing that strong edges drive action while weak edges only advise. The schema, not a rule, is what disambiguates sharing from takeover from multi-account abuse.
PROVENANCE
Freezing the ruleset, the model, and the feature vector on every fact so a six-month-old decision is replayable to the byte. Reconstructed context is no defense; stamped context is the whole defense.
VELOCITY vs PARALLELISM
Distinguishing impossible travel (one identity moving too fast) from account sharing (one account in too many places at once) — different windows, different patterns, different false-positive classes.
FEEDBACK
Closing the loop with a union of two late, disagreeing label sources — analyst verdicts in hours, chargebacks in months — into a single training-set source of truth. The loop is what keeps the model from rotting unseen.
PRECOMPUTATION
Reading the graph, never walking it, at decision time — the connected-component job runs off the hot path so a 100ms budget is honored, with incremental merges keeping the read near-current.
Stripe, Wise, Coinbase, PayPal all land on this shape. The differences are which graph backend, whether the cluster job is real-time or batch, and how aggressively the feature store separates online from offline — never whether you need the graph, the frozen context, and the loop.— CLOSING ARGUMENT