The Fraud Problem — PaddySpeaks

§ 01 — THE QUESTIONOne model, five problems

It sounds like a column. It is really a graph, a time machine, and a feedback loop — three systems the prompt hides inside the word "score."

Interview Prompt

"Design the data model for a fraud system at a fintech / marketplace / streaming product. How do you detect it, and how do you defend a decision six months later?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The junior answer stops at "a transaction table with a risk_score column," and the interviewer knows it within thirty seconds, because that schema cannot answer any of the questions that actually matter. It cannot tell you that the account you just blocked shares a payment instrument with forty-seven others. It cannot reproduce why you blocked the September 14 transaction when the regulator asks in March. And it cannot learn, because the chargeback that proves you were right arrives ninety days after the decision and has nowhere to land. The senior signal is naming the five orthogonal sub-problems first, and noticing that three of them — the graph, the frozen decision context, and the feedback loop — are the parts the naive schema has no room for.

So before any boxes and arrows, the working frame for the session — five problems pretending to be one model:

IDENTITY GRAPH: The same user is many rows. Three emails, two phones, four devices, five IPs. Build a graph with typed-strength edges — strong (shared payment instrument), medium (shared device), weak (shared IP within an hour) — not a row, and resolve it into connected components.
IMPOSSIBLE TRAVEL: Velocity, not parallelism. Login from Lagos, transaction from Berlin eighteen minutes later. Great-circle distance over elapsed time exceeds any plane — the canonical LEAD over session events.
SHARING vs ATO: Three things look identical in raw transactions. Account sharing, account takeover, and multi-account fraud all read as "identity ↔ account isn't 1:1." The schema must disambiguate them, or every rule produces the wrong false-positive class.
DECISION REPLAY: Defend a block six months later. Freeze the ruleset, the model version, and the feature snapshot at decision time. SCD2 everywhere, all three IDs stamped on every fact row — or "why did you block this customer?" becomes "we don't know."
FEEDBACK LOOP: Labels come back late, from two sources. Analyst verdicts in hours, chargebacks in 30–180 days, and they disagree. A union fact reconciles them into the training set of record — without it, the model degrades silently.

The feedback loop matters more than the model. A model with no labels coming back is a model quietly rotting — and you will not see it in any dashboard until the fraud is already paid out.

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. State the disambiguation up front, because conflating the three "identity ≠ account" patterns is the classic pitfall. In scope: the typed identity graph and its connected-component rollup, the frozen decision context for replay, the velocity and sharing detectors, and the label-reconciliation feedback fact. Out of scope, said explicitly: the model training code itself (a consumer of these tables), the real-time feature store internals, the case-management UI, and the choice of graph backend — Neo4j versus JanusGraph versus a Postgres recursive CTE — which is an implementation detail the schema is deliberately agnostic to. The thread tying scope together is defensibility: every decision the system makes must be reproducible and every outcome must flow back as a label.

Then the envelope math, offered rather than extracted. Fintech-shaped numbers:

Quantity	Estimate	Consequence
Decision latency budget	< 100 ms	Graph lookup must be precomputed, not walked live
Strong-edge cluster job	nightly + incremental	Connected components batched; new edges merged online
Decision-replay window	6 months	The horizon that forces SCD2 on everything
Chargeback delay	30–180 days	Labels are late-arriving facts — idempotent merge
Signal-fact volume	1 row / rule firing	High — partition by day, 90-day hot tier
Cluster expansion haul	30–200× the seed	One mule surfaces a whole ring
Impossible-travel threshold	> 900 km/h	Faster than any plane ⇒ two identities

Notice which number forces the architecture. The six-month replay window is the constraint that turns "store a score" into "freeze the entire decision context" — the ruleset, the model, and the exact feature vector the model saw, all SCD2, all stamped on the fact. That single requirement dictates the dimensional design, the partition strategy, and the audit posture. The rest of this article follows the decision.

§ 02 — DATA FLOWFollowing a transaction through the decision

One decision path, two slower loops feeding it. The graph is precomputed and read in milliseconds; the decision freezes its context as it writes; and labels return on their own delayed schedule to retrain the next model. The score is the cheap part.

FIG. 1 — End-to-end flow. The decision reads a precomputed cluster and freezes its own context as it writes; labels return from two sources on a delay; the union feeds the retrain that mints the next model version.

Three properties of this picture do most of the interview's work. First, the cluster lookup on the decision path is a read of a precomputed component, never a live graph walk — at a sub-100ms budget you cannot traverse edges synchronously, so the connected-component job runs nightly and incrementally and the decision path simply reads a cluster id. Second, the decision freezes its own context as it writes: ruleset_id, model_id, and feature_snapshot_id land on the fact row in the same transaction as the score, because a context reconstructed later is a context you cannot trust. Third, the feedback arrows are slow and they close the loop: analyst labels in hours, chargebacks in months, both flowing into a union fact that the retrain reads — the loop, not the model, is what keeps the system from rotting.

The Failure Philosophy, In One Rule

A decision you cannot reproduce is a decision you cannot defend. Every fact row carries the three frozen IDs so that six months later the exact ruleset, the exact model SHA, and the exact feature vector are recoverable — not approximated from "what the rules probably were." Strong-edge clusters drive automated action; weak-edge graphs are advisory and go to an analyst, never to an auto-block, because a weak edge is a coincidence until a human says otherwise. The failure mode is a defensible, replayable record — never a score with no provenance.

§ 03 — DATA MODELA typed graph, a frozen context, a union of labels

The schema falls out of three demands: resolve identity as a graph, freeze the decision so it can be replayed, and reconcile two late label sources. Subject dimensions are SCD2; edges are append-only and typed; the decision context is frozen on the fact.

Subjects and locations — SCD2 on the risk-bearing attributes

The subject dimensions — dim_user, dim_device, dim_payment_instrument — are SCD2, but only on the attributes that bear risk and must be frozen at decision time: kyc_status, account_state, device_trust_score, bin_country. Location dimensions carry the third-party signals — is_vpn, is_proxy, is_datacenter, asn_risk_class — also SCD2, because an ASN's risk classification changes and a replay must use the classification that was current then.

DDL · SUBJECT & LOCATION DIMENSIONS (SCD2 ON RISK ATTRS)

CREATE TABLE dim_user (
    user_sk        BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id        BIGINT       NOT NULL,
    email          TEXT,
    phone          TEXT,
    kyc_status     TEXT         NOT NULL,   -- frozen at decision time
    risk_tier      TEXT         NOT NULL,
    account_state  TEXT         NOT NULL,
    effective_from TIMESTAMPTZ  NOT NULL,
    effective_to   TIMESTAMPTZ,
    is_current     BOOLEAN      NOT NULL DEFAULT true,
    UNIQUE (user_id, effective_from)
);

CREATE TABLE dim_ip (
    ip_sk          BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    ip             INET         NOT NULL,
    city           TEXT, country TEXT, asn INT,
    is_vpn         BOOLEAN      NOT NULL DEFAULT false,
    is_proxy       BOOLEAN      NOT NULL DEFAULT false,
    is_datacenter  BOOLEAN      NOT NULL DEFAULT false,
    asn_risk_class TEXT         NOT NULL DEFAULT 'unknown',  -- SCD2 from feed
    effective_from TIMESTAMPTZ  NOT NULL,
    effective_to   TIMESTAMPTZ
);

The identity graph — typed-strength edges, append-only

This is the part the naive schema has no room for, and the part the interviewer is really probing. Identity is not a row; it is a graph. Edges connect users to the things they share — devices, IPs, payment instruments — and each edge carries a edge_strength that encodes how much that sharing means. Strong (1.0): the same payment instrument, the same KYC document hash, the same SIM. Medium (0.5): the same device fingerprint within twenty-four hours. Weak (0.2): the same IP within an hour, the same residential ASN within a week. The edges are append-only — you never un-observe a coincidence — and the strengths are what let the connected-component job draw the line between an action cluster and an advisory one.

DDL · IDENTITY-GRAPH EDGES + CLUSTER ROLLUP

-- Append-only typed edges. edge_strength is the whole game: it decides which
-- edges form the ACTION cluster vs which merely advise the analyst.
CREATE TABLE brg_user_device (
    edge_id        BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_a         BIGINT       NOT NULL,
    user_b         BIGINT       NOT NULL,
    edge_strength  NUMERIC(2,1) NOT NULL,   -- 1.0 strong / 0.5 medium / 0.2 weak
    via            TEXT         NOT NULL,   -- 'device_fp' | 'payment' | 'ip' | 'sim'
    first_seen     TIMESTAMPTZ  NOT NULL,
    last_seen      TIMESTAMPTZ  NOT NULL,
    CHECK (user_a < user_b)                 -- canonical undirected ordering
);
-- (brg_user_ip and brg_user_payment mirror this shape, different `via`.)

-- The connected-component rollup over strong+medium edges (>= 0.5).
-- Recomputed nightly; new edges merged incrementally during the day.
CREATE TABLE identity_cluster (
    cluster_id     BIGINT       NOT NULL,
    user_id        BIGINT       NOT NULL,
    cluster_score  NUMERIC(5,2) NOT NULL,   -- aggregate risk of the component
    member_count   INT          NOT NULL,
    computed_at    TIMESTAMPTZ  NOT NULL DEFAULT now(),
    PRIMARY KEY (cluster_id, user_id)
);

Decision facts and the frozen context

The decision facts — fact_transaction, fact_login_attempt, fact_session_event — are one row per attempt and each carries the score, the decision, and the three frozen IDs. The frozen context lives in three SCD2 dimensions: dim_ruleset (the rules JSON and its effective window), dim_risk_model (the model SHA, training cutoff, feature-set hash), and dim_feature_snapshot (the exact feature vector the model saw, computed at decision time). Stamping all three IDs on the fact is what makes a six-month-old block replayable to the byte.

DDL · DECISION FACT + FROZEN CONTEXT (SCD2)

CREATE TABLE fact_transaction (
    txn_id              BIGINT       NOT NULL,
    user_id             BIGINT       NOT NULL,
    payment_id          BIGINT       NOT NULL,
    amount              NUMERIC(14,2) NOT NULL,
    currency            TEXT         NOT NULL,
    merchant            TEXT,
    ip_id               BIGINT       NOT NULL,
    device_id           BIGINT       NOT NULL,
    ts                  TIMESTAMPTZ  NOT NULL,
    risk_score          NUMERIC(6,2) NOT NULL,
    decision            TEXT         NOT NULL      -- allow | step_up | block
                        CHECK (decision IN ('allow','step_up','block')),
    ruleset_id          BIGINT       NOT NULL,      -- frozen: which rules ran
    model_id            TEXT         NOT NULL,      -- frozen: which model scored
    feature_snapshot_id BIGINT       NOT NULL,      -- frozen: what it saw
    txn_date            DATE         NOT NULL,
    PRIMARY KEY (txn_date, txn_id)
) PARTITION BY RANGE (txn_date);

-- The three frozen dims. SCD2 is ESSENTIAL — the only way to replay.
CREATE TABLE dim_risk_model (
    model_id        TEXT         PRIMARY KEY,   -- 'v7.2'
    sha             TEXT         NOT NULL,
    training_cutoff TIMESTAMPTZ  NOT NULL,
    feature_set_hash TEXT        NOT NULL,
    effective_from  TIMESTAMPTZ  NOT NULL,
    effective_to    TIMESTAMPTZ
);

CREATE TABLE dim_feature_snapshot (
    feature_snapshot_id BIGINT      PRIMARY KEY,
    feature_vector_json JSONB       NOT NULL,    -- exact inputs at decision time
    computed_at         TIMESTAMPTZ NOT NULL
);

Two append-only signal facts trace why: fact_rule_firing (one row per rule that contributed, with its score_delta) and fact_model_score (the raw score, plus shadow-model scores for the next candidate). They are high-volume — partitioned by day, ninety-day hot tier — and they are what turn a single number into an explainable chain of contributions.

The feedback facts — and the union that reconciles them

Labels arrive from two places on two timelines and they disagree. fact_analyst_label records a human verdict — fraud, clean, inconclusive — within hours. fact_chargeback arrives 30–180 days later, idempotent because card networks retry. The reconciliation is fact_outcome_label: a union that resolves disagreements (analyst said clean, a chargeback came back ninety days later → final label is fraud) into the training-set source of truth.

DDL · FEEDBACK FACTS + THE OUTCOME UNION

CREATE TABLE fact_chargeback (
    txn_id          BIGINT       PRIMARY KEY,   -- idempotent: network retries dedup here
    network         TEXT         NOT NULL,
    reason_code     TEXT         NOT NULL,
    disputed_amount NUMERIC(14,2) NOT NULL,
    arrived_ts      TIMESTAMPTZ  NOT NULL       -- 30–180 days after the txn
);

-- The training set of record. Reconciles analyst + chargeback disagreements.
CREATE TABLE fact_outcome_label (
    txn_id          BIGINT       PRIMARY KEY,
    final_label     TEXT         NOT NULL      -- fraud | clean
                    CHECK (final_label IN ('fraud','clean')),
    source          TEXT         NOT NULL,     -- 'analyst' | 'chargeback' | 'agreed'
    resolved_at     TIMESTAMPTZ  NOT NULL DEFAULT now()
);

§ 04 — THE CORE INVARIANTEvery decision is replayable to the byte

Everything in this design protects one sentence: the exact inputs to any decision are recoverable six months later. The invariant lives in three frozen IDs stamped on the fact in the same write as the score.

The lifecycle of a decision is short, but its defense is long. Features are assembled, the rules and model produce a score, a decision is rendered, and — atomically — the fact is written with the ruleset, model, and feature-snapshot IDs frozen alongside it. Then come the labels: a case opens, an analyst rules, and months later a chargeback may or may not agree. The invariant says the decision's provenance never decays: the rules that ran, the model that scored, and the precise vector it saw are pinned, not reconstructed.

FEATURES ASSEMBLED→ SCORED→ DECIDED + FROZEN→ CASE / LABEL→ OUTCOME RECONCILED

The replay query is where correctness lives. Join the transaction to its frozen model and feature snapshot by the IDs on the fact, and you reproduce exactly what the model saw — not an approximation assembled from "what the features probably were that day." Without this, every "why did you block this customer?" audit ends in "we don't know," which in a regulated context is not an inconvenience but a liability.

SQL · DECISION REPLAY (REPRODUCE WHAT v7.2 SAW)

SELECT t.txn_id, t.risk_score, t.decision,
       t.model_id, m.sha, m.training_cutoff,
       r.rules_json,
       fs.feature_vector_json
FROM   fact_transaction t
JOIN   dim_risk_model       m  ON t.model_id = m.model_id
JOIN   dim_ruleset          r  ON t.ruleset_id = r.ruleset_id
JOIN   dim_feature_snapshot fs ON t.feature_snapshot_id = fs.feature_snapshot_id
WHERE  t.txn_id = :txn_id;
-- Reproduces the EXACT inputs the model used 6 months ago. The three frozen
-- IDs on the fact are the entire defense; reconstructed context is no defense.

The graph supplies the other half of correctness: the connected-component lookup that turns a per-account question into a per-cluster one. Clusters are formed on edges of strength ≥ 0.5 so that strong and medium ties draw the action graph while weak ties stay advisory. The component itself is computed off the hot path; the decision merely reads which cluster a user belongs to.

SQL · CONNECTED-COMPONENT MEMBERSHIP (READ AT DECISION TIME)

SELECT cluster_id, cluster_score, member_count
FROM   identity_cluster
WHERE  user_id = :user_id;            -- O(1) read; the walk happened nightly
-- Strong+medium edges (>= 0.5) form this component. Weak edges (0.2) are NOT
-- here — they feed the analyst as "may be related," never an auto-block.

A risk score is an opinion until you can show the rules, the model, and the vector behind it. Freeze all three on the fact, or do not write the fact at all.FRAUD RULE Nº 1 — FREEZE THE CONTEXT

§ 05 — INGESTION & STREAMSPython on the decision path

Three programs carry the system: the scorer that freezes its context as it writes, the edge ingester that types every observed coincidence, and the chargeback merger that lands late labels idempotently. Each is small; the judgment is in what they refuse to do.

1 · The scorer — decide, then freeze, in one transaction

The scorer's one discipline is atomicity of provenance: the score and the three frozen IDs are written together, never in two steps. It assembles the feature vector, persists that vector as a snapshot, runs the rules and the model against the currently effective ruleset and model, renders the decision, and writes the fact with all three IDs in a single transaction. The thing it refuses to do is reconstruct context after the fact — a feature vector re-derived at audit time is not the vector the model saw, and pretending otherwise is the bug that loses the regulator's trust.

PYTHON · SCORER: DECIDE AND FREEZE ATOMICALLY

async def decide(db, txn: dict) -> str:
    """Assemble features, score, decide — and freeze the EXACT context on the
    fact in one transaction. The feature vector is persisted as a snapshot
    BEFORE the decision row references it, so replay is always exact."""
    rs    = await current_ruleset(db)            # effective-now ruleset
    model = await current_model(db)               # effective-now model
    feats = await assemble_features(db, txn)      # incl. cluster_id lookup (precomputed)

    async with db.transaction():
        snap_id = await db.fetchval(
            """INSERT INTO dim_feature_snapshot (feature_vector_json, computed_at)
               VALUES ($1, now()) RETURNING feature_snapshot_id""",
            feats.as_json(),                       # the exact inputs, frozen
        )
        score, deltas = score_txn(rs, model, feats)
        decision = ("block"    if score >= rs.block_threshold
                    else "step_up" if score >= rs.step_up_threshold
                    else "allow")
        await db.execute(
            """INSERT INTO fact_transaction
                 (txn_id, user_id, payment_id, amount, currency, ip_id,
                  device_id, ts, risk_score, decision,
                  ruleset_id, model_id, feature_snapshot_id, txn_date)
               VALUES ($1,$2,$3,$4,$5,$6,$7,now(),$8,$9,$10,$11,$12,current_date)""",
            txn["txn_id"], txn["user_id"], txn["payment_id"], txn["amount"],
            txn["currency"], txn["ip_id"], txn["device_id"], score, decision,
            rs.ruleset_id, model.model_id, snap_id,            # the three frozen IDs
        )
        await db.executemany(
            """INSERT INTO fact_rule_firing (txn_id, rule_id, score_delta)
               VALUES ($1, $2, $3)""",
            [(txn["txn_id"], r, d) for r, d in deltas],   # the explainable chain
        )
    return decision

One distinction, always stated: the scorer reads the cluster id from the precomputed identity_cluster table; it never traverses edges live. The sub-100ms budget forbids a graph walk on the decision path, and the freshness cost — a cluster up to one nightly cycle stale, patched by the incremental merge — is far cheaper than the latency a live traversal would add.

2 · The edge ingester — type every coincidence

PYTHON · EDGE INGESTER (TYPED STRENGTH, APPEND-ONLY)

# Strength is assigned by HOW the two users are linked. Shared payment is
# near-certain; shared IP within an hour is a coincidence until corroborated.
STRENGTH = {"payment": 1.0, "kyc_doc": 1.0, "sim": 1.0,
            "device_fp": 0.5, "ip": 0.2, "residential_asn": 0.2}

async def observe_edge(db, ua: int, ub: int, via: str, seen) -> None:
    """Record a typed edge between two users. Append-only: a coincidence is
    never un-observed. Canonical ordering (ua < ub) keeps the graph undirected
    and the edge idempotent on (user_a, user_b, via)."""
    if ua == ub:
        return
    ua, ub = sorted((ua, ub))                       # canonical undirected pair
    await db.execute(
        """INSERT INTO brg_user_device (user_a, user_b, edge_strength, via,
                                       first_seen, last_seen)
           VALUES ($1, $2, $3, $4, $5, $5)
           ON CONFLICT (user_a, user_b, via)
           DO UPDATE SET last_seen = GREATEST(brg_user_device.last_seen,
                                              EXCLUDED.last_seen)""",
        ua, ub, STRENGTH[via], via, seen,
    )

3 · The chargeback merger — idempotent late labels

Chargebacks arrive months late and the networks retry them, so the merge must be idempotent — the same dispute landing twice is one row, not two. And it must reconcile: a chargeback that contradicts an earlier "clean" analyst verdict overrides it in fact_outcome_label, because money changing hands is the ground truth that an opinion is not.

PYTHON · CHARGEBACK MERGE → OUTCOME RECONCILIATION

async def merge_chargeback(db, cb: dict) -> None:
    """Land a late chargeback idempotently and reconcile the outcome label.
    A chargeback is ground truth: where it disagrees with an earlier analyst
    'clean', the final label flips to fraud."""
    async with db.transaction():
        await db.execute(
            """INSERT INTO fact_chargeback
                 (txn_id, network, reason_code, disputed_amount, arrived_ts)
               VALUES ($1,$2,$3,$4,$5)
               ON CONFLICT (txn_id) DO NOTHING""",             # network retries dedup
            cb["txn_id"], cb["network"], cb["reason_code"],
            cb["disputed_amount"], cb["arrived_ts"],
        )
        await db.execute(
            """INSERT INTO fact_outcome_label (txn_id, final_label, source)
               VALUES ($1, 'fraud', 'chargeback')
               ON CONFLICT (txn_id) DO UPDATE
                 SET final_label = 'fraud',
                     source = CASE WHEN fact_outcome_label.final_label = 'fraud'
                                   THEN 'agreed' ELSE 'chargeback' END""",
            cb["txn_id"],
        )

§ 06 — AGGREGATIONThe cluster job, and the moment you find the ring

Two slow layers do the heavy thinking. The nightly connected-component job turns millions of typed edges into clusters the decision path can read in O(1). And cluster expansion is the payoff: confirm one mule, and the graph hands you the other forty-seven.

The connected-component job is the aggregation that makes the graph usable. Walking edges live at decision time is impossible under a 100ms budget, so the job runs nightly over all edges of strength ≥ 0.5, assigns each maximal component a cluster_id, scores the component by the risk of its members, and writes the flat identity_cluster rollup. New edges observed during the day are merged incrementally — a new strong edge between two existing clusters unions them — so the decision path always reads a near-current component without ever traversing the graph itself.

PYTHON · NIGHTLY CONNECTED-COMPONENT JOB (UNION-FIND)

def build_clusters(edges) -> dict[int, int]:
    """Connected components over strong+medium edges (>= 0.5) via union-find.
    Runs nightly over the full edge set; the result is the flat membership the
    decision path reads in O(1). Weak edges (0.2) are deliberately EXCLUDED —
    they would collapse half the user base into one giant useless component."""
    parent: dict[int, int] = {}

    def find(x):
        parent.setdefault(x, x)
        while parent[x] != x:
            parent[x] = parent[parent[x]]          # path compression
            x = parent[x]
        return x

    def union(a, b):
        parent[find(a)] = find(b)

    for ua, ub, strength in edges:
        if strength >= 0.5:                          # the action-graph threshold
            union(ua, ub)
    return {u: find(u) for u in parent}        # user_id → cluster_id (= root)

The expansion query is the operational climax of the whole system, and it is a recursive walk on strong edges out from a confirmed-fraud seed. When an analyst confirms one account as fraud, the system does not stop there — it expands outward up to four hops on edges of strength ≥ 0.5 and surfaces every account in the same component, most of which the analyst never knew were related. The typical haul is thirty to two hundred times the seed for mule rings, and none of those accounts had individually crossed the per-account threshold; only the cluster tied them together.

SQL · CLUSTER EXPANSION (FIND ONE MULE, FIND THE RING)

WITH RECURSIVE walk AS (
    SELECT user_id, 0 AS hop
    FROM   fact_analyst_label al
    JOIN   fact_case c USING (case_id)
    WHERE  c.case_id = :seed_case AND al.label = 'fraud'
    UNION
    SELECT CASE WHEN e.user_a = w.user_id THEN e.user_b ELSE e.user_a END,
           w.hop + 1
    FROM   walk w
    JOIN   brg_user_device e ON w.user_id IN (e.user_a, e.user_b)
    WHERE  e.edge_strength >= 0.5 AND w.hop < 4    -- strong edges, 4 hops out
)
SELECT DISTINCT user_id, min(hop) AS hops_from_seed
FROM   walk
GROUP BY user_id
ORDER BY hops_from_seed;
-- Typical haul: 30–200x the seed for mule rings, 2–5x for friendly fraud.

The threshold catches one account at a time. The graph catches the whole ring — accounts no single rule would ever have flagged.FRAUD RULE Nº 2 — ACT ON THE COMPONENT

§ 07 — ANALYTICS SQLInterrogating the system, by who asks

The decision facts, the graph, and the outcome union are where the system explains itself. Three queries an interviewer loves, each grouped by its stakeholder and each carrying a classic pattern: window LEAD for velocity, conditional aggregation for parallelism, and an SCD2-joined false-positive rate.

Detection engineer — impossible travel (window LEAD)

Impossible travel is about velocity, not parallelism: the same user appears in two geographies faster than any aircraft could carry them. The canonical move is LEAD over each user's session events ordered by time — pair each event with the next, compute great-circle distance over elapsed time, and flag anything exceeding 900 km/h.

SQL · IMPOSSIBLE-TRAVEL SIGNAL (LEAD OVER SESSIONS)

WITH paired AS (
    SELECT s.user_id, s.ts AS ts_a, s.geo AS geo_a,
           lead(s.ts)  OVER (PARTITION BY s.user_id ORDER BY s.ts) AS ts_b,
           lead(s.geo) OVER (PARTITION BY s.user_id ORDER BY s.ts) AS geo_b
    FROM   fact_session_event s
)
SELECT user_id, ts_a, geo_a, ts_b, geo_b,
       extract(epoch FROM (ts_b - ts_a)) / 60        AS gap_minutes,
       great_circle_km(geo_a, geo_b)                  AS distance_km
FROM   paired
WHERE  ts_b IS NOT NULL AND geo_a <> geo_b
  AND  great_circle_km(geo_a, geo_b)
       / nullif(extract(epoch FROM (ts_b - ts_a)) / 3600, 0) > 900
ORDER BY gap_minutes;
-- > 900 km/h ⇒ faster than any plane ⇒ two identities, not one traveler.

Detection engineer — account sharing (conditional aggregation)

Account sharing is the inverse shape: parallelism, not velocity. One user, many simultaneous devices and geographies — the Netflix-family or Spotify-share pattern. The detector counts distinct geos and distinct devices per user in a window; three-plus of each is a candidate for shared-account review, explicitly separated from impossible travel because the two have different false-positive classes and different remediations.

SQL · ACCOUNT-SHARING DETECTOR (PARALLELISM, NOT VELOCITY)

SELECT user_id,
       count(DISTINCT geo)       AS distinct_geos_24h,
       count(DISTINCT device_id) AS distinct_devices_24h
FROM   fact_session_event
WHERE  ts >= now() - INTERVAL '24 hours'
GROUP BY user_id
HAVING count(DISTINCT geo) >= 3
   AND count(DISTINCT device_id) >= 3;
-- 3+ geos AND 3+ devices in 24h = shared-account candidate. Distinct from
-- impossible-travel (velocity) and from ATO (one new device, one new geo).

Risk PM — false-positive rate by ruleset version (SCD2 as-of)

"Which ruleset version is over-blocking?" is the question that decides whether a rule ships or rolls back. Because the ruleset id is frozen on every fact and the final verdict lives in the outcome union, the false-positive rate per ruleset version is a left join from decisions to outcomes — blocks that the ground truth later called clean, divided by all blocks, grouped by the version that made them.

SQL · FALSE-POSITIVE RATE BY RULESET VERSION (LAST 30 D)

SELECT t.ruleset_id, r.rules_json -> 'name' AS ruleset_name,
       count(*) FILTER (WHERE t.decision = 'block') AS blocked,
       count(*) FILTER (WHERE t.decision = 'block'
                          AND ol.final_label = 'clean')   AS false_pos,
       round(100.0 * count(*) FILTER (WHERE t.decision = 'block'
                                       AND ol.final_label = 'clean')
             / nullif(count(*) FILTER (WHERE t.decision = 'block'), 0), 2)
                                                       AS fp_pct
FROM   fact_transaction t
JOIN   dim_ruleset r ON t.ruleset_id = r.ruleset_id
LEFT JOIN fact_outcome_label ol ON t.txn_id = ol.txn_id
WHERE  t.ts >= now() - INTERVAL '30 days'
GROUP BY t.ruleset_id, r.rules_json -> 'name'
ORDER BY fp_pct DESC;

§ 08 — THE DASHBOARDProving the system is honest

A senior design ends with observability, because a fraud system that cannot see its own false-positive rate is a liability with a UI. The dashboard watches three things: are we blocking the right things, is the loop closing, and can we still replay?

DECISION QUALITY: block rate, step-up rate, false-positive rate by ruleset version against the outcome union, and approval friction — every false positive is a real customer turned away, so a rising FP rate is a revenue alarm, not just a model one.
FEEDBACK HEALTH: label latency (analyst hours, chargeback days), outcome agreement % between the two sources, unlabeled-decision backlog, and chargeback rate on allowed txns — the leakage the model missed, the truest measure of the loop.
GRAPH & REPLAY: cluster job freshness, largest component size (a sudden giant means a weak edge leaked into the ≥ 0.5 set), expansion haul per confirmed seed, and replay coverage — the fraction of decisions whose three frozen IDs all resolve, which must be 100%.

Trust & Safety Ops — Decision Engine TUE 03:10 UTC · MODEL v7.2 · 60s REFRESH

Block Rate

1.9%

False-Positive Rate

4.1%

Step-Up Rate

3.4%

Chargeback Leakage

0.18%

Cluster expansion · accounts surfaced per confirmed seed (last 12 cases)

Replay Coverage

100%

Outcome Agreement

91%

Cluster Freshness

Largest Component

412

Label Backlog

2.1k

Impossible-Travel Hits

340/d

FIG. 2 — The story a working loop tells: replay coverage pinned at 100% so every decision is defensible, expansion hauls spiking on mule rings, outcome agreement healthy at 91% — but a false-positive rate drifting amber and a largest-component creeping up, two signals that want investigating before the next model ships.

Read the amber and red tiles together and the dashboard narrates the two failure modes from §06 and §04. The largest component climbing toward 412 is the tell that a weak edge has leaked into the ≥ 0.5 action set — a single mislabeled device_fp strength can chain hundreds of unrelated users into one false cluster, so it is investigated before the cluster job runs again. The chargeback-leakage tile glowing red is the truest alarm in the room: it is fraud the model allowed, surfacing 30–180 days later, and it is the number that justifies the whole feedback apparatus. Replay coverage at 100% is the quiet hero — it means every one of these decisions can be defended to a regulator. That is what a designed, auditable fraud system looks like from the operator's chair at three in the morning.

§ 09 — THE RUBRICWhat was actually being tested

Strip the transactions away and the question was testing five judgments, each of which generalizes far beyond fraud:

GRAPH: Modeling identity as a typed-strength graph resolved into connected components, not a row — and knowing that strong edges drive action while weak edges only advise. The schema, not a rule, is what disambiguates sharing from takeover from multi-account abuse.
PROVENANCE: Freezing the ruleset, the model, and the feature vector on every fact so a six-month-old decision is replayable to the byte. Reconstructed context is no defense; stamped context is the whole defense.
VELOCITY vs PARALLELISM: Distinguishing impossible travel (one identity moving too fast) from account sharing (one account in too many places at once) — different windows, different patterns, different false-positive classes.
FEEDBACK: Closing the loop with a union of two late, disagreeing label sources — analyst verdicts in hours, chargebacks in months — into a single training-set source of truth. The loop is what keeps the model from rotting unseen.
PRECOMPUTATION: Reading the graph, never walking it, at decision time — the connected-component job runs off the hot path so a 100ms budget is honored, with incremental merges keeping the read near-current.

Stripe, Wise, Coinbase, PayPal all land on this shape. The differences are which graph backend, whether the cluster job is real-time or batch, and how aggressively the feature store separates online from offline — never whether you need the graph, the frozen context, and the loop.— CLOSING ARGUMENT