Model the data behind a fraud system at a fintech or marketplace. One bad actor wears three emails, two phones, four devices, five IPs — and a chargeback you have to defend to a regulator arrives six months after you blocked it. The one decision that separates a senior answer: fraud is a graph, frozen in time, with a feedback loop. A complete working-through — data flow, schema, the cluster expansion, the replay SQL, and the dashboard.
It sounds like a column. It is really a graph, a time machine, and a feedback loop — three systems the prompt hides inside the word "score."
"Design the data model for a fraud system at a fintech / marketplace / streaming product. How do you detect it, and how do you defend a decision six months later?"
The junior answer stops at "a transaction table with a risk_score column," and the interviewer knows it within thirty seconds, because that schema cannot answer any of the questions that actually matter. It cannot tell you that the account you just blocked shares a payment instrument with forty-seven others. It cannot reproduce why you blocked the September 14 transaction when the regulator asks in March. And it cannot learn, because the chargeback that proves you were right arrives ninety days after the decision and has nowhere to land. The senior signal is naming the five orthogonal sub-problems first, and noticing that three of them — the graph, the frozen decision context, and the feedback loop — are the parts the naive schema has no room for.
So before any boxes and arrows, the working frame for the session — five problems pretending to be one model:
Scope is the first scored dimension, and most candidates skip it. State the disambiguation up front, because conflating the three "identity ≠ account" patterns is the classic pitfall. In scope: the typed identity graph and its connected-component rollup, the frozen decision context for replay, the velocity and sharing detectors, and the label-reconciliation feedback fact. Out of scope, said explicitly: the model training code itself (a consumer of these tables), the real-time feature store internals, the case-management UI, and the choice of graph backend — Neo4j versus JanusGraph versus a Postgres recursive CTE — which is an implementation detail the schema is deliberately agnostic to. The thread tying scope together is defensibility: every decision the system makes must be reproducible and every outcome must flow back as a label.
Then the envelope math, offered rather than extracted. Fintech-shaped numbers:
| Quantity | Estimate | Consequence |
|---|---|---|
| Decision latency budget | < 100 ms | Graph lookup must be precomputed, not walked live |
| Strong-edge cluster job | nightly + incremental | Connected components batched; new edges merged online |
| Decision-replay window | 6 months | The horizon that forces SCD2 on everything |
| Chargeback delay | 30–180 days | Labels are late-arriving facts — idempotent merge |
| Signal-fact volume | 1 row / rule firing | High — partition by day, 90-day hot tier |
| Cluster expansion haul | 30–200× the seed | One mule surfaces a whole ring |
| Impossible-travel threshold | > 900 km/h | Faster than any plane ⇒ two identities |
Notice which number forces the architecture. The six-month replay window is the constraint that turns "store a score" into "freeze the entire decision context" — the ruleset, the model, and the exact feature vector the model saw, all SCD2, all stamped on the fact. That single requirement dictates the dimensional design, the partition strategy, and the audit posture. The rest of this article follows the decision.
One decision path, two slower loops feeding it. The graph is precomputed and read in milliseconds; the decision freezes its context as it writes; and labels return on their own delayed schedule to retrain the next model. The score is the cheap part.
Three properties of this picture do most of the interview's work. First, the cluster lookup on the decision path is a read of a precomputed component, never a live graph walk — at a sub-100ms budget you cannot traverse edges synchronously, so the connected-component job runs nightly and incrementally and the decision path simply reads a cluster id. Second, the decision freezes its own context as it writes: ruleset_id, model_id, and feature_snapshot_id land on the fact row in the same transaction as the score, because a context reconstructed later is a context you cannot trust. Third, the feedback arrows are slow and they close the loop: analyst labels in hours, chargebacks in months, both flowing into a union fact that the retrain reads — the loop, not the model, is what keeps the system from rotting.
A decision you cannot reproduce is a decision you cannot defend. Every fact row carries the three frozen IDs so that six months later the exact ruleset, the exact model SHA, and the exact feature vector are recoverable — not approximated from "what the rules probably were." Strong-edge clusters drive automated action; weak-edge graphs are advisory and go to an analyst, never to an auto-block, because a weak edge is a coincidence until a human says otherwise. The failure mode is a defensible, replayable record — never a score with no provenance.
The schema falls out of three demands: resolve identity as a graph, freeze the decision so it can be replayed, and reconcile two late label sources. Subject dimensions are SCD2; edges are append-only and typed; the decision context is frozen on the fact.
The subject dimensions — dim_user, dim_device, dim_payment_instrument — are SCD2, but only on the attributes that bear risk and must be frozen at decision time: kyc_status, account_state, device_trust_score, bin_country. Location dimensions carry the third-party signals — is_vpn, is_proxy, is_datacenter, asn_risk_class — also SCD2, because an ASN's risk classification changes and a replay must use the classification that was current then.
This is the part the naive schema has no room for, and the part the interviewer is really probing. Identity is not a row; it is a graph. Edges connect users to the things they share — devices, IPs, payment instruments — and each edge carries a edge_strength that encodes how much that sharing means. Strong (1.0): the same payment instrument, the same KYC document hash, the same SIM. Medium (0.5): the same device fingerprint within twenty-four hours. Weak (0.2): the same IP within an hour, the same residential ASN within a week. The edges are append-only — you never un-observe a coincidence — and the strengths are what let the connected-component job draw the line between an action cluster and an advisory one.
The decision facts — fact_transaction, fact_login_attempt, fact_session_event — are one row per attempt and each carries the score, the decision, and the three frozen IDs. The frozen context lives in three SCD2 dimensions: dim_ruleset (the rules JSON and its effective window), dim_risk_model (the model SHA, training cutoff, feature-set hash), and dim_feature_snapshot (the exact feature vector the model saw, computed at decision time). Stamping all three IDs on the fact is what makes a six-month-old block replayable to the byte.
Two append-only signal facts trace why: fact_rule_firing (one row per rule that contributed, with its score_delta) and fact_model_score (the raw score, plus shadow-model scores for the next candidate). They are high-volume — partitioned by day, ninety-day hot tier — and they are what turn a single number into an explainable chain of contributions.
Labels arrive from two places on two timelines and they disagree. fact_analyst_label records a human verdict — fraud, clean, inconclusive — within hours. fact_chargeback arrives 30–180 days later, idempotent because card networks retry. The reconciliation is fact_outcome_label: a union that resolves disagreements (analyst said clean, a chargeback came back ninety days later → final label is fraud) into the training-set source of truth.
Everything in this design protects one sentence: the exact inputs to any decision are recoverable six months later. The invariant lives in three frozen IDs stamped on the fact in the same write as the score.
The lifecycle of a decision is short, but its defense is long. Features are assembled, the rules and model produce a score, a decision is rendered, and — atomically — the fact is written with the ruleset, model, and feature-snapshot IDs frozen alongside it. Then come the labels: a case opens, an analyst rules, and months later a chargeback may or may not agree. The invariant says the decision's provenance never decays: the rules that ran, the model that scored, and the precise vector it saw are pinned, not reconstructed.
The replay query is where correctness lives. Join the transaction to its frozen model and feature snapshot by the IDs on the fact, and you reproduce exactly what the model saw — not an approximation assembled from "what the features probably were that day." Without this, every "why did you block this customer?" audit ends in "we don't know," which in a regulated context is not an inconvenience but a liability.
The graph supplies the other half of correctness: the connected-component lookup that turns a per-account question into a per-cluster one. Clusters are formed on edges of strength ≥ 0.5 so that strong and medium ties draw the action graph while weak ties stay advisory. The component itself is computed off the hot path; the decision merely reads which cluster a user belongs to.
Three programs carry the system: the scorer that freezes its context as it writes, the edge ingester that types every observed coincidence, and the chargeback merger that lands late labels idempotently. Each is small; the judgment is in what they refuse to do.
The scorer's one discipline is atomicity of provenance: the score and the three frozen IDs are written together, never in two steps. It assembles the feature vector, persists that vector as a snapshot, runs the rules and the model against the currently effective ruleset and model, renders the decision, and writes the fact with all three IDs in a single transaction. The thing it refuses to do is reconstruct context after the fact — a feature vector re-derived at audit time is not the vector the model saw, and pretending otherwise is the bug that loses the regulator's trust.
One distinction, always stated: the scorer reads the cluster id from the precomputed identity_cluster table; it never traverses edges live. The sub-100ms budget forbids a graph walk on the decision path, and the freshness cost — a cluster up to one nightly cycle stale, patched by the incremental merge — is far cheaper than the latency a live traversal would add.
Chargebacks arrive months late and the networks retry them, so the merge must be idempotent — the same dispute landing twice is one row, not two. And it must reconcile: a chargeback that contradicts an earlier "clean" analyst verdict overrides it in fact_outcome_label, because money changing hands is the ground truth that an opinion is not.
Two slow layers do the heavy thinking. The nightly connected-component job turns millions of typed edges into clusters the decision path can read in O(1). And cluster expansion is the payoff: confirm one mule, and the graph hands you the other forty-seven.
The connected-component job is the aggregation that makes the graph usable. Walking edges live at decision time is impossible under a 100ms budget, so the job runs nightly over all edges of strength ≥ 0.5, assigns each maximal component a cluster_id, scores the component by the risk of its members, and writes the flat identity_cluster rollup. New edges observed during the day are merged incrementally — a new strong edge between two existing clusters unions them — so the decision path always reads a near-current component without ever traversing the graph itself.
The expansion query is the operational climax of the whole system, and it is a recursive walk on strong edges out from a confirmed-fraud seed. When an analyst confirms one account as fraud, the system does not stop there — it expands outward up to four hops on edges of strength ≥ 0.5 and surfaces every account in the same component, most of which the analyst never knew were related. The typical haul is thirty to two hundred times the seed for mule rings, and none of those accounts had individually crossed the per-account threshold; only the cluster tied them together.
The decision facts, the graph, and the outcome union are where the system explains itself. Three queries an interviewer loves, each grouped by its stakeholder and each carrying a classic pattern: window LEAD for velocity, conditional aggregation for parallelism, and an SCD2-joined false-positive rate.
Impossible travel is about velocity, not parallelism: the same user appears in two geographies faster than any aircraft could carry them. The canonical move is LEAD over each user's session events ordered by time — pair each event with the next, compute great-circle distance over elapsed time, and flag anything exceeding 900 km/h.
Account sharing is the inverse shape: parallelism, not velocity. One user, many simultaneous devices and geographies — the Netflix-family or Spotify-share pattern. The detector counts distinct geos and distinct devices per user in a window; three-plus of each is a candidate for shared-account review, explicitly separated from impossible travel because the two have different false-positive classes and different remediations.
"Which ruleset version is over-blocking?" is the question that decides whether a rule ships or rolls back. Because the ruleset id is frozen on every fact and the final verdict lives in the outcome union, the false-positive rate per ruleset version is a left join from decisions to outcomes — blocks that the ground truth later called clean, divided by all blocks, grouped by the version that made them.
A senior design ends with observability, because a fraud system that cannot see its own false-positive rate is a liability with a UI. The dashboard watches three things: are we blocking the right things, is the loop closing, and can we still replay?
Read the amber and red tiles together and the dashboard narrates the two failure modes from §06 and §04. The largest component climbing toward 412 is the tell that a weak edge has leaked into the ≥ 0.5 action set — a single mislabeled device_fp strength can chain hundreds of unrelated users into one false cluster, so it is investigated before the cluster job runs again. The chargeback-leakage tile glowing red is the truest alarm in the room: it is fraud the model allowed, surfacing 30–180 days later, and it is the number that justifies the whole feedback apparatus. Replay coverage at 100% is the quiet hero — it means every one of these decisions can be defended to a regulator. That is what a designed, auditable fraud system looks like from the operator's chair at three in the morning.
Strip the transactions away and the question was testing five judgments, each of which generalizes far beyond fraud: