PaddySpeaks · Systems at the Whiteboard · Nº 06

The Fulfillment Problem

Model Amazon orders end to end — partial returns, replacements, marketplace sellers (1P vs FBA vs 3P), and a single SKU pulled from two warehouses. The decision that separates a senior answer: never update history. Append the refund, log the movement, recognize revenue through one column. A full working through — data flow, schema, the ledger invariant, streaming Python, nightly reconciliation, analytics SQL, and the dashboard.

§ 01 — THE QUESTIONOne product, two grains, one truth

A sale is the easy part. The hard part is everything that reshapes it afterward — a partial return a week later, a replacement that triple-counts at the gross level, a 3P seller where Amazon books commission instead of the whole price, and stock that has to reconcile to the ledger every single night.

Interview Prompt

"Design a model for Amazon orders that handles partial returns, replacements, marketplace seller orders — 1P retail, FBA, and 3P marketplace — and inventory drawn from multiple warehouses with FIFO/LIFO consumption rules. How would you scope this out?"

LEVEL · SENIOR / STAFFDURATION · 45 MINFORMAT · WHITEBOARD

The instinct that fails this interview is the instinct to keep an order's row up to date. A customer returns three of six pens, so you decrement the quantity; a seller is marketplace, so you overwrite the revenue; stock ships, so you set the on-hand number. Every one of those is an UPDATE, and every UPDATE destroys the one thing finance and operations cannot live without — the ability to ask what was true last Tuesday and get an honest answer. Returns reshape revenue retroactively. The seller-of-record decides whether a dollar is GMV or commission. And the same physical pen lives at two grains at once: a financial event that must never be mutated, and a stock movement that must reconcile to a count. The whole question is how to hold both without lying about either.

So before any boxes and arrows, the working frame for the session — two ledgers and a snapshot, each immutable in its own way:

THE FINANCIAL LEDGER
Order lines and returns, append-only. A sale lands as one row per (order × product × source warehouse); a return lands as a new row that references it. Net revenue is a subtraction across two tables — never an edit to one. Finance can restate any historical period because history was never touched.
THE PHYSICAL LEDGER
Inventory movements, immutable. Every stock change — receipt, sale-out, return-in, transfer, damage, adjustment — is a signed row that happened and stays. The on-hand count is not stored as truth; it is derived by summing movements. Lose the snapshot, rebuild from the log.
THE RECOGNITION RULE
Seller-of-record as a column. 1P retail books full GMV; 3P marketplace books only commission. This multi-billion-dollar policy is one SCD2 column, recognized_as, that turns into a CASE in every revenue query — not a forked ETL path that rots.
You cannot edit the past and audit it too. The model that survives quarter-end is the one that only ever appends — and computes the present by reading the whole history forward.

Scoping out loud

Scope is the first scored dimension, and most candidates skip it. Said out loud: in scope is the order-to-cash and inventory model — order lines, returns, replacements, movements, the daily snapshot, and the seller-recognition rule that ties them to finance. Out of scope, explicitly: the checkout and payments-capture flow (we receive an order, we don't authorize the card), the routing/carrier and last-mile logistics engine, pricing and promotions optimization (we record the discount, we don't compute it), and the warehouse-management WMS internals (we consume its movement events). The caveat: FIFO/LIFO consumption is in scope as a costing rule on movements, because it changes the dollar value of cost-of-goods, even though the physical pick path is the WMS's job.

Then the envelope math, volunteered rather than extracted. Amazon-shaped numbers:

QuantityEstimateConsequence
Orders / day (peak)~70–80 MSets the write rate on the financial ledger
Avg lines per order~2.5Plus warehouse splits → more rows than orders
Inventory movements / day≈ 1–2 BThe number that shapes the whole architecture
Return rate~5–15 % (category-dep.)Reshapes revenue weeks after the sale
3P share of units~60 %Most rows hit the commission carve-out
Active SKUs × warehouses~10⁹ snapshot rowsThe nightly reconciliation surface
Reconciliation SLAnightly · zero driftSnapshot must equal SUM(movements)

Notice where the weight sits. The financial ledger is large but tame — tens of millions of orders a day is gigabytes, and finance reads it in batch. The firehose is the physical ledger: a billion-plus signed movements a day, every one immutable, all of which must sum back to the snapshot to the unit. That single row of the table — movements as the source of truth, snapshot as a derived cache — dictates the storage layout, the reconciliation job, and the failure philosophy. The rest of the design follows the stock.


§ 02 — DATA FLOWFollowing a pen through the building

A sale forks into two streams the moment it is placed: a financial event onto the order ledger, and a physical event onto the movement log. Returns and replacements append to both. The daily snapshot is a fold of movements, and reconciliation is the proof the two ledgers still agree.

FINANCIAL LEDGER · APPEND-ONLY · RESTATABLE PHYSICAL LEDGER · IMMUTABLE MOVEMENTS · ~1–2B/DAY ORDER INTAKE placed · split by WH ORDER LINE LEDGER 1 row / order×prod×WH gross_revenue · never UPDATE seller_id → recognized_as RETURNS LOG (APPEND) refund references order_line partial qty · condition REPLACEMENTS links original ↔ new line NET-REVENUE MART SUM(orders) − SUM(returns) GMV vs commission split WMS EVENTS pick · pack · receive MOVEMENT LOG signed qty · type · cost immutable source of truth NIGHTLY SNAPSHOT fold movements → on-hand RECONCILIATION snap == Σ(movements)? OPS DASHBOARDS DOH · fill rate · drift transfer efficiency a return is BOTH: refund row + return_in movement a sale is BOTH: order line + sale_out movement SOLID — data lands here · DASHED — same event, recorded at both grains · Append the truth twice, never edit it once.
FIG. 1 — Every business event lands at two grains: a financial row that finance restates from, and a physical movement that ops reconciles from. The snapshot is derived; the truth is the logs.

Three properties of this picture do most of the interview's work. First, nothing on the financial side is ever mutated — a return does not edit the order line it refunds, it appends a row that points back at it, so net revenue is a subtraction across two append-only tables and any past period can be recomputed exactly. Second, the snapshot is a cache, not a source: on-hand stock is a fold of the immutable movement log, so if the snapshot is lost or corrupted it rebuilds from movements rather than from a backup of a number nobody can trust. Third, one real-world event lands at two grains on purpose — a sale is both an order line and a sale-out movement; a return is both a refund row and a return-in movement — and keeping them as separate facts that reference a shared key is what lets finance and operations each have a complete, non-lossy view of the same pen.

The Failure Philosophy, In One Rule

The snapshot may lag; the log may not drop. If the nightly fold fails, yesterday's on-hand is stale but the movement log is intact, so the snapshot simply rebuilds on the next run — staleness self-heals. But a lost movement is unforgivable: it breaks the reconciliation invariant permanently and corrupts cost-of-goods, so movement ingestion is exactly-once-effective via idempotency keys, durable, and back-pressured rather than dropped. A snapshot you can recompute beats a movement you can never recover.


§ 03 — DATA MODELTwo append-only logs, one derived snapshot

The schema falls out of the immutability question. A financial fact at the sale grain and a returns fact that only ever appends; a physical movement fact that is the system's source of truth; a daily snapshot derived from it; and a replacement bridge that links orders without collapsing them.

The sale grain — wide, and never edited

An order line is the financial atom: one row per (order × product × source_warehouse), so a three-item order whose middle line splits across two warehouses is three rows that share an order_id and a line_number. The customer still sees one line item; the warehouse split is queryable because it is a distinct row, not a buried attribute. Crucially there is no returned_qty column here — returns live in their own table, because writing them back would destroy the historical truth.

DDL · THE SALE GRAIN (APPEND-ONLY)
-- One row per (order × product × source warehouse). A multi-warehouse -- line shares order_id + line_number but differs by source_warehouse_id. -- NEVER updated. Returns append elsewhere; this row is history. CREATE TABLE fct_order_lines ( order_line_id BIGINT PRIMARY KEY, order_id BIGINT NOT NULL, line_number SMALLINT NOT NULL, -- customer's mental "line" customer_id BIGINT NOT NULL, product_id BIGINT NOT NULL, seller_id BIGINT NOT NULL, -- → recognized_as carve-out source_warehouse_id BIGINT NOT NULL, -- the split dimension ordered_ts TIMESTAMPTZ NOT NULL, shipped_ts TIMESTAMPTZ, delivered_ts TIMESTAMPTZ, quantity INT NOT NULL, unit_price_usd NUMERIC(10,2) NOT NULL, promo_discount NUMERIC(10,2) NOT NULL DEFAULT 0, tax NUMERIC(10,2) NOT NULL DEFAULT 0, shipping NUMERIC(10,2) NOT NULL DEFAULT 0, gross_revenue_usd NUMERIC(12,2) NOT NULL, -- frozen at order time is_prime BOOLEAN NOT NULL DEFAULT FALSE, is_subscribe_save BOOLEAN NOT NULL DEFAULT FALSE ) PARTITION BY RANGE (ordered_ts);

The refund grain — a return is a new row, never a deletion

A return references the order line it reverses and carries its own partial quantity, refund amount, and the condition the goods came back in. Append-only is the contract: an unreturn (a customer who changes their mind, a reversed RMA) is another row, not an edit. This is what lets finance restate March's net revenue in June and get a number that matches what March actually closed at, because nothing about March was ever overwritten.

DDL · THE REFUND GRAIN (APPEND-ONLY)
CREATE TABLE fct_returns ( return_id BIGINT PRIMARY KEY, order_line_id BIGINT NOT NULL -- FK → fct_order_lines REFERENCES fct_order_lines (order_line_id), return_ts TIMESTAMPTZ NOT NULL, refund_ts TIMESTAMPTZ, reason_code TEXT NOT NULL, -- defective / wrong_item / no_longer_needed … quantity INT NOT NULL, -- partial OK: 3 of 6 refund_amount_usd NUMERIC(12,2) NOT NULL, restocking_fee_usd NUMERIC(10,2) NOT NULL DEFAULT 0, condition_received TEXT NOT NULL CHECK (condition_received IN ('new','used','damaged')), is_replacement BOOLEAN NOT NULL DEFAULT FALSE ); -- Replacements are LINKED, not collapsed. The original order stays, -- the replacement is a new order line, the return ties them together. -- Triple-counted at gross level on purpose; it nets to zero correctly. CREATE TABLE fct_replacements ( replacement_id BIGINT PRIMARY KEY, original_order_line_id BIGINT NOT NULL REFERENCES fct_order_lines (order_line_id), replacement_order_line_id BIGINT NOT NULL REFERENCES fct_order_lines (order_line_id), replaced_ts TIMESTAMPTZ NOT NULL, reason_code TEXT NOT NULL );

The stock-flow grain — the immutable physical truth

Every change to stock is a signed movement: a receipt is positive, a sale-out negative, a return-in positive again, transfers net to zero across two warehouses, damage and adjustments are explicit. The unit_cost_usd on each movement is what makes FIFO/LIFO costing possible — consuming oldest-cost or newest-cost layers is a function of how you read these rows, not a number you store. This table is never updated; an error is corrected by an offsetting adjustment movement, so the audit trail stays complete.

DDL · THE STOCK-FLOW GRAIN, AND THE DERIVED SNAPSHOT
CREATE TABLE fct_inventory_movements ( movement_id BIGINT PRIMARY KEY, product_id BIGINT NOT NULL, warehouse_id BIGINT NOT NULL, movement_ts TIMESTAMPTZ NOT NULL, movement_type TEXT NOT NULL CHECK (movement_type IN ('receipt','sale_out', 'return_in','transfer_out','transfer_in', 'damage','adjustment')), quantity INT NOT NULL, -- SIGNED: out is negative unit_cost_usd NUMERIC(10,4) NOT NULL, -- the cost layer, for FIFO/LIFO related_order_line_id BIGINT, -- non-null for sale_out / return_in idempotency_key TEXT NOT NULL UNIQUE -- exactly-once from the WMS stream ) PARTITION BY RANGE (movement_ts); -- Pre-aggregated for fast reads; DERIVED from movements every night. -- The reconciliation invariant lives in the column list itself. CREATE TABLE snap_inventory_daily ( product_id BIGINT NOT NULL, warehouse_id BIGINT NOT NULL, snapshot_date DATE NOT NULL, opening_qty BIGINT NOT NULL, receipts BIGINT NOT NULL, returns_in BIGINT NOT NULL, sales BIGINT NOT NULL, damage BIGINT NOT NULL, transfers_net BIGINT NOT NULL, ending_qty BIGINT NOT NULL, weighted_avg_cost NUMERIC(10,4) NOT NULL, PRIMARY KEY (product_id, warehouse_id, snapshot_date) );

The dimensions are conventional and mostly SCD2 — dim_products (titles and dimensions drift), dim_warehouses, dim_customers with SCD2 dim_addresses, dim_promotions — but one carries the whole revenue policy on its back. dim_sellers holds seller_type (retail_1p, fba_2p, marketplace_3p), a commission_pct, and the column that does the work: recognized_as, either gmv or commission_only. That one SCD2 field is the difference between booking a $22 sale and booking $3.30 of commission on it — and because it is a dimension column, the rule is a join-and-CASE, not a separate pipeline.


§ 04 — THE CORE INVARIANTThe reconciliation identity

This system's correctness lives in one equation that must hold every night, per product, per warehouse: opening plus receipts plus returns-in, minus sales, minus damage, plus net transfers, equals ending. If it ever fails to balance, the physical ledger and the snapshot have disagreed — and you trust the ledger.

The invariant is a conservation law for atoms. The snapshot is not allowed to hold a number that the movement log cannot reproduce, so the nightly job does not set the ending quantity — it computes it from the day's signed movements added to yesterday's close, and stores both the computed value and its inputs. Reconciliation then re-derives the same arithmetic and compares: a clean run returns nothing, a non-empty result is a row where stock has gone unaccounted, which is an operational incident, never a number to quietly overwrite.

SQL · THE RECONCILIATION INVARIANT — MUST RETURN ZERO ROWS
-- ending == opening + receipts + returns_in - sales - damage + transfers -- Any row returned here is unaccounted stock: investigate, don't patch. SELECT product_id, warehouse_id, snapshot_date, opening_qty, receipts, returns_in, sales, damage, transfers_net, ending_qty, (opening_qty + receipts + returns_in - sales - damage + transfers_net) AS computed_ending, ending_qty - (opening_qty + receipts + returns_in - sales - damage + transfers_net) AS drift FROM snap_inventory_daily WHERE snapshot_date = CURRENT_DATE - 1 AND ending_qty != opening_qty + receipts + returns_in - sales - damage + transfers_net;

There is a second invariant on the financial side, quieter but just as load-bearing: net revenue is always a subtraction, never an edit. A replacement deliberately triple-counts at the gross level — the original sale, the return that reverses it, and the new replacement line all post — and the only reason that is safe is that the three rows net to zero by construction. The original gross is cancelled by the refund; the replacement stands as the real revenue. Collapse those into one mutated row and you lose both the audit trail and the ability to explain a customer's history; keep them as three append-only facts and the arithmetic takes care of itself.

The snapshot is a convenience. The movement log is the truth. When they disagree, the log wins and the snapshot is rebuilt — never the other way around.FULFILLMENT RULE Nº 1 — THE LEDGER IS THE SOURCE

The order lifecycle, and where returns re-enter

PLACED SHIPPED DELIVERED SETTLED RETURN OPENED REFUNDED

The forward path — placed, shipped, delivered, settled — is a sequence of timestamp columns on the order line, set as the events arrive; it is the only place the order row "changes," and even then it only fills nulls, never rewrites a value. The return loop is entirely separate: it does not reopen the order line, it appends to fct_returns and emits a return_in movement, so a refund three weeks after delivery reshapes revenue and restores stock without ever touching the historical sale. That separation — forward state as fill-only timestamps, reversals as new facts — is what keeps the two ledgers honest.


§ 05 — INGESTION & STREAMSPython on the two ledgers

Three small programs carry the write path: the order splitter that fans one cart into warehouse-grained lines plus their sale-out movements, the movement consumer that keeps the physical log exactly-once, and the return handler that appends a refund and its return-in. Each is small; the judgment is in what they refuse to do.

1 · The order splitter — one cart, many grains

The splitter's most important behavior is that a single customer line can become several rows when stock is sourced from more than one warehouse — and it emits the physical twin of each at the same time. One placed order yields N order lines and N matching sale-out movements, and they share the order_line_id so finance and operations can always rejoin them.

PYTHON · ORDER SPLITTER → LINES + SALE-OUT MOVEMENTS
import uuid def split_order(order, allocation) -> tuple[list, list]: """Fan one cart into (order_lines, movements). A line whose demand is met from two warehouses becomes two rows sharing order_id + line_number. Each fulfilling row emits its sale_out twin so the physical ledger never lags the financial one.""" lines, movements = [], [] for item in order.items: # allocation[item] = [(warehouse, qty, unit_cost), ...] from the WMS for wh, qty, unit_cost in allocation[item.line_number]: ol_id = uuid.uuid4().int >> 64 lines.append({ "order_line_id": ol_id, "order_id": order.id, "line_number": item.line_number, "product_id": item.product_id, "seller_id": item.seller_id, "source_warehouse_id": wh, "quantity": qty, "unit_price_usd": item.unit_price, # gross is FROZEN here, at order time, forever "gross_revenue_usd": round(qty * item.unit_price - item.promo_share + item.tax_share, 2), "ordered_ts": order.placed_ts, }) movements.append({ "movement_id": uuid.uuid4().int >> 64, "product_id": item.product_id, "warehouse_id": wh, "movement_type": "sale_out", "quantity": -qty, # SIGNED: leaving stock "unit_cost_usd": unit_cost, "related_order_line_id": ol_id, # idempotency: same line never moves stock twice on retry "idempotency_key": f"sale_out:{ol_id}", }) return lines, movements

One carve-out, always stated: the splitter does not decide costing. It records the unit_cost_usd of the layer it consumed, but whether the business reports FIFO or LIFO cost-of-goods is a read-time decision over the movement history — the writer's job is only to never lose which cost layer left the shelf.

2 · The movement consumer — exactly-once, or it doesn't count

PYTHON · WMS STREAM → MOVEMENT LOG (IDEMPOTENT)
async def ingest_movements(stream, db): """The physical ledger's correctness rides on never double-counting and never dropping. The UNIQUE idempotency_key makes re-delivery a no-op: a WMS that replays a pick event on reconnect cannot move the same stock twice. We dedupe in the DB, not in memory — restarts are free.""" async for batch in stream.batches(max_size=5_000): rows = [_to_movement(ev) for ev in batch] # ON CONFLICT DO NOTHING: at-least-once transport becomes # effectively-once storage. The unique key is the whole defense. await db.executemany(""" INSERT INTO fct_inventory_movements (movement_id, product_id, warehouse_id, movement_ts, movement_type, quantity, unit_cost_usd, related_order_line_id, idempotency_key) VALUES (%(movement_id)s, %(product_id)s, %(warehouse_id)s, %(movement_ts)s, %(movement_type)s, %(quantity)s, %(unit_cost_usd)s, %(related_order_line_id)s, %(idempotency_key)s) ON CONFLICT (idempotency_key) DO NOTHING """, rows) await stream.commit(batch) # ack only after durable write

3 · The return handler — append a refund, restore the stock

A return is two appends that must both land: a refund row that references the original order line, and a return-in movement that puts the goods back (if they came back saleable). The handler refuses to UPDATE the order line, and it prices the refund proportionally to the partial quantity — three of six pens returns half the line's revenue, not the whole thing.

PYTHON · RETURN HANDLER → REFUND + RETURN-IN
async def handle_return(db, rma): """A return NEVER edits the sale. It appends a refund that points at the order line, and — if the item is resalable — a return_in movement that restores stock. Two facts, one event, zero mutation of history.""" line = await db.fetchrow( "SELECT * FROM fct_order_lines WHERE order_line_id = %s", rma.order_line_id) # Proportional refund: 3 of 6 returned → half the line's gross. refund = round(line["gross_revenue_usd"] * rma.quantity / line["quantity"], 2) await db.execute(""" INSERT INTO fct_returns (return_id, order_line_id, return_ts, reason_code, quantity, refund_amount_usd, condition_received) VALUES (%s,%s, now(), %s,%s,%s,%s) """, rma.id, rma.order_line_id, rma.reason, rma.quantity, refund, rma.condition) # Only saleable conditions re-enter inventory. Damaged goods are a # return that books a refund but emits a 'damage' write-off, not a # return_in — the refund stands, the stock does not come back. move_type = "return_in" if rma.condition in ("new", "used") else "damage" sign = +rma.quantity if move_type == "return_in" else -rma.quantity await db.execute(""" INSERT INTO fct_inventory_movements (movement_id, product_id, warehouse_id, movement_ts, movement_type, quantity, unit_cost_usd, related_order_line_id, idempotency_key) VALUES (%s,%s,%s, now(), %s,%s,%s,%s,%s) ON CONFLICT (idempotency_key) DO NOTHING """, uuid.uuid4().int >> 64, line["product_id"], line["source_warehouse_id"], move_type, sign, line["unit_price_usd"], rma.order_line_id, f"{move_type}:{rma.id}")

§ 06 — AGGREGATIONThe nightly fold and the recognition rule

Two derived layers turn logs into answers. A nightly job folds the movement log into the on-hand snapshot — computing, never setting, the ending quantity. And the net-revenue mart applies the seller carve-out, so 1P books GMV and 3P books commission, from one query instead of two pipelines.

The snapshot fold is where the reconciliation invariant is born. Yesterday's ending_qty becomes today's opening_qty; the day's signed movements are bucketed by type and summed; the new ending is the arithmetic sum, stored alongside the inputs that produced it. Because every input is kept, the reconciliation check in §04 can re-derive the same number and catch any disagreement. The weighted-average cost is folded the same way — a value-weighted blend of opening stock and the day's receipts — which is the default costing layer the FIFO/LIFO read-time logic can override per report.

PYTHON · NIGHTLY SNAPSHOT FOLD (COMPUTE, DON'T SET)
def build_inventory_snapshot(spark, run_date): """Fold the immutable movement log into on-hand stock. The ending qty is COMPUTED from signed movements + yesterday's close, never set by a side channel — that's what makes reconciliation a real check and not a tautology.""" moves = (spark.table("fct_inventory_movements") .where(F.to_date("movement_ts") == run_date) .groupBy("product_id", "warehouse_id") .agg( F.sum(F.when(F.col("movement_type") == "receipt", F.col("quantity")).otherwise(0)).alias("receipts"), F.sum(F.when(F.col("movement_type") == "return_in", F.col("quantity")).otherwise(0)).alias("returns_in"), # sale_out is negative; report magnitude as 'sales' F.sum(F.when(F.col("movement_type") == "sale_out", -F.col("quantity")).otherwise(0)).alias("sales"), F.sum(F.when(F.col("movement_type") == "damage", -F.col("quantity")).otherwise(0)).alias("damage"), F.sum(F.when(F.col("movement_type").isin("transfer_in", "transfer_out"), F.col("quantity")).otherwise(0)).alias("transfers_net"))) prev = (spark.table("snap_inventory_daily") .where(F.col("snapshot_date") == F.date_sub(F.lit(run_date), 1)) .select("product_id", "warehouse_id", F.col("ending_qty").alias("opening_qty"))) return (moves.join(prev, ["product_id", "warehouse_id"], "full_outer") .fillna(0) .withColumn("ending_qty", # THE invariant, made flesh F.col("opening_qty") + F.col("receipts") + F.col("returns_in") - F.col("sales") - F.col("damage") + F.col("transfers_net")) .withColumn("snapshot_date", F.lit(run_date)))

The net-revenue mart is where the recognition column earns its keep. The carve-out is a single CASE: when recognized_as = 'gmv', Amazon books the whole gross; when 'commission_only', it books only gross × commission_pct. Returns are recognized through the same lens — a refund on a 3P order reverses only the commission, not the whole refund amount — so net revenue stays internally consistent across the 1P/3P boundary without a second code path.

SQL · NET REVENUE WITH THE 1P / 3P CARVE-OUT
WITH gross AS ( SELECT s.seller_type, sum(CASE WHEN s.recognized_as = 'gmv' THEN ol.gross_revenue_usd ELSE ol.gross_revenue_usd * s.commission_pct END) AS recognized_usd FROM fct_order_lines ol JOIN dim_sellers s USING (seller_id) WHERE ol.ordered_ts >= date_trunc('month', CURRENT_DATE) GROUP BY s.seller_type ), refunds AS ( SELECT s.seller_type, sum(CASE WHEN s.recognized_as = 'gmv' THEN r.refund_amount_usd ELSE r.refund_amount_usd * s.commission_pct END) AS reversed_usd FROM fct_returns r JOIN fct_order_lines ol ON r.order_line_id = ol.order_line_id JOIN dim_sellers s USING (seller_id) WHERE r.return_ts >= date_trunc('month', CURRENT_DATE) GROUP BY s.seller_type ) SELECT g.seller_type, round(g.recognized_usd, 2) AS gross_recognized, round(coalesce(rf.reversed_usd, 0), 2) AS refunds_recognized, round(g.recognized_usd - coalesce(rf.reversed_usd, 0), 2) AS net_recognized FROM gross g LEFT JOIN refunds rf USING (seller_type);
A revenue policy worth billions should be a column you can join to, not a branch in an ETL job that one engineer understands. The carve-out is a CASE — and that is why it survives the next reorg.FULFILLMENT RULE Nº 2 — RECOGNITION IS A COLUMN

§ 07 — ANALYTICS SQLThree questions the model was built to answer

The two append-only ledgers and the derived snapshot are where the system explains itself. Three queries an interviewer loves, because each carries a classic pattern on its back — a percentile funnel, a cohort net-revenue join across the carve-out, and a FIFO cost-layer walk.

Are we keeping the delivery promise? — funnel with percentiles

The customer-facing question. The pattern is a duration funnel over the fill-only timestamps on the order line, split by the membership flag, reporting both the average and the tail — because Prime's promise is about the 95th percentile, not the mean.

SQL · PRIME VS NON-PRIME DELIVERY TIME
SELECT is_prime, count(*) AS lines, round(avg(extract(epoch FROM delivered_ts - ordered_ts) / 3600), 1) AS avg_hours, round(percentile_cont(0.95) WITHIN GROUP (ORDER BY extract(epoch FROM delivered_ts - ordered_ts) / 3600), 1) AS p95_hours FROM fct_order_lines WHERE delivered_ts IS NOT NULL AND ordered_ts >= CURRENT_DATE - 30 GROUP BY is_prime;

What is a cohort really worth? — net revenue across 1P/3P

The finance question, and the one that proves append-only pays off. Net lifetime value of a customer cohort is gross recognized minus refunds recognized, both passed through the seller carve-out — a join that would be impossible if returns had been written back into the order rows, because the historical gross would already be corrupted.

SQL · COHORT NET REVENUE, GMV-VS-COMMISSION AWARE
WITH cohort AS ( SELECT customer_id, date_trunc('month', min(ordered_ts)) AS cohort_month FROM fct_order_lines GROUP BY customer_id ), recognized AS ( SELECT ol.customer_id, CASE WHEN s.recognized_as = 'gmv' THEN ol.gross_revenue_usd ELSE ol.gross_revenue_usd * s.commission_pct END AS gross_rec, coalesce( CASE WHEN s.recognized_as = 'gmv' THEN r.refund_amount_usd ELSE r.refund_amount_usd * s.commission_pct END, 0) AS refund_rec FROM fct_order_lines ol JOIN dim_sellers s USING (seller_id) LEFT JOIN fct_returns r ON r.order_line_id = ol.order_line_id ) SELECT c.cohort_month, count(DISTINCT c.customer_id) AS customers, round(sum(x.gross_rec - x.refund_rec), 2) AS net_revenue, round(sum(x.gross_rec - x.refund_rec) / count(DISTINCT c.customer_id), 2) AS net_per_customer FROM cohort c JOIN recognized x USING (customer_id) GROUP BY c.cohort_month ORDER BY c.cohort_month;

What did the stock that left actually cost? — FIFO over the movement log

The operations-meets-finance question, and the payoff for keeping unit_cost_usd on every movement. FIFO cost-of-goods is a window walk: order the receipt layers oldest-first, take a running sum of their quantities, and consume against the day's sales — the classic running-balance allocation that values each unit sold at the cost of the oldest layer still on the shelf.

SQL · FIFO COST-OF-GOODS FROM THE COST LAYERS
-- Value today's sale_out against oldest-cost receipt layers (FIFO). -- Running sum over receipts builds the cost "stack"; sales draw it down. WITH layers AS ( -- receipt cost layers, oldest first SELECT product_id, warehouse_id, movement_ts, quantity, unit_cost_usd, sum(quantity) OVER (PARTITION BY product_id, warehouse_id ORDER BY movement_ts) AS cum_received FROM fct_inventory_movements WHERE movement_type = 'receipt' ), sold AS ( -- units that left today, per product/warehouse SELECT product_id, warehouse_id, sum(-quantity) AS qty_sold FROM fct_inventory_movements WHERE movement_type = 'sale_out' AND movement_ts::date = CURRENT_DATE - 1 GROUP BY product_id, warehouse_id ) SELECT l.product_id, l.warehouse_id, -- each layer contributes whatever of qty_sold falls inside it sum(l.unit_cost_usd * LEAST(l.quantity, GREATEST(s.qty_sold - (l.cum_received - l.quantity), 0))) AS fifo_cogs_usd FROM layers l JOIN sold s USING (product_id, warehouse_id) WHERE l.cum_received - l.quantity < s.qty_sold -- layer is (partly) consumed GROUP BY l.product_id, l.warehouse_id;

§ 08 — THE DASHBOARDProving the two ledgers still agree

A senior design ends with observability, because the immutability discipline above is invisible without it. The fulfillment dashboard watches three concerns — the customer promise, the financial restatement, and the physical reconciliation — and the single most important tile is the one that should always read zero.

CUSTOMER · PROMISE
delivery time p50/p95 Prime vs non-Prime, return rate by category (the product-quality canary), and replacement rate — a category whose returns spike days after a shipment surge is a quality regression, not noise.
FINANCE · RECOGNITION
net recognized revenue split GMV vs commission, refund reversal $, and 1P/3P mix — watching the commission share move tells you whether marketplace is eating retail, a strategic signal, not just an accounting one.
OPS · RECONCILIATION
reconciliation drift (snapshot minus movements — the tile that must be zero), days of inventory on hand, multi-warehouse fill rate, and transfer efficiency — any non-zero drift is an incident that pages, because the two ledgers have disagreed.
Fulfillment Ops — All Regions SAT 03:20 PT · NIGHTLY RECONCILE COMPLETE · 1.4B MOVEMENTS
Reconciliation Drift
0units
Delivery p95 (Prime)
38h
Return Rate
9.1%
Multi-WH Fill Rate
96.7%
Net recognized revenue · 1P GMV vs 3P commission · trailing 12 days
— 1P GMV — 3P commission
Days Inv. on Hand
31d
3P Unit Share
61%
Refund Reversal $
$74M/d
Transfer Efficiency
93%
Replacement Rate
1.3%
FIFO COGS
$612M/d
FIG. 2 — The story a clean close tells: reconciliation drift is exactly zero, the two revenue lines diverge as 3P keeps eating share, and the return rate has drifted amber — a category-quality signal worth a drill-down, not a panic.

Read the tiles together and the dashboard narrates the discipline. The drift tile reads zero, which is the whole point: a billion-plus movements folded into the snapshot and the arithmetic still balanced to the unit, so finance and operations are looking at the same stock. The two revenue lines tell a strategy story — 3P commission climbing relative to 1P GMV — that is only legible because the carve-out is a column. And the amber return-rate tile is the kind of signal append-only was built to surface: because returns are their own facts, the spike can be sliced by category, reason, and cohort without ever having corrupted the original sales it reverses.


§ 09 — THE RUBRICWhat was actually being tested

Strip the order details away and the question was testing five judgments, each of which generalizes far beyond e-commerce:

IMMUTABILITY
Never editing history. Append the return, log the movement, fill timestamps but rewrite nothing — so any past period can be restated exactly. The instinct to append rather than update is the senior move in any financial or audited system.
DERIVED STATE
Treating the snapshot as a cache of the log, not a source. On-hand stock is computed from immutable movements, so a lost snapshot rebuilds instead of corrupting — the truth is the event stream, the aggregate is a convenience.
THE INVARIANT
One equation that must balance every night. The reconciliation identity makes disagreement between the physical ledger and the snapshot impossible to hide, because correctness is checked, not assumed.
POLICY AS DATA
Turning a billion-dollar revenue rule into a column. The 1P/3P carve-out as recognized_as is a join-and-CASE, not a forked pipeline — which is why it survives the next reorg and the next quarter-end.
GRAIN DISCIPLINE
Letting one event live at two grains without collapsing either. A sale is a financial row and a physical movement; a replacement triple-counts and nets to zero — keeping them separate is what gives finance and operations each a complete, non-lossy view.
Anyone can model a sale. The interview is won by the candidate who refuses to edit it — because the return, the reconciliation, and the restatement all depend on a history that was never overwritten.— CLOSING ARGUMENT