"How many monthly active users do we have?" is a one-line question with a one-line answer that is almost always a lie. A flat MAU can be a stable product or a bucket leaking and refilling — and the single number cannot tell you which. A complete working through: the activity grain, the four-state machine, the FULL OUTER self-join that decomposes it, the frozen cohort, the retention triangle, and the dashboard that exposes the leak.
Every growth-stage data team eventually meets this question, and it is a trap disguised as a warm-up. It sounds like a SELECT COUNT and it is actually an accounting system.
"How would you measure whether this product is actually growing? Leadership reports MAU every month and it's been flat at ten million. Are we healthy?"
The junior answer is COUNT(DISTINCT user_id) over a trailing thirty-day window, and it is worse than useless because it radiates false confidence. Ten million active users this month and ten million last month is reported as "stable." But the same flat line describes a product that retained nine million and added one, and a product that retained three million, churned seven, and refilled the hole with seven million freshly-acquired strangers who will themselves be gone by summer. Those are not the same company. One has product-market fit; the other is renting its user base from the ad network and the lease is about to expire. A single scalar cannot distinguish a reservoir from a sieve.
The framework that fixes this is growth accounting — the Social Capital / Reforge decomposition — and it forces four design problems before a single query is written:
Scope is the first scored dimension, and the candidates who skip it are the ones who never name the leaky bucket. State what you are building: a growth-accounting warehouse that decomposes every period's active base into the four states, computes a quick-ratio, and renders a cohort-retention triangle. State what you are deliberately ignoring: attribution and channel ROI (treated as a slice, not the spine), the event-collection SDK itself, real-time activation, and anti-fraud (active here means a real human session — bot-filtering is upstream). And state the one definitional landmine you will defend with a versioned dimension: when marketing later moves the bar for "active," last quarter's published numbers must not silently move.
Then the envelope math, volunteered. A consumer product at ten-million-MAU scale:
| Quantity | Estimate | Consequence |
|---|---|---|
| Monthly active users | 10,000,000 | The headline number — and the one that lies |
| Activity events / day | ~80–120 M | The atomic fact; rolls up to DAU/WAU/MAU |
| Monthly churned | ≈ 2.4 M | The hidden row that shapes the whole architecture |
| New + resurrected / month | ≈ 2.5 M | Just barely covers the leak — quick-ratio ≈ 1.04 |
| Distinct cohorts tracked | ~60 months × channel | The frozen SCD0 dimension; never rewritten |
| Retention-triangle cells | ~60² / 2 ≈ 1,800 | Long-form fact a BI tool pivots into a heatmap |
| Growth-fact rows / period | 1 per period | The decomposition is tiny; the join that builds it is not |
Notice the asymmetry the envelope exposes: the churned figure is invisible in the MAU headline yet nearly equal to everything the product acquired. That single row — the one the naive query cannot see — dictates the entire model. Everything below exists to make churn a first-class number instead of an inference.
One pipeline, three tempos. A raw activity firehose narrows to a per-period rollup, the rollup self-joins against its own previous period to produce the state machine, and the state machine fans out into cohort curves. The whole design is a funnel that turns volume into meaning.
Three properties of this picture do most of the interview's work. First, the activity fact is the only place the product's definition of "active" lives — change the session threshold and everything downstream re-derives, which is exactly why that threshold belongs in a versioned dimension and not in a hard-coded WHERE clause. Second, the growth fact is produced by a self-join of the rollup against itself, offset by one period — the present joined to its own immediate past. Third, the cohort key descends from a frozen dimension that touches the curves but is never recomputed; the moment a user's cohort can move, every retention number becomes a fiction.
Never report a level without its flows. A stock — MAU, MRR, headcount — is the running total of four flows: arrivals, survivors, returns, and departures. Report the stock alone and you have hidden three of the four numbers that explain it. The entire architecture exists to make the flows as cheap to query as the stock, so that "MAU is flat" can never again be said without "...because we added 2.5M and lost 2.4M" in the same breath.
The schema falls out of the four design problems. One atomic activity fact carries the value column. One frozen cohort dimension is computed once. One SCD2 metric-definition dimension keeps history stable when the bar moves. The growth facts are derived from these and stay tiny.
Everything derives from fact_activity, whose grain is one row per (user, day, activity_type). The single most consequential column is inc_amt — incremental value created by a user in a period. It is the generalization that lets one model answer two questions: leave it 1 and the decomposition counts heads; set it to that day's revenue and the same decomposition counts dollars, with expansion and contraction appearing for free.
A cohort is the period of a user's first activity, and it must never change — which makes it a slowly-changing dimension of the simplest possible kind: type 0, write-once. The whole point of a retention curve is to compare January's signups at month three against February's signups at month three; if a backfill or a re-login could quietly re-stamp a user's cohort, the curves would shift under you and the comparison would be meaningless. So the cohort is computed exactly once, on first sighting, and is read-only thereafter.
The growth facts are small — one row per period — but their columns are load-bearing. The engagement and revenue variants share a shape; the revenue variant adds expansion and contraction, the two states that only exist when the value column is money. These tables are produced by the self-join in §04, and they exist precisely so that the dashboard and the analysts never re-run that join.
The correctness of the entire system lives in two algebraic identities and the one join clause that makes them hold. Get the join wrong — use an inner join, or a left join — and churn quietly reports as zero, which is the single most dangerous bug in growth analytics because it looks like good news.
Each user, each period, is in exactly one of four states, and the states are defined by a self-join of the rollup against its own previous period:
Read the chain as a life: a user is dormant until their first active period (new); they persist (retained); they lapse (churned); they return (resurrected); they persist again. The distinction between new and resurrected is the cohort: a user present this period but not last is new if their first-ever activity equals this period, and resurrected otherwise. That is why the frozen cohort dimension is not optional — it is the only thing that tells arrivals from returns.
The two identities below hold by construction once the join is right. Subtracting one from the other yields the decomposition that turns a flat headline into an alarm:
MAU(t) = retained(t) + new(t) + resurrected(t) — this period's active base is its survivors plus its arrivals plus its returns.
MAU(t−1) = retained(t) + churned(t) — last period's base is the same survivors plus whoever left.
Subtract: ΔMAU = new + resurrected − churned. The level is a consequence of the flows; the flows are the truth.
Here is the atomic heart of the framework: a single FULL OUTER self-join of the rollup against itself, offset by one period. The COALESCE on the join keys, and the CASE over which side is NULL, is the entire state machine.
The same query, with inc_amt as money instead of a head count and the delta_amt sign inspected on retained rows, splits retention into expansion (delta > 0) and contraction (delta < 0). That is the whole trick of the generalized value column: one join, two metrics, and the revenue identities — MRR(t) = retained + new + resurrected + expansion and MRR(t−1) = retained + churned + contraction — fall out of the same machinery.
Three programs carry the derived plane: the activity reducer that turns raw events into the dedup'd grain, the cohort stamper that writes once and never again, and the period roller that materializes the rollups. Each is small; the judgment is in what they refuse to do.
Raw events arrive at-least-once and at every imaginable granularity — a user opens the app forty times a day. The reducer collapses them to the qualifying grain: one row per user per day per activity, but only for events that clear the active threshold from the metric definition in force. The refusal here is the important part: a sub-threshold session is not "a small amount of active" — it is not active, and silently keeping it would inflate every number above.
The cohort stamper is the guardian of SCD0. Its entire job is to compute a user's first-activity period exactly once and then to refuse every subsequent write. The discipline lives in the SQL: an INSERT ... ON CONFLICT DO NOTHING, so a re-login, a late event, or a replayed batch can never re-stamp a cohort. A user whose first event was in January is in the January cohort forever, even if the warehouse first sees them in March.
The roller materializes agg_mau from fact_activity. The judgment it encodes is event-time, not processing-time: an event with an activity_dt of last month must land in last month's rollup even though it arrived today, and the roller therefore re-materializes a trailing window of recent periods on every run rather than only the current one. Without that, a late batch would understate a closed period and the self-join would manufacture phantom churn.
One carve-out, always stated: the rollups are recomputed, never incremented. A growth metric that adds deltas to a running total will, the first time a batch replays, drift — and the drift is invisible because the number still looks plausible. Recompute-from-source is slower and correct; increment-in-place is faster and a time bomb.
The slow, derived layer is where cohorts become curves. The retention triangle is a self-join of cohort membership against later activity; the LTV curve is its cumulative cousin. Both are deterministic, so they are materialized once per period and read forever.
The retention triangle is the framework's most legible artifact: a matrix whose rows are cohorts (by first month) and whose columns are months_since_first, each cell holding the fraction of that cohort still active. Read down a column and you see whether successive cohorts retain better or worse; read across a row and you watch a single cohort's curve flatten — and the level at which it flattens is the retention floor, the population that genuinely found the product. A curve that decays to zero is a product nobody keeps; a curve that settles at forty per cent has forty per cent product-market fit.
The LTV curve is the triangle's cumulative cousin. Where retention asks "what fraction remain at month N," LTV asks "how much value, per original cohort member, has accumulated by month N." It is a running sum of per-period value divided by the cohort's original size — a semi-cartesian cumulative join — and its asymptote, compared against the cost of acquiring that cohort, is the single number that decides whether growth spend is rational.
The growth facts are where the product explains itself. Three queries an interviewer loves, because each carries a named pattern on its back — the quick-ratio gauge, the as-of cohort slice, and the window-function retention curve.
The quick-ratio is the single number that exposes a flat-MAU treadmill: gains over losses, (new + resurrected) / |churned|. Above one, the product grows; at one, it runs to stand still; below one, it shrinks even while the headline MAU looks flat. This is conditional aggregation reduced to one expressive ratio.
"Is churn worse on free or premium, and in which country?" requires slicing the decomposition by attributes as they were when the user was active, not as they are now — a premium user who downgraded last week should not retroactively recolor their history. That is an SCD2 as-of join: bound each activity period to the dimension version valid at that time.
The heatmap source: each cohort's active count normalized to its own month-zero size with FIRST_VALUE over a cohort-partitioned window. A curve that flattens has found its floor; a curve still falling cohort-over-cohort is a widening leak the dashboard will paint red.
A senior design ends with the dashboard, because the entire point of growth accounting is to make a comfortable number alarming. The board shows the level and the flows side by side, so that "MAU flat" can never be read without "and here is why."
Read the board top to bottom and it argues the §01 thesis without a word of prose: the level is stable, the flows are violent, the gauge says treadmill, and the only genuinely good news is the LTV:CAC of 3.1× — which means the acquired users who do stay are worth more than they cost, so the fix is to plug the leak, not to stop spending. That is the difference between a dashboard that reports a number and one that prescribes an action.
Strip the users and sessions away and the question was testing five judgments, each of which generalizes far beyond growth analytics: