← PaddySpeaks
Interview Studio · Practice · Q&A Design
▸ DESIGN · Snowflake · Architecture, Cost & CDC

Snowflake — architecture, cost, and the patterns interviews test.

The round where they ask "how does Snowflake actually work" — or hand you a prompt that opens with "design an analytics platform on Snowflake." Seven sections: the three-layer architecture, micro-partitions and pruning, the virtual-warehouse credit model, the data-lifecycle trio, native change-data-capture, an end-to-end project brief you can actually build, and a rapid-fire question bank.

§ 01 — The architecture

The three-layer architecture — why storage and compute are separate.

Most candidates describe Snowflake as "a cloud data warehouse" and stop. The senior answer names the three decoupled layers and — more importantly — explains what each decoupling buys you. The whole pricing model, the concurrency story, and half the interview traps fall out of this one diagram.

┌─────────────────────────────────────────────────────────────────────┐ │ CLOUD SERVICES — the "brain" │ │ authentication · metadata · query optimizer · transaction mgr · RBAC │ │ result cache lives here · mostly free (billed only above 10% of compute) │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐ │ WAREHOUSE │ │ WAREHOUSE │ │ WAREHOUSE │ QUERY PROCESSING │ (ETL) │ │ (BI) │ │ (DATA │ — independent MPP │ size L │ │ size XS │ │ SCIENCE)│ compute clusters └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ (pay per second) │ │ │ ┌──────────▼──────────────────▼──────────────────▼──────────────────────┐ │ DATABASE STORAGE — one shared copy of the data │ │ columnar · compressed · immutable micro-partitions on object storage │ │ (S3 / GCS / Azure Blob) · billed flat per TB-month │ └────────────────────────────────────────────────────────────────────────┘
LayerWhat it doesHow it is billed
Database storageHolds every table as compressed, columnar, immutable micro-partitions on cloud object storage. One copy — every warehouse reads it.Flat rate per TB-month of compressed data
Query processingVirtual warehouses — independent MPP compute clusters. Each runs queries with no effect on the others.Credits per second a warehouse runs (60-second minimum)
Cloud servicesThe coordinator: authentication, metadata, the query optimizer, transaction management, access control, the result cache.Free until daily cloud-services usage exceeds 10% of daily compute credits

What the decoupling actually buys you

  • Storage and compute scale independently. Petabytes of cold data cost only storage; you spin compute up and down on top of it. A legacy warehouse couples the two — you over-provision compute just to hold data.
  • Workload isolation without copying data. The ETL warehouse, the BI warehouse, and the data-science warehouse all read the same tables, yet a heavy backfill on one cannot slow a dashboard on another. No replicas, no data movement.
  • Elastic concurrency. A single warehouse can scale out into multiple clusters when users pile up (more on this in §03).
  • Compute is disposable. Suspend a warehouse and you stop paying for it instantly; the data is untouched in the storage layer.
Senior signal. "Snowflake's defining choice is the separation of storage from compute, mediated by a stateless cloud-services layer. That's not a feature — it's the reason you can give every team its own right-sized warehouse against one shared copy of the data, and the reason the bill is mostly 'how many warehouse-seconds did we burn,' not 'how big is the cluster we rent year-round.'"
Trap. "It auto-scales" is too vague. Be precise about which layer scales and how: storage grows transparently; a warehouse scales up (bigger size = faster single query) or out (more clusters = more concurrency). Conflating up and out is the single most common Snowflake interview mistake.
§ 02 — Storage internals

Micro-partitions & pruning — how Snowflake reads less data.

You cannot tune what you cannot picture. Snowflake's storage unit is the micro-partition: an immutable, columnar file holding roughly 50–500 MB of uncompressed data, created automatically as data lands. You never size them, never name them, never write a PARTITION BY clause. The performance question is never "how do I partition" — it is "how well can the optimizer skip partitions."

Pruning — the only performance idea that matters

For every micro-partition, Snowflake stores per-column metadata: the min and max value, distinct counts, null counts. When a query filters on a column, the optimizer compares the predicate to that metadata and reads only the micro-partitions that could contain a match. Everything else is skipped before a single byte is scanned. That skip is called pruning.

Query: WHERE order_date = '2026-03-14' micro-partition min order_date max order_date read it? ─────────────── ────────────── ────────────── ──────── mp_001 2026-01-01 2026-01-09 ✗ pruned mp_002 2026-02-28 2026-03-05 ✗ pruned mp_003 2026-03-11 2026-03-18 ✓ SCAN (range overlaps) mp_004 2026-03-25 2026-04-02 ✗ pruned 1 of 4 micro-partitions scanned — the other 3 never touched.

This is why load order is itself a performance feature. If rows arrive roughly time-ordered, each micro-partition holds a tight date range and date filters prune beautifully. If you shuffle the data, every micro-partition spans the whole date range, min/max becomes useless, and the optimizer must scan everything.

Clustering keys — when natural order is not enough

Suppose the table loads time-ordered, but the heaviest queries filter on region. Date pruning works; region pruning does not, because every micro-partition contains every region. A clustering key tells Snowflake to co-locate rows with similar region values into the same micro-partitions:

ALTER TABLE orders CLUSTER BY (region, order_date);

-- inspect how well-clustered the table currently is
SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(region, order_date)');

Automatic Clustering then maintains that organization as a background service — and that service consumes credits. So clustering is a deliberate trade: you pay ongoing reorganization cost to buy faster pruning.

Add a clustering key when…Do NOT bother when…
The table is large (multi-TB) — small tables prune fine with no helpThe table is small, or fully rebuilt every run
Queries filter/join on a column that is not the natural load orderQueries already filter on the load-order column
The column has moderate cardinality (region, customer_id buckets)The column is unique per row (clustering buys nothing) or near-constant
The table is read far more than it is writtenThe table churns constantly — reclustering cost will dominate

Search Optimization — the other access path

Clustering accelerates range and moderately selective scans. It does little for a needle-in-a-haystack point lookupWHERE event_id = '...' returning three rows out of two billion. For that, enable the Search Optimization Service, which builds a separate per-table search-access path for highly selective equality predicates. The two are complementary and can coexist on one table.

Senior signal. "I'd start by reading the query profile — the headline number is 'partitions scanned vs partitions total.' If a date-filtered query scans 90% of partitions, the data isn't ordered by date. I'd cluster on the real filter column, but only if the table is large and read-heavy enough to earn back the reclustering credits. For selective id lookups I'd reach for Search Optimization instead — clustering is the wrong tool for a point lookup."
Trap. "Add an index" is a non-answer — Snowflake has no user-managed B-tree indexes. The vocabulary is micro-partitions, pruning, clustering keys, Search Optimization. Reach for an index in a Snowflake interview and you have signalled you are mapping it onto Postgres.
§ 03 — Compute & cost

Virtual warehouses & the credit model — the cost levers.

A virtual warehouse is a cluster of compute you rent by the second. The interview question is rarely "what is a warehouse" — it is "this Snowflake bill tripled last quarter; where do you look?" Answering that means knowing every lever.

Scale UP vs scale OUT — the distinction that separates levels

Scale UP (resize)Scale OUT (multi-cluster)
What changesWarehouse size: XS → S → M → L → XL …Number of same-size clusters: 1 → 2 → 3 …
SolvesOne slow, heavy query — more CPU/RAM per queryMany concurrent queries — queries are queuing
Cost shapeEach size step roughly doubles credits/hour (XS=1, S=2, M=4, L=8…)Linear — each extra running cluster adds its size's credit rate
Symptom to look forQuery spills to remote disk; long single-query runtimeQueries spend time queued before they even start

A bigger warehouse finishes a heavy query faster but, because it also costs proportionally more per second, a 2× warehouse that runs a query 2× faster is roughly cost-neutral — you are buying latency, not savings. Multi-cluster, by contrast, only spins up extra clusters while queries are actually queuing and retires them when load falls; under SCALING_POLICY = STANDARD it favours starting clusters early to kill queuing, under ECONOMY it favours keeping clusters fully loaded to save credits.

The cost levers — a checklist for "why is the bill high"

LeverWhat to check
Auto-suspendIs it set low (e.g. 60s)? An idle warehouse left running bills for nothing. There is a 60-second minimum each resume, so do not set it absurdly low on a bursty workload.
Auto-resumeOn, so a suspended warehouse wakes on the next query — suspend aggressively without blocking users.
Right-sizingIs a warehouse oversized for its workload? Check the query profile for spilling (too small) vs near-instant tiny queries on a huge warehouse (too big).
Warehouse sprawlToo many warehouses, each idling above its minimum. Consolidate workloads with similar size needs.
Result cacheIdentical queries within 24h (and unchanged data) return instantly from cache with no warehouse — dashboards that exploit this are nearly free.
Serverless featuresSnowpipe, Automatic Clustering, Search Optimization, Dynamic Tables and serverless Tasks all bill separately from warehouses — a real line item to monitor.
StorageLong Time Travel retention plus Fail-safe on huge churning tables multiplies storage. Use transient tables for disposable data (§04).

Three caching layers — free speed if you know they exist

  1. Result cache — cloud-services layer. A syntactically identical query, unchanged data, < 24h old, returns the stored result with no warehouse and no compute cost. Account-wide: another user gets the hit too.
  2. Warehouse (local SSD) cache — micro-partition data cached on a running warehouse's nodes. Warmed by use; lost on suspend, so a resumed warehouse starts cold.
  3. Metadata cacheCOUNT(*), MIN, MAX and similar are answered from micro-partition statistics with no data scan at all.
Senior signal. "For a bill that tripled I'd query SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY to find which warehouse grew, then QUERY_HISTORY to see whether it's queuing — fix with multi-cluster — or running slow heavy queries — fix with a resize. Then I'd audit auto-suspend on every warehouse and check the serverless line items. The bill is almost always idle time, oversizing, or a runaway backfill — rarely the thing people first blame."
§ 04 — Data lifecycle

Time Travel, cloning & Fail-safe — the data-lifecycle trio.

Three features fall straight out of one fact: micro-partitions are immutable. An "update" never edits a file — it writes new micro-partitions and leaves the old ones in place for a while. Once you internalise that, Time Travel, zero-copy cloning and Fail-safe stop being magic.

Time Travel — query the past

Because the previous micro-partitions still exist, you can query a table as of an earlier moment, within its retention window (default 1 day; up to 90 days on Enterprise edition and above):

-- the three ways to address the past
SELECT * FROM orders AT(OFFSET => -3600);                    -- 1 hour ago
SELECT * FROM orders AT(TIMESTAMP => '2026-03-14 09:00'::timestamp);
SELECT * FROM orders BEFORE(STATEMENT => '01b2...-query-id'); -- just before a bad query

-- recover from an accidental unqualified DELETE
INSERT INTO orders
SELECT * FROM orders BEFORE(STATEMENT => '<the_delete_query_id>');

-- recover a dropped object
UNDROP TABLE orders;

Zero-copy cloning — branch the data, not the bytes

CREATE TABLE orders_dev CLONE orders; on a 5 TB table returns in seconds. The clone is a new, fully writable object that initially just references the source's existing micro-partitions — metadata only, zero added storage. Because those micro-partitions are immutable, the moment either copy is modified it writes new micro-partitions, and you pay storage only for that divergence. You can clone a table, a schema, or an entire database.

Pattern. Clone the production database into a CI database at the start of an integration-test run: a full-scale, isolated, real-data environment created in seconds for near-zero storage, then dropped when the run ends. This is the single most-loved Snowflake trick — name it.

Fail-safe — the last resort, not a feature you use

After Time Travel retention expires, permanent tables enter a non-configurable 7-day Fail-safe period. You cannot query it; only Snowflake support can recover from it, on a best-effort basis. It is disaster insurance — and it is billed storage.

Table typeTime TravelFail-safeUse for
Permanent (default)0–90 days7 daysSource-of-truth tables you must be able to recover
Transient0–1 dayNoneStaging / intermediate tables rebuilt every run — no recovery storage wasted
Temporary0–1 dayNoneSession-scoped scratch — vanishes when the session ends
Senior signal. "All three features are consequences of immutable micro-partitions — old versions linger, so I can read them (Time Travel), share them by reference (cloning), or have Snowflake hold them as insurance (Fail-safe). The cost lever is that this lingering data is billed: I'd set short Time Travel and use transient tables for every disposable intermediate, and keep long retention only on genuine sources of truth."
§ 05 — Change-data-capture

Streams, Tasks & Dynamic Tables — native change-data-capture.

The defining pipeline question: "new data keeps arriving — how do you process only what changed, without re-scanning the world?" Snowflake answers it natively, and there are now two idioms — the imperative one (Streams + Tasks) and the declarative one (Dynamic Tables).

Streams — a bookmark over a table's changes

A stream is a change-tracking object. It does not store a copy of the data — it stores an offset, and exposes the rows that changed between that offset and the table's current version, with three metadata columns:

  • METADATA$ACTIONINSERT or DELETE
  • METADATA$ISUPDATETRUE when the row is part of an update (represented as a paired DELETE + INSERT)
  • METADATA$ROW_ID — a stable row identifier

Reading a stream is repeatable — it returns the same change set every time until a DML statement consumes it inside a transaction, which advances the offset on commit. A stream goes stale if its offset falls outside the source table's data-retention window, because the change history is no longer available — a real production failure mode to design around.

Tasks — scheduled or triggered SQL

A task runs a SQL statement (or a call to a stored procedure) on a schedule or in a dependency graph. The idiom that makes the loop cheap is the stream guard:

CREATE TASK refresh_dim_customer
  WAREHOUSE = etl_wh
  SCHEDULE  = '2 MINUTE'
  WHEN SYSTEM$STREAM_HAS_DATA('customer_stream')   -- skip empty cycles
AS
  CALL apply_scd2_customer();   -- the MERGE logic lives in a procedure

When the stream is empty the task body does not run; a serverless task bills nothing for that skipped cycle. Tasks can be wired into a task graph (a DAG) so a root task fans out to dependents.

Worked example — an SCD2 dimension

A Type-2 slowly-changing dimension keeps history: when a customer's tier changes, you close the current row and open a new one. A single MERGE cannot both update the old version and insert its replacement on the same key, so the procedure runs two statements in one transaction:

-- 1. expire the active version where a tracked attribute changed
UPDATE dim_customer d
SET    valid_to = CURRENT_TIMESTAMP(), is_current = FALSE
FROM   customer_stream s
WHERE  d.customer_id = s.customer_id
  AND  d.is_current = TRUE
  AND  s.metadata$action = 'INSERT'
  AND  (d.name <> s.name OR d.tier <> s.tier);

-- 2. open a new current version for changed + brand-new customers
INSERT INTO dim_customer (customer_id, name, tier, valid_from, valid_to, is_current)
SELECT s.customer_id, s.name, s.tier, CURRENT_TIMESTAMP(), NULL, TRUE
FROM   customer_stream s
WHERE  s.metadata$action = 'INSERT'
  AND  NOT EXISTS (SELECT 1 FROM dim_customer d
                   WHERE d.customer_id = s.customer_id AND d.is_current = TRUE
                     AND d.name = s.name AND d.tier = s.tier);

Both statements read customer_stream inside one transaction, so the change set is consistent across them and the offset advances exactly once.

Dynamic Tables — the declarative alternative

Streams + Tasks is powerful but imperative — you hand-wire offsets, guards and schedules. A Dynamic Table flips it around: you declare the query and a target freshness, and Snowflake works out the incremental refresh itself.

CREATE DYNAMIC TABLE customer_orders_daily
  TARGET_LAG = '5 minutes'
  WAREHOUSE  = etl_wh
AS
  SELECT c.customer_id, c.tier, COUNT(*) AS orders, SUM(o.amount) AS revenue
  FROM   orders o JOIN dim_customer c ON o.customer_id = c.customer_id
  WHERE  c.is_current = TRUE
  GROUP BY c.customer_id, c.tier;
Streams + TasksDynamic Tables
StyleImperative — you orchestrateDeclarative — you state the goal
You controlOffsets, MERGE logic, schedule, DAG wiringThe query and a TARGET_LAG
Best forCustom logic — SCD2, side effects, calling procedures, non-deterministic stepsStraightforward transformation DAGs that just need to stay fresh
Refresh chainsHand-built task graphAutomatic — a Dynamic Table reading another forms the DAG implicitly
Senior signal. "Default to Dynamic Tables for transformation DAGs — declaring a TARGET_LAG beats hand-wiring streams, tasks and offsets, and it can't go stale on me. I drop to Streams + Tasks only where I need imperative control: SCD2 history, calling a procedure, or a step with side effects a pure query can't express."
§ 06 — Project brief

Project brief — a near-real-time analytics pipeline.

Reading about Snowflake is not the same as building on it. This is a self-contained project — buildable on a Snowflake trial account — that exercises every concept in §§01–05. Treat it as a take-home: build it, then be ready to defend each decision.

The brief

A retailer drops order events as JSON files into cloud storage every few minutes, all day. Customer records (name, loyalty tier) drift over time. Build a Snowflake pipeline that ingests the events continuously, models them into a star schema with a history-preserving customer dimension, and serves an always-fresh revenue mart — incrementally, with no full reloads.

RequirementTarget
FreshnessReporting mart no more than 5 minutes behind source
IdempotencyAny step can be re-run without duplicating or corrupting data
HistoryA customer's tier on the date of each order is recoverable (SCD2)
CostCompute scales to zero when no data is arriving
RecoverabilityAn accidental bad write is recoverable without a support ticket

The architecture

cloud storage ┌── RAW (VARIANT) ──┐ ┌── stream ──┐ (JSON order files) ─▶ Snowpipe ─▶ raw_orders ─▶ orders_stream ─┐ │ customer feed ──────▶ COPY ─▶ raw_customer ─▶ customer_stream ─┤ ▼ Task graph / Dynamic Tables │ ┌────────────────────────────────────┤ ▼ ▼ dim_customer (SCD2) fact_orders └──────────────┬─────────────────────┘ ▼ mart_revenue_daily ──▶ BI dashboard

Build steps

  1. Stage & file format. Create an external stage over the cloud-storage bucket and a JSON FILE FORMAT. Land raw events in a table with a single VARIANT column — schema-on-read keeps ingestion resilient to upstream JSON changes.
  2. Continuous ingest. Create a Snowpipe with auto-ingest so new files load within a minute or two on serverless compute — no warehouse schedule to manage.
  3. Streams. Put a stream on raw_orders and one on raw_customer so every downstream step processes only new rows.
  4. SCD2 customer dimension. Build dim_customer with valid_from / valid_to / is_current. Drive it with a stored procedure (the two-statement MERGE from §05) called by a Task guarded by SYSTEM$STREAM_HAS_DATA.
  5. Fact table. Flatten the order JSON with LATERAL FLATTEN, casting VARIANT paths to typed columns. Join each order to the customer version that was current at order time — that is the whole point of keeping SCD2 history.
  6. Reporting mart. Define mart_revenue_daily as a Dynamic Table with TARGET_LAG = '5 minutes' — it satisfies the freshness requirement declaratively, no extra task wiring.
  7. Recoverability. Confirm Time Travel retention covers your worst-case "notice the bug" window; rehearse a BEFORE(STATEMENT => ...) recovery so it is muscle memory.

Why each requirement is met

  • Freshness — Snowpipe ingests within ~1–2 min; the Dynamic Table holds the mart within its 5-minute TARGET_LAG.
  • Idempotency — streams advance their offset only on commit, so a failed task simply re-sees the same unconsumed changes next run; the SCD2 MERGE checks for an existing identical current row before inserting.
  • Cost-to-zero — Snowpipe and serverless tasks bill per work done; the ETL warehouse auto-suspends; empty cycles are skipped by the stream guard.
  • Recoverability — Time Travel on the permanent fact and dimension tables; transient tables for any throwaway intermediate so you do not pay Fail-safe on scratch data.

Stretch goals — turn it into a portfolio piece

  • Data quality. Add a task that rejects malformed events into a quarantine table and alerts when the reject rate crosses a threshold.
  • Governance. Apply a masking policy on customer PII and a row access policy so each region's analysts see only their rows.
  • A CI environment. Zero-copy CLONE the whole database into a CI database, run the pipeline end-to-end against it, assert row counts, then drop it.
  • Observability. Build a small monitoring view over SNOWFLAKE.ACCOUNT_USAGE — credits per warehouse per day, Snowpipe load latency, task failure history.
How to present it. Do not narrate the code. Lead with the diagram, name the trade-off at each box — "Snowpipe over a scheduled COPY because the files are small and frequent; Dynamic Table over a task because I want to declare freshness, not orchestrate it; SCD2 over an overwrite because the brief needs the tier as-of order date." The decisions are the deliverable.
§ 07 — Rapid-fire

Rapid-fire — the tricky questions, with crisp answers.

The follow-ups that separate "used Snowflake" from "understands Snowflake." One or two sentences each — the way you would answer them in the room.

Q. A query is slow. The interviewer asks for your first move — what is it?

A. Open the query profile and read partitions scanned vs partitions total. If it is scanning most of the table on a filtered query, the problem is poor pruning, not warehouse size. Diagnose before you resize — throwing compute at a pruning problem just costs more for the same scan.

Q. Would a bigger warehouse make a single query cheaper?

A. Usually no. A 2× size roughly doubles the per-second rate; if it also halves runtime, the cost is about the same — you bought latency, not savings. Resizing pays off mainly when a query is spilling to remote disk for lack of memory.

Q. Two teams contend for compute. Resize, or multi-cluster?

A. Neither blindly — first ask whether queries are queued or just slow. Queuing is a concurrency problem: scale out (multi-cluster). Slow individual queries is a per-query resource problem: scale up. The cleaner fix is often a separate warehouse per team — same shared data, isolated compute.

Q. Why did a stream suddenly return nothing — or error as stale?

A. A stream is an offset, not stored data. If it is not consumed before its offset ages past the source table's data-retention window, the change history is gone and the stream goes stale. Fix: consume streams promptly, or extend retention on the source table.

Q. You cloned a 10 TB table. Did storage just double?

A. No. The clone references the source's existing micro-partitions — zero added storage at creation. You are billed only for micro-partitions that change afterwards, on either copy, because micro-partitions are immutable and edits write new ones.

Q. A report shows "pro" with quotes instead of pro. Why?

A. The value is being read straight from a VARIANT path without casting. Path access yields a VARIANT; a VARIANT string carries its JSON double-quotes. Cast it — payload:user.plan::STRING — to get a clean typed value.

Q. Should every big table get a clustering key?

A. No. Clustering only helps when queries filter on a column that is not the natural load order, and Automatic Clustering consumes credits to maintain it. On a table already ordered by its filter column, or one rebuilt every run, a clustering key just burns credits for nothing.

Q. Cut this Snowflake bill — where do you look first?

A. Idle time and oversizing, almost always. Audit auto-suspend on every warehouse, look for warehouses idling above their minimum cluster count, and check the serverless line items (Snowpipe, Automatic Clustering, Search Optimization). Then resize down anything whose queries finish near-instantly. ACCOUNT_USAGE has the receipts.

Q. How do you give analysts a safe, full-scale environment to test against?

A. Zero-copy clone the production database. They get real data at full scale, instantly, for near-zero storage, fully isolated from prod — modify it freely, then drop it. It is the textbook use of cloning.

· · ·
▸ Seven sections · the architecture explains the cost · go well.
← Back to Design pillar  ·  System Design — Data Platforms  ·  SQL Skill Check