A phone knows where you tapped. A headset knows where your eyes went, the geometry of your living room, and how your hands move — ninety times a second. Design the data platform for spatial computing, and the question is no longer how to store a firehose, but how to store the most sensitive data a consumer device has ever collected without drowning in it or mishandling it. A complete working through: the device-centric star, the sensor firehose boundary, consent as a temporal dimension, and the gaze-confirmed attention that is honest and lawful at once.
This question arrives sounding like a logging exercise and turns, within two sentences, into the hardest privacy problem in the catalog. The data is a firehose, and the firehose is biometric.
"Design the data platform for a spatial-computing platform — Meta Quest, Apple Vision Pro. Headsets stream eye, hand, and room sensors during immersive sessions. Serve Product, Trust & Privacy, and Monetization. How would you scope this out?"
Two facts make this unlike every other data platform. First, the most sensitive data classes arrive at firehose rates. Eye-gaze is biometric; the room mesh is a three-dimensional map of a private space; both stream continuously at sixty to a hundred and twenty hertz. You cannot weld a 90 Hz pose firehose to a queryable analytics fact any more than you could weld raw GPS to a ride-share trip log — the volume alone makes the table unusable, and here the volume is also the most private data a person owns. Second, capability is not uniform. A Vision Pro has eye-tracking; a Quest controller session does not; phone-AR has neither. What data even exists is a function of the device — so device capability is a dimension that gates the schema, not a footnote to it.
A weak answer reaches for one giant events table and a boolean consent flag. A strong answer notices the three structural forces first, and names them before any boxes:
Scope first, because the privacy boundary is the design. State what you are building: a device-centric star on the immersive session, with the sensor firehose in object storage, consent as a temporal dimension that gates every analytic read, and a gaze-confirmed attention fact for monetization. State what you are deliberately keeping out of the warehouse entirely: the raw room mesh — a 3-D scan of someone's home lives in encrypted, residency-pinned, short-TTL storage and only a relocalization hash ever reaches the analytics layer. State what you punt: the rendering pipeline, the content store, real-time comfort intervention (treated as an on-device loop), and cross-device identity resolution.
Then the envelope, volunteered. A single immersive session on a fully-sensored headset:
| Quantity | Estimate | Consequence |
|---|---|---|
| Sample rate | 90 Hz | The firehose tempo everything is measured against |
| Signals per frame | 6-DoF × (head + 2 hands + 2 eyes) | ~30 floats per frame — multiplies the rate |
| Raw samples / 30-min session | ≈ 162 M | The number that forces the object-store boundary |
| Per-second rollup rows / session | ~1,800 | The queryable fact — five orders of magnitude smaller |
| Consent categories × regions | 6 × ~40 | The SCD2 grain; append-only versions per user |
| Room-mesh footprint | tens of MB / scan | Never in the warehouse — hash only |
| Gaze dwell events / ad surface | 1 per exposure | Billable only as-of consent + spoof check |
Notice the asymmetry: the per-second rollup is roughly ninety-thousand times smaller than the raw stream and carries everything analytics actually asks. That single ratio — 162 million samples reduced to eighteen hundred rows — dictates the storage tier, the privacy boundary, and the shape of every fact below. The rest of this article follows the sensor.
One architecture, two planes. The raw plane carries biometric frames to encrypted object storage and never lets them touch the warehouse; the analytic plane carries a per-second rollup and pointers, gated at read time by consent that was in force at capture.
Three properties of this picture do most of the interview's work. First, the firehose never touches the warehouse — frames flow headset → edge rollup → encrypted object store, and only a per-second tributary plus a raw_blob_uri pointer reaches the analytic plane. Second, consent is a temporal dimension read at query time, not a column on the fact: the dashed arrow into fct_attention is an as-of join that asks "was this gaze allowed when it was captured," and a later withdrawal cannot rewrite a past answer. Third, the device dimension gates the schema — a session's facts are a function of its sensor suite, so the absence of eye rows for a controller session is correct, not a gap.
The raw plane is write-only to object storage; the warehouse is read-only against permission. Biometric frames are captured, encrypted, pinned to a region, and pointed at — never queried directly, never joined, never inlined. Every analytic question is answered against the per-second rollup, and every question that touches biometrics is answered against the consent that existed at capture time. The two rules together are why a withdrawal plus an audit is survivable: the data that was lawful stays countable, the data that was not drops out, and nothing is ever overwritten.
The grain all three stakeholders share is the immersive session — headset on to headset off, within one app. Finer signals (pose, gaze) aggregate up to it; coarser context (user, device, consent) hangs off it. The conformed device dimension carries the sensor suite that decides which facts a session can even produce.
The same dim_device defined here is conformed across the wider platform — it keys ads attribution, fraud velocity, the engagement surface, and the autonomous fleet. The headset is simply its most sensor-rich member. The load-bearing column is sensor_suite: it is the catalogue of what data can exist, and it is SCD2 because consent state and OS/runtime drift over the device's life.
The session is the shared grain: one row per immersive session, anchoring every consent as-of join through its start_ts. The firehose is aggregated to one row per session-second; the raw 90 Hz stream is a blob URI, never inlined. The comfort score and guardian-breach count make physical safety a first-class column rather than an afterthought.
Two many-to-many cases hide in the corners and both have the same trap. A shared-space session has many users in one room sharing one spatial anchor; putting N users on one session row triple-counts engagement exactly the way a pooled ride does. The fix is a bridge — one row per (shared_space_id, anchor_id, user_session_key) — so per-user metrics stay correct and "who was co-located when" is a join, not a self-join.
The correctness of this entire platform lives in one join predicate: every read of biometric data binds to the consent that was valid at the session's capture time. Get it wrong — read under "current" consent — and you have committed a retroactive consent rewrite, which is the failure that turns a data platform into a lawsuit.
Consent is a lifecycle, and the lifecycle is non-destructive. A grant opens a window; a withdrawal closes that window and opens a new one; no state is ever overwritten:
Each transition appends a row and closes the previous one's valid_to. Because the history is preserved, the question "was this gaze allowed when it was captured?" is always answerable — and so is its grim cousin, "which exact rows must a deletion request purge?" Both are the same as-of predicate, run for analytics in one direction and for erasure in the other.
Here is the atomic heart of the platform: aggregate gaze dwell per surface, but include only gaze captured while the user's eye-tracking consent was granted in their region — resolved as-of the session start, not as-of now. The two timestamp comparisons against the consent window are the entire invariant.
The worked case makes the invariant concrete. A user grants eye-tracking at 10:00 and withdraws it at 10:40. Session S-1 starts at 10:05 (consent granted as-of start), session S-2 at 11:00 (consent withdrawn as-of start). The query counts S-1's gaze and silently excludes S-2's — even though S-2's rows physically exist, because the device captured them before the runtime flag propagated. Had consent been a single mutable boolean, withdrawing it would either have falsified S-1's history or failed to exclude S-2. Both are wrong; the window is right.
Three programs carry the raw plane: the edge aggregator that turns 90 Hz into one row per second, the room-mesh handler that refuses to let geometry into the warehouse, and the gaze-spoof check that decides whether attention is even real. Each is small; the judgment is in what they refuse to keep.
The aggregator runs at the edge and is the boundary itself. It reduces a second's worth of frames — ninety samples across head, hands, and eyes — to a handful of summary statistics, writes the raw window to encrypted object storage, and emits one rollup row carrying the blob pointer. The refusal is the design: the raw frames leave the aggregator only as an opaque, encrypted blob, and the warehouse never sees a single frame.
A 3-D scan of a private home is among the most sensitive data possible, and the handler's only job is to make sure it is the one thing the warehouse never receives. It computes a stable relocalization fingerprint from the mesh, persists that hash for AR placement, and routes the actual geometry to encrypted, residency-pinned, short-TTL storage outside the analytics estate entirely. The refusal is absolute: the mesh is not down-sampled into the warehouse, it is excluded from it.
Gaze-confirmed attention is a far stronger currency than a served impression — eye-tracking can prove a user actually looked. But that makes it forgeable and biometric at once, so before a dwell becomes billable it passes two gates: it must clear a dwell threshold (a glance is not attention), and it must survive a jitter check (a perfectly still, zero-jitter gaze is as fabricated as a phantom-convoy GPS track). The refusal here protects the advertiser and the user simultaneously.
One carve-out, always stated: the spoof check decides whether attention is real, but it does not decide whether it is lawful. A gaze can be perfectly genuine and still un-billable because consent was withdrawn at capture. Reality is a property of the signal; lawfulness is a property of the consent window. The §06 close enforces both, and never lets one stand in for the other.
The derived layer turns gated attention into a billable currency and turns guardian breaches into a comfort-and-safety signal. Both are deterministic rollups over the facts — and both refuse to count anything they cannot defend.
Gaze-confirmed viewability is the monetization payoff: of the ads served, what fraction were actually looked at, counting only users who consented to eye-tracking at the time and only gaze that survived the spoof check. Served-but-never-looked-at impressions drop out of the billable metric; non-consented users never enter it. The result is a viewability number that is honest and lawful at once — a stronger currency than a served impression and a defensible one.
The safety signal is the other half of the aggregation. A guardian breach — a hand or head crossing the play boundary — is the XR analog of a vehicle disengagement: a safety event that correlates with room size and content intensity. Rolled up per session and joined to comfort score, it answers the product question that the session grain alone cannot — which experiences are physically uncomfortable, and which rooms are too small for them — and it does so without ever touching the room mesh, because the breach event already encodes the only geometry that matters.
The facts are where the platform explains itself. Three queries an interviewer loves, because each carries a classic pattern on its back — the capability-gated join, the SCD2 as-of deletion sweep, and the sessionization of a continuous firehose.
"Average eye-engagement per app" is a trap unless you gate by capability. A controller-only Quest produces no eye rows, and naively dividing by all sessions would understate every Vision Pro app. Join through sensor_suite so the denominator counts only sessions that could have produced gaze — absence is correct, not missing.
A withdrawal or an erasure request runs the consent invariant backward: find every gaze row captured while consent was not granted, so it can be purged. This is the same as-of predicate as the attention query, inverted — the dimension that proves you may use a row also proves which rows you must delete.
"Does comfort degrade the longer a session runs?" The per-second pose rollup has no fatigue boundaries — only timestamps and motion summaries. Flagging sustained high-velocity stretches and running a window sum over the flags is the canonical gaps-and-islands move, turning a continuous firehose into discrete fatigue episodes.
A senior design ends with observability, because every privacy boundary above is invisible without it. The dashboard watches three different definitions of "healthy" — the firehose, the consent estate, and the monetization currency — and treats a consent leak as the page-worthy incident.
Read the green tiles first, because they are the ones that keep the company out of court: ungated reads at zero means every biometric query went through the as-of join, and residency violations at zero means no frame escaped its region. Then read the amber: comfort p10 is sliding and spoof rejection is climbing — a content problem and an adversary, both real, both contained. That is what a designed privacy posture looks like from the operator's chair: the dangerous failures are alarmed to zero, and the merely-bad ones are visible before they page.
Strip the headset away and the question was testing five judgments, each of which generalizes far beyond spatial computing: