Hot Topics 2026 — what the senior DE loop is actually testing.
The format war is over (Iceberg won), so the questions moved up the stack — to the catalog, to cost at petabyte scale, to quality at the source, and to the data plumbing that production GenAI lives or dies on. Twenty-one topics, each with an architecture diagram and the one-line interview signal that separates an L4 answer from an L5/L6 one.
The four macro-shifts driving the 2026 question set
Iceberg won → the catalog war
With the table format settled, the differentiation — and the hard problems — moved to the REST catalog and metadata layer.
FinOps at petabyte scale
Storage and compute bills are now a first-class engineering KPI. "Make it cheaper" is a design constraint, not an afterthought.
Quality shifts left
Contracts and assertions move enforcement to the producer and the ingestion edge — catch bad data before it poisons the lake.
GenAI needs plumbing
Feature stores, vector stores and retrieval pipelines — with freshness and lineage — are the new must-build, plus copilots in ops.
The twenty-one topics
- S3 Storage Lens & Cost Optimization — the FinOps entry point
- Data Lake Query Engines & Catalogs — Trino + Iceberg REST catalogs
- Storage Layout & Partitioning Strategy — the highest-leverage L5 topic
- Compute-to-Storage Skew Mitigation — killing the straggler
- Real-time Stream Partitioning & Ingestion — the partition-key decision
- Schema Evolution & Format Governance — registries & columnar IDs
- Data Quality, Anomaly Detection & Data Contracts — circuit breakers
- Data Lineage, Dependencies & Observability — OpenLineage / OTel
- Data Tiering & TTL Management — the compute-side of cost
- AI-Ready Infrastructure & Pipeline Copilots — the signature 2026 theme
- Multi-Region Data Architecture & Disaster Recovery — surviving a region loss
- Table Maintenance Automation — metadata failures, not compute
- Data Platform Reliability Engineering — SLOs, error budgets, SRE-for-data
- Lakehouse Security & Governance — RLS, masking, GDPR-as-code
- Change Data Capture (CDC) at Scale — OLTP → Kafka → Iceberg
- Cost-Aware Compute Optimization — cut 40% without breaking SLA
- Metadata Engineering — discovery & trust
- Event-Driven Architectures — sourcing, CQRS, outbox, sagas
- Vector Data Infrastructure — embeddings, ANN, RAG freshness
- Data Mesh & Domain Ownership — organizational scaling
- CI/CD & DataOps Optimization with AI — the intelligent delivery loop
S3 Storage Lens & Cost
Org-wide growth, waste surfacing, and lifecycle to Glacier / Deep Archive.
Query Engines & Catalogs
Trino + Iceberg REST catalogs — Polaris, Glue, Nessie; manifest pruning.
Partitioning Strategy
Hidden partitioning, bucketing, sorting, Z-order; the small-file killer.
Skew Mitigation
Salting, AQE, skew joins, isolating heavy hitters in Spark/Flink.
Stream Partitioning
Kafka/Flink partition keys, backpressure, rebalancing at device scale.
Schema Evolution
Avro/Protobuf registries, compatibility modes, Iceberg ID-based evolution.
Quality & Contracts
Assertions as circuit breakers; producer/consumer data contracts.
Lineage & OTel
Auto-mapped dependencies, impact analysis, OpenLineage/OpenTelemetry.
Tiering & TTL
TTL frameworks, rollup cascades, retiring raw to cheaper tiers.
AI Infra & Copilots
Feature/vector stores, retrieval freshness, and copilots in operations.
Multi-Region & DR
Catalog recovery, RPO/RTO, active-active vs passive, cross-region replication.
Table Maintenance
Compaction, snapshot expiry, manifest rewrites, orphan removal, branching.
Reliability Engineering
Pipeline SLOs, error budgets, automated rollback, self-healing.
Security & Governance
Row/column security, dynamic masking, PII, GDPR-as-code, zero-trust.
CDC at Scale
Debezium, ordering, idempotency, exactly-once upserts into Iceberg.
Compute Cost
AQE, autoscaling, spot, right-sizing, shuffle tuning, resource queues.
Metadata Engineering
Discovery, glossary, semantic layer, ownership, catalog performance.
Event-Driven Arch
Event sourcing, CQRS, the outbox pattern, sagas, event versioning.
Vector Infrastructure
Embedding pipelines, ANN, hybrid search, RAG freshness, re-embedding.
Data Mesh
Domain ownership, data products, federated governance, platform teams.
CI/CD with AI
Selective testing, change-risk scoring, anomaly-gated deploys, self-healing pipelines.
★ Where to start — the 2026 priority order
Triaging limited prep time? This is the ranking I'd use — the topics most likely to decide a senior/staff loop, each linked to its deep-dive above.
- Iceberg / Lakehouse internals
- Data lake query engines & catalogs
- Streaming systems (Kafka/Flink)
- CDC & incremental processing
- Storage layout & small-file optimization
- Compute optimization & cost engineering
- Data reliability engineering (SLOs, budgets)
- Data contracts & quality
- Lineage & observability
- Security & governance
- Multi-region architecture & DR
- Metadata engineering & discovery
- AI data infrastructure (RAG, vectors)
- Event-driven architectures
- Data mesh & organizational scaling
S3 Storage Lens & Cost Optimization
Why it's hot in 2026: at petabyte scale the storage bill is an engineering KPI, and Storage Lens is the org-wide lens that turns "where did 8 PB come from?" into a dashboard — the natural entry point to a FinOps practice.
Storage Lens aggregates usage and activity metrics across every account and bucket in the org (the advanced tier adds prefix-level detail and recommendations), so growth and waste become visible in one place. The four waste categories it surfaces map directly to one-line lifecycle fixes.
The techniques
- Surface the four wastes: orphaned objects (no owning table/job), uncompressed data, incomplete multipart uploads silently accruing, and non-current object versions in versioned buckets.
- Automate lifecycle rules: transition cold prefixes Standard → IA → Glacier Flexible → Deep Archive;
AbortIncompleteMultipartUpload;NoncurrentVersionExpiration; expire true temp data. - Intelligent-Tiering for unpredictable access patterns (auto-moves between tiers, no retrieval fee on frequent/infrequent), vs explicit lifecycle when you know the access curve.
- FinOps wiring: cost-allocation tags + showback/chargeback so each team sees its own bill; treat $/TB-scanned and $/TB-stored as tracked KPIs.
Data Lake Query Engines & Catalogs
Why it's hot in 2026: Iceberg won the format war, so the catalog layer is the new battleground. The REST catalog and the metadata tree — not the data files — now decide how fast a petabyte query plans.
Once the table format is a commodity, the catalog is where vendors compete (Snowflake Polaris, Databricks Unity, AWS Glue, the git-like Nessie) and where query planning lives or dies. A query never scans data first — it walks the metadata tree, pruning at each level.
The techniques
- Minimize metadata overhead: expire old snapshots, rewrite/compact manifests, keep
metadata.jsonfrom ballooning; small, current metadata = fast planning. - Prune manifests efficiently: partition stats in the manifest list and column lower/upper bounds (and Puffin sketches) let the planner skip whole manifests before reading data.
- REST catalog choice: Polaris/Unity/Glue/Nessie — Nessie adds git-style branches & tags for isolated writes and rollbacks; REST decouples engine from catalog implementation.
- Engine tuning: Trino split sizing, dynamic filtering, table/column stats (
ANALYZE), metadata caching; cost-based join ordering.
Storage Layout & Partitioning Strategy
Why it's hot in 2026: layout ties physical decisions directly to query cost, which makes it one of the highest-leverage topics in an L5 loop — get it wrong and you're paying for it on every query, forever.
Every layout knob trades write-time effort for read-time savings, paid back on every query. The senior move is to derive the layout from the dominant query shapes, not the other way round.
The techniques
- Partition on a low-cardinality, frequently-filtered dimension (usually date) — over-partitioning is how you create the small-file problem.
- Bucket on the join/group key to co-locate matches and remove shuffles; sort/cluster within files for min/max data skipping.
- Iceberg hidden partitioning: transform-based (
day(ts),bucket(16, id)) so queries don't reference partition columns, and partition evolution changes the scheme with no table rewrite. - Z-order / space-filling curves for multi-column skipping; compaction to ~128 MB–1 GB targets to kill small files.
day for the time filter, bucket by user_id for the join, and Z-order on country for the dashboard filter." Mention that Iceberg's hidden partitioning + evolution is what lets you fix a bad partition choice without rewriting petabytes. Deep mechanics in Performance Families 1 & 7.Compute-to-Storage Skew Mitigation
Why it's hot in 2026: classic distributed-systems interview territory that never goes away — a few hot keys bottleneck the whole cluster while every other task sits idle. The total work is fine; the distribution is the problem.
Skew is the optimization juniors miss because the plan "looks fine." Detection is half the skill — read task-duration and partition-size distributions, not just the average.
The techniques
- Salting: append a random salt to the hot key to split it across N partitions; replicate the small side N ways to keep the join correct.
- Adaptive Query Execution (Spark 3+):
skewJoinsplits oversized partitions at runtime, coalesces tiny ones, and can flip sort-merge → broadcast once it sees real sizes. - Isolate heavy hitters: a known hot set goes down a separate two-phase path; the long tail aggregates normally (no broad replication cost).
- Flink:
keyByskew → local/pre-aggregation (two-phase), rebalance/rescale, key-group tuning.
skewedPartitionFactor and why you salt only the hot keys (salting everything just inflates the small side). Full operational treatment on the Hot Shards & Data Skew page and Performance Family 3.Real-time Stream Partitioning & Ingestion
Why it's hot in 2026: millions of concurrent devices, one wrong partition key, and you get a hot partition that lags the whole consumer group. The partition-key choice is the single most consequential decision in a streaming design.
Throughput and balance are won or lost at the partition key. The trade-off to articulate: per-key ordering requires same-key→same-partition, which is exactly what creates hot partitions when a key is popular.
The techniques
- Key design: high-cardinality, evenly-hashed keys (e.g.,
hash(device_id)); composite keys to break up hot tenants; never a low-cardinality key like country/region if it's skewed. - Throughput: right-size partition count, producer batching/
linger.ms, idempotent/transactional producers for exactly-once. - Backpressure: Flink credit-based flow control + buffer debloating; watch Kafka consumer lag as the health SLO.
- Rebalancing: cooperative/incremental sticky assignor to avoid stop-the-world rebalances when consumers join/leave.
Schema Evolution & Format Governance
Why it's hot in 2026: decoupled microservices deploy independently, so a producer's schema change must not break downstream consumers. The schema registry is the contract that makes that safe.
Format governance is what lets independent teams move fast without breaking each other. The registry encodes which changes are safe in which direction, and columnar formats add their own evolution rules.
The techniques
- Schema registry (Avro/Protobuf) enforcing
BACKWARD/FORWARD/FULLcompatibility as a CI gate — breaking changes fail the build, not production. - Safe vs breaking: adding an optional/defaulted field is compatible; renaming, removing, or retyping a required field is not.
- Iceberg/Parquet evolution by field-ID: add/drop/rename/reorder columns with no data rewrite — vs Hive/positional schemas that break on reorder.
- Direction matters: backward (new consumer reads old data) vs forward (old consumer reads new data) decides who can upgrade first.
Data Quality, Anomaly Detection & Data Contracts
Why it's hot in 2026: quality is shifting left. Instead of dashboards catching bad data days later, contracts and in-pipeline assertions act as circuit breakers that stop a malformed batch at the door.
The shift is from detecting bad data downstream to preventing it at the source. Two complementary mechanisms: contracts (an agreement) and assertions (the enforcement).
The techniques
- Data contracts: a versioned producer/consumer agreement — schema + semantics + SLAs — enforced in the producer's CI so a breaking change is caught in the PR.
- In-pipeline assertions as circuit breakers: Great Expectations / Soda / dbt tests that fail the run and quarantine the batch (dead-letter) rather than poison downstream.
- Anomaly detection on volume, freshness, null-rate and distribution drift — statistical or ML, alerting on deviation from the learned baseline.
- Policy-as-code: quality rules live in version control, reviewed like any other code.
Data Lineage, Dependencies & Observability
Why it's hot in 2026: lineage is the backbone of both incident response and change management — and it's increasingly built on vendor-neutral OpenLineage/OpenTelemetry standards rather than tool-specific silos.
Auto-mapped lineage turns a pile of jobs into a navigable dependency graph, which powers the two questions every on-call and every migration needs answered: impact (downstream) and root-cause (upstream).
The techniques
- Auto lineage: parse SQL or emit OpenLineage events from Airflow/Spark/dbt — table-level and, ideally, column-level.
- Impact analysis & cascading-failure handling: when a node fails or changes, highlight everything downstream before you ship/pause.
- Scheduling optimization: dependency-aware orchestration runs jobs in true topological order, not on guessed timers.
- OpenTelemetry standardization: vendor-neutral traces/metrics/logs + lineage facets — the five observability pillars (freshness, volume, schema, distribution, lineage).
Data Tiering & TTL Management
Why it's hot in 2026: the compute-side complement to Storage-Lens cost work (№01) — strict TTLs and rollup cascades retire raw events to cheaper tiers, so you stop paying to keep (and scan) granular history forever.
Tiering is the recognition that not all data deserves hot, granular, expensive storage. Access patterns decay with age, so storage and granularity should too.
The techniques
- TTL frameworks: per-dataset retention enforced automatically — raw expires after N days once it's rolled up (also a GDPR/retention-compliance lever).
- Rollup cascade: raw → hourly → daily → monthly aggregates; retire or delete the granular tier behind each rollup.
- Operational vs analytical layers: a lean operational layer serves recent + aggregated data; deep history lives cold.
- Additivity discipline: store sums/counts and HLL sketches (not pre-computed distinct counts or ratios) so rollups stay correct at every grain.
AI-Ready Infrastructure & Pipeline Copilots
Why it's hot in 2026: the defining theme. Two halves — (a) the data foundations production GenAI depends on, and (b) AI copilots embedded in the platform itself, pushing toward "autonomous" data operations.
This is the topic that signals you're building for where the field is going. Both halves are squarely a data-engineering responsibility — the model is somebody else's; the plumbing is yours.
The techniques
- Feature stores: online/offline parity and point-in-time correctness to kill training/serving skew — the #1 reason a model that tested well fails in production.
- Vector stores & retrieval: embeddings + ANN index; chunking and embedding-refresh pipelines with freshness and lineage (doc → chunk → answer) so RAG can be trusted and audited.
- Data quality as the RAG lever: retrieval is only as good as the freshness and quality of what's indexed — ties directly to contracts (№07) and lineage (№08).
- Copilots in ops: text-to-SQL with guardrails, schema-change risk analysis, partition/clustering recommendations, anomaly triage — augmenting the engineer, trending toward autonomous platforms.
Multi-Region Data Architecture & Disaster Recovery
Why it's hot in 2026: anyone can build a pipeline; far fewer can recover an Iceberg catalog after a region outage with a defined RPO/RTO. "Can you survive losing a region?" is becoming a bigger senior question than "can you write Spark?"
DR for a lakehouse is mostly a metadata problem: the data files replicate cheaply via cross-region replication, but a query can't read them until the catalog that points at them is recovered and consistent.
The techniques
- RPO/RTO first: agree the acceptable data-loss window (RPO) and recovery time (RTO) before choosing a topology — they drive everything else.
- Active-passive vs active-active: warm standby (cheaper, simpler, slower RTO) vs both-regions-serving (fast, expensive, needs conflict resolution / single-writer per key).
- Cross-region replication: S3 CRR for data, Kafka MirrorMaker 2 for the log, and catalog metadata replication — recovering Iceberg/Nessie/Glue is the part teams forget.
- Prove it: game-day failover drills and chaos engineering — an untested DR plan is a hypothesis, not a capability.
Table Maintenance Automation
Why it's hot in 2026: Iceberg adoption is exploding, but many engineers stop at CREATE TABLE. A surprising share of production failures in 2026 are metadata failures, not compute failures — unmaintained tables that slowly strangle their own planning.
An Iceberg table is a living thing: every write adds snapshots, manifests and files. Without scheduled maintenance, planning time and storage creep up until the table becomes the incident.
The techniques
- Compaction (
rewrite_data_files) to kill small files; manifest rewrites to keep the manifest tree shallow. - Snapshot expiration + orphan-file removal to reclaim storage and stop metadata bloat (the №02 planning killer).
- Branching/tagging for isolated writes, audits and instant rollback; incremental processing off Iceberg snapshots/changelog to read only what changed.
- Automate it as policy-as-code on a schedule — not a heroic manual cleanup after the table is already slow.
Data Platform Reliability Engineering
Why it's hot in 2026: companies increasingly expect data engineers to think like SREs — SLOs, error budgets, automated rollback and self-healing. "What happens if Kafka is down for 4 hours?" is now a standard senior question.
DPRE imports the SRE discipline into data: pipelines get reliability targets, an error budget that gates change, and automated responses so humans aren't the first line of defence.
The techniques
- SLIs/SLOs for pipelines: freshness, end-to-end latency, completeness and error rate — with explicit targets, not vibes.
- Error budgets that gate releases: burn the budget and you freeze risky changes until reliability recovers.
- Automated rollback & self-healing: bad deploy or data → revert to the last good snapshot (Iceberg time-travel helps), retry/quarantine, runbook automation.
- Degradation design: answer "Kafka down 4 hours?" with buffering, replay from offsets, and an explicit acceptable-data-loss stance.
Lakehouse Security & Governance
Why it's hot in 2026: most DE topic lists stop at schema and lineage, but interviews — especially in healthcare, finance and AI — increasingly probe row/column-level security, dynamic masking, PII handling and GDPR/CCPA deletion as policy-as-code.
Governance is shifting from documentation to enforcement: access rules live as code in the query path, and compliance (deletion, residency) is an engineered workflow, not a manual scramble.
The techniques
- Row-level security (per-role row filters) and column-level / dynamic masking (mask SSNs, emails unless authorized) enforced centrally.
- PII detection & tagging to drive masking automatically; policy-as-code so access rules are versioned and reviewed.
- GDPR/CCPA deletion workflows: right-to-be-forgotten at scale on immutable files — Iceberg row-level deletes + compaction to physically purge.
- Zero-trust data access: short-lived, attribute-based credentials; no standing broad grants.
WHERE clauses. Connects to lineage (№08) for "where did this PII flow?"Change Data Capture (CDC) at Scale
Why it's hot in 2026: a surprising number of modern architectures are OLTP → CDC → Kafka → Iceberg → Trino rather than traditional batch ETL. Getting ordering, idempotency and exactly-once right is one of the hottest operational topics.
CDC replaces nightly batch with a continuous, low-latency mirror of the source. The hard parts aren't the capture — they're the correctness guarantees on the way into the lake.
The techniques
- Log-based capture (Debezium off the WAL/binlog) — low-impact and complete, vs query-based polling that misses deletes.
- Ordering & idempotency: Kafka guarantees order within a key/partition; make the apply idempotent (upsert on PK, dedup on an
op_seq/LSN) so replays are safe. - Exactly-once into Iceberg:
MERGEupserts/deletes; handle late-arriving and out-of-order updates with a monotonic version tiebreaker. - Snapshots + deletes: tombstone handling so source deletes propagate, not just inserts/updates.
Cost-Aware Compute Optimization
Why it's hot in 2026: storage cost (№01) has a compute twin. "Cut this job's cost 40% without hurting the SLA" is a more realistic senior question than "write a Spark transformation" — it tests judgment, not syntax.
Compute cost is a design constraint, not an afterthought. The senior skill is finding the slack — idle capacity, over-provisioning, needless shuffle — without putting the SLA at risk.
The techniques
- Spark AQE (coalesce partitions, skew join, dynamic switching) and shuffle optimization to cut the most expensive stages.
- Autoscaling + spot/preemptible with an on-demand base for the driver/critical path — the biggest lever, if your job tolerates interruption.
- Right-size executors (cores/memory to the actual workload) and use resource queues so cheap batch doesn't starve interactive.
- Serverless trade-off: pay-per-query convenience and zero idle vs cost at steady high utilization — choose per workload.
Metadata Engineering
Why it's hot in 2026: it's becoming its own specialty. The industry is moving from "store data" to "make data discoverable and trustworthy" — discovery, a business glossary, ownership and a semantic layer over a fast metadata catalog.
When there are thousands of tables, the bottleneck isn't storage or compute — it's finding the right, trustworthy dataset. Metadata engineering makes the catalog a product in its own right.
The techniques
- Data discovery: a searchable catalog ("Google for data") with ranking by popularity, freshness and quality signals.
- Business glossary + semantic layer: shared definitions and governed metrics so "revenue" means one thing everywhere (ties to Analytics №04).
- Ownership & data products: every dataset has an owner, an SLA and a contract — domain-driven, not orphaned.
- Catalog performance: metadata indexing so discovery and column-level lineage queries are fast at scale.
Event-Driven Architectures
Why it's hot in 2026: one layer above Kafka partitioning (№05) sit the patterns that appear constantly in staff-level system-design rounds — event sourcing, CQRS, the outbox pattern and sagas.
Event-driven design treats the stream of events as the system of record, with everything else a derived view. It's how you decouple services and rebuild state on demand.
The techniques
- Event sourcing: persist state changes as an append-only log; current state is a fold over events, and you can replay/rebuild any projection.
- CQRS: separate the write model from many purpose-built read models (search, analytics, cache) — each optimized for its query.
- Outbox pattern / transactional messaging: write state and the event in one DB transaction, relay the outbox to the log — no dual-write inconsistency.
- Sagas for multi-service transactions (with compensating actions), and event versioning for schema evolution of the log.
Vector Data Infrastructure
Why it's hot in 2026: the AI pillar's vector stores deserve their own deep-dive. Companies now expect data engineers to own the embedding pipelines, indexing and freshness that AI workloads consume.
A vector store is only as good as the pipeline that fills and refreshes it. This is squarely data-engineering work — ingestion, indexing, freshness and lineage — not model training.
The techniques
- Embedding pipelines: chunk → embed → upsert into the index, with the same model on both ingest and query side.
- Vector indexing & ANN search: HNSW/IVF trade-offs (recall vs latency vs memory); filtering by metadata alongside the vector.
- Hybrid search (vector + keyword/BM25) and re-ranking for relevance the pure-vector recall misses.
- Freshness & re-embedding: re-embed on content or model change; track lineage (doc→chunk→vector) so RAG answers are auditable. Multi-modal retrieval where needed.
Data Mesh & Domain Ownership
Why it's hot in 2026: the hype cooled but the principles stuck. Staff-level interviews increasingly test organizational architecture — domain ownership, data products and federated governance — not just technical architecture.
Data mesh is an answer to an organizational scaling problem: a central team becomes the bottleneck as data needs outgrow it. The fix is ownership at the domain, on a shared platform.
The techniques
- Domain ownership: the team that knows the data owns it end-to-end — no central team bottleneck for every change.
- Data as a product: each domain ships discoverable, documented, SLA-backed data products with contracts (№07) and ownership in the catalog (№17).
- Self-serve platform: a platform team provides the paved road (storage, catalog, CI/CD, observability) so domains move fast without reinventing infra.
- Federated computational governance: global, automated standards (interop, security, quality) with local autonomy — not a central committee.
CI/CD & DataOps Optimization with AI
Why it's hot in 2026: data-platform code and pipelines ship through CI/CD too — and AI is now optimizing the delivery loop itself. Selective testing, change-risk scoring and anomaly-gated deploys are turning CI/CD from a slow gate into an intelligent one.
CI/CD for a data platform gates on more than unit tests — it gates on data quality, contract compatibility and blast radius. AI both speeds the loop up and makes the deploy decision smarter.
The techniques
- Test impact analysis / selective testing: run only the tests a diff actually affects (and pipeline tests for the touched models) — minutes instead of a full suite.
- Flaky-test detection & quarantine: classify and auto-quarantine flaky tests so they stop blocking merges, and fix them on a separate track.
- AI code review & change-risk scoring: auto-review PRs, score blast radius (from lineage, №08), and flag risky schema changes (№06) for human eyes; predict build failures and optimize caching/DAG parallelism.
- Anomaly-gated progressive delivery: canary/blue-green where an AI watches data-quality, freshness and error metrics to auto-promote or auto-rollback — self-healing pipelines (auto-retry, revert, even auto-fix PRs), the DPRE loop (№13) closed automatically.
One thread runs through all twenty-one.
Read together, the 2026 set is a single argument: now that the format is settled, value moved up the stack — to the catalog, metadata and table internals (№02, №12, №17), to layout and cost on both storage and compute (№03, №01, №09, №16), to reliability, quality, security and governance enforced as code (№13, №07, №06, №14, №08), to the streaming, CDC and skew fundamentals that never stopped mattering (№04, №05, №15), to resilience across regions (№11), to the staff-level concerns of event-driven and organizational architecture (№18, №20), and to the AI-ready foundations — feature, vector and retrieval infra — that are the new table stakes (№10, №19), with AI now optimizing the delivery loop itself (№21). The senior signal in every one is the same: you tie the technical choice to cost, blast radius, or trust — not just correctness.