▸ INTERVIEW SKILLS · before you walk in
Research the company before your interview.
Most candidates read the "About" page and call it done. That's not research — that's the minimum. Interviewers can tell. Here's what to actually look for, where to find it, and worked examples for top-tier, mid-size, and stealth companies.
What actually matters — and what doesn't
THE SIGNAL, NOT THE NOISE
Interviewers do not expect you to know everything. They expect you to know something specific — something that shows you're serious enough about this role to spend an hour understanding what the team is actually working on.
The goal of research is not trivia. It's three things:
- Sharpen your "why this company" answer so it sounds like conviction, not flattery.
- Generate smart questions that signal technical curiosity and domain knowledge.
- Anticipate the problems they care about so you can connect your experience to their actual work.
Don't research these: Stock price and quarterly earnings (unless you're in finance) · Mission statement language · How many employees they have · Award lists · Press coverage about funding rounds (unless the round changed the product direction). These are ambient facts, not signal.
The 1-hour research checklist
DO THIS THE NIGHT BEFORE
Engineering research (40 min)
- Read the last 3–5 posts on the company engineering blog. Look for architectural decisions, not product announcements. 15 min
- Search "{company} data engineering" or "{company} data infrastructure" on Google. Read anything from the last 18 months. 10 min
- Check the company's GitHub org. What OSS do they maintain? What are the recent commits about? 5 min
- Search YouTube or Databricks Summit / dbt Coalesce / Kafka Summit for recent talks by company engineers. 10 min
Product and business context (10 min)
- Read the product changelog or release notes for the last 6 months. What direction are they moving? 5 min
- If B2B: look at their pricing page. Understand the product tiers. This tells you how they think about their customers. 3 min
- If B2C: spend 5 minutes using the product. Know what it does from the user's perspective. 5 min
People (10 min)
- LinkedIn: find your interviewers. Read their backgrounds. Don't mention it — just know what they've built. 5 min
- Look for any conference talks, blog posts, or papers by the interviewers. 5 min
Where to find technical signal
SOURCES THAT MATTER
| Source | What you'll find | Signal strength |
| Engineering blog (tech.company.com) | Real architectural decisions, incident postmortems, design choices | ⬛⬛⬛⬛⬛ Very high |
| GitHub org | What OSS they maintain, recent work direction, code quality | ⬛⬛⬛⬛⬜ High |
| Conference talks (YouTube, Spark Summit, dbt Coalesce) | Deep technical problems they're solving, how engineers think | ⬛⬛⬛⬛⬜ High |
| Research papers (arxiv, ACM) | Core technical bets the company has made at scale | ⬛⬛⬛⬛⬛ Very high (rare) |
| Hacker News / Reddit | Community reactions to launches, real user pain points | ⬛⬛⬛⬜⬜ Medium |
| Job descriptions (current JDs) | What stack they're using now, what problems are acute | ⬛⬛⬛⬜⬜ Medium |
| LinkedIn (company page) | Recent hires, team growth pattern | ⬛⬛⬜⬜⬜ Low-medium |
| Crunchbase / press releases | Funding stage, growth trajectory | ⬛⬜⬜⬜⬜ Low (for tech) |
The engineering blog is the highest-value source almost every time. Companies publish blog posts about problems that are real, hard, and current. If you can read a blog post, form a technical opinion about it, and bring that opinion into the interview, you're in the top 10% of candidates.
· · ·
Top-tier examples
GOOGLE · META · AMAZON
Top-tier companies publish extensively. The challenge is volume — there's too much. Focus on the last 18 months and filter to your domain (data infrastructure, ML platform, etc.).
Top-tier · Google
Research brief for
Google — Data / AI Engineering
Key technical systems to know
Monarch (time-series monitoring at scale), Dremel / BigQuery (columnar query engine), Flume / Apache Beam (data pipeline model), Spanner (globally consistent SQL), Borg / Kubernetes (workload scheduling). On the AI side: Pathways (multi-task model training), TPU infrastructure, Vertex AI platform.
Where to research
Google Research blog (research.google), Google Cloud blog (cloud.google.com/blog), Google AI blog. Search "Google data infrastructure" on Hacker News. Read the Dremel paper and Monarch paper if applying to the data infra side. Check Google's GitHub (github.com/google) for OSS projects relevant to your role.
What makes Google different technically
Google builds internal infrastructure at a scale that external tools can't handle — then eventually open-sources it. The culture values technical depth and "big picture" thinking simultaneously. They run large-scale distributed systems as a matter of course. The interview will probe whether you can reason about scale you haven't directly operated at.
Smart questions to ask at Google
"How does the team decide when to build internal infrastructure vs. adopt an external tool?" · "What does data reliability SLO enforcement look like inside Google — how do you handle internal customers who miss SLAs?" · "How is the Monarch/Monarch2 migration affecting the way teams think about observability pipelines?"
How to use this in your pitch / answers
"I've read the Monarch paper and the design choices around in-memory zone leaves — the tradeoff between query freshness and availability is directly relevant to the SLO work I've been doing. I want to work on that problem at a scale where those tradeoffs actually hurt."
Top-tier · Meta
Research brief for
Meta — Data / AI Engineering
Key technical systems to know
Scuba (real-time log analytics), Hive at Meta scale, Presto / Trino (originated at Meta), TAO (social graph store), Tupperware (container orchestration), Manifold (feature store), FBLearner (ML training platform). Data catalog: Nemo. Data quality: Mariana.
Where to research
Meta Engineering blog (engineering.fb.com) — extremely high-quality posts. The Data Engineering section specifically. Search "Meta data infrastructure" on Hacker News. Several Meta engineers publish on Medium and Substack — worth following. Presto's GitHub is public and shows the company's OSS priorities.
What makes Meta different technically
Meta moves extremely fast and values impact at scale over perfection. Data decisions are cross-functional — the data engineering teams support both ads (revenue-critical) and social products (engagement-critical) simultaneously. The interview will test whether you can navigate ambiguity and prioritize ruthlessly.
Smart questions to ask at Meta
"How does the team handle data consistency across products — ads data vs. social graph data?" · "Presto originated here — how does the team decide what to contribute upstream vs. keep as internal forks?" · "What's the current thinking on the data contract enforcement problem — I saw the Mariana work, curious how it's evolved."
How to use this in your pitch / answers
"The cross-functional data dependency problem at Meta is what I've been thinking about — I read the engineering post about how you handle schema evolution across teams and it's a harder version of the contract enforcement work I built at Lyft."
Top-tier · Amazon
Research brief for
Amazon / AWS — Data Engineering
Key technical systems to know
Redshift (columnar DW), Kinesis (streaming), EMR (managed Hadoop/Spark), Glue (ETL service), Lake Formation (data lake governance), Aurora (relational), DynamoDB. Internally: Coral (cross-service SQL translation), Maestro (workflow orchestration). Amazon retail uses a massive internal data mesh.
Where to research
AWS Blog (aws.amazon.com/blogs/big-data/) — very detailed, technical. The AWS re:Invent talks on YouTube are excellent for deep dives. Search "Amazon data infrastructure" and "AWS data engineering" on Hacker News. For the retail side, search "Amazon data mesh" and "Amazon data engineering blog". The Amazon Science blog covers the ML and AI infrastructure work.
What makes Amazon different technically
Amazon is two companies in one from a data perspective: AWS (selling services to the world) and Amazon retail/logistics (the world's most complex fulfillment operation). Leadership Principles are deeply embedded in interviews — every behavioral question is mapped to an LP. Research the 16 LPs and prepare a story for each. Technical decisions are written up as 6-pagers — the culture values clarity in writing.
Smart questions to ask at Amazon
"How does the team handle the tension between Amazon being a customer of AWS services and being the team that defines those services?" · "What does the data governance model look like for the retail data mesh — how do you enforce access at the scale of hundreds of teams?" · "How does the Redshift Streaming roadmap interact with Kinesis — are those teams converging?"
How to use this in your pitch / answers
"I've been a Kinesis and Redshift customer for four years and I have opinions. The Redshift Streaming ingestion work in the last two re:Invent cycles is solving a problem I've worked around — I want to be on the side that's making the decision, not adapting to it."
· · ·
Mid-size company examples
DATABRICKS · STRIPE · AIRBNB
Mid-size high-growth companies often have more focused technical blogs and a smaller, more searchable corpus. It's easier to read everything relevant. Do it.
Mid-size · Databricks
Research brief for
Databricks
Key technical systems / products to know
Delta Lake (open table format), Unity Catalog (data governance), Photon (native vectorized query engine in C++), MLflow (OSS ML lifecycle), Delta Sharing (open data sharing protocol), Liquid Clustering (new layout algorithm replacing ZORDER), Lakeflow (pipeline orchestration). The DBRX foundation model. Streaming: Structured Streaming.
Where to research
Databricks Engineering Blog (databricks.com/blog/engineering) — read everything from the last year. Databricks Summit talks on YouTube. The Delta Lake GitHub is public and extremely active. MLflow GitHub. The Unity Catalog paper (available as a technical report). Search "Databricks" on Hacker News for community reactions. Follow the founders and principal engineers on LinkedIn.
What makes Databricks different
Databricks is betting on the open Lakehouse architecture as the future of enterprise data. They're simultaneously a tool company (selling Spark/Delta) and a platform company (Unity Catalog, Lakeflow). The culture is academic-heavy — several key engineers have PhDs and publish research. Expect deep technical questions. OSS contribution is valued.
How to use this
"The Liquid Clustering announcement was interesting to me because I've been using ZORDER as a workaround for partition skew — I have questions about how the adaptive nature of Liquid Clustering handles write-heavy workloads. That's actually why I'm applying."
Mid-size · Stripe
Research brief for
Stripe
Key technical context
Stripe processes hundreds of billions of dollars annually. The data infrastructure challenge is exactly-once semantics, ledger-accurate reconciliation, and cross-currency consistency. Key products: Payments API, Billing, Tax, Radar (fraud ML), Sigma (in-dashboard SQL for merchants), Link (one-click checkout). Stripe's data warehouse is built on Spark and their own internal tooling. Strong writing culture — expect written exercises in some loops.
Where to research
Stripe Engineering Blog (stripe.com/blog/engineering) — very high quality. Stripe's sessions at Kafka Summit and dbt Coalesce. The Stripe API documentation itself — reading it tells you how they think about product. Stripe Press publishes books that reveal the company's intellectual priorities. Hacker News threads when Stripe launches new products.
What makes Stripe different
Stripe values clear writing above almost everything. They're hiring people who can articulate complex technical ideas in prose — every important decision starts as a written document. The technical bar is extremely high. They care about financial data accuracy at a level most engineers haven't worked at. Radar (fraud detection) is a world-class ML system.
How to use this
"I read the Stripe engineering post on handling multi-currency ledger consistency. The approach you described — treating currency conversion as an immutable event rather than an in-place mutation — is the same bet I made on a smaller reconciliation system. I want to see how that holds at your scale."
Mid-size · Airbnb
Research brief for
Airbnb
Key technical systems to know
Minerva (metric consistency platform — read the blog post), Midas (data certification framework), Superset (originated at Airbnb, now Apache), Airflow (originated at Airbnb, now Apache), Flink for streaming, Hive + Spark on AWS. Their approach to data contracts and SLOs is documented and influential in the data community. Two-sided marketplace: supply (hosts) and demand (guests) data are deeply intertwined.
Where to research
Airbnb Engineering & Data Science blog (medium.com/airbnb-engineering) — one of the best-written engineering blogs in the industry. Read the Minerva post, the Midas post, and anything on their ML platform. The Airflow and Superset GitHub repos. dbt Coalesce talks by Airbnb engineers. LinkedIn posts from their data leadership team.
What makes Airbnb different
Airbnb has a deeply data-driven culture — A/B testing and metric rigor are central to product decisions. The Minerva platform is their answer to metric inconsistency across teams, and it's a genuine innovation. The marketplace model means understanding supply-demand dynamics is required context. They've gone through major size reductions (2020 COVID layoffs, IPO, re-growth) — the team now does more with less.
How to use this
"The Minerva architecture — specifically the decision to build consistency at the semantic layer rather than at the pipeline layer — is the right answer to a problem I've been dealing with. I've been implementing something similar at a smaller scale. I want to contribute to that platform and understand how you handle the hardest cases."
· · ·
Stealth company approach
WHEN INFORMATION IS LIMITED
Stealth companies by definition reveal little. But you can almost always find: (1) the founding team's backgrounds, (2) what problem space they're likely in based on those backgrounds, (3) what they've published or presented previously, and (4) who else has joined (LinkedIn search by company).
The stealth research heuristic: Research the founders' previous companies and papers instead of the stealth company itself. If the CEO built fraud infrastructure at Stripe, the stealth company is probably in financial risk or identity. If the CTO published on vector databases at Meta, you're probably in AI infrastructure. Research those domains instead.
Stealth · AI Infrastructure
Research approach for
Stealth AI Infrastructure Startup
What to research instead
Research the domain they're likely in: AI training infrastructure, inference serving, vector databases, model evaluation, or AI observability depending on founder backgrounds. Read the foundational papers in that space (e.g., if inference serving: read the Orca paper, Triton inference server docs, vLLM architecture). Understand the current unsolved problems in the domain — those are likely what the company is building.
What to do with founder research
Look up every public paper, talk, or blog post the founders have published in the last 5 years. These reveal the technical bets they believe in. In the interview, you can say: "I read your 2023 talk on [topic] at [conference] — the approach you described to [specific thing] informed how I thought about [your own work]." This is extremely high signal.
Questions to ask stealth companies
"Can you tell me more about the technical problem you're focused on?" (They expect this — you're allowed to ask.) · "What made you decide to start this company now — what gap did you see that wasn't being addressed?" · "How are you thinking about the data infrastructure layer — is that something you're building or something you're buying?" · "What does the first 90 days look like for someone joining at this stage?"
How to frame your pitch
"Based on what I know about the founding team's background, I'm guessing you're in the [domain] space — that's where I've spent the last three years, and it's why I was excited to come in. Tell me if I'm wrong — I'm curious what you're actually building."
Stealth · Enterprise Fintech
Research approach for
Stealth Enterprise Fintech
What to research instead
Research the B2B fintech infrastructure landscape: payment rails (ACH, RTP, FedNow), bank-grade data reconciliation, ledger systems (Stripe Treasury, Bond, Modern Treasury), financial data compliance (SOX, PCI-DSS, Basel III). Read the Modern Treasury and Moov engineering blogs — they write publicly about the hard problems in this space. Understand what enterprise banks are bad at and where startups win.
Domain problems worth understanding
Exactly-once semantics in financial transactions · Multi-currency reconciliation at scale · Compliance-grade audit trails (immutable, tamper-evident) · Real-time fraud detection with sub-100ms latency constraints · Regulatory reporting (SAR filings, Basel reporting) — the data pipelines behind these are complex and often poorly built.
How to frame your pitch
"I've spent my career in financial data infrastructure — reconciliation, compliance pipelines, fraud data. Whatever you're building in this space, the data problems are the ones I've been working on. I'm at a stage where I want to build the infrastructure from scratch rather than inherit someone else's decisions."
· · ·
How to use what you find
CONVERTING RESEARCH INTO SIGNAL
Research is useless if it stays in your notes. The goal is to weave it into the conversation naturally — not to recite facts, but to use what you've learned as a lens for your own experience.
| When | How to use it |
| In your elevator pitch |
Use one specific company technical detail in your "why this company" beat. One specific thing. Not a list. |
| In technical answers |
Connect their system to your work: "That's similar to what [their system name] solves — in my case…" |
| In behavioral answers |
Frame impact in terms they understand: "I was solving a freshness problem — similar to what I understand Scuba handles at Meta but at a smaller scale." |
| At the end (your questions) |
Ask about something specific from the blog posts. "I read your post on [X] — how has that evolved since publication?" |
The most powerful move: Form a technical opinion about something you've read and share it. "I read the Minerva post and I agreed with the decision to enforce consistency at the semantic layer — but I wondered about the performance cost of on-the-fly metric computation vs. pre-aggregation. How do you handle that at your query volume?" An informed question is better than a flattering statement.
← Elevator pitch ·
Company tracks →