PaddySpeaks · Systems at the Whiteboard · Nº 29

The Azure Problem

Design a hybrid enterprise platform on Microsoft Azure. Active Directory / Entra ID, VNet hub-and-spoke, AKS, Cosmos DB multi-region, Managed Identity, ExpressRoute, and the identity-first architecture that makes Azure the enterprise cloud of record.

☁️ Cloud series: The AWS Problem →  ·  The GCP Problem →

§ 00 — BEFORE AZUREHow Microsoft got here

Azure didn't emerge from a cloud-native company. It was a survival pivot by a software giant facing existential irrelevance as AWS reshaped enterprise computing from the ground up.

PRE-CLOUD ERA pre-2007 Windows Server AD · SQL · IIS enterprise contracts 2007–2010 Existential Threat AWS growing fast enterprise asks for cloud 2010 Azure GA "Windows Azure" .NET / Windows focus 2014 PIVOT Nadella CEO mobile-first cloud-first · Linux ✓ 2019 Azure Arc hybrid cloud any infra, Azure ctrl Today Azure Today #2 cloud · M365 moat Entra ID · AI / OpenAI Pre-cloud era Azure milestone Strategic pivot
EraMicrosoft positionAzure role
pre-2010Dominant in on-prem software (Windows Server, SQL Server, AD)No cloud; customers managed own datacenters
2010–2014Windows Azure launched — .NET/Windows only, catching up to AWSNiche: mostly Windows workloads, early adopters
2014–2019Nadella pivot: Linux first-class, open source embrace, Office 365Rapid enterprise adoption; Entra ID becomes cloud identity backbone
2019–now#2 cloud globally; $48B+ cloud revenue; Azure OpenAI partnershipMoat: Entra ID integration, M365 bundling, Arc hybrid cloud, Azure OpenAI

§ 01 — THE QUESTIONDesign a hybrid enterprise platform on Azure

Interview Prompt

"Design a hybrid enterprise platform on Azure. Walk me through your choices for compute, storage, identity, and networking — and how Azure's unique strengths (Active Directory, hybrid connectivity, enterprise compliance) differentiate it from other clouds."

LEVEL · SENIOR / STAFFDURATION · 45–60 MINFORMAT · WHITEBOARD
DimensionWeak answerStrong answer
IdentityUse IAM rolesEntra ID with Conditional Access, PIM, Managed Identity — no creds in code
NetworkingPut everything in a VNetHub-and-spoke with NSGs at subnet + NIC level, ExpressRoute for on-prem, Private Endpoints
DatabaseUse SQL ServerAzure SQL Flexible with geo-redundancy, Cosmos DB for multi-region with tunable consistency
HybridVPN to on-premExpressRoute with VPN as failover; Azure Arc for on-prem governance; hybrid DNS
Differentiation"It's like AWS"Microsoft 365 integration, Windows AD federation, enterprise compliance (FedRAMP, ISO 27001), Azure Arc

§ 02 — GLOBAL INFRASTRUCTUREGeography → Region → Zone → Datacenter

Azure groups datacenters into a four-level hierarchy. Region pairs are the Azure-native DR strategy: paired regions share a physical boundary, never update simultaneously, and replicate GRS storage automatically.

HIERARCHY LEVEL GEOGRAPHY e.g. United States REGION PAIR East US ↔ West US 2 (paired, 300+ mi apart) PRIMARY REGION East US — active workloads PAIRED REGION West US 2 — DR replica / GRS target Zone 1 Zone 2 Zone 3 DC(s) GRS auto-replication target Primary Paired (DR) Azure operates 60+ regions across 140+ countries
ConceptAzureAWS equiv.Key difference
Region pairBuilt-in, same geoManual cross-regionAzure pairs regions at infra level; GRS replicates automatically
Availability Zones3 per region3 per regionConceptually identical; Azure calls them Zone 1/2/3
EdgeAzure Front Door / CDNCloudFrontFront Door integrates WAF + global LB in one service
Gov cloudAzure GovernmentAWS GovCloudBoth FedRAMP High; Azure has deeper DoD IL5 coverage

§ 03 — COMPUTEVMs · AKS · Functions · Arc

Azure's compute decision tree starts with workload type. Arc extends Azure governance to any Kubernetes cluster anywhere — on-prem, multi-cloud, or edge.

What's your workload? Start here VMs / IaaS lift-and-shift Azure VMs B / D / E / F / N / H series (see below) Containers orchestrated AKS managed K8s Container Instances (ACI) Serverless event-driven Azure Functions Logic Apps Durable Functions Batch / HPC parallelizable Azure Batch CycleCloud (HPC) Hybrid on-prem Azure Arc any K8s, any cloud / on-prem VM SERIES B Burstable dev/test D General balanced E Memory SAP/in-mem DB F Compute CPU-intensive N GPU ML / rendering H HPC RDMA/InfiniBand VMSS (Virtual Machine Scale Sets) CPU % or custom metric → scale-out trigger → new VM instances provisioned from shared image gallery Supports rolling upgrade mode — no downtime

§ 04 — STORAGEBlob tiers · Disk · Files · Queue · Table

Azure Blob Storage has four access tiers. Tiering is a cost optimization: move cold data down automatically with lifecycle policies. Archive tier has hours of rehydration latency.

← LOW ACCESS COST · HIGH STORAGE COST ─────────────────────────── HIGH ACCESS COST · LOW STORAGE COST → HOT Frequent access storage access immediate COOL ≥ 30 days stored storage access ms latency COLD ≥ 90 days stored storage access ms latency ARCHIVE ≥ 180 days stored storage access rehydrate 1–15 hrs Lifecycle Management Policy: Hot → Cool → Cold → Archive (automated by last-modified date)
ServiceTypeAccess patternUse case
Blob StorageObjectREST APIImages, video, backups, data lake
Azure DiskBlockVM-attached (SSD/HDD)OS disk, database volumes
Azure FilesFile share (SMB/NFS)Mount on Windows/LinuxLift-and-shift shared drives, app config
Queue StorageMessage queueFIFO, 64 KB maxDecoupling; simple task queues
Table StorageNoSQL key-valuePartitionKey + RowKeyLow-cost structured data; logs
Data Lake Gen2Hierarchical BlobADLS driver, SparkAnalytics, Synapse, Databricks

§ 05 — NETWORKINGVNet · NSG · Hub-and-spoke · ExpressRoute

Azure's key differentiator: NSGs operate at both subnet level AND NIC level — double-perimeter defense within a VNet. Hub-and-spoke is the reference architecture for enterprise multi-workload isolation.

ON-PREM Corp Network Active Directory ExpressRoute or VPN GW HUB VNET VPN / ER Gateway Azure Firewall Azure Bastion Private DNS Zone NSG on every subnet VNet peering SPOKE · APP VNET App Subnet AKS Subnet NSG @ subnet + NIC level SPOKE · DATA VNET SQL Private EP Cosmos Private EP No public IP on any data resource SPOKE · DMZ VNET App Gateway WAF (L7) Internet-facing; public IPs here only Load Balancing: Azure Load Balancer (L4, internal/external) · Application Gateway (L7, WAF, SSL offload) · Azure Front Door (global, anycast) NSG differentiator: stateful rules applied at BOTH subnet boundary AND NIC — two-layer enforcement inside the VNet

§ 06 — DATABASEAzure SQL · Cosmos DB · Synapse · Redis

Cosmos DB is Azure's crown jewel for globally distributed data — the only database with 5 tunable consistency levels. Choose consistency based on latency vs correctness tolerance.

What's your data type? choose your engine Relational ACID Azure SQL Flexible Server PostgreSQL / MySQL Document multi-model Cosmos DB SQL / Mongo / Cassandra Gremlin / Table APIs Cache sub-ms reads Azure Cache for Redis Enterprise tier: geo Analytics OLAP / DWH Synapse Analytics + AI Search Search full-text Azure AI Search (Cognitive Search) vector + semantic COSMOS DB: 5 CONSISTENCY LEVELS ← STRONGER (lower availability, higher latency) ──────────────── WEAKER (higher availability, lower latency) → Strong linearizable reads always latest commit Bounded Staleness lag ≤ K versions or T seconds Session default consistent for session token Consistent Prefix never out of order, may lag Eventual lowest latency highest avail. reads may stale Strong → finance / inventory   Session (default) → user profiles   Eventual → social counters, carts Cosmos DB: 99.999% SLA multi-region, <10ms reads, <15ms writes at p99 with any consistency level Use Cosmos DB when: global distribution, multi-model, variable schema, or guaranteed low latency is the requirement

§ 07 — IDENTITYMicrosoft Entra ID — Azure's crown jewel

Entra ID (formerly Azure AD) is the reason enterprises choose Azure. Every resource in Azure is governed by a single identity plane — the same directory that manages Windows desktops, Microsoft 365, and third-party SaaS.

GOVERNANCE SCOPE (RBAC applies at each level) TENANT Entra ID Directory · contoso.com MANAGEMENT GROUPS Corp MG → Prod MG / Dev MG / Sandbox MG SUBSCRIPTIONS Production Sub · Dev Sub · Shared Services Sub RESOURCE GROUPS rg-network · rg-compute · rg-data · rg-security RESOURCES · VMs · Storage · AKS · SQL · Key Vault · … RBAC role assignments inherited from any parent scope; deny assignments block at resource level
FeatureWhat it doesAWS equiv.
SSOOne login for Azure, M365, 3000+ SaaS apps via SAML/OIDCIAM Identity Center (partial)
MFAPer-user or Conditional Access-driven; TOTP / FIDO2 / Windows HelloIAM MFA
Conditional AccessIf-then policies: device compliance + location + risk score → allow/block/MFANo direct equiv.
PIMPrivileged Identity Management: just-in-time role elevation, approval flows, auditIAM (limited JIT)
B2BGuest access for external partners — their identity, your resourcesIAM cross-account + Cognito
B2CConsumer-facing identity for apps — social login, custom UX, millions of usersCognito User Pools
MANAGED IDENTITY: No credentials in code App / AKS Pod System-assigned or User-assigned MI no password stored 1. get token (IMDS endpoint) Entra ID validates identity issues OAuth2 token short-lived JWT 2. JWT token 3. call resource with Bearer token Azure Resource Key Vault / Blob SQL / Service Bus RBAC checked via MI

§ 08 — SERVERLESS & INTEGRATIONFunctions · Service Bus · Event Grid · Logic Apps

Azure Functions shares the Lambda cold-start problem. The integration layer — Service Bus vs Event Grid vs Event Hubs — is a common interview stumper because each solves a different problem.

AZURE FUNCTIONS: COLD START ANATOMY WARM Trigger Warm instance Execute ~5–20 ms total COLD Trigger Provision host Load runtime Load deps Execute 800ms–3s cold start penalty Mitigations: Premium plan (pre-warmed instances) · Durable Functions (persistent state) · App Service plan (always-on) Python / Java worst; C# / JS best cold start on Consumption plan
ServiceModelOrderingAt-least-onceMax messageUse case
Service BusQueue / Topic-subFIFO (sessions)Yes256 KB (Standard), 100 MB (Premium)Enterprise messaging, commands, sagas
Event GridPush (reactive)NoYes (retry)1 MBReacting to Azure resource events; fan-out
Event HubsStreaming (pull)Per-partitionAt-least-once1 MB (Standard)Telemetry, logs, Kafka-compatible ingest
Storage QueueQueueBest-effortYes64 KBSimple decoupling, cost-sensitive
LOGIC APPS — Visual Workflow Orchestration TRIGGER HTTP request Service Bus msg Schedule / Event CONDITION if status == error → true / false TRUE ACTION A Send Teams alert + create Jira ticket FALSE ACTION B Write to Blob + update SQL record Logic Apps: 400+ built-in connectors · no-code / low-code · Consumption or Standard hosting

§ 09 — Q&ATwelve questions, max two sentences

Q 01 — When would you choose Azure over AWS or GCP?
Choose Azure when the org is already on Microsoft 365, Windows Server, SQL Server, or on-prem Active Directory — Entra ID federation, hybrid connectivity, and enterprise compliance (FedRAMP, ISO 27001, HIPAA BAA) make the integration story dramatically simpler. GCP wins on data/ML tooling; AWS wins on breadth of services.
Q 02 — What makes Azure's identity story unique?
Microsoft Entra ID is the only cloud directory that federates seamlessly with on-prem Active Directory, Microsoft 365, and 3,000+ SaaS apps via SAML/OIDC out of the box. Features like Conditional Access, PIM (just-in-time privilege elevation), and Managed Identity (no secrets in code) have no direct AWS equivalent.
Q 03 — What is Cosmos DB and when do you use it?
Cosmos DB is a globally distributed, multi-model database with five tunable consistency levels and guaranteed <10ms reads / <15ms writes at p99. Use it when you need multi-region active-active writes, a variable schema, or multiple API surfaces (SQL, MongoDB, Cassandra, Gremlin, Table) over the same data.
Q 04 — Azure VNet vs AWS VPC — key differences?
Both are isolated L3 networks, but Azure NSGs operate at both the subnet boundary AND the NIC — giving you two enforcement points inside a VNet without needing a separate security group per instance. Azure also lacks the concept of an Internet Gateway construct: routing is implicit once a public IP is attached.
Q 05 — What is Azure Arc?
Azure Arc extends Azure's control plane to any infrastructure — on-prem servers, Kubernetes clusters on other clouds, even edge devices — so you can apply Azure Policy, Defender for Cloud, and RBAC uniformly. It's the answer when a customer asks "how do I govern hybrid infrastructure as if it were all in Azure."
Q 06 — Service Bus vs Event Grid vs Event Hubs — how to choose?
Service Bus = reliable messaging with ordering (sessions), dead-letter, and transactions — use for commands between microservices. Event Grid = reactive fan-out to subscribers when Azure resources change (blob created, VM deleted) — push model, no ordering guarantee. Event Hubs = high-throughput time-series ingest with pull semantics — use for telemetry, logs, and Kafka-compatible streams.
Q 07 — What is Managed Identity and why is it better than service principals with passwords?
Managed Identity gives an Azure resource (VM, AKS pod, Function) an automatically rotated, Azure-managed identity in Entra ID — no password, no secret stored anywhere in code or config. The application calls the IMDS endpoint to get a short-lived JWT, then presents it to Key Vault or any Azure resource with RBAC; the credential never exists as a human-readable string.
Q 08 — How does Azure handle multi-region active-active databases?
Cosmos DB supports multi-region writes natively — each region accepts writes and conflicts are resolved by last-write-wins (timestamp) or a custom merge procedure. Azure SQL uses active geo-replication (async, readable secondaries) or failover groups for automatic promotion; active-active writes are not native to SQL and require application-level sharding.
Q 09 — What is ExpressRoute and when do you need it?
ExpressRoute is a dedicated private circuit from your on-prem network to Azure — not over the public internet — provisioned through a connectivity provider (Equinix, AT&T, etc.). Use it when you need <10ms latency to Azure, bandwidth above 1 Gbps, or regulatory requirements that prohibit data traversing the public internet (financial, healthcare, government).
Q 10 — What is Azure Policy vs RBAC?
RBAC controls who can do what (create, read, delete resources); Azure Policy controls what configuration is allowed — e.g., "all storage accounts must enforce HTTPS" or "VMs must be in approved regions." They complement each other: RBAC gives permission, Policy enforces guardrails regardless of who created the resource.
Q 11 — How does AKS differ from GKE?
GKE manages the control plane transparently at no cost and has the deepest Kubernetes integration (it invented much of it); AKS also offers a free control plane but historically had more operational burden for upgrades. The practical difference for enterprises is ecosystem: AKS integrates with Entra ID for RBAC, Azure Monitor, and Azure Policy out of the box — GKE integrates with Google Workspace and IAM.
Q 12 — What is Conditional Access in Entra ID?
Conditional Access is an if-then policy engine that evaluates signals — user identity, device compliance state, location, sign-in risk score — and enforces controls before granting access: allow, block, require MFA, or require managed device. It's the mechanism that implements zero-trust network access in Azure without a VPN.

§ 10 — COMMON MISTAKESWhat interviewers catch immediately

Three misconceptions appear in almost every Azure interview. Naming them — and correcting them — signals genuine hands-on experience.

❌ "Azure Active Directory = on-prem Active Directory" Azure AD (now Entra ID) is a cloud identity provider, not a domain controller in the sky. It uses OAuth2 / OIDC / SAML, not Kerberos / LDAP. Hybrid environments often run both, creating complex sync scenarios managed by Entra ID Connect. They share a name but are architecturally distinct.
❌ "Managed Identity and Service Principal are the same" A Service Principal is an application identity that requires a client secret or certificate you must rotate. Managed Identity is identity automatically provisioned by Azure for a compute resource — no credentials to manage, no rotation schedule. Always prefer Managed Identity when your workload runs inside Azure.
❌ "Cosmos DB is just a NoSQL database" Cosmos DB is a multi-model, multi-API globally distributed database with 5 consistency levels. The consistency level choice (Strong → Bounded Staleness → Session → Consistent Prefix → Eventual) directly impacts latency, throughput, and cost. Most teams default to Session consistency — but choosing without understanding the trade-offs is a red flag.

§ 11 — WHY NOT?Azure vs the alternatives

Every cloud has a home. Azure wins specific battles decisively — and loses others. Know when to recommend an alternative.

✓ Choose Azure When

  • ✓ Heavy Microsoft stack (SQL Server, .NET, Active Directory)
  • ✓ Microsoft 365 / Teams integration required
  • ✓ Hybrid cloud is a mandate (Azure Arc spans on-prem + cloud)
  • ✓ Enterprise compliance breadth (FedRAMP, HIPAA, ISO 27001, DoD)

✗ Consider Alternatives When

  • ✗ AWS — broadest service ecosystem and developer community
  • ✗ GCP — data/ML workloads (BigQuery, Vertex AI, TPUs)
  • ✗ Azure pricing can be complex (multiple meters per service)
  • ✗ UI/UX historically more complex than AWS console

§ 12 — ONE-MINUTE ANSWERThe question every interviewer asks

Interview Question
"Why do enterprises choose Azure over AWS?"
Model Answer
Azure wins enterprises primarily through integration with existing Microsoft investments. If a company runs Active Directory, SQL Server, Exchange, and Office 365 — which describes most Fortune 500 companies — Azure provides seamless identity federation via Entra ID, native SQL Server migration paths, and unified billing. Satya Nadella's 2014 pivot to "mobile-first, cloud-first" reoriented Azure from a .NET platform to a genuine multi-language cloud. The differentiator today is hybrid cloud: Azure Arc lets organizations manage on-premises infrastructure with the same Azure control plane, which matters enormously for regulated industries that can't move everything to public cloud.

§ 13 — INTERVIEWER'S MINDWhat they're really testing

Four signal areas separate candidates who read documentation from those who have operated Azure in production.

SIGNAL 01
Identity-first architecture
Can you explain the difference between Entra ID, Managed Identity, and Service Principal? When does each apply? Strong answer: MI for compute resources, SP for external/CI pipelines, Entra ID as the identity plane for everything.
SIGNAL 02
Cosmos DB trade-offs
Do you know the 5 consistency levels? When would you choose Strong vs Eventual? What's the RU (Request Unit) model? Strong answer: RUs are pre-provisioned throughput; consistency is a per-request or account-level setting with direct cost impact.
SIGNAL 03
Hybrid cloud
What is Azure Arc? Why do enterprises care about hybrid cloud more than pure cloud-native companies? Strong answer: regulated industries can't always migrate everything; Arc brings Azure Policy and Defender governance to on-prem Kubernetes clusters.
SIGNAL 04
Networking
Can you design a hub-and-spoke VNet topology? When does ExpressRoute beat VPN Gateway? Strong answer: ExpressRoute for <10ms latency, >1 Gbps, or regulatory requirements prohibiting public internet traversal; VPN as failover.

§ 14 — THE EVOLUTION50 years to cloud dominance

Azure's enterprise moat was built over decades of enterprise software incumbency. Each milestone compounded the next.

1975 Microsoft founded Gates + Allen · BASIC interpreter for Altair 8800 1993 Windows NT / Active Directory enterprise OS + directory services 2001 SQL Server dominant enterprise database market; .NET launched 2008 Azure announced PDC 2008 · initially "Windows Azure" 2010 Azure GA (Windows Azure) primarily .NET / Windows workloads 2014 Nadella pivot · Linux support "mobile-first, cloud-first" · open source 2015 Azure Stack (hybrid) Azure control plane for on-prem hardware 2017 AI / ML services Cognitive Services · Azure ML · Bot Framework 2019 Azure Arc extend Azure governance to any cloud or on-prem Kubernetes 2020 $48B cloud revenue solidly #2 cloud, enterprise dominance 2023 Azure OpenAI Service GPT-4 · Copilot · M365 AI integration

§ 15 — WHAT'S NEXT?The problems Azure hasn't solved yet

Each Azure inflection point solved one hard problem and revealed the next. The trajectory points toward AI governance at enterprise scale.


§ 16 — SUMMARYWeak vs strong answer

Topic Weak answer Strong answer
Why AzureIt's like AWS but from MicrosoftEnterprise AD integration, hybrid connectivity (ExpressRoute + Arc), M365 co-licensing, and the deepest compliance portfolio (HIPAA, FedRAMP High, ISO 27001) in the market
IdentityCreate a service principalUse Managed Identity (no credentials), assign RBAC at the minimum required scope, layer Conditional Access for human sign-ins
NetworkingOne big VNet for everythingHub-and-spoke: shared services hub (firewall, bastion, ER gateway), isolated spoke per workload, NSG at subnet AND NIC, Private Endpoints for all data services
Cosmos DBIt's a NoSQL databaseMulti-model (SQL/Mongo/Cassandra/Gremlin), 5 consistency levels, multi-region write with conflict resolution, <10ms p99 reads globally — use when distribution or schema flexibility is the requirement
HybridSet up a VPNExpressRoute (private circuit, <10ms, >1Gbps) as primary + VPN as failover; Azure Arc for extending governance to on-prem Kubernetes; hybrid DNS with Private DNS Zones
MessagingUse a queueService Bus for ordered commands, Event Grid for reactive Azure resource events, Event Hubs for high-throughput telemetry streams — each has a different delivery model
Serverless cold startThat's just how it worksPremium plan pre-warms instances; Durable Functions for stateful orchestration; Consumption plan only for bursty, latency-tolerant workloads
← paddyspeaks.com