Est. 2026Philosophy · Technology · WisdomLinkedIn ↗

PaddySpeaks

Where ancient wisdom meets the architecture of tomorrow

← All Articles
technology

The Information System Collapse - Part 3: The Architecture That Survives

Part 1 exposed the delay problem: Your traditional ERP→ETL→Warehouse→BI stack takes 3 days to answer simple questions.

The Information System Collapse - Part 3: The Architecture That Survives

From Collapse to Construction: Building Event-Driven Systems That Actually Work


Where We've Been

Part 1 exposed the delay problem: Your traditional ERP→ETL→Warehouse→BI stack takes 3 days to answer simple questions. By the time you get the answer, the opportunity is gone.

Part 2 revealed the source: ERP's two ancestral curses (Customization Chaos + Thousand-Table Labyrinth) force data teams to excavate instead of analyze.

Now comes the hard question: What do we build instead?


The Honest Starting Point

Let's be clear about what we're NOT doing:

  • ❌ We're not "eliminating data warehouses" (too evangelical)

  • ❌ We're not claiming "AI solves everything" (too magical)

  • ❌ We're not proposing a rip-and-replace (too risky)

What we ARE doing:

  • ✅ Building a parallel system that handles 80% of business questions faster

  • ✅ Using proven technologies (Kafka, PostgreSQL, LLMs) in a new way

  • ✅ Starting with one use case, proving value, then expanding

  • ✅ Keeping your existing infrastructure running while we transition

This is engineering, not revolution.


The Core Insight: Events, Not States

The fundamental problem with ERP→Warehouse architecture: It stores STATES (final values) instead of EVENTS (what actually happened).

When your ERP records a sale, it stores:

  • Order ID: 12345

  • Customer ID: CUST_789

  • Amount: $500

  • Date: 2025-01-15

What it DOESN'T store:

  • Customer browsed 7 products before buying

  • Added item to cart, removed it, added it back

  • Hesitated at checkout for 3 minutes

  • Applied a discount code

  • Checkout page loaded slowly (4 seconds)

All that context is LOST.

And that context is exactly what you need to answer "why did this happen?"


The New Architecture: Event-Driven Intelligence

Here's what we're building:

Let's break down each layer with actual technical details, not buzzwords.


Layer 1: Event Capture (The Foundation)

What Is An Event?

An event is an immutable record of something that happened, with full context.

Example Event Schema:

Key Properties of Events

  1. Immutable Once written, never changed. To correct an error, you write a new event (like accounting).

  2. Self-Contained Every event has all the context it needs. No need to join 12 tables to understand it.

  3. Causally Linked Every event knows what caused it and what it might trigger.

  4. Semantically Indexed Each event has a vector embedding that captures its meaning, enabling semantic search.


Layer 2: Semantic Event Fabric (The Smart Storage)

This is where we store and index events in a way that makes them queryable by meaning, not just by field names.

Technical Components

Component 1: Event Store (PostgreSQL with TimescaleDB)

Why PostgreSQL?

  • Battle-tested reliability

  • Native JSON support for flexible event payloads

  • TimescaleDB extension for time-series optimization

  • pgvector extension for semantic search

  • ACID guarantees (unlike some NoSQL solutions)

Schema:

Component 2: Event Bus (Apache Kafka or Redpanda)

Why Kafka/Redpanda?

  • Handles millions of events per second

  • Guaranteed ordering within partitions

  • Replay capability (reprocess events if needed)

  • Multiple consumers can read same stream

Topic Structure:

Component 3: Stream Processors (Kafka Streams or Flink)

Real-time enrichment and aggregation:


Layer 3: Domain Reasoning Nodes (The Intelligence)

These are specialized query engines for specific business domains. Not generic AI—focused, deterministic processors with LLM-enhanced reasoning.

Anatomy of a Domain Reasoning Node

Example: Customer Behavior Node

Why This Architecture Works

90% of queries are deterministic (fast SQL):

  • "Show me conversion funnel for last week"

  • "What's the cart abandonment rate by device?"

  • "How many orders today?"

10% of queries need synthesis (SQL + LLM):

  • "Why did mobile conversion drop yesterday?"

  • "Which customer segments are at risk?"

  • "What happened before high-value users churned?"

The LLM is only used for:

  1. Understanding intent (parsing natural language)

  2. Finding semantic patterns (when causality needed)

  3. Synthesizing explanations (turning data into narrative)

The LLM never does math. All calculations are SQL.


Layer 4: Conversational Query Interface

Users don't write SQL. They ask questions.

How It Works

Input: Natural language query Output: Answer with evidence and confidence

Example Interaction:

Technical Implementation


Real-World Example: E-commerce Flow

Let's trace a complete customer journey through the system.

Events Generated

Query Examples

Query 1: Simple Aggregation (Deterministic)

Query 2: Funnel Analysis (Deterministic)

Query 3: Diagnostic (Hybrid - SQL + Vector Search + LLM)


Infrastructure Requirements (Real Numbers)

For a mid-sized e-commerce company (1M visitors/month, 50K orders/month):

Event Volume

  • ~50M events/month

  • ~60 events/second average

  • ~300 events/second peak

Infrastructure

Event Store (PostgreSQL + TimescaleDB):

  • 3-node cluster (primary + 2 replicas)

  • 16 vCPU, 64GB RAM per node

  • 2TB SSD storage (with compression)

  • Cost: ~$2,000/month (AWS RDS equivalent)

Event Bus (Kafka or Redpanda):

  • 3-node cluster

  • 8 vCPU, 32GB RAM per node

  • 1TB SSD per node

  • Cost: ~$1,500/month

Domain Reasoning Nodes:

  • 4 nodes (customer, revenue, inventory, product)

  • 8 vCPU, 32GB RAM per node

  • Cost: ~$1,200/month

LLM API (Claude Sonnet):

  • ~10K queries/month

  • ~80% deterministic (no LLM needed)

  • ~2K queries need LLM synthesis

  • Cost: ~$500/month

Vector Embeddings (OpenAI):

  • Batch process 50M events/month

  • Cost: ~$400/month

Total Infrastructure: ~$5,600/month

Compare to Traditional Stack

Traditional (Snowflake + dbt + Fivetran + Looker):

  • Snowflake: $3,000/month

  • Fivetran: $1,500/month

  • dbt Cloud: $500/month

  • Looker: $1,000/month

  • Data team time: 60% on maintenance

  • Total: $6,000/month + massive time waste

Event-driven stack: $5,600/month + 10% time on maintenance


Migration Strategy: 90-Day Proof of Concept

Week 1-2: Event Capture POC

Goal: Prove we can capture events from existing systems

Tasks:

  1. Set up Kafka cluster

  2. Write event producers for 1 source system (e.g., e-commerce platform)

  3. Capture 1 event type (e.g., order_completed)

  4. Store in PostgreSQL

Success metric: 100K events captured and stored

Week 3-4: Build First Domain Node

Goal: Answer 1 business question faster than current system

Tasks:

  1. Build Revenue Node

  2. Implement deterministic queries (simple aggregations)

  3. Compare results with existing warehouse

Success metric: "Daily revenue by product" query: 0.5s (vs 3 minutes in warehouse)

Week 5-6: Add Vector Search + Semantic Layer

Goal: Enable semantic queries

Tasks:

  1. Generate embeddings for all events

  2. Implement vector similarity search

  3. Test semantic queries

Success metric: "Find orders similar to this one" works accurately

Week 7-8: Conversational Interface

Goal: Natural language queries work

Tasks:

  1. Integrate Claude API

  2. Build intent parser

  3. Implement answer synthesis

  4. Test with 20 real business questions

Success metric: 90%+ accuracy on test questions

Week 9-10: Hybrid Query Execution

Goal: Handle complex "why" questions

Tasks:

  1. Implement causal chain analysis

  2. Build cross-domain query capability

  3. Test diagnostic queries

Success metric: "Why did conversion drop?" answered in <10 seconds with causality

Week 11-12: Pilot with Real Users

Goal: Prove business value

Tasks:

  1. Give access to 5 business users

  2. Track: queries asked, time saved, satisfaction

  3. Compare accuracy vs traditional dashboards

Success metrics:

  • 50+ queries asked

  • 80%+ user satisfaction

  • 100x faster than traditional method

  • Demonstrate ROI


What Makes This Different

Not Data Warehouse 2.0

This isn't "a faster warehouse." It's a different paradigm:

  • Old: Store aggregated states, visualize, human interprets

  • New: Store events, synthesize answers, explain causality

Not "AI Does Everything"

  • 90% is deterministic SQL (fast, accurate, cheap) 10% uses LLM (for understanding intent and synthesis)

  • The intelligence is in the architecture, not just the AI.

Not Rip-and-Replace

Your existing warehouse keeps running. New system handles new use cases. Migrate gradually as you prove value.

Not Vendor Lock-In

Built on open technologies:

  • PostgreSQL (open source)

  • Kafka (open source)

  • DuckDB (open source)

  • Claude API (anthropic.com, but swappable)

You own the infrastructure.


The Path Forward

  • Months 1-3: POC (prove it works)

  • Months 4-6: Pilot (10-20 users, 20% of queries)

  • Months 7-12: Scale (50% of queries)

  • Year 2: Primary system (80% of queries)

  • Year 3: Warehouse becomes archive only

You don't rip out the old system. You build the new one alongside it. You prove value before betting the company.

Share