Database Design & Messaging — System Design Handbook

TL;DR

SQL (PostgreSQL, MySQL): ACID, relationships, complex queries. Default choice.
NoSQL (MongoDB, DynamoDB): Flexible schema, horizontal scale, specific access patterns.
Redis: In-memory cache, pub/sub, rate limiting, session store.
Kafka: Distributed event streaming, high throughput, event sourcing.

Step 1: SQL vs NoSQL Decision

This is the most frequently asked database question in system design interviews, and the answer is never "one is better." SQL databases (PostgreSQL, MySQL) were built for data integrity — ACID transactions, foreign keys, and the ability to query data in ways you didn't anticipate at design time. NoSQL (MongoDB, DynamoDB) was invented when companies like Google and Facebook hit scale limits that SQL couldn't handle with single-node architectures, trading query flexibility for horizontal scalability. The right answer depends on your access patterns, consistency requirements, and scale.

Factor	SQL (PostgreSQL)	NoSQL (MongoDB/DynamoDB)
Schema	Fixed, migrations needed	Flexible, schema-less
Relationships	JOINs, foreign keys	Embedded/denormalized
Scaling	Vertical + read replicas	Horizontal (sharding)
Transactions	Multi-table ACID	Limited (single document)
Query flexibility	Any query on any column	Optimized for specific patterns
Best for	Complex queries, transactions	High write throughput, flexible data

Decision Tree

Need complex JOINs and transactions? → SQL (PostgreSQL)
Need flexible schema that changes often? → MongoDB
Need key-value with sub-millisecond latency? → Redis/DynamoDB
Need time-series data? → TimescaleDB / InfluxDB
Need full-text search? → Elasticsearch
Need graph relationships? → Neo4j

Step 2: Database Design Principles

Normalization was invented in the 1970s by Edgar Codd to eliminate data redundancy and update anomalies — if a user's email exists in one place, you can't have conflicting values. But normalization optimizes for writes at the cost of read performance (JOINs across tables). Denormalization reverses this tradeoff for read-heavy systems: duplicate data so reads hit one table instead of joining five. Most production systems use a hybrid: normalized for writes, denormalized views/caches for reads. Understanding when to normalize vs denormalize is a core system design skill.

Normalization (SQL)

-- Normalized (3NF) — no data duplication
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255) UNIQUE
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INT REFERENCES users(id),
    total DECIMAL(10,2),
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(id),
    product_id INT REFERENCES products(id),
    quantity INT,
    price DECIMAL(10,2)
);

Denormalization (for read performance)

-- Denormalized — faster reads, redundant data
CREATE TABLE order_summary (
    id SERIAL PRIMARY KEY,
    user_id INT,
    user_name VARCHAR(100),      -- Duplicated from users table
    user_email VARCHAR(255),     -- Duplicated from users table
    total DECIMAL(10,2),
    item_count INT,              -- Pre-computed
    created_at TIMESTAMP
);
-- One query instead of JOINing 3 tables
-- Trade-off: updates must change multiple places

Indexing Strategy

-- Index columns you WHERE, JOIN, or ORDER BY
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);

-- Composite index (order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);
-- Supports: WHERE user_id = ? AND created_at > ?
-- Also supports: WHERE user_id = ? (leftmost prefix)
-- Does NOT support: WHERE created_at > ? (skips first column)

-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE active = true;

Step 3: Redis — In-Memory Data Store

Redis was created in 2009 because Salvatore Sanfilippo needed a faster way to track real-time analytics than hitting a database for every page view. It stores everything in RAM, giving sub-millisecond latency for reads and writes. It's now the world's most popular cache and in-memory data store, used by virtually every production system at scale. Beyond simple caching, its data structures (sorted sets for leaderboards, lists for queues, pub/sub for real-time) make it a Swiss Army knife for performance-critical features.

Common Use Cases

1. CACHING       — Cache DB queries, API responses
2. SESSIONS      — Store user sessions (fast lookup)
3. RATE LIMITING — Token bucket per user/IP
4. QUEUES        — Simple job queues (lists)
5. PUB/SUB       — Real-time messaging
6. LEADERBOARDS  — Sorted sets for rankings
7. COUNTERS      — Atomic increment/decrement

Data Structures

# Strings — simple key-value
SET user:1:name "Alice"
GET user:1:name
SETEX session:abc 3600 '{"userId": 1}'  # Expires in 1 hour

# Hashes — object fields
HSET user:1 name "Alice" email "alice@test.com" age 30
HGET user:1 name
HGETALL user:1

# Lists — queues/stacks
LPUSH queue:jobs '{"type": "email", "to": "bob@test.com"}'
RPOP queue:jobs  # Process oldest first (FIFO)

# Sets — unique collections
SADD online:users "user:1" "user:2" "user:3"
SISMEMBER online:users "user:1"  # true
SCARD online:users  # count: 3

# Sorted Sets — leaderboards
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2100 "player:bob"
ZREVRANGE leaderboard 0 9  # Top 10 players

# Pub/Sub — real-time
PUBLISH chat:room1 '{"user": "Alice", "msg": "Hello!"}'
SUBSCRIBE chat:room1

Step 4: Message Queues — Kafka & RabbitMQ

Message queues were invented to solve the fundamental problem of distributed systems: what happens when the service you're calling is slow, down, or overwhelmed? Without a queue, requests are lost. With a queue, messages persist until the consumer is ready — providing durability, load leveling, and decoupling. Kafka was built at LinkedIn to handle trillions of events/day when traditional message brokers couldn't keep up. RabbitMQ excels at task distribution and RPC patterns. Choosing between them is a common interview question.

When to Use a Queue

Without queue:
  Service A → Service B (direct call)
  If B is down → A fails ❌
  If B is slow → A waits ❌
  If spike → B overwhelmed ❌

With queue:
  Service A → [Queue] → Service B
  If B is down → messages wait in queue ✅
  If B is slow → processes at own pace ✅
  If spike → queue absorbs burst ✅

Kafka vs RabbitMQ

	Kafka	RabbitMQ
Model	Distributed log (append-only)	Message broker (queue)
Throughput	Millions/sec	Thousands/sec
Retention	Configurable (days/forever)	Until consumed
Consumer model	Pull (consumers read at own pace)	Push (broker sends to consumers)
Ordering	Per-partition guaranteed	Per-queue
Use case	Event streaming, logs, ETL	Task queues, RPC, notifications
Replay	Yes (re-read old messages)	No (consumed = gone)

Kafka Architecture

Producer → Topic → Partition 0 → Consumer Group A (Consumer 1)
                   Partition 1 → Consumer Group A (Consumer 2)
                   Partition 2 → Consumer Group A (Consumer 3)
                            └──→ Consumer Group B (all partitions)

Key concepts:
- Topics: Named streams of events
- Partitions: Parallelism unit (ordering within partition)
- Consumer Groups: Each group gets each message once
- Offsets: Position in the log (consumers track where they are)

Step 5: Distributed Systems Patterns

These patterns (Saga, idempotency, service discovery) exist because distributed transactions are impossible to do correctly with traditional ACID across services. The Saga pattern was borrowed from database theory (1987) to coordinate multi-service operations through a sequence of local transactions with compensating actions for rollback. Idempotency keys prevent duplicate processing from network retries. Service discovery enables dynamic routing as services scale up/down. These are the building blocks that make microservices actually work in production.

Saga Pattern (Distributed Transactions)

Order Service:
1. Create order (PENDING)
2. → Payment Service: charge card
   ✅ Success → 3. Inventory: reserve items
      ✅ Success → 4. Confirm order (COMPLETE)
      ❌ Failure → Compensation: refund payment
   ❌ Failure → Compensation: cancel order

Each step has a compensating action for rollback

Idempotency

// Problem: Network retry sends same request twice
// Solution: Idempotency key

async function processPayment(request: PaymentRequest) {
  const idempotencyKey = request.headers['Idempotency-Key'];

  // Check if already processed
  const existing = await redis.get(`payment:${idempotencyKey}`);
  if (existing) return JSON.parse(existing); // Return cached result

  // Process payment
  const result = await stripe.charge(request.body);

  // Cache result with TTL
  await redis.setex(`payment:${idempotencyKey}`, 86400, JSON.stringify(result));

  return result;
}

Service Discovery

┌─────────────────────────────────────────┐
│         Service Registry (Consul/etcd)   │
│  ┌─────────────────────────────────────┐ │
│  │ user-service: [10.0.1.1, 10.0.1.2] │ │
│  │ order-service: [10.0.2.1]          │ │
│  │ payment-service: [10.0.3.1, .3.2]  │ │
│  └─────────────────────────────────────┘ │
└─────────────────────────────────────────┘

1. Service starts → registers itself
2. Service needs another → queries registry
3. Service stops → deregisters (or health check removes it)

Step 6: Real-World Architecture Examples

E-Commerce (10M Users)

CDN → Load Balancer → API Gateway
                          │
         ┌────────────────┼────────────────┐
         ↓                ↓                ↓
   Product Service   Order Service   User Service
   (read-heavy)      (write-heavy)   (auth)
         │                │                │
    Redis Cache      Kafka Queue      PostgreSQL
         │                │
    PostgreSQL       Payment Svc
    (read replicas)  Inventory Svc
                     Notification Svc

Chat Application

Client → WebSocket Server (sticky sessions)
              │
         Redis Pub/Sub (fan-out messages to all WS servers)
              │
         Kafka (persistent message storage)
              │
         Cassandra (message history, partition by channel_id)

Interview Questions

SQL or NoSQL for an e-commerce platform?
- SQL (PostgreSQL) for orders, payments, inventory (ACID transactions critical). Redis for sessions, caching. Elasticsearch for product search. Mix databases for different needs.
When would you use Kafka over RabbitMQ?
- Kafka: event streaming, need message replay, very high throughput, multiple consumer groups reading same data. RabbitMQ: task queues, request-reply patterns, simpler setup, messages consumed once.
How does database indexing work?
- B-tree structure that allows O(log n) lookups instead of O(n) full table scan. Trade-off: faster reads, slower writes (index must be updated). Only index columns you frequently query.
Explain the Saga pattern.
- Distributed transaction as a sequence of local transactions. Each service does its work and publishes an event. If any step fails, compensating transactions undo previous steps. Eventual consistency instead of ACID across services.