TL;DR
- SQL (PostgreSQL, MySQL): ACID, relationships, complex queries. Default choice.
- NoSQL (MongoDB, DynamoDB): Flexible schema, horizontal scale, specific access patterns.
- Redis: In-memory cache, pub/sub, rate limiting, session store.
- Kafka: Distributed event streaming, high throughput, event sourcing.
Step 1: SQL vs NoSQL Decision
This is the most frequently asked database question in system design interviews, and the answer is never "one is better." SQL databases (PostgreSQL, MySQL) were built for data integrity — ACID transactions, foreign keys, and the ability to query data in ways you didn't anticipate at design time. NoSQL (MongoDB, DynamoDB) was invented when companies like Google and Facebook hit scale limits that SQL couldn't handle with single-node architectures, trading query flexibility for horizontal scalability. The right answer depends on your access patterns, consistency requirements, and scale.
| Factor | SQL (PostgreSQL) | NoSQL (MongoDB/DynamoDB) |
|---|---|---|
| Schema | Fixed, migrations needed | Flexible, schema-less |
| Relationships | JOINs, foreign keys | Embedded/denormalized |
| Scaling | Vertical + read replicas | Horizontal (sharding) |
| Transactions | Multi-table ACID | Limited (single document) |
| Query flexibility | Any query on any column | Optimized for specific patterns |
| Best for | Complex queries, transactions | High write throughput, flexible data |
Decision Tree
Need complex JOINs and transactions? → SQL (PostgreSQL)
Need flexible schema that changes often? → MongoDB
Need key-value with sub-millisecond latency? → Redis/DynamoDB
Need time-series data? → TimescaleDB / InfluxDB
Need full-text search? → Elasticsearch
Need graph relationships? → Neo4j
Step 2: Database Design Principles
Normalization was invented in the 1970s by Edgar Codd to eliminate data redundancy and update anomalies — if a user's email exists in one place, you can't have conflicting values. But normalization optimizes for writes at the cost of read performance (JOINs across tables). Denormalization reverses this tradeoff for read-heavy systems: duplicate data so reads hit one table instead of joining five. Most production systems use a hybrid: normalized for writes, denormalized views/caches for reads. Understanding when to normalize vs denormalize is a core system design skill.
Normalization (SQL)
-- Normalized (3NF) — no data duplication
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(255) UNIQUE
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id),
total DECIMAL(10,2),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT,
price DECIMAL(10,2)
);
Denormalization (for read performance)
-- Denormalized — faster reads, redundant data
CREATE TABLE order_summary (
id SERIAL PRIMARY KEY,
user_id INT,
user_name VARCHAR(100), -- Duplicated from users table
user_email VARCHAR(255), -- Duplicated from users table
total DECIMAL(10,2),
item_count INT, -- Pre-computed
created_at TIMESTAMP
);
-- One query instead of JOINing 3 tables
-- Trade-off: updates must change multiple places
Indexing Strategy
-- Index columns you WHERE, JOIN, or ORDER BY
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);
-- Composite index (order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);
-- Supports: WHERE user_id = ? AND created_at > ?
-- Also supports: WHERE user_id = ? (leftmost prefix)
-- Does NOT support: WHERE created_at > ? (skips first column)
-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE active = true;
Step 3: Redis — In-Memory Data Store
Redis was created in 2009 because Salvatore Sanfilippo needed a faster way to track real-time analytics than hitting a database for every page view. It stores everything in RAM, giving sub-millisecond latency for reads and writes. It's now the world's most popular cache and in-memory data store, used by virtually every production system at scale. Beyond simple caching, its data structures (sorted sets for leaderboards, lists for queues, pub/sub for real-time) make it a Swiss Army knife for performance-critical features.
Common Use Cases
1. CACHING — Cache DB queries, API responses
2. SESSIONS — Store user sessions (fast lookup)
3. RATE LIMITING — Token bucket per user/IP
4. QUEUES — Simple job queues (lists)
5. PUB/SUB — Real-time messaging
6. LEADERBOARDS — Sorted sets for rankings
7. COUNTERS — Atomic increment/decrement
Data Structures
# Strings — simple key-value
SET user:1:name "Alice"
GET user:1:name
SETEX session:abc 3600 '{"userId": 1}' # Expires in 1 hour
# Hashes — object fields
HSET user:1 name "Alice" email "alice@test.com" age 30
HGET user:1 name
HGETALL user:1
# Lists — queues/stacks
LPUSH queue:jobs '{"type": "email", "to": "bob@test.com"}'
RPOP queue:jobs # Process oldest first (FIFO)
# Sets — unique collections
SADD online:users "user:1" "user:2" "user:3"
SISMEMBER online:users "user:1" # true
SCARD online:users # count: 3
# Sorted Sets — leaderboards
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2100 "player:bob"
ZREVRANGE leaderboard 0 9 # Top 10 players
# Pub/Sub — real-time
PUBLISH chat:room1 '{"user": "Alice", "msg": "Hello!"}'
SUBSCRIBE chat:room1
Step 4: Message Queues — Kafka & RabbitMQ
Message queues were invented to solve the fundamental problem of distributed systems: what happens when the service you're calling is slow, down, or overwhelmed? Without a queue, requests are lost. With a queue, messages persist until the consumer is ready — providing durability, load leveling, and decoupling. Kafka was built at LinkedIn to handle trillions of events/day when traditional message brokers couldn't keep up. RabbitMQ excels at task distribution and RPC patterns. Choosing between them is a common interview question.
When to Use a Queue
Without queue:
Service A → Service B (direct call)
If B is down → A fails ❌
If B is slow → A waits ❌
If spike → B overwhelmed ❌
With queue:
Service A → [Queue] → Service B
If B is down → messages wait in queue ✅
If B is slow → processes at own pace ✅
If spike → queue absorbs burst ✅
Kafka vs RabbitMQ
| Kafka | RabbitMQ | |
|---|---|---|
| Model | Distributed log (append-only) | Message broker (queue) |
| Throughput | Millions/sec | Thousands/sec |
| Retention | Configurable (days/forever) | Until consumed |
| Consumer model | Pull (consumers read at own pace) | Push (broker sends to consumers) |
| Ordering | Per-partition guaranteed | Per-queue |
| Use case | Event streaming, logs, ETL | Task queues, RPC, notifications |
| Replay | Yes (re-read old messages) | No (consumed = gone) |
Kafka Architecture
Producer → Topic → Partition 0 → Consumer Group A (Consumer 1)
Partition 1 → Consumer Group A (Consumer 2)
Partition 2 → Consumer Group A (Consumer 3)
└──→ Consumer Group B (all partitions)
Key concepts:
- Topics: Named streams of events
- Partitions: Parallelism unit (ordering within partition)
- Consumer Groups: Each group gets each message once
- Offsets: Position in the log (consumers track where they are)
Step 5: Distributed Systems Patterns
These patterns (Saga, idempotency, service discovery) exist because distributed transactions are impossible to do correctly with traditional ACID across services. The Saga pattern was borrowed from database theory (1987) to coordinate multi-service operations through a sequence of local transactions with compensating actions for rollback. Idempotency keys prevent duplicate processing from network retries. Service discovery enables dynamic routing as services scale up/down. These are the building blocks that make microservices actually work in production.
Saga Pattern (Distributed Transactions)
Order Service:
1. Create order (PENDING)
2. → Payment Service: charge card
✅ Success → 3. Inventory: reserve items
✅ Success → 4. Confirm order (COMPLETE)
❌ Failure → Compensation: refund payment
❌ Failure → Compensation: cancel order
Each step has a compensating action for rollback
Idempotency
// Problem: Network retry sends same request twice
// Solution: Idempotency key
async function processPayment(request: PaymentRequest) {
const idempotencyKey = request.headers['Idempotency-Key'];
// Check if already processed
const existing = await redis.get(`payment:${idempotencyKey}`);
if (existing) return JSON.parse(existing); // Return cached result
// Process payment
const result = await stripe.charge(request.body);
// Cache result with TTL
await redis.setex(`payment:${idempotencyKey}`, 86400, JSON.stringify(result));
return result;
}
Service Discovery
┌─────────────────────────────────────────┐
│ Service Registry (Consul/etcd) │
│ ┌─────────────────────────────────────┐ │
│ │ user-service: [10.0.1.1, 10.0.1.2] │ │
│ │ order-service: [10.0.2.1] │ │
│ │ payment-service: [10.0.3.1, .3.2] │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────┘
1. Service starts → registers itself
2. Service needs another → queries registry
3. Service stops → deregisters (or health check removes it)
Step 6: Real-World Architecture Examples
E-Commerce (10M Users)
CDN → Load Balancer → API Gateway
│
┌────────────────┼────────────────┐
↓ ↓ ↓
Product Service Order Service User Service
(read-heavy) (write-heavy) (auth)
│ │ │
Redis Cache Kafka Queue PostgreSQL
│ │
PostgreSQL Payment Svc
(read replicas) Inventory Svc
Notification Svc
Chat Application
Client → WebSocket Server (sticky sessions)
│
Redis Pub/Sub (fan-out messages to all WS servers)
│
Kafka (persistent message storage)
│
Cassandra (message history, partition by channel_id)
Interview Questions
-
SQL or NoSQL for an e-commerce platform?
- SQL (PostgreSQL) for orders, payments, inventory (ACID transactions critical). Redis for sessions, caching. Elasticsearch for product search. Mix databases for different needs.
-
When would you use Kafka over RabbitMQ?
- Kafka: event streaming, need message replay, very high throughput, multiple consumer groups reading same data. RabbitMQ: task queues, request-reply patterns, simpler setup, messages consumed once.
-
How does database indexing work?
- B-tree structure that allows O(log n) lookups instead of O(n) full table scan. Trade-off: faster reads, slower writes (index must be updated). Only index columns you frequently query.
-
Explain the Saga pattern.
- Distributed transaction as a sequence of local transactions. Each service does its work and publishes an event. If any step fails, compensating transactions undo previous steps. Eventual consistency instead of ACID across services.