TL;DR
- Microservices: Independent services, each owning its data. Communication via REST/gRPC/events.
- Event-driven: Async communication via queues (Kafka, RabbitMQ). Decouples producers from consumers.
- API Gateway: Single entry point, handles auth, rate limiting, routing.
- Circuit Breaker: Prevent cascading failures in distributed systems.
Step 1: Monolith vs Microservices
This is the first and most consequential architecture decision for any system. Microservices were invented at companies like Netflix, Amazon, and Spotify because their monoliths couldn't scale — not in terms of traffic (that's horizontal scaling), but in terms of developer productivity. When 500 engineers deploy from one codebase, merge conflicts and deployment coordination become the bottleneck. Microservices let teams own, deploy, and scale their services independently. But for small teams (<10 devs), a monolith is almost always the right starting point.
Monolith
┌─────────────────────────────────────┐
│ Single Application │
│ ┌─────┐ ┌──────┐ ┌───────┐ │
│ │Users│ │Orders│ │Payment│ ... │
│ └──┬──┘ └──┬───┘ └───┬───┘ │
│ └────────┴─────────┘ │
│ Shared Database │
└─────────────────────────────────────┘
Microservices
┌──────┐ ┌───────┐ ┌────────┐ ┌──────────┐
│ User │ │ Order │ │Payment │ │Inventory │
│ Svc │ │ Svc │ │ Svc │ │ Svc │
└──┬───┘ └──┬────┘ └──┬─────┘ └────┬─────┘
│ │ │ │
┌──┴──┐ ┌───┴──┐ ┌────┴───┐ ┌─────┴────┐
│DB-1 │ │DB-2 │ │ DB-3 │ │ DB-4 │
└─────┘ └──────┘ └────────┘ └──────────┘
Each service owns its own data ✅
Decision Framework
| Start with Monolith when | Use Microservices when |
|---|---|
| Small team (<10 devs) | Large org, multiple teams |
| New product, unclear boundaries | Clear domain boundaries |
| Simple deployment needs | Independent scaling needed |
| Speed of development matters | Different tech per service OK |
| <100K users | >1M users with varied workloads |
Step 2: Communication Patterns
Once you split into services, they need to talk to each other — and how they communicate determines your system's reliability, latency, and complexity. Synchronous (REST/gRPC) is simple but creates coupling: if Service B is slow, Service A is slow. Asynchronous messaging (Kafka, RabbitMQ) decouples services but introduces eventual consistency and debugging complexity. In system design interviews, choosing the right communication pattern for each interaction and justifying why is what separates senior from junior answers.
Synchronous (REST/gRPC)
Order Service → [HTTP/gRPC] → Payment Service
↓
Process payment
↓
← [Response] ← Return result
Problem: If Payment is down, Order is blocked
Asynchronous (Event-Driven)
Order Service → [Event: OrderCreated] → Message Queue
↓
Payment Service picks up
Inventory Service picks up
Notification Service picks up
(All independently, at their own pace)
When to Use Which
| Pattern | Use When | Trade-off |
|---|---|---|
| REST | Simple CRUD, need immediate response | Tight coupling, cascading failures |
| gRPC | High-performance, service-to-service | More complex setup, binary protocol |
| Events (async) | Don't need immediate response | Eventually consistent, harder to debug |
| Saga | Multi-service transactions | Complex compensation logic |
Step 3: API Gateway Pattern
The API Gateway was born from a practical problem: clients shouldn't need to know about your internal service topology. Without a gateway, a mobile app would need to call 8 different services directly, handle authentication with each, and know their addresses. The gateway provides a single entry point that handles cross-cutting concerns (auth, rate limiting, logging, SSL) and can aggregate responses from multiple services into one client-friendly payload. Every major system (Netflix Zuul, Kong, AWS API Gateway) uses this pattern.
┌──────────────────────────┐
Mobile App ──────→ │ │ ──→ User Service
Web App ─────────→ │ API Gateway │ ──→ Order Service
Third Party ─────→ │ │ ──→ Product Service
└──────────────────────────┘
Responsibilities:
• Authentication/Authorization
• Rate limiting
• Request routing
• Response aggregation
• SSL termination
• Logging/monitoring
• Request/response transformation
Backend for Frontend (BFF)
Mobile App ──→ Mobile BFF ──→ Services
Web App ──→ Web BFF ──→ Services
Admin Panel ──→ Admin BFF ──→ Services
Each BFF tailors responses for its client type
(mobile gets less data, web gets richer payloads)
Step 4: Circuit Breaker Pattern
The circuit breaker pattern was adopted from electrical engineering: just as a circuit breaker trips to prevent a house fire from a short circuit, this pattern "trips" to prevent cascading failures across services. Without it, when Service B goes down, Service A keeps sending requests, exhausting its thread pool waiting for timeouts — then A goes down, then everything that depends on A goes down. The circuit breaker fails fast after detecting B is unhealthy, giving B time to recover and keeping A responsive with graceful degradation.
Prevents cascading failures when a service is down:
States:
┌────────┐ failures > threshold ┌──────┐ timeout ┌───────────┐
│ CLOSED │ ─────────────────────→ │ OPEN │ ────────→ │ HALF-OPEN │
│(normal)│ ←─────────────────────── │(fail)│ ←──────── │ (testing) │
└────────┘ success resets count └──────┘ failure └───────────┘
↑ success │
└────────────┘
class CircuitBreaker {
private failures = 0;
private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
private lastFailureTime = 0;
constructor(
private threshold: number = 5,
private timeout: number = 30000
) {}
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'OPEN') {
if (Date.now() - this.lastFailureTime > this.timeout) {
this.state = 'HALF_OPEN';
} else {
throw new Error('Circuit is OPEN — service unavailable');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
this.failures = 0;
this.state = 'CLOSED';
}
private onFailure() {
this.failures++;
this.lastFailureTime = Date.now();
if (this.failures >= this.threshold) {
this.state = 'OPEN';
}
}
}
Step 5: Event-Driven Architecture
Event-driven architecture emerged because traditional request-response creates tight coupling between services — the producer must know all consumers. With events, a service simply announces "something happened" (OrderCreated, UserRegistered) and doesn't care who listens. This enables CQRS (separate read/write models), event sourcing (full audit trail by replaying events), and adding new features without modifying existing services. It's how Uber tracks rides in real-time, how banks maintain transaction history, and how analytics pipelines process billions of events daily.
Event Sourcing
Instead of storing current state, store all events:
Events:
1. UserRegistered { id: "u1", name: "Alice", email: "alice@..." }
2. UserUpdatedEmail { id: "u1", email: "newalice@..." }
3. UserDeactivated { id: "u1" }
Current state = replay all events
Benefits: Full audit trail, can rebuild state at any point in time
CQRS (Command Query Responsibility Segregation)
┌────────────────┐ ┌────────────────┐
│ Write Side │ │ Read Side │
│ (Commands) │ │ (Queries) │
├────────────────┤ ├────────────────┤
│ Validate │ Events │ Denormalized │
│ Business logic │ ──────→ │ Read models │
│ Write to DB │ │ Optimized for │
│ │ │ specific views │
└───────┬────────┘ └────────┬───────┘
│ │
Write DB Read DB
(normalized) (denormalized, fast)
Step 6: Rate Limiting
Rate limiting was invented to protect systems from abuse — both malicious (DDoS attacks, brute-force login attempts) and accidental (buggy client in an infinite retry loop). Without rate limiting, a single misbehaving client can consume all your resources and deny service to everyone. Every public API (GitHub, Stripe, Twitter) enforces rate limits, typically returning HTTP 429 (Too Many Requests). Understanding the algorithms (token bucket, sliding window) comes up in both system design and API design interviews.
Algorithms
| Algorithm | How | Best For |
|---|---|---|
| Token Bucket | Tokens refill at fixed rate, each request costs 1 token | Burst-friendly, most APIs |
| Sliding Window | Count requests in rolling time window | Smooth limiting |
| Fixed Window | Count per time interval (e.g., 100/minute) | Simple but has boundary spike issue |
| Leaky Bucket | Requests processed at fixed rate, excess queued | Smooth output rate |
Token Bucket Implementation
class RateLimiter {
private tokens: number;
private lastRefill: number;
constructor(
private maxTokens: number = 100,
private refillRate: number = 10, // tokens per second
) {
this.tokens = maxTokens;
this.lastRefill = Date.now();
}
allow(): boolean {
this.refill();
if (this.tokens > 0) {
this.tokens--;
return true;
}
return false; // Rate limited
}
private refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
}
}
Step 7: Common System Design Questions — Approach
The Framework (5 Steps)
- Clarify requirements (5 min) — Users? Scale? Core features?
- Back-of-envelope estimation (3 min) — QPS, storage, bandwidth
- High-level design (10 min) — Components, data flow
- Detailed design (15 min) — Deep dive into 2-3 components
- Bottlenecks & trade-offs (5 min) — What fails first? How to handle?
Quick Reference: Common Designs
| System | Key Components |
|---|---|
| URL Shortener | Hash function, key-value store, redirect service |
| Chat App | WebSocket servers, message queue, presence service |
| News Feed | Fan-out (push vs pull), ranking algorithm, cache |
| Notification | Queue, worker pool, delivery channels, templates |
| Rate Limiter | Token bucket, Redis (distributed), sliding window |
| File Storage | Chunk upload, metadata DB, CDN, replication |
Interview Questions
-
How would you design a URL shortener?
- Base62 encode an auto-incrementing ID (or hash the URL). Store mapping in key-value store (Redis). Redirect service reads from cache first. Analytics: count redirects per short URL. Scale: shard by short code prefix.
-
Explain the Circuit Breaker pattern.
- Monitors failures to an external service. After threshold failures, "opens" the circuit — subsequent calls fail immediately (fast failure). After a timeout, allows one test request. If it succeeds, closes circuit. Prevents cascading failures.
-
When would you choose event-driven over REST?
- When consumers don't need immediate response, when multiple services need to react to the same event, when you want loose coupling, or when you need to handle traffic spikes (queue absorbs burst).
-
What's the difference between CQRS and event sourcing?
- CQRS = separate read and write models (different DBs optimized for each). Event sourcing = store events instead of current state. They're often used together but are independent patterns.