Design Patterns & Architecture — System Design Handbook

TL;DR

Microservices: Independent services, each owning its data. Communication via REST/gRPC/events.
Event-driven: Async communication via queues (Kafka, RabbitMQ). Decouples producers from consumers.
API Gateway: Single entry point, handles auth, rate limiting, routing.
Circuit Breaker: Prevent cascading failures in distributed systems.

Step 1: Monolith vs Microservices

This is the first and most consequential architecture decision for any system. Microservices were invented at companies like Netflix, Amazon, and Spotify because their monoliths couldn't scale — not in terms of traffic (that's horizontal scaling), but in terms of developer productivity. When 500 engineers deploy from one codebase, merge conflicts and deployment coordination become the bottleneck. Microservices let teams own, deploy, and scale their services independently. But for small teams (<10 devs), a monolith is almost always the right starting point.

Monolith

┌─────────────────────────────────────┐
│           Single Application         │
│  ┌─────┐ ┌──────┐ ┌───────┐       │
│  │Users│ │Orders│ │Payment│  ...   │
│  └──┬──┘ └──┬───┘ └───┬───┘       │
│     └────────┴─────────┘           │
│          Shared Database            │
└─────────────────────────────────────┘

Microservices

┌──────┐   ┌───────┐   ┌────────┐   ┌──────────┐
│ User │   │ Order │   │Payment │   │Inventory │
│  Svc │   │  Svc  │   │  Svc   │   │   Svc    │
└──┬───┘   └──┬────┘   └──┬─────┘   └────┬─────┘
   │          │           │              │
┌──┴──┐  ┌───┴──┐  ┌────┴───┐   ┌─────┴────┐
│DB-1 │  │DB-2  │  │ DB-3   │   │  DB-4    │
└─────┘  └──────┘  └────────┘   └──────────┘
Each service owns its own data ✅

Decision Framework

Start with Monolith when	Use Microservices when
Small team (<10 devs)	Large org, multiple teams
New product, unclear boundaries	Clear domain boundaries
Simple deployment needs	Independent scaling needed
Speed of development matters	Different tech per service OK
<100K users	>1M users with varied workloads

Step 2: Communication Patterns

Once you split into services, they need to talk to each other — and how they communicate determines your system's reliability, latency, and complexity. Synchronous (REST/gRPC) is simple but creates coupling: if Service B is slow, Service A is slow. Asynchronous messaging (Kafka, RabbitMQ) decouples services but introduces eventual consistency and debugging complexity. In system design interviews, choosing the right communication pattern for each interaction and justifying why is what separates senior from junior answers.

Synchronous (REST/gRPC)

Order Service → [HTTP/gRPC] → Payment Service
                                    ↓
                              Process payment
                                    ↓
               ← [Response] ← Return result

Problem: If Payment is down, Order is blocked

Asynchronous (Event-Driven)

Order Service → [Event: OrderCreated] → Message Queue
                                              ↓
                              Payment Service picks up
                              Inventory Service picks up
                              Notification Service picks up
                              (All independently, at their own pace)

When to Use Which

Pattern	Use When	Trade-off
REST	Simple CRUD, need immediate response	Tight coupling, cascading failures
gRPC	High-performance, service-to-service	More complex setup, binary protocol
Events (async)	Don't need immediate response	Eventually consistent, harder to debug
Saga	Multi-service transactions	Complex compensation logic

Step 3: API Gateway Pattern

The API Gateway was born from a practical problem: clients shouldn't need to know about your internal service topology. Without a gateway, a mobile app would need to call 8 different services directly, handle authentication with each, and know their addresses. The gateway provides a single entry point that handles cross-cutting concerns (auth, rate limiting, logging, SSL) and can aggregate responses from multiple services into one client-friendly payload. Every major system (Netflix Zuul, Kong, AWS API Gateway) uses this pattern.

                    ┌──────────────────────────┐
Mobile App ──────→  │                          │  ──→ User Service
Web App ─────────→  │      API Gateway         │  ──→ Order Service
Third Party ─────→  │                          │  ──→ Product Service
                    └──────────────────────────┘
                    Responsibilities:
                    • Authentication/Authorization
                    • Rate limiting
                    • Request routing
                    • Response aggregation
                    • SSL termination
                    • Logging/monitoring
                    • Request/response transformation

Backend for Frontend (BFF)

Mobile App  ──→  Mobile BFF  ──→  Services
Web App     ──→  Web BFF     ──→  Services
Admin Panel ──→  Admin BFF   ──→  Services

Each BFF tailors responses for its client type
(mobile gets less data, web gets richer payloads)

Step 4: Circuit Breaker Pattern

The circuit breaker pattern was adopted from electrical engineering: just as a circuit breaker trips to prevent a house fire from a short circuit, this pattern "trips" to prevent cascading failures across services. Without it, when Service B goes down, Service A keeps sending requests, exhausting its thread pool waiting for timeouts — then A goes down, then everything that depends on A goes down. The circuit breaker fails fast after detecting B is unhealthy, giving B time to recover and keeping A responsive with graceful degradation.

Prevents cascading failures when a service is down:

States:
┌────────┐  failures > threshold  ┌──────┐  timeout  ┌───────────┐
│ CLOSED │ ─────────────────────→ │ OPEN │ ────────→ │ HALF-OPEN │
│(normal)│ ←─────────────────────── │(fail)│ ←──────── │  (testing) │
└────────┘  success resets count   └──────┘  failure   └───────────┘
                                              ↑ success    │
                                              └────────────┘

class CircuitBreaker {
  private failures = 0;
  private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
  private lastFailureTime = 0;

  constructor(
    private threshold: number = 5,
    private timeout: number = 30000
  ) {}

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailureTime > this.timeout) {
        this.state = 'HALF_OPEN';
      } else {
        throw new Error('Circuit is OPEN — service unavailable');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    this.failures = 0;
    this.state = 'CLOSED';
  }

  private onFailure() {
    this.failures++;
    this.lastFailureTime = Date.now();
    if (this.failures >= this.threshold) {
      this.state = 'OPEN';
    }
  }
}

Step 5: Event-Driven Architecture

Event-driven architecture emerged because traditional request-response creates tight coupling between services — the producer must know all consumers. With events, a service simply announces "something happened" (OrderCreated, UserRegistered) and doesn't care who listens. This enables CQRS (separate read/write models), event sourcing (full audit trail by replaying events), and adding new features without modifying existing services. It's how Uber tracks rides in real-time, how banks maintain transaction history, and how analytics pipelines process billions of events daily.

Event Sourcing

Instead of storing current state, store all events:

Events:
1. UserRegistered { id: "u1", name: "Alice", email: "alice@..." }
2. UserUpdatedEmail { id: "u1", email: "newalice@..." }
3. UserDeactivated { id: "u1" }

Current state = replay all events
Benefits: Full audit trail, can rebuild state at any point in time

CQRS (Command Query Responsibility Segregation)

┌────────────────┐         ┌────────────────┐
│   Write Side   │         │   Read Side    │
│  (Commands)    │         │  (Queries)     │
├────────────────┤         ├────────────────┤
│ Validate       │ Events  │ Denormalized   │
│ Business logic │ ──────→ │ Read models    │
│ Write to DB    │         │ Optimized for  │
│                │         │ specific views │
└───────┬────────┘         └────────┬───────┘
        │                           │
   Write DB                    Read DB
(normalized)              (denormalized, fast)

Step 6: Rate Limiting

Rate limiting was invented to protect systems from abuse — both malicious (DDoS attacks, brute-force login attempts) and accidental (buggy client in an infinite retry loop). Without rate limiting, a single misbehaving client can consume all your resources and deny service to everyone. Every public API (GitHub, Stripe, Twitter) enforces rate limits, typically returning HTTP 429 (Too Many Requests). Understanding the algorithms (token bucket, sliding window) comes up in both system design and API design interviews.

Algorithms

Algorithm	How	Best For
Token Bucket	Tokens refill at fixed rate, each request costs 1 token	Burst-friendly, most APIs
Sliding Window	Count requests in rolling time window	Smooth limiting
Fixed Window	Count per time interval (e.g., 100/minute)	Simple but has boundary spike issue
Leaky Bucket	Requests processed at fixed rate, excess queued	Smooth output rate

Token Bucket Implementation

class RateLimiter {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private maxTokens: number = 100,
    private refillRate: number = 10, // tokens per second
  ) {
    this.tokens = maxTokens;
    this.lastRefill = Date.now();
  }

  allow(): boolean {
    this.refill();
    if (this.tokens > 0) {
      this.tokens--;
      return true;
    }
    return false; // Rate limited
  }

  private refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;
  }
}

Step 7: Common System Design Questions — Approach

The Framework (5 Steps)

Clarify requirements (5 min) — Users? Scale? Core features?
Back-of-envelope estimation (3 min) — QPS, storage, bandwidth
High-level design (10 min) — Components, data flow
Detailed design (15 min) — Deep dive into 2-3 components
Bottlenecks & trade-offs (5 min) — What fails first? How to handle?

Quick Reference: Common Designs

System	Key Components
URL Shortener	Hash function, key-value store, redirect service
Chat App	WebSocket servers, message queue, presence service
News Feed	Fan-out (push vs pull), ranking algorithm, cache
Notification	Queue, worker pool, delivery channels, templates
Rate Limiter	Token bucket, Redis (distributed), sliding window
File Storage	Chunk upload, metadata DB, CDN, replication

Interview Questions

How would you design a URL shortener?
- Base62 encode an auto-incrementing ID (or hash the URL). Store mapping in key-value store (Redis). Redirect service reads from cache first. Analytics: count redirects per short URL. Scale: shard by short code prefix.
Explain the Circuit Breaker pattern.
- Monitors failures to an external service. After threshold failures, "opens" the circuit — subsequent calls fail immediately (fast failure). After a timeout, allows one test request. If it succeeds, closes circuit. Prevents cascading failures.
When would you choose event-driven over REST?
- When consumers don't need immediate response, when multiple services need to react to the same event, when you want loose coupling, or when you need to handle traffic spikes (queue absorbs burst).
What's the difference between CQRS and event sourcing?
- CQRS = separate read and write models (different DBs optimized for each). Event sourcing = store events instead of current state. They're often used together but are independent patterns.