XplormityXplormity
HomeHandbooks
Browse
Xplormity

TLDR developer handbooks for
seasoned developers.

Handbooks

RustNestJSNext.jsGitDockerTypeScriptReactNode.jsDSASQLSystem DesignTailwind CSS

Site

HomeHandbooksAboutPrivacyTerms

Connect

GitHubTwitterLinkedIn

© 2026 Xplormity. All rights reserved.

HandbooksSystem DesignScalability & Core Concepts

Scalability & Core Concepts

scalabilityload-balancingcachingbasicshandbook

TL;DR

  • Horizontal scaling > vertical scaling for most web apps.
  • Load balancers distribute traffic. CDNs cache static assets at the edge.
  • Caching (Redis, CDN, browser) is the #1 performance lever.
  • Know the tradeoffs: CAP theorem, consistency vs availability.

Step 1: Scaling Fundamentals

Scaling is the reason system design interviews exist. Every startup begins on a single server, and the moment it succeeds, that server falls over under load. The choice between vertical scaling (bigger machine) and horizontal scaling (more machines) shapes your entire architecture. Horizontal won because hardware has a ceiling, it creates redundancy, and cloud providers make adding instances trivial. But it introduces complexity: state management, data consistency, and load distribution — which is everything else in this handbook.

Vertical vs Horizontal Scaling

Vertical Scaling (Scale Up)          Horizontal Scaling (Scale Out)
┌────────────┐                       ┌────┐ ┌────┐ ┌────┐ ┌────┐
│            │                       │ S1 │ │ S2 │ │ S3 │ │ S4 │
│   BIGGER   │                       └────┘ └────┘ └────┘ └────┘
│   SERVER   │                            ↑
│            │                       Load Balancer
└────────────┘                       ┌──────────────────────┐
                                     │     Traffic          │
More CPU, RAM                        └──────────────────────┘
Has limits                           Add more machines
Single point of failure              Redundancy built-in
Vertical Horizontal
Cost Exponential Linear
Limit Hardware ceiling Practically unlimited
Complexity Simple Need load balancing, state management
Downtime Required for upgrade Zero-downtime possible

Step 2: Load Balancing

Load balancers exist because the moment you have multiple servers, you need something to decide which server handles each request. Without one, you'd need DNS round-robin (no health checks, sticky DNS caches) or manual traffic splitting. Load balancers also enable zero-downtime deployments, automatic failover, and SSL termination. Understanding the different algorithms (round-robin, least connections, IP hash) comes up in every system design interview because the choice directly affects session handling and resource utilization.

Algorithms

Algorithm How it Works Best For
Round Robin Rotate through servers 1→2→3→1→... Equal servers, stateless
Weighted Round Robin More traffic to stronger servers Mixed hardware
Least Connections Route to server with fewest active connections Variable request times
IP Hash Same client always hits same server Session affinity
Random Randomly pick a server Simple, surprisingly effective

Architecture

                    ┌─── App Server 1
User → DNS → CDN → LB ─── App Server 2
                    └─── App Server 3
                              │
                         ┌────┴────┐
                         │  Cache  │
                         │ (Redis) │
                         └────┬────┘
                              │
                         ┌────┴────┐
                         │Database │
                         │(Primary)│
                         └────┬────┘
                              │
                    ┌─────────┴─────────┐
                    │ Read Replica 1    │ Read Replica 2
                    └───────────────────┘

Health Checks

# Load balancer health check config
health_check:
  path: /health
  interval: 10s
  timeout: 3s
  unhealthy_threshold: 3   # 3 failures = remove from pool
  healthy_threshold: 2     # 2 successes = add back

Step 3: Caching Strategies

Caching is the single biggest performance lever in any system — it's cheaper and faster to add cache than to optimize queries or add servers. The reason it's so effective: most apps have a heavily skewed access pattern (80/20 rule) where a small fraction of data gets most of the reads. Understanding the cache hierarchy (browser → CDN → Redis → DB) and patterns (cache-aside, write-through, write-behind) is essential because the wrong strategy leads to stale data, thundering herds, or cache stampedes that can bring down your system.

Cache Hierarchy

Browser Cache (fastest, per-user)
    ↓ miss
CDN Cache (fast, per-region)
    ↓ miss
Application Cache - Redis (fast, shared)
    ↓ miss
Database Query Cache
    ↓ miss
Database Disk

Patterns

Cache-Aside (Lazy Loading)

def get_user(user_id):
    # 1. Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # 2. Cache miss — fetch from DB
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)

    # 3. Store in cache for next time
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))  # TTL: 1 hour

    return user

Write-Through

def update_user(user_id, data):
    # 1. Write to DB
    db.update("UPDATE users SET ... WHERE id = ?", data, user_id)

    # 2. Update cache immediately
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))

Write-Behind (Write-Back)

def update_user(user_id, data):
    # 1. Write to cache only (fast response)
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))

    # 2. Async worker writes to DB later (batched)
    queue.push({"type": "user_update", "id": user_id, "data": data})

Cache Invalidation

Strategy Pros Cons
TTL (Time-to-Live) Simple, automatic Stale data until expiry
Event-based Immediate freshness Complex, need event system
Version keys Easy rollback More cache keys

Step 4: Database Scaling

Database scaling is where system design gets hard. Your app servers are stateless and easily replicated, but your database holds the truth — and truth is hard to distribute. Read replicas handle read-heavy workloads (most apps are 90%+ reads), but writes still hit a single primary. When that's not enough, sharding splits data across machines, but it makes joins impossible across shards and requires careful shard key selection. This is the domain where "it depends" actually matters, and interviewers want to hear you reason about the tradeoffs.

Read Replicas

Writes → Primary DB (single source of truth)
Reads  → Read Replicas (many, distributed)

Replication lag: 10ms - 1s (eventual consistency)

Sharding (Horizontal Partitioning)

User data split by user_id:
  Shard 1: user_id 1-1M      (Server A)
  Shard 2: user_id 1M-2M     (Server B)
  Shard 3: user_id 2M-3M     (Server C)

Shard key selection is CRITICAL:
- Even distribution
- Queries should hit single shard
- Avoid hot spots

When to Scale What

Symptom Solution
Slow reads Read replicas, caching
Slow writes Sharding, write-behind cache
Too much data Sharding, archiving old data
Complex queries Denormalization, materialized views
Full-text search Elasticsearch/OpenSearch

Step 5: CAP Theorem

The CAP theorem (proved by Brewer in 2000) states that in the presence of a network partition, you must choose between consistency and availability. It matters because distributed systems WILL experience network issues — the question is how your system behaves when they happen. A banking system must choose consistency (reject transactions if unsure), while a social feed chooses availability (show potentially stale data). This framework is used in every system design interview to justify architecture decisions and understand database choices.

    Consistency (C)
        /\
       /  \
      /    \
     / Pick \
    /  Two   \
   /──────────\
Availability    Partition
    (A)        Tolerance (P)

You can only have 2 of 3 (and P is not optional in distributed systems):

Choice Trade-off Example
CP Sacrifices availability during partitions MongoDB, Redis Cluster
AP Sacrifices consistency (eventual) Cassandra, DynamoDB
CA Can't handle network partitions (single node) Traditional RDBMS

Practical Decision

Banking transaction → CP (consistency matters, reject if uncertain)
Social media feed → AP (show slightly stale data, always available)
Shopping cart → AP (merge conflicts later, never lose items)
Inventory count → CP (don't oversell)

Step 6: Content Delivery Networks (CDN)

CDNs exist because the speed of light is a real constraint — a user in Tokyo making a request to a server in Virginia experiences 200ms of unavoidable network latency. CDNs place cached copies of your content at "edge" locations worldwide, reducing latency to 10-30ms for cached content. For static assets, CDNs reduce origin load by 90%+. For modern apps using ISR/SSG (Next.js), CDNs serve entire pages from the edge. Cloudflare, AWS CloudFront, and Vercel's Edge Network are the major players.

Without CDN:
  User (Tokyo) → Server (US-East) = 200ms latency

With CDN:
  User (Tokyo) → CDN Edge (Tokyo) = 20ms latency
  CDN Edge → Origin (cache miss only) = 200ms

What to CDN

Cache Don't Cache
Static assets (JS, CSS, images) Personalized content
API responses (public, rarely changes) Real-time data
HTML pages (SSG/ISR) Authentication responses
Fonts, videos Write operations

Step 7: Back-of-Envelope Calculations

Latency Numbers Every Developer Should Know

Operation Time
L1 cache reference 1 ns
L2 cache reference 4 ns
RAM reference 100 ns
SSD random read 16 μs
HDD seek 4 ms
Network round-trip (same datacenter) 0.5 ms
Network round-trip (cross-continent) 150 ms

Quick Estimation

1 million users, 10% daily active = 100K DAU
100K users × 10 requests/day = 1M requests/day
1M / 86400 seconds ≈ 12 requests/second (average)
Peak = 3-5x average ≈ 50 RPS

Can a single server handle 50 RPS? YES (most can handle 1000+ RPS)
When do you NEED horizontal scaling? 5000+ RPS or high availability requirements

Interview Questions

  1. How would you design a system to handle 10M users?

    • Load balancer → multiple app servers (stateless) → Redis cache → database with read replicas. CDN for static assets. Shard database when single instance is bottleneck. Queue for async work.
  2. Explain the difference between horizontal and vertical scaling.

    • Vertical: bigger machine (more RAM/CPU). Horizontal: more machines behind a load balancer. Horizontal preferred for availability, cost-effectiveness, and practically unlimited growth.
  3. How do you handle session state with multiple servers?

    • Options: Sticky sessions (IP hash, not recommended), centralized session store (Redis), or JWT tokens (stateless — preferred).
  4. What's eventual consistency?

    • After a write, reads may temporarily return stale data. All replicas converge to the same value eventually. Trade-off: higher availability for temporary inconsistency.
Design Patterns & Architecture

On this page

  • TL;DR
  • Step 1: Scaling Fundamentals
  • Vertical vs Horizontal Scaling
  • Step 2: Load Balancing
  • Algorithms
  • Architecture
  • Health Checks
  • Step 3: Caching Strategies
  • Cache Hierarchy
  • Patterns
  • Cache Invalidation
  • Step 4: Database Scaling
  • Read Replicas
  • Sharding (Horizontal Partitioning)
  • When to Scale What
  • Step 5: CAP Theorem
  • Practical Decision
  • Step 6: Content Delivery Networks (CDN)
  • What to CDN
  • Step 7: Back-of-Envelope Calculations
  • Latency Numbers Every Developer Should Know
  • Quick Estimation
  • Interview Questions