Event Loop, Streams & Architecture — Node.js Handbook

TL;DR

Node.js is single-threaded with an event loop for async I/O.
Event loop phases: Timers → I/O Callbacks → Poll → Check → Close.
Streams process data in chunks (don't load entire file into memory).
Worker Threads for CPU-heavy tasks (don't block the event loop).

Step 1: The Event Loop

The event loop is what makes Node.js possible — it's how a single-threaded runtime handles thousands of concurrent connections without spawning a thread per request (like Java/C# did traditionally). Ryan Dahl created Node.js in 2009 specifically because he was frustrated with Apache's thread-per-connection model that collapsed under 10K concurrent connections. The event loop delegates I/O to the OS kernel (using epoll/kqueue/IOCP) and processes callbacks when work completes. Understanding its phases is critical for debugging timing issues and preventing the cardinal sin of Node: blocking the event loop.

   ┌───────────────────────────┐
┌─→│         timers             │ ← setTimeout, setInterval
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
│  │     pending callbacks      │ ← I/O callbacks deferred
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
│  │       idle, prepare        │ ← internal use
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
│  │          poll              │ ← I/O events (most work happens here)
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
│  │          check             │ ← setImmediate
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
└──│      close callbacks       │ ← socket.on('close')
   └───────────────────────────┘

process.nextTick → runs BETWEEN each phase (highest priority)
Promises (.then) → microtask queue (after nextTick, before next phase)

Execution Order Quiz

console.log('1 - sync');

setTimeout(() => console.log('2 - timeout'), 0);
setImmediate(() => console.log('3 - immediate'));

Promise.resolve().then(() => console.log('4 - promise'));
process.nextTick(() => console.log('5 - nextTick'));

console.log('6 - sync');

// Output:
// 1 - sync
// 6 - sync
// 5 - nextTick       (microtask - highest priority)
// 4 - promise        (microtask - after nextTick)
// 2 - timeout        (timer phase) — OR 3 first (order not guaranteed at top level)
// 3 - immediate      (check phase)

Don't Block the Event Loop

// ❌ BAD — blocks the event loop (no other requests processed)
app.get('/hash', (req, res) => {
  const hash = crypto.pbkdf2Sync(password, salt, 100000, 64, 'sha512');
  res.json({ hash: hash.toString('hex') });
});

// ✅ GOOD — async version (event loop stays free)
app.get('/hash', async (req, res) => {
  const hash = await new Promise((resolve, reject) => {
    crypto.pbkdf2(password, salt, 100000, 64, 'sha512', (err, key) => {
      if (err) reject(err);
      else resolve(key);
    });
  });
  res.json({ hash: hash.toString('hex') });
});

// ✅ BETTER — worker thread for CPU-heavy work
const { Worker } = require('worker_threads');
app.get('/hash', async (req, res) => {
  const result = await runInWorker('./hash-worker.js', { password, salt });
  res.json(result);
});

Step 2: Streams — Process Data Efficiently

Streams were built into Node.js because the naive approach to file/network I/O (read everything into memory, then process) breaks catastrophically with large data. A 2GB file read with readFileSync consumes 2GB of RAM; a stream processes it in 64KB chunks with constant memory. Streams are the backbone of Node.js: HTTP requests/responses are streams, file I/O uses streams, compression uses streams. They're how Node handles gigabytes of data on a server with 512MB of RAM. The pipeline() API from Node 10+ is the modern way to chain them safely with proper error handling.

Why Streams?

// ❌ BAD — loads entire 2GB file into memory
const data = fs.readFileSync('huge-file.csv');
process(data); // Memory usage: 2GB+

// ✅ GOOD — processes chunk by chunk (constant memory ~64KB)
const stream = fs.createReadStream('huge-file.csv');
stream.on('data', (chunk) => process(chunk));
stream.on('end', () => console.log('Done'));

Stream Types

Type	Purpose	Example
Readable	Source of data	`fs.createReadStream`, `http.IncomingMessage`
Writable	Destination for data	`fs.createWriteStream`, `http.ServerResponse`
Transform	Modify data passing through	`zlib.createGzip()`, custom parsers
Duplex	Both readable and writable	`net.Socket`, WebSocket

Piping (Connect Streams)

const fs = require('fs');
const zlib = require('zlib');
const { pipeline } = require('stream/promises');

// Compress a file: read → gzip → write
await pipeline(
  fs.createReadStream('input.log'),      // Readable
  zlib.createGzip(),                     // Transform
  fs.createWriteStream('input.log.gz')   // Writable
);

// HTTP file download with compression
app.get('/download/:file', (req, res) => {
  res.setHeader('Content-Encoding', 'gzip');
  pipeline(
    fs.createReadStream(`./files/${req.params.file}`),
    zlib.createGzip(),
    res  // res is a Writable stream!
  );
});

Custom Transform Stream

const { Transform } = require('stream');

// CSV line parser
const csvParser = new Transform({
  objectMode: true,
  transform(chunk, encoding, callback) {
    const lines = chunk.toString().split('\n');
    for (const line of lines) {
      if (line.trim()) {
        const [name, age, city] = line.split(',');
        this.push({ name, age: parseInt(age), city });
      }
    }
    callback();
  }
});

// Usage
fs.createReadStream('data.csv')
  .pipe(csvParser)
  .on('data', (record) => console.log(record));

Step 3: Error Handling Patterns

Error handling in Node.js is notoriously tricky because errors can occur in three contexts: synchronous code (try/catch), callbacks (error-first pattern), and Promises/async-await. An unhandled rejection or uncaught exception crashes the entire process — taking down every connected user. These patterns exist because production Node.js apps need to distinguish operational errors (user sent bad data, network timeout) from programmer errors (undefined is not a function), handle each differently, and shut down gracefully when recovery is impossible.

// 1. Async/Await with try-catch
async function getUser(id) {
  try {
    const user = await db.findUser(id);
    if (!user) throw new AppError('User not found', 404);
    return user;
  } catch (error) {
    if (error instanceof AppError) throw error;
    throw new AppError('Database error', 500, error);
  }
}

// 2. Express error middleware
app.use((err, req, res, next) => {
  const status = err.statusCode || 500;
  const message = err.isOperational ? err.message : 'Internal server error';

  // Log for debugging (but don't expose to client)
  if (!err.isOperational) {
    console.error('Unexpected error:', err);
  }

  res.status(status).json({ error: message });
});

// 3. Custom error class
class AppError extends Error {
  constructor(message, statusCode, cause) {
    super(message);
    this.statusCode = statusCode;
    this.isOperational = true; // Expected errors (not bugs)
    this.cause = cause;
  }
}

// 4. Unhandled rejection (catch-all safety net)
process.on('unhandledRejection', (reason) => {
  console.error('Unhandled Rejection:', reason);
  // Graceful shutdown
  server.close(() => process.exit(1));
});

Step 4: Worker Threads

Worker threads were added to Node.js v10.5 (2018) because the single-threaded model has one fundamental weakness: CPU-intensive tasks (image processing, encryption, data parsing) block the event loop and make the entire server unresponsive. Before worker threads, the options were spawning child processes (expensive, no shared memory) or using native C++ addons. Worker threads give you true parallelism with shared memory (SharedArrayBuffer) for CPU-bound work while keeping the event loop free for I/O. A worker pool pattern prevents the overhead of creating/destroying threads per request.

// main.js
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');

if (isMainThread) {
  // Main thread — spawn worker for CPU task
  async function runHeavyTask(data) {
    return new Promise((resolve, reject) => {
      const worker = new Worker('./worker.js', { workerData: data });
      worker.on('message', resolve);
      worker.on('error', reject);
    });
  }

  app.get('/process', async (req, res) => {
    const result = await runHeavyTask({ numbers: [1, 2, 3, ...largeArray] });
    res.json(result);
  });
}

// worker.js
const { parentPort, workerData } = require('worker_threads');

// CPU-heavy computation (runs in separate thread)
const result = workerData.numbers.reduce((sum, n) => sum + fibonacci(n), 0);
parentPort.postMessage({ result });

Worker Pool (Reuse Workers)

const { Worker } = require('worker_threads');
const os = require('os');

class WorkerPool {
  constructor(workerFile, poolSize = os.cpus().length) {
    this.workers = [];
    this.queue = [];

    for (let i = 0; i < poolSize; i++) {
      this.workers.push({ worker: new Worker(workerFile), busy: false });
    }
  }

  run(data) {
    return new Promise((resolve, reject) => {
      const available = this.workers.find(w => !w.busy);

      if (available) {
        available.busy = true;
        available.worker.postMessage(data);
        available.worker.once('message', (result) => {
          available.busy = false;
          this.processQueue();
          resolve(result);
        });
      } else {
        this.queue.push({ data, resolve, reject });
      }
    });
  }

  processQueue() {
    if (this.queue.length === 0) return;
    const { data, resolve, reject } = this.queue.shift();
    this.run(data).then(resolve).catch(reject);
  }
}

Step 5: Clustering & Process Management

Node.js clustering exists because a single Node process can only use one CPU core (due to the single-threaded event loop). On a machine with 8 cores, 7 cores sit idle without clustering. The cluster module forks multiple copies of your application and distributes incoming connections across them using round-robin (on Linux) or the OS scheduler. In production, PM2 handles this plus auto-restart on crash, zero-downtime reloads, log management, and startup scripts — it's essentially systemd for Node.js apps.

const cluster = require('cluster');
const os = require('os');

if (cluster.isPrimary) {
  const numCPUs = os.cpus().length;
  console.log(`Primary ${process.pid} starting ${numCPUs} workers`);

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code) => {
    console.log(`Worker ${worker.process.pid} died (code: ${code})`);
    cluster.fork(); // Restart dead worker
  });
} else {
  // Worker process — each handles requests
  const app = require('./app');
  app.listen(3000, () => {
    console.log(`Worker ${process.pid} listening on 3000`);
  });
}

Production: Use PM2

# Start with cluster mode
pm2 start app.js -i max  # max = number of CPU cores

# Key PM2 commands
pm2 list                 # Show all processes
pm2 monit               # Real-time monitoring
pm2 reload app          # Zero-downtime restart
pm2 logs                # View logs
pm2 startup             # Auto-start on system boot

Step 6: Common Architecture Patterns

The layered architecture (routes → services → repositories) became the standard for Node.js apps because without it, Express route handlers become 200-line monsters that mix HTTP logic, business rules, and database queries. Separating these concerns makes code testable (mock the database layer), reusable (call the service from a CLI or message consumer), and maintainable (change the database without touching business logic). This is the architecture pattern that every serious Node.js backend follows, whether it's explicit (NestJS) or ad-hoc (Express with good discipline).

Layered Architecture

┌──────────────────────────────────┐
│ Routes / Controllers             │ ← HTTP handling, validation
├──────────────────────────────────┤
│ Services (Business Logic)        │ ← Core logic, orchestration
├──────────────────────────────────┤
│ Repositories / Data Access       │ ← Database queries
├──────────────────────────────────┤
│ Database / External APIs         │ ← Actual data sources
└──────────────────────────────────┘

// routes/users.js — thin controller
router.get('/:id', async (req, res, next) => {
  try {
    const user = await userService.getById(req.params.id);
    res.json(user);
  } catch (err) {
    next(err);
  }
});

// services/userService.js — business logic
class UserService {
  constructor(userRepo, emailService) {
    this.userRepo = userRepo;
    this.emailService = emailService;
  }

  async getById(id) {
    const user = await this.userRepo.findById(id);
    if (!user) throw new AppError('User not found', 404);
    return user;
  }

  async create(data) {
    const user = await this.userRepo.create(data);
    await this.emailService.sendWelcome(user.email);
    return user;
  }
}

// repositories/userRepo.js — data access only
class UserRepository {
  async findById(id) {
    return db.query('SELECT * FROM users WHERE id = $1', [id]);
  }
}

Interview Questions

Explain the Node.js event loop.
- Single-threaded loop with phases: Timers → I/O → Poll → Check → Close. Async I/O is delegated to the OS/thread pool (libuv). Callbacks are queued and processed in each phase. process.nextTick and Promises run between phases as microtasks.
When would you use Worker Threads vs Cluster?
- Worker Threads: CPU-intensive tasks (parsing, encryption, image processing) in parallel within one process. Cluster: multiple instances of your entire server (each handling requests) for horizontal scaling.
What are Streams and when should you use them?
- Streams process data in chunks instead of loading everything into memory. Use for: large files, HTTP responses, real-time data, piping between sources. Four types: Readable, Writable, Transform, Duplex.
What happens if you block the event loop?
- ALL incoming requests wait. No I/O callbacks fire. Timers don't execute. Server appears frozen. Fix: use async APIs, Worker Threads for CPU work, or break work into smaller chunks with setImmediate.