TL;DR
- Window Functions compute across a set of rows related to the current row — without collapsing them.
- CTEs (Common Table Expressions) = named subqueries for readability and recursion.
- Window functions solve: ranking, running totals, moving averages, gap detection, percentiles.
- Master
ROW_NUMBER,RANK,LAG/LEAD,SUM() OVER— they cover 90% of interview questions.
Step 1: Window Function Anatomy
Window functions were added to the SQL standard (SQL:2003) because GROUP BY destroys individual rows — you can get the department average, but you lose each employee's row. Window functions let you compute aggregates while keeping every row visible, answering questions like "what percentile is this employee within their department?" or "what's the running total up to this row?". They revolutionized analytics queries and are now the most powerful and most-tested advanced SQL feature in data engineering and backend interviews.
function_name() OVER (
PARTITION BY column -- Split into groups (optional)
ORDER BY column -- Order within each group
ROWS/RANGE frame_spec -- Which rows to include (optional)
)
Key Difference: GROUP BY vs Window Functions
-- GROUP BY: collapses rows → one row per group
SELECT department, AVG(salary) FROM employees GROUP BY department;
-- Result: 5 rows (one per department)
-- Window function: keeps all rows, adds computed column
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- Result: ALL rows, each with their department average
Step 2: Ranking Functions
Ranking functions were invented because "give me the top 3 per group" was one of the most-requested query patterns and was nightmarishly complex without them. Before ROW_NUMBER, getting the top N per group required correlated subqueries or application-side processing. The three variants (ROW_NUMBER for unique ordinals, RANK with gaps for ties, DENSE_RANK without gaps) cover every ranking scenario: leaderboards, pagination, deduplication, and percentile computation. "Top N per group" is arguably the single most common advanced SQL interview question.
-- ROW_NUMBER: unique sequential number (no ties)
-- RANK: same rank for ties, gaps after (1,2,2,4)
-- DENSE_RANK: same rank for ties, no gaps (1,2,2,3)
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
-- Example output:
-- Alice | Engineering | 150000 | 1 | 1 | 1
-- Bob | Engineering | 150000 | 2 | 1 | 1 ← Same salary
-- Carol | Sales | 140000 | 3 | 3 | 2 ← RANK skips, DENSE_RANK doesn't
-- Dave | Marketing | 130000 | 4 | 4 | 3
Top N Per Group
-- Top 3 earners in each department
WITH ranked AS (
SELECT
name, department, salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS rn
FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn <= 3;
NTILE — Split into Buckets
-- Divide employees into 4 salary quartiles
SELECT
name, salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;
-- quartile 1 = bottom 25%, quartile 4 = top 25%
Step 3: LAG & LEAD — Access Adjacent Rows
LAG and LEAD were created because comparing a row to its previous/next row is fundamental to time-series analysis but was extremely painful with self-joins. Calculating day-over-day revenue change, detecting consecutive events, or finding gaps in sequences all require looking at neighboring rows. Before LAG/LEAD, this meant joining the table to itself with offset conditions — slow and unreadable. These functions power every growth chart, trend analysis, and sequential pattern detection in analytics dashboards.
-- LAG: previous row's value
-- LEAD: next row's value
SELECT
date,
revenue,
LAG(revenue) OVER (ORDER BY date) AS prev_day_revenue,
revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY date)) * 100.0
/ LAG(revenue) OVER (ORDER BY date), 2
) AS pct_change
FROM daily_revenue
ORDER BY date;
-- Output:
-- 2026-01-01 | 10000 | NULL | NULL | NULL
-- 2026-01-02 | 12000 | 10000 | 2000 | 20.00%
-- 2026-01-03 | 11500 | 12000 | -500 | -4.17%
Month-over-Month Growth
WITH monthly AS (
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', created_at)
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0
/ LAG(revenue) OVER (ORDER BY month), 1) AS growth_pct
FROM monthly
ORDER BY month;
Step 4: Running Totals & Moving Averages
Running totals and moving averages are the foundation of financial reporting, inventory tracking, and time-series analysis. Before window functions with frame specifications, computing a running total required a self-join with a <= condition or cursor-based iteration — both O(n²). SUM() OVER (ORDER BY date) does it in a single linear pass. Moving averages smooth out noise in data ("what's the 7-day average revenue?") and are used in stock analysis, capacity planning, and any dashboard showing trends over time.
-- Running total
SELECT
date,
revenue,
SUM(revenue) OVER (ORDER BY date) AS running_total
FROM daily_revenue;
-- Running total per month (resets each month)
SELECT
date,
revenue,
SUM(revenue) OVER (
PARTITION BY DATE_TRUNC('month', date)
ORDER BY date
) AS monthly_running_total
FROM daily_revenue;
-- 7-day moving average
SELECT
date,
revenue,
ROUND(AVG(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS moving_avg_7d
FROM daily_revenue;
-- Cumulative percentage
SELECT
department,
salary,
ROUND(
SUM(salary) OVER (ORDER BY salary DESC) * 100.0
/ SUM(salary) OVER (), 2
) AS cumulative_pct
FROM employees;
Step 5: Frame Specifications
Frame specifications give you precise control over which rows a window function considers — "the 6 rows before the current one" or "from the start of the partition to here" or "3 days preceding to 3 days following". They were added because default window behavior (all rows in partition, or all rows up to current) doesn't cover every use case. Understanding ROWS vs RANGE is critical: ROWS counts physical rows (predictable), while RANGE groups logically equal values (handles duplicates differently). This is the advanced-level window function knowledge that distinguishes expert SQL users.
-- ROWS BETWEEN defines which rows the window includes:
-- All rows from start to current (default for ORDER BY)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- Previous 2 + current + next 2 (5-row window)
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
-- Current row to end
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
-- All rows in partition (no ORDER BY needed)
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
ROWS vs RANGE
-- ROWS: physical rows (exact count)
-- RANGE: logical range (values that are equal grouped together)
-- If dates have duplicates:
-- ROWS BETWEEN 6 PRECEDING = exactly 6 previous rows
-- RANGE BETWEEN INTERVAL '6 days' PRECEDING = all rows within 6 days
SELECT date, revenue,
SUM(revenue) OVER (
ORDER BY date
RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
) AS trailing_7d_revenue
FROM daily_revenue;
Step 6: CTEs (Common Table Expressions)
CTEs (the WITH clause) were added to SQL because deeply nested subqueries become unreadable after 2-3 levels. They let you name intermediate result sets and reference them like tables, creating a top-down readable flow instead of inside-out nesting. Recursive CTEs go further — they enable hierarchical queries (org charts, category trees, bill of materials) that would otherwise require application-side loops or database-specific extensions. CTEs are the key to writing maintainable, self-documenting analytical SQL.
Basic CTE
-- Named subquery — much more readable than nested subqueries
WITH high_value_customers AS (
SELECT user_id, SUM(total) AS lifetime_value
FROM orders
GROUP BY user_id
HAVING SUM(total) > 5000
),
recent_orders AS (
SELECT user_id, COUNT(*) AS recent_count
FROM orders
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY user_id
)
SELECT
u.name,
hvc.lifetime_value,
COALESCE(ro.recent_count, 0) AS orders_last_30d
FROM high_value_customers hvc
JOIN users u ON hvc.user_id = u.id
LEFT JOIN recent_orders ro ON hvc.user_id = ro.user_id
ORDER BY hvc.lifetime_value DESC;
Recursive CTE
-- Organizational hierarchy
WITH RECURSIVE org_tree AS (
-- Base case: top-level managers (no manager)
SELECT id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees under each manager
SELECT e.id, e.name, e.manager_id, ot.level + 1
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT
REPEAT(' ', level - 1) || name AS org_chart,
level
FROM org_tree
ORDER BY level, name;
-- Output:
-- CEO | 1
-- VP Engineering | 2
-- Senior Engineer | 3
-- Junior Engineer | 3
-- VP Sales | 2
-- Sales Manager | 3
Recursive: Generate Date Series
-- Generate all dates in a range (fill gaps)
WITH RECURSIVE dates AS (
SELECT DATE '2026-01-01' AS date
UNION ALL
SELECT date + 1 FROM dates WHERE date < '2026-01-31'
)
SELECT
d.date,
COALESCE(SUM(o.total), 0) AS revenue
FROM dates d
LEFT JOIN orders o ON DATE(o.created_at) = d.date
GROUP BY d.date
ORDER BY d.date;
Step 7: Advanced Patterns
Gap & Island Detection
-- Find consecutive login streaks
WITH numbered AS (
SELECT
user_id,
login_date,
login_date - (ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY login_date
))::int AS group_id
FROM user_logins
)
SELECT
user_id,
MIN(login_date) AS streak_start,
MAX(login_date) AS streak_end,
COUNT(*) AS streak_length
FROM numbered
GROUP BY user_id, group_id
HAVING COUNT(*) >= 3 -- Streaks of 3+ days
ORDER BY streak_length DESC;
Percentile / Median
-- Median salary per department
SELECT DISTINCT
department,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary)
OVER (PARTITION BY department) AS median_salary
FROM employees;
First/Last Value
-- Most recent order per customer
SELECT DISTINCT
user_id,
FIRST_VALUE(total) OVER (
PARTITION BY user_id ORDER BY created_at DESC
) AS most_recent_order_total,
LAST_VALUE(total) OVER (
PARTITION BY user_id
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS latest_order_total
FROM orders;
Interview Questions
-
What's the difference between ROW_NUMBER, RANK, and DENSE_RANK?
- ROW_NUMBER: always unique (1,2,3,4). RANK: ties get same number, gaps after (1,2,2,4). DENSE_RANK: ties get same number, no gaps (1,2,2,3).
-
When would you use a CTE over a subquery?
- When the same subquery is referenced multiple times, for readability, for recursive queries (hierarchy traversal), and when you want to name intermediate steps logically.
-
How do you calculate a running total?
SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). The ORDER BY + default frame gives cumulative sum.
-
Explain the frame specification in window functions.
- Defines which rows relative to current row are included in the window calculation.
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW= 3-row window. Default with ORDER BY is "from start to current row".
- Defines which rows relative to current row are included in the window calculation.