In this interactive SQL tutorial, we’ll learn how to find the third transaction for every user in a dataset. This is a common SQL interview question that tests your ability to rank and filter records using window functions. By the end of this tutorial, you’ll have a solid grasp of how to tackle similar problems using SQL.
Learning Objectives
- Use
ROW_NUMBER()
to rank records within partitions. - Filter records based on their rank to extract specific transactions.
- Understand how to apply window functions to solve real-world SQL interview questions.
Problem Statement
You are given a table named transactions
that contains details of Uber transactions. This table includes the following columns:
user_id
(integer): The unique identifier for each user.spend
(decimal): The amount spent on each transaction.transaction_date
(timestamp): The date and time of the transaction.
Task: Write an SQL query to obtain the third transaction of every user. The output should include user_id
, spend
, and transaction_date
.
Example Input Data
user_id | spend | transaction_date |
---|---|---|
111 | 100.50 | 01/08/2022 12:00:00 |
111 | 55.00 | 01/10/2022 12:00:00 |
121 | 36.00 | 01/18/2022 12:00:00 |
145 | 24.99 | 01/26/2022 12:00:00 |
111 | 89.60 | 02/05/2022 12:00:00 |
Expected Output
user_id | spend | transaction_date |
---|---|---|
111 | 89.60 | 02/05/2022 12:00:00 |
Step-by-Step Solution
Let’s break down the problem into a series of exercises to help you understand how to solve it.
Step 1: Using ROW_NUMBER()
to Rank Transactions
To find the third transaction for each user, we need to rank their transactions in chronological order. The ROW_NUMBER()
function allows us to do this effectively.
Interactive Exercise 1: Rank Transactions
Task: Write a query to rank all transactions for each user using ROW_NUMBER()
.
SELECT
user_id,
spend,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) AS row_num
FROM transactions;
- Try It: Run this query and observe how
ROW_NUMBER()
assigns a rank to each transaction for every user based on thetransaction_date
.
Explanation
ROW_NUMBER()
: This window function generates a sequential rank for each row within eachuser_id
partition.PARTITION BY user_id
: Divides the dataset into partitions based onuser_id
.ORDER BY transaction_date
: Orders the transactions within each partition bytransaction_date
in ascending order.
Step 2: Filtering to Get the Third Transaction
Now that you have the transactions ranked, the next step is to filter out only the third transaction for each user.
Interactive Exercise 2: Select the Third Transaction
Task: Modify the query to filter out only the transactions where the rank is 3.
SELECT user_id, spend, transaction_date
FROM (
SELECT
user_id,
spend,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) AS row_num
FROM transactions
) AS ranked_transactions
WHERE row_num = 3;
- Try It: Execute this query to see the third transaction for each user.
Explanation
- Subquery: The inner query ranks the transactions as you did in Step 1.
- Filtering: The outer query filters the ranked transactions to include only those where
row_num = 3
.
Additional Exercises
Exercise 3: Handling Different Scenarios
- Try It: Modify the query to find the second or fourth transaction instead of the third.
- Challenge: What happens if a user has fewer than three transactions? How would you modify the query to handle such cases?
Summary
In this tutorial, you learned how to:
- Use
ROW_NUMBER()
to rank transactions for each user. - Filter the ranked data to extract the third transaction for every user.
- Apply these concepts to solve real-world SQL interview problems efficiently.
By mastering window functions like ROW_NUMBER()
, you’ll be well-prepared to tackle similar SQL challenges in interviews and real-world scenarios.