Finding the Third Transaction of Every User: An SQL Tutorial

In this interactive SQL tutorial, we’ll learn how to find the third transaction for every user in a dataset. This is a common SQL interview question that tests your ability to rank and filter records using window functions. By the end of this tutorial, you’ll have a solid grasp of how to tackle similar problems using SQL.

Learning Objectives

  • Use ROW_NUMBER() to rank records within partitions.
  • Filter records based on their rank to extract specific transactions.
  • Understand how to apply window functions to solve real-world SQL interview questions.

Problem Statement

You are given a table named transactions that contains details of Uber transactions. This table includes the following columns:

  • user_id (integer): The unique identifier for each user.
  • spend (decimal): The amount spent on each transaction.
  • transaction_date (timestamp): The date and time of the transaction.

Task: Write an SQL query to obtain the third transaction of every user. The output should include user_id, spend, and transaction_date.

Example Input Data

user_idspendtransaction_date
111100.5001/08/2022 12:00:00
11155.0001/10/2022 12:00:00
12136.0001/18/2022 12:00:00
14524.9901/26/2022 12:00:00
11189.6002/05/2022 12:00:00

Expected Output

user_idspendtransaction_date
11189.6002/05/2022 12:00:00

Step-by-Step Solution

Let’s break down the problem into a series of exercises to help you understand how to solve it.


Step 1: Using ROW_NUMBER() to Rank Transactions

To find the third transaction for each user, we need to rank their transactions in chronological order. The ROW_NUMBER() function allows us to do this effectively.

Interactive Exercise 1: Rank Transactions

Task: Write a query to rank all transactions for each user using ROW_NUMBER().

SELECT 
    user_id,
    spend,
    transaction_date,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) AS row_num
FROM transactions;
  • Try It: Run this query and observe how ROW_NUMBER() assigns a rank to each transaction for every user based on the transaction_date.

Explanation

  • ROW_NUMBER(): This window function generates a sequential rank for each row within each user_id partition.
  • PARTITION BY user_id: Divides the dataset into partitions based on user_id.
  • ORDER BY transaction_date: Orders the transactions within each partition by transaction_date in ascending order.

Step 2: Filtering to Get the Third Transaction

Now that you have the transactions ranked, the next step is to filter out only the third transaction for each user.

Interactive Exercise 2: Select the Third Transaction

Task: Modify the query to filter out only the transactions where the rank is 3.

SELECT user_id, spend, transaction_date
FROM (
    SELECT 
        user_id,
        spend,
        transaction_date,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) AS row_num
    FROM transactions
) AS ranked_transactions
WHERE row_num = 3;
  • Try It: Execute this query to see the third transaction for each user.

Explanation

  • Subquery: The inner query ranks the transactions as you did in Step 1.
  • Filtering: The outer query filters the ranked transactions to include only those where row_num = 3.

Additional Exercises

Exercise 3: Handling Different Scenarios

  • Try It: Modify the query to find the second or fourth transaction instead of the third.
  • Challenge: What happens if a user has fewer than three transactions? How would you modify the query to handle such cases?

Summary

In this tutorial, you learned how to:

  • Use ROW_NUMBER() to rank transactions for each user.
  • Filter the ranked data to extract the third transaction for every user.
  • Apply these concepts to solve real-world SQL interview problems efficiently.

By mastering window functions like ROW_NUMBER(), you’ll be well-prepared to tackle similar SQL challenges in interviews and real-world scenarios.