Imagine you’re analyzing user activity on Microsoft Teams, and you want to find out who the most active users were in terms of sending messages during August 2022. In this problem, we’re tasked with identifying the top 2 users who sent the highest number of messages during that month. Let’s break down the problem and write an SQL query to solve it.
Problem Statement
Given a table named messages
containing information about messages sent on Microsoft Teams, identify the top 2 users who sent the highest number of messages in August 2022. Display these users’ IDs along with the total number of messages they sent. The result should be sorted in descending order based on the count of messages.
Table: messages
Column Name | Type |
---|---|
message_id | integer |
sender_id | integer |
receiver_id | integer |
content | varchar |
sent_date | datetime |
Example Input:
message_id | sender_id | receiver_id | content | sent_date |
---|---|---|---|---|
901 | 3601 | 4500 | You up? | 08/03/2022 00:00:00 |
902 | 4500 | 3601 | Only if you’re buying | 08/03/2022 00:00:00 |
743 | 3601 | 8752 | Let’s take this offline | 06/14/2022 00:00:00 |
922 | 3601 | 4500 | Get on the call | 08/10/2022 00:00:00 |
Example Output:
sender_id | message_count |
---|---|
3601 | 2 |
4500 | 1 |
In this output:
- 3601 is the top sender with 2 messages sent in August 2022.
- 4500 is the second top sender with 1 message sent in August 2022.
Step-by-Step Solution
- Filter for August 2022: Start by filtering the
messages
table to include only the rows where thesent_date
falls within August 2022. - Count Messages per Sender: Group the filtered results by
sender_id
and count the number of messages each user sent. - Order and Limit Results: Sort the result in descending order of the message count and limit the output to the top 2 users.
SQL Query
Here’s how you can write the SQL query to find the top 2 power users:
SELECT sender_id, COUNT(message_id) AS message_count
FROM messages
WHERE sent_date BETWEEN '2022-08-01' AND '2022-08-31'
GROUP BY sender_id
ORDER BY message_count DESC
LIMIT 2;
Explanation of the Query
- Filtering (WHERE clause):
- The
WHERE sent_date BETWEEN '2022-08-01' AND '2022-08-31'
condition filters themessages
table to include only the rows where thesent_date
falls within August 2022.
- Grouping and Counting (GROUP BY clause):
- The
GROUP BY sender_id
groups the rows bysender_id
, so we can count the number of messages each user sent. COUNT(message_id) AS message_count
counts the number of messages for each user.
- Sorting and Limiting (ORDER BY and LIMIT clauses):
ORDER BY message_count DESC
sorts the results in descending order based on the number of messages sent.LIMIT 2
restricts the output to the top 2 users.
Assumptions and Considerations
- The assumption is that no two users have sent the same number of messages in August 2022. This means there will be a clear top 2 without ties.
- The query uses the
BETWEEN
clause for date filtering, ensuring we only consider messages within the specified date range (August 1 to August 31, 2022).
Conclusion
This query helps identify the top users based on their messaging activity over a specific period. By utilizing SQL functions like GROUP BY
, ORDER BY
, and LIMIT
, you can efficiently analyze user behavior and extract valuable insights. Whether you’re preparing for an interview or working on a real-world analytics task, mastering such SQL queries is essential for handling data and drawing actionable conclusions.