In this SQL tutorial, we’ll learn how to calculate and compare the time spent sending versus opening snaps on Snapchat, grouped by age buckets. This problem tests your ability to use SQL aggregation, conditional sums, and joins to generate insightful metrics. By the end of this tutorial, you’ll know how to break down activity metrics and calculate percentages efficiently in SQL.
Learning Objectives
- Learn how to join tables to bring together different data points.
- Use
SUM()
withCASE
statements to conditionally aggregate data. - Calculate percentages and format them to two decimal places using SQL.
- Apply these techniques to group and analyze user activity data.
Problem Statement
You have two tables:
activities
: Contains records of Snapchat user activities, including time spent sending, opening, and chatting.age_breakdown
: Maps users to age groups.
Task: Write an SQL query to obtain a breakdown of the time spent sending versus opening snaps as a percentage of the total time spent on these activities, grouped by age group. Round the percentage to 2 decimal places.
Schema
activities
Table
Column Name | Type | Description |
---|---|---|
activity_id | integer | The unique ID of each activity. |
user_id | integer | The ID of the user performing the activity. |
activity_type | string | The type of activity (‘send’, ‘open’, ‘chat’). |
time_spent | float | The time spent on the activity. |
activity_date | datetime | The date of the activity. |
Example Input for activities
activity_id | user_id | activity_type | time_spent | activity_date |
---|---|---|---|---|
7274 | 123 | open | 4.50 | 06/22/2022 12:00:00 |
2425 | 123 | send | 3.50 | 06/22/2022 12:00:00 |
1413 | 456 | send | 5.67 | 06/23/2022 12:00:00 |
1414 | 789 | chat | 11.00 | 06/25/2022 12:00:00 |
2536 | 456 | open | 3.00 | 06/25/2022 12:00:00 |
age_breakdown
Table
Column Name | Type | Description |
---|---|---|
user_id | integer | The ID of the user. |
age_bucket | string | The age group of the user. |
Example Input for age_breakdown
user_id | age_bucket |
---|---|
123 | 31-35 |
456 | 26-30 |
789 | 21-25 |
Expected Output
age_bucket | send_perc | open_perc |
---|---|---|
26-30 | 65.40 | 34.60 |
31-35 | 43.75 | 56.25 |
Step-by-Step Solution
Let’s break this problem into a series of steps to make it more manageable.
Step 1: Join the Tables
To get a complete dataset for analysis, we need to join the activities
table with the age_breakdown
table. This will help us group the activity data by age buckets.
Interactive Exercise 1: Join Tables to Combine Data
Task: Write a query to join the activities
and age_breakdown
tables.
SELECT
a.user_id,
ab.age_bucket,
a.activity_type,
a.time_spent
FROM activities a
JOIN age_breakdown ab ON a.user_id = ab.user_id;
- Try It: Run this query to see a combined dataset of user activities along with their age buckets.
Explanation
JOIN
: Combines rows from theactivities
table with theage_breakdown
table where theuser_id
matches.
Step 2: Calculate Total Time Spent on Sending and Opening Snaps
Now, let’s calculate the total time spent on “send” and “open” activities for each age bucket. We use SUM()
with a CASE
statement to conditionally sum the time spent on each activity type.
Interactive Exercise 2: Calculate Time Spent on Each Activity
Task: Write a query to calculate the total time spent sending and opening snaps, grouped by age buckets.
SELECT
ab.age_bucket,
SUM(CASE WHEN a.activity_type = 'send' THEN a.time_spent ELSE 0 END) AS total_send_time,
SUM(CASE WHEN a.activity_type = 'open' THEN a.time_spent ELSE 0 END) AS total_open_time
FROM activities a
JOIN age_breakdown ab ON a.user_id = ab.user_id
GROUP BY ab.age_bucket;
- Try It: Run this query to see the total time spent on sending and opening snaps for each age bucket.
Explanation
CASE
Statements: Conditionals used to selectively sumtime_spent
based onactivity_type
.SUM()
: Aggregates the total time for each activity type.GROUP BY ab.age_bucket
: Groups the results by age bucket.
Step 3: Calculate Percentages and Round Them
Now that we have the total time spent on sending and opening snaps, we need to calculate the percentages of total time spent for each activity type and round them to two decimal places.
Interactive Exercise 3: Calculate Percentages
Task: Write a query to calculate the percentages of time spent sending and opening snaps.
SELECT
ab.age_bucket,
ROUND(
SUM(CASE WHEN a.activity_type = 'send' THEN a.time_spent ELSE 0 END) /
SUM(CASE WHEN a.activity_type IN ('send', 'open') THEN a.time_spent ELSE 0 END) * 100.0, 2
) AS send_perc,
ROUND(
SUM(CASE WHEN a.activity_type = 'open' THEN a.time_spent ELSE 0 END) /
SUM(CASE WHEN a.activity_type IN ('send', 'open') THEN a.time_spent ELSE 0 END) * 100.0, 2
) AS open_perc
FROM activities a
JOIN age_breakdown ab ON a.user_id = ab.user_id
GROUP BY ab.age_bucket;
- Try It: Execute this query to see the percentage breakdown of time spent on sending and opening snaps, grouped by age bucket.
Explanation
- Percentage Calculation:
time_spent / (time_spent send + time_spent open) * 100.0
ensures a floating-point division to avoid integer division issues. ROUND()
: Rounds the calculated percentages to two decimal places for better readability.
Summary
In this tutorial, you learned how to:
- Join tables to enrich your dataset for analysis.
- Use
SUM()
withCASE
statements to conditionally aggregate data. - Calculate percentages of total time spent and round them to two decimal places.
Final Query Recap:
SELECT
ab.age_bucket,
ROUND(
SUM(CASE WHEN a.activity_type = 'send' THEN a.time_spent ELSE 0 END) /
SUM(CASE WHEN a.activity_type IN ('send', 'open') THEN a.time_spent ELSE 0 END) * 100.0, 2
) AS send_perc,
ROUND(
SUM(CASE WHEN a.activity_type = 'open' THEN a.time_spent ELSE 0 END) /
SUM(CASE WHEN a.activity_type IN ('send', 'open') THEN a.time_spent ELSE 0 END) * 100.0, 2
) AS open_perc
FROM activities a
JOIN age_breakdown ab ON a.user_id = ab.user_id
GROUP BY ab.age_bucket;
Additional Exercises
Exercise 1: Include Other Activity Types
- Task: Modify the query to include other activity types such as ‘chat’. How does this change the breakdown?
Conclusion
By following this tutorial, you now know how to use SQL to analyze user behavior on social media platforms. You can apply these techniques to other problems involving conditional aggregation, percentage calculations, and grouping by categories.