192. Aggregating analytics events into daily activity
Before we move forward and calculate retention curve (D1-D30 retention rates for a daily cohort) or retention matrix (D1-D30 retention rates for multiple cohorts) let’s learn an important trick. Namely, generating activity data.
Why bother? Same as with activation rate, to calculate retention rate we need to define a metric that tells us that the user is retained. We also could say that if a user is active on a certain day we consider such a user retained on that day.
User activity data based on analytics events
When calculating retention rate, it often helps to imagine a table where we record dates when a user was active. With such table calculations become really simple:
user_id
signup_date
activity_date
2134
2018-02-01
2018-02-01
2134
2018-02-01
2018-02-02
2134
2018-02-01
2018-02-03
2135
2018-02-01
2018-02-01
2135
2018-02-01
2018-02-02
2137
2018-02-02
2018-02-02
It’s quiz time! Go ahead and write a query that generates this table from mobile_analytics.events table. Consider a user active if there’s any event happened on this date. Show the query.
Here we go:
WITH users_daily_events AS (
SELECT
u.user_id,
u.created_at::date AS signup_date,
e.created_at::date AS activity_date,
COUNT(*) AS events_counts
FROM mobile_analytics.events u
LEFT JOIN mobile_analytics.events e
ON e.user_id = u.user_id
WHERE
u.action = 'signup'
GROUP BY 1, 2, 3
ORDER BY signup_date ASC, user_id ASC
)
SELECT
user_id,
signup_date,
activity_date
FROM users_daily_events
As you can see, it’s a combo of a subquery and our favorite grouping and counting technique. We aggregated all mobile events by user, signup date and the date mobile event was tracked.
This table will be the basis of our further retention analysis.
User activity data without analytics events
Before we continue with retention curves, let’s talk briefly about the nature of user activity data. Remember our trick with generating events based on production database tables? It allowed us to calculate activation rate without analytics events. Think for a minute, is there a way we could generate activity data based on tables form production database?
Because chances are, we don’t have mobile analytics events. What can we do instead? As you can see, the data in the books_users table is not enough:
SELECT *
FROM books_users
It contains only the timestamp when a user started reading a book. We definitely need something else.
Short answer: it’s impossible to generate user activity data based on existing production tables in the Bindle warehouse. The solution requires some technical support. We need to create a table in the production database and every time a user hits any API endpoint (for login, for example) we’d write to this table that this user was active. As you can see, it’s a very simplified version of a mobile analytics system that would do a good job for retention analytics. It certainly won’t allow us to zoom in and see how users interact with our app (screenviews and interactions often don’t hit API endpoints).
We’re all ready to level up our retention rate game and calculate our first retention curve. See you in the next lesson!
SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst and solve real-life challenges from Business, Marketing, and Product Management.
SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work.
“well worth the money”
Fluent in SQL in a month
Master Data Analysis with SQL with real life examples from Product Management, Marketing, Finance and more.