Hi @tks
That’s a really great and deep question, this whole chapter is about finding events that aren’t obviously present in our data.
First of all, I’d suggest we prettify your query a little bit and follow SQL Habit’s style guide:
SELECT
COUNT(DISTINCT(app_version)) AS releases_count
FROM adjust.callbacks
WHERE
to_char(created_at, 'yyyy-mm') = '2018-02'
Now it’s a bit clearer and the result column alias tells us what do we want to calculate. We’re ready to dive deep into the problem!
Let’s look at our raw Adjust callbacks data:
SELECT *
FROM adjust.callbacks
Every row in this table is already some kind of event. We can figure out the kind by looking at the activity_kind
column, it has values click, install or signup.
As you can see, there’s no app install event. Adjust sends us callbacks (events) only in three cases – when someone clicks our link, install our app or signs up for Bindle.
You see where it’s going? Your query counts the number of app versions that users used (clicked links, installed or signed up for) in Feb, 2018. If someone installed an app version from January it’ll be counted as a February release, so here’s our bug. Busted!
Inferring release date
We know that adjust.callbacks
doesn’t have app releases data, we simply don’t track such event.
Since we don’t have such an event, we need to infer it – look at our events and figure out some statistic that would tell us when the app was released.
In our case it’s the first timestamp when we see a specific app version:
WITH app_releases AS (
SELECT
app_version,
MIN(created_at) AS app_released_at
FROM adjust.callbacks
GROUP BY 1
)
SELECT *
FROM app_releases
For me such queries are a thing of beauty – we just produced a whole new table of events from a simingly unrelated data, isn’t it awesome?
Now we just need to calculate releases from February, 2018 and the job is done.
Thank you for great questions, @tks!