I am not sure why my query returned as error for exercise 201.
WITH ab_test_categorization AS (
SELECT
user_id,
action,
custom_parameters ->> 'ab_test_name' AS ab_test_name,
custom_parameters ->> 'ab_test_variation' AS ab_test_variation,
created_at AS categorized_at
FROM mobile_analytics.events
WHERE
custom_parameters ->> 'ab_test_name' IS NOT NULL
AND action IN ('signup', 'add-library-entry')
)
SELECT
ab_test_variation,
action,
COUNT(DISTINCT(user_id)),
100.0 * COUNT(DISTINCT(CASE WHEN ab_test_variation = 'control' AND action = 'add-library-entry' THEN user_id END)) / COUNT(DISTINCT(CASE WHEN ab_test_variation = 'control' AND action = 'signup' THEN user_id END))
FROM ab_test_categorization
WHERE
ab_test_name = 'longer_onboarding_201803'
GROUP BY 1, 2
The bit at the final query
100.0 * COUNT(DISTINCT(CASE WHEN ab_test_variation = 'control' AND action = 'add-library-entry' THEN user_id END)) / COUNT(DISTINCT(CASE WHEN ab_test_variation = 'control' AND action = 'signup' THEN user_id END))
returned ERROR: division by zero
.
I tried to further investigate myself by checking both the numerator and denominator part of the calculation independently and it works fine. I managed to answer the question by doing manually calculation from the table without the line of calculation that I got an error above.
Can you explain the error that I made above?
AB-test categorization. Part 1 - When categorization happens
When it’s mentioned that “A great way to categorize users is during the signup process”, is this specifically for testing onboarding flow changes or all changes?
The lesson also cautions against “categorizing users when they visit this page or screen” that’s subject to AB test changes. But why? Wouldn’t you want to assign users to their A/B variant at the point of change as not all users will go through the funnel and you risk requiring a larger sample size? The concern I’m flagging is also outlined here: https://medium.com/@foundinblank/why-it-matters-where-you-randomize-users-in-a-b-experiments-5570c7585944.
Many thanks, the exercises have been interesting and useful!
So I tried to recreate the results for Exercise 185 Activation rate per country but without using the mobile_analytics.events
table.
The results can’t be replicated due to differences in tables i.e. number of unique users in mobile_analytics.events table: 2508 number of unique iphone users in devices table: 4223
But is the logic of my query attempt correct?
WITH mob_readers_and_users AS (
SELECT
country,
COUNT(DISTINCT CASE WHEN last_page > 0 THEN d.user_id END) AS reader_num,
COUNT(DISTINCT d.user_id) AS user_num
FROM users u
LEFT JOIN books_users b
ON u.id = b.user_id
LEFT JOIN devices d
ON u.id = d.user_id
WHERE
device_type = 'iphone'
GROUP BY 1
)
SELECT
country,
reader_num,
ROUND(100.0 * reader_num / user_num, 2) AS activation_rate
FROM mob_readers_and_users
ORDER BY 3 DESC
I got the right answer (and same SQL results) for the exercise even though I counted event action ‘Signup’ instead of DISTINCT pageview_id.
As a rule of thumb, I understand that it’s preferable to count unique identifiers for conversion rates. But are there any risks to using my method below?
My query is below for reference:
SELECT p.device_type, 100.0* COUNT(action) / COUNT(DISTINCT visitor_id) AS cvr
FROM web_analytics.pageviews p
LEFT JOIN
web_analytics.events e
ON p.pageview_id = e.pageview_id
AND e.action = 'Signup'
WHERE url LIKE '%/books/%'
GROUP BY 1
ORDER BY 2 DESC
Hello everyone, I am in exercise 30: counting number of customers per country in a month.
The solution to the exercise has this query:
SELECT
COUNT(*)
FROM users
WHERE
signup_date BETWEEN '2018-01-01' AND '2018-01-31'
AND status = 'customer'
AND country = 'in'
but I performed a different query:
SELECT
COUNT(CASE WHEN status ='customer' THEN id END) AS count_customer_in
FROM users
WHERE country = 'in'
AND
signup_date BETWEEN '2018-01-01' AND '2018-01-31';
that led me to the result.
My question is: Would it be good practice to query it in this last way? Regards.
Hi Anatoli!
My query did take me to the right answer but the CTR values weren’t the same as your query.
My query was like this:
SELECT
device_type,
(100.0 * count(distinct s.pageview_id) / count(distinct p.visitor_id)) ctr
FROM web_analytics.pageviews p
LEFT JOIN (select pageview_id from web_analytics.events where category = 'Signup Button' and action = 'Click') s
ON p.pageview_id = s.pageview_id
where p.url like '%books%'
group by 1
order by 2 desc
Could you please explain what I’m missing? Thank you!
On exercise number 40….. newbie question…… the solution used COUNT() , but I use COUNT(book_id) (because i think the question is about the book lol)… is there a rule when to use COUNT() or when to use COUNT(specific column) or they will be always the same?
And what if there is a NULL
value in that specific column, so COUNT(*)
will always be the better choice?
I am slightly confused about the soft activation calculation.
In chpt 183, soft activation rate is calculated by Count cohort users who have a specific event divided by Count all cohort users.
But in chpt 184, it is calculated as Count cohort users who have a specific event divided by signup users.
As per my understanding specific users event should be divided with signup users.
Can you please clarify this?
I created the following query for the exercise on lesson 114. Detecting campaign with the highest CPC:
SELECT
utm_campaign,
amount / clicks AS CPC
FROM marketing_spends
GROUP BY 1
ORDER BY 2 DESC
However this returns an error: column marketing_spends.amount
must appear in the GROUP BY
.
Why is it required to include it on the group by clause? Seems that the following query runs without errors:
SELECT
utm_campaign,
amount / clicks AS CPC
FROM marketing_spends
GROUP BY 1, 2
ORDER BY 2 DESC
Thank you!
When I run Query 1 query, I expected the query will return me only rows with refunded = False
. But it also returns rows with refunded = Null
.
SELECT *
FROM users u
LEFT JOIN purchases p
ON u.id = p.user_id
AND refunded = FALSE
But when I run query 2, it returns me rows with refunded = false
(no null rows):
SELECT u.id, status, refunded
FROM users u
LEFT JOIN purchases p
ON u.id = p.user_id
WHERE
refunded = FALSE
Can someone please explain me this?