FORUM SQL Habit

SQL Habit

A place to ask questions about the SQL Habit Course content, UX of the website or suggest new features šŸ¤“

Lesson 26 part 1 and part 2

This might be a bit silly, I am new to SQL so pardon my simple queries. I was practicing calculating purchase rate in lesson 26 and I decided to create my own queries. The question my own query should answer is ā€œWhat is the % of users who are trial users and signed up in the first 2 weeks of Bindle existanceā€.

So I wrote a query that calculates this (Query 1), and I wanted to check this with another query (Query 2) to see if the result is correct. I canā€™t figure out why I am missing 1% of the users when I run the 2nd query.

Query 1

Calcualate % of the users who are trial users and signed up in the first 2 weeks of Bindle existance. This query produces a result of 36%:

SELECT
  100 * COUNT(CASE WHEN status = 'trial' THEN id END) /
  COUNT(*) AS trial_users_first_2_weeks_Jan_2018
FROM users
WHERE
  signup_date BETWEEN '2018-01-01' AND '2018-01-14'  

Query 2

Below query should return 64% as the query above returned 36% but it returns 63% and I donā€™t know why.

SELECT
  100 * COUNT(CASE WHEN status = 'customer' OR status = 'free' THEN id END) /
  COUNT(*) AS non_trial_users_first_2_weeks_Jan_2018
FROM users
WHERE
  signup_date BETWEEN '2018-01-01' AND '2018-01-14'

Query 3

Below query shows all unique values for status column in the user table for the same time frame as previous queries. I used it to figure out how to structure second query.

SELECT DISTINCT status
FROM users
WHERE 
  signup_date BETWEEN '2018-01-01' AND '2018-01-14'

Thank you.

105. 'Logical fallacy' explanation of my alternative query using 'where' clause

In 105. a great and important point is made in the difference of limiting a query in the join conditions vs. limiting a query in the where clause.

I understand the scenario of why limiting the query in the join condition makes sense (filtering the purchases table to only consider all purchases that were not refunded) and I see how using the where clause ultimately returns wrong output (filtering in the where clause will ignore the ā€˜negative spaceā€™ of all the NULLs in purchase columns for users who did not make a purchase).

I tried challenging my understanding a bit and came up with the ā€˜bā€™ CTE below (ā€˜aā€™ CTE is the correct query as described in the exercise). My question is then: how am I making a logical error in trying to utilise the where clause in the ā€˜bā€™ CTE below? As far as I can see, the difference in counts between ā€˜aā€™ and ā€˜bā€™ is 104 rows (purchases that were refunded?).

WITH a AS (
  SELECT
    SUM(amount) / COUNT(DISTINCT(u.id)) AS ARPU
  FROM users u
  LEFT JOIN purchases p
  ON u.id = p.user_id
  AND refunded = FALSE
),

b AS (
  SELECT
    SUM(amount) / COUNT(DISTINCT(u.id)) AS ARPU
  FROM users as u
  LEFT JOIN purchases AS p
  ON u.id = p.user_id
  WHERE refunded IS NULL OR refunded = FALSE
)

SELECT * FROM b
-- SELECT * FROM a

/*
The 'a' CTE above is correct. I can see that 'b' CTE is not correct,
but I'm unable to wrap my mind around why, for the 'b' version, the
'amount' is greater in relation to the distinct user count.
*/