FORUM Mobile Analytics AB-TEST : Exercise 201

AB-TEST : Exercise 201

I am not sure why my query returned as error for exercise 201.

WITH ab_test_categorization AS (
  SELECT
    user_id,
    action,
    custom_parameters ->> 'ab_test_name' AS ab_test_name,
    custom_parameters ->> 'ab_test_variation' AS ab_test_variation,
    created_at AS categorized_at
  FROM mobile_analytics.events
  WHERE
    custom_parameters ->> 'ab_test_name' IS NOT NULL
    AND action IN ('signup', 'add-library-entry')
)    

SELECT
  ab_test_variation,
  action,
  COUNT(DISTINCT(user_id)),
  100.0 * COUNT(DISTINCT(CASE WHEN ab_test_variation = 'control' AND action = 'add-library-entry' THEN user_id END)) / COUNT(DISTINCT(CASE WHEN ab_test_variation = 'control' AND action = 'signup' THEN user_id END))
FROM ab_test_categorization
WHERE
  ab_test_name = 'longer_onboarding_201803'
GROUP BY 1, 2

The bit at the final query

100.0 * COUNT(DISTINCT(CASE WHEN ab_test_variation = 'control' AND action = 'add-library-entry' THEN user_id END)) / COUNT(DISTINCT(CASE WHEN ab_test_variation = 'control' AND action = 'signup' THEN user_id END))

returned ERROR: division by zero.

I tried to further investigate myself by checking both the numerator and denominator part of the calculation independently and it works fine. I managed to answer the question by doing manually calculation from the table without the line of calculation that I got an error above.

Can you explain the error that I made above?

REPLIES

Hey @zofranmohd 👋

Oh I love these kind of questions! There’re a plenty on the Forum already, I wonder if we should do a recap or even a chapter on query debugging. 🤔

You have a nice query with nominator and denominator in a single expression. The division by zero error calls us to break it down into a table like so:

WITH ab_test_categorization AS (
  SELECT
    user_id,
    action,
    custom_parameters ->> 'ab_test_name' AS ab_test_name,
    custom_parameters ->> 'ab_test_variation' AS ab_test_variation,
    created_at AS categorized_at
  FROM mobile_analytics.events
  WHERE
    custom_parameters ->> 'ab_test_name' IS NOT NULL
    AND action IN ('signup', 'add-library-entry')
)    

SELECT
  ab_test_variation,
  action,  
  COUNT(DISTINCT(CASE WHEN ab_test_variation = 'control' AND action = 'add-library-entry' THEN user_id END)) AS action_count,
  COUNT(DISTINCT(CASE WHEN ab_test_variation = 'control' AND action = 'signup' THEN user_id END)) AS signup_count
FROM ab_test_categorization
WHERE
  ab_test_name = 'longer_onboarding_201803'
GROUP BY 1, 2

As you can see, the signup_count column has 0 values, hence the division by zero error.

Surpressing the error with NULLIF

One way is to use the NULLIF function to return NULL when the denominator is 0.

🔍 Head to SQL Habit MDN Docs for more info on NULLIF here.

Here’s a trivial breaking example:

SELECT 2 / 0

Here’s a version with NULLIF that returns NULL instead of an error:

SELECT 2 / NULLIF(0, 0)

You can wrap your denominator in NULLIF(..., 0) and you’ll get NULL-s in problematic cases.

Solving the problem

Using NULLIF is always a good idea, especially when you can have 0 values in your data. In your case, there’s actually a small bug that causes the problem.

Look closely at the table with nominator and denominator. ☝

As you can see, there’re 0 values in every single row — sometimes in nominator, sometimes in denominator. This is the root cause of the problem — you’re grouping by action and using condition counting COUNT(CASE WHEN action = ...) with different actions at the same time. In every action group only one action will be present, so the other one will be 0 and cause the error.

I hope that helps @zofranmohd 🙏 Let me know if you have any questions!

WRITE A REPLY