Fundamentals of Data Analysis If/else logic via CASE statement

27. If/else logic via CASE statement

“Who are these people?” asked Sandra laughing, “So many users signed up! It would be so great to know more about them, for example how old are they. We do ask for age in our signup flow, I bet we can query this data!”

Time to explore one more column in the users table – age. Inside the age column we store a number that users type in after they signed up. It could be used in many ways – we can look at product usage or marketing efforts per age group. It helps better understand your audience (and adjust copy or product strategy based on data) or make sure you target the right audience (imagine you want 30+ people to use your product and your marketing campaigns only bring you 15-year-olds. It’s better to turn these campaigns off).

Calculating age group with CASE statement

Age itself isn’t very helpful – we’ll most likely have all ages between 10 and 100. Usually, we assign an age group to each user and then use groups for analysis. Typical age groups are:

  • Youth (< 17)
  • Young adult (17-25)
  • Adult (26-35)
  • Middle-aged (36-45)
  • Aged (46+)

(makes me shiver when I’m writing this, I’m 30 🙈). All righty, now we have our age groups defined let’s write a query that will assign a group to each user. To do this for each row we’d need to perform the following calculation: “If a user’s age is less than 16 – set age_group as ‘youth’; if age is between 18 and 25 set_age_group to ‘young_adult’ … else set age_group to ‘aged’.” In programming (and in SQL as well) it’s often referred to as if/else logic.

In SQL if/else logic is implemented via the CASE statement:

SELECT 
  CASE 
    WHEN age <= 16 THEN 'youth'
    WHEN age BETWEEN 17 AND 25 THEN 'young_adult'
    WHEN age BETWEEN 26 AND 35 THEN 'adult'
    WHEN age BETWEEN 36 AND 45 THEN 'middle_aged'
    ELSE 'aged'
  END AS age_group,
  *
FROM users

🙀 I know this query might be overwhelming, shoot me a message via chat 👉 or leave feedback after you finish the lesson.

Let’s break down what’s going on in this query:

  • we’re selecting all columns from the users table using a wildcard *
  • we’re adding a new column named age_group using alias operator AS age_group
  • we’re using comparison operators to determine which age group to set inside new age_group column (less than < and BETWEEN
  • note how the CASE statement ends with ELSE keyword – it works only if all previous conditions are not valid

🔍 If you run this query ☝ you’ll see that the age_group column will be the first one, followed by the rest of the users table columns (because we used the wildcard * in the SELECT statement). We can totally change the order and have age_group in the end like this:

SELECT
  *,
  CASE 
    WHEN age <= 16 THEN 'youth'
    WHEN age BETWEEN 17 AND 25 THEN 'young_adult'
    WHEN age BETWEEN 26 AND 35 THEN 'adult'
    WHEN age BETWEEN 36 AND 45 THEN 'middle_aged'
    ELSE 'aged'
  END AS age_group
FROM users

💡 Just a refresher: we can select multiple columns from the table, we just need to list them in the SELECT statement and separate with commas. From that point of view the age_group is just yet another column we’re adding to our result set.

☝ We can organize our query’s result as we want by simply specifying necessary columns in the SELECT statement. When it comes to calculating new custom columns (like age_group) I always add them in the beginning, because usually they’re the most important ones.

Case statement

Let’s look closely at the CASE statement:

Here are more details on how it works:

  • it always starts with CASE keyword and ends with END
  • conditions (condition_1, condition_2, etc) are the same filters we write in WHERE clause
  • when the CASE statement is evaluated it checks conditions top to bottom until the first match and then stops

If/else logic gives you so much power – we’ll use CASE statement a lot when calculating rates, percentages, segment users, etc. Go ahead and run these queries in the Playground ☝ 🚀

Anatoli Makarevich, author of SQL Habit About SQL Habit

Hi, it’s Anatoli, the author of SQL Habit. 👋

SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst 📊 and solve real-life challenges from Business, Marketing, and Product Management.

SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work. 🚀

“well worth the money”

Fluent in SQL in a month

Master Data Analysis with SQL with real life examples from Product Management, Marketing, Finance and more.
-- Type your query here, for example this one -- lists all records from users table: SELECT * FROM users
Loading chart... ⏳