Fundamentals of Data Analysis Better filtering records with ranges

24. Better filtering records with ranges

Let’s look at the query that counts signups within the first week of Bindle’s existence:

SELECT
  COUNT(*)
FROM users
WHERE
  signup_date >= '2018-01-01' 
  AND signup_date <= '2018-01-07'

As you can see we used the signup_date column twice which might be sometimes confusing (imagine you have 15 filters they don’t follow any particular order). ⚠

BETWEEN operator

SQL is 47+ years old, there’s definitely a better way to filter records with ranges: using BETWEEN operator. The query above ☝ could be rewritten like so:

SELECT
  COUNT(*)
FROM users
WHERE
  signup_date BETWEEN '2018-01-01' AND '2018-01-07'

BETWEEN operator filters records within a given range of numbers or dates.

⚠ BETWEEN operator is inclusive – both lower and higher values of the range will be included in the query’s output. ⚠ Which is exactly what we want! Now our query resembles English more than Mathematics 😄

BETWEEN and datetime = ⚠

Be careful when using BETWEEN with timestamp (datetime) columns. Compare the results of these similarly looking 2 queries:

SELECT
  COUNT(*)
FROM users
WHERE
  signup_date BETWEEN '2018-01-01' AND '2018-01-07'
SELECT
  COUNT(*)
FROM users
WHERE
  created_at BETWEEN '2018-01-01' AND '2018-01-07'

The only difference is what column we’re using for our range: signup_date is a date column and created_at is a timestamp (datetime) column.

The trick here is the conversion of the range dates ('2018-01-01' and '2018-01-07') to timestamps which SQL engine does behind the scenes:

SELECT 
  '2018-01-01'::timestamp,
  '2018-01-07'::timestamp

To debug this and similar cases it often helps to go through all the records manually and look for a pattern (you’ll see there’re no records on Jan, 7):

SELECT
  *
FROM users
WHERE
  created_at BETWEEN '2018-01-01' AND '2018-01-07'
ORDER BY created_at DESC

As you can see, this query returns less records that the one where we filter using the signup_date column. Look closely at the sorted result sets of these 2 queries – in the query with the created_at column there’re no records from the 7th of February. That’s because the upper bound is <= 2018-01-07 00:00:00. 💥

🎩 🐰

Anatoli Makarevich, author of SQL Habit About SQL Habit

Hi, it’s Anatoli, the author of SQL Habit. 👋

SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst 📊 and solve real-life challenges from Business, Marketing, and Product Management.

SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work. 🚀

“well worth the money”

Fluent in SQL in a month

Master Data Analysis with SQL with real life examples from Product Management, Marketing, Finance and more.
-- Type your query here, for example this one -- lists all records from users table: SELECT * FROM users
Loading chart... ⏳