Product Analytics. Part 1 Cleaning data: filtering out empty values

144. Cleaning data: filtering out empty values

We’ve talked about how important it is to validate data before doing any reports. S**t in – S**t out.

With JSON columns it’s quite easy to make a mistake, we’ve already seen how filtering for empty JSON values could be misleading. Compare these 2 queries:

SELECT COUNT(*) FROM web_analytics.pageviews WHERE custom_parameters IS NOT NULL 
SELECT COUNT(*) FROM web_analytics.pageviews WHERE custom_parameters::text != '{}' 

Quite different numbers, right? Imagine we’d calculate conversion rate or something with these numbers, could be very dangerous.

...

“well worth the money”
Sign up and check out 37 free lessons and exercises.

Anatoli Makarevich, author of SQL Habit About SQL Habit

Hi, it’s Anatoli, the author of SQL Habit. 👋

SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst 📊 and solve real-life challenges from Business, Marketing, and Product Management.

SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work. 🚀

“well worth the money”

Fluent in SQL in a month

Master Data Analysis with SQL with real life examples from Product Management, Marketing, Finance and more.
-- Type your query here, for example this one -- lists all records from users table: SELECT * FROM users
Loading chart... ⏳