FORUM Product Analytics Identifying the most popular book in the catalogue

Identifying the most popular book in the catalogue

Hi, I’m doing the exercise “Identifying the most popular book in the catalogue”, I actually arrived to the same answers, altough with a little different query, I want to know if my result as pure lucky or is just another way to solve it. My query was:

SELECT
  name,
  COUNT(user_id) AS number_users
FROM books_users bu
INNER JOIN books b
  ON bu.book_id = b.id
GROUP BY 1
ORDER BY 2 DESC, 1 ASC

An the solution was the following:

SELECT
  name,
  COUNT(DISTINCT(bu.user_id))
FROM books_users bu
INNER JOIN books b
  ON bu.book_id = b.id
GROUP BY 1
ORDER BY 2 DESC, name ASC

Also, I want to know what’s the “general rule” to do inner joins, is there a difference which table I use first and which I join?

Thanks!!

REPLIES

Hey @ecasanova95 👋

That’s a really great question, let’s see what we have here.

INNER JOIN

The order of tables in INNER JOIN does not matter.

💡 INNER JOIN returns rows that have matching values (bu.book_id = b.id) in both tables, so it doesn’t matter in which order we join them – the result will be the same.

COUNT() vs COUNT(DISTINCT())

Your solution is absolutely correct. 👍

As you can see, the difference between these 2 solution is having DISTINCT() inside the COUNT() function.

If for some reason there’re duplicated records inside the books_users table – COUNT() will count the same user twice. That’s why I always recommend adding DISTINCT() to count users. 💡

Here’s a real example of that scenario. Imagine that Bindle’s website has a button “Add to the library” – a user clicks on it and we add a new record to the books_users table. If there’s a bug on the website and users are allowed to click this button multiple times – boom, we have duplicated records. 💣

WRITE A REPLY