Hi @coachpacman
That’s a great question, a cornerstone of Data Analysis with SQL. In my experience, once you get the intuition in the INNER vs LEFT JOIN
topic everything else becomes trivial.
Let’s try to arrive at the deepest level of understanding step by step.
Step 1: left and right tables
You always join one table to another. I always imagine two rectangles put together: .
You have your left table () and you join the right table to it (). In your case , your left table is the books
table and your right table is the books_users
table.
Step 2: join condition
Once we’ve decided on which tables to join, we need to define the join condition (ON ...
).
The join condition is a boolean expression that returns either TRUE
or FALSE
. Here’re some examples:
-- Example 1: Foreign key.
ON u.book_id = b.id
-- Example 2: Foreign key with a filter.
ON u.book_id = b.id
AND b.genre IN ('Biography')
-- Example 3
ON TRUE -- All records will be joined to all records. Don't try this at home!
The join condition is defined for all join types, it’ll be evaluated for all pairs of records.
Step 3: INNER JOIN behavior
Take another look at the illustration from the INNER JOIN lesson. The idea is that INNER JOIN joins two records (from the left and right tables) and keeps them in the result set only when the join condition is true.
If the join condition is false, then neither left nor right records won’t make it to the result set.
That’s why INNER JOIN is always pictured as the intersection between 2 circles.
Step 4: LEFT JOIN behavior
Now let’s look at the LEFT JOIN lesson. What your intuition tells you?
The LEFT JOIN
keeps all records from the left table:
- if the join condition is true, the right record will be joined to the left record and added to the result set
- If the join condition is false, the right record won’t be joined BUT a blank record will be joined to the left record (all columns are
NULL
-s)
That’s why on the diagram the left circle is always full.
Step 5: picking the right left table
Haha no pan intended
When it comes to the INNER JOIN
, it doesn’t matter which table is left or right – the result set will be the same in any case. Intersection between 2 circles won’t change when we switch them, right?
It matters a lot which table is the left table when using LEFT JOIN
. Remember, all records from the left table will be in the result set. That’s why we need to choose the left table based on what we’re calculating or researching.
Step 6: your example
First of all, a little reminder: the COUNT()
function counts only non-NULL values. Try it out:
SELECT COUNT(1)
vs
SELECT COUNT(NULL)
That’s why these 2 queries will return 2 different results:
SELECT COUNT(user_id)
FROM books b
LEFT JOIN books_users u
ON u.book_id = b.id
SELECT COUNT(*)
FROM books b
LEFT JOIN books_users u
ON u.book_id = b.id
Because of the LEFT JOIN
logic, we’ll have some NULL
records from the books_users
table (books that haven’t been read by any users).
I believe that was the source of your confusion, because if we’d use COUNT(*)
the results would be different:
SELECT COUNT(*)
FROM books b
LEFT JOIN books_users u
ON u.book_id = b.id
-- 1739
vs
SELECT COUNT(*)
FROM books b
INNER JOIN books_users u
ON u.book_id = b.id
-- 1667
Step 7: homework
Now that we’re so deep into JOINs, I’d like you to try and change the left and right tables and play with the results:
SELECT COUNT(*)
FROM books_users u
INNER JOIN books b
ON u.book_id = b.id
Also try to simply eyeball the result set and see if your intuition picks it up, that helps me a lot to detect patterns:
SELECT *
FROM books_users u
INNER JOIN books b
ON u.book_id = b.id
I hope that was helpful, have fun!