Fundamentals of Data Analysis Limiting number of records produced by query

11. Limiting number of records produced by query

“Wooow, we have thousands of users registered, look at these users!” the girls were screaming and jumping out of joy. “I’m really glad that we can query the tables so easily and check what fields are there. But do we need to print all records every time? It makes it slower, is there a way to print out, say, 5 records?” asked Linh.

Indeed there is 😊 There’s a special keyword LIMIT that tells SQL how many records do we want to be printed. For example, this query will print the first 5 records:

SELECT *
FROM users
LIMIT 5
Run query

Let’s combine all the things we have learned so far, can you tell what this query is doing?

SELECT *
FROM users
ORDER BY created_at DESC
LIMIT 5
Run query

This query could be interpreted like Give me all columns of 5 latest records in the users table, simply 5 latest signups.

Note, that we used the created_at datetime column to sort records by precise timestamps. An alternative would be to sort records by signup_date column. Keep in mind that signup_date doesn’t have information about hour/minute or second of a signup because it’s a column of type date ⚠

Now go and practice these queries in the Playground 👏

Anatoli Makarevich, author of SQL Habit About SQL Habit

Hi, it’s Anatoli, the author of SQL Habit. 👋

SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst 📊 and solve real-life challenges from Business, Marketing, and Product Management.

SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work. 🚀

“well worth the money”

Fluent in SQL in a month

Master Data Analysis with SQL with real life examples from Product Management, Marketing, Finance and more.
-- Type your query here, for example this one
-- lists all records from users table:
הההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההה
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Loading chart... ⏳

This data warehouse belongs to a fictional startup named Bindle. Bindle is a web and mobile app for reading books, it has subscription business model. The SQL Habit Course is based on Bindle’s story. 📖

Bindle’s data warehouse contains everything needed to run a modern Internet company – web and mobile analytics, marketing data, AB-test data, etc. 📊

users
Column Type Description
id integer Unique identifier of user.
email text User’s email, unique.
first_name text User’s first name.
last_name text User’s last name.
country text User’s signup country.
signup_date date Date when user signed up.
created_at datetime Timestamp when user record was created, we can treat it as signup date and time.
status text What status user has in Bindle, could be free (can read only free books) or customer (user who purchased a subscription, can read all books).
age integer User’s age.
referrer_id integer id of another user who referred this user (this is usually set when users sign up via referral link).
visitor_id text Identifier of a user in the web_analytics.pageviews table. Generated by a web analytics system and stored in a cookie.
utm_source text utm_source in URL when user signed up, used for marketing attribution
utm_campaign text utm_campaign in URL when user signed up, used for marketing attribution
utm_term text utm_term in URL when user signed up, used for marketing attribution
utm_content text utm_content in URL when user signed up, used for marketing attribution
utm_medium text utm_medium in URL when user signed up, used for marketing attribution
adjust_tracker text Adjust tracker in case user signed up via an Adjust link https://app.adjust.com/gxel3d1.
adjust_campaign text The value of campaign paramenter in Adjust URL.
adjust_adgroup text The value of adgroup paramenter in Adjust URL.
adjust_creative text The value of creative paramenter in Adjust URL.
profiles
Column Type Description
id integer Unique identifier of a profile.
user_id integer User’s id.
about text Information about a user.
interests text User’s interests. Comma separated list of tags.
avatar_url text URL of an avatar user uploaded. Check some of them out 😉.
postal_code text A postal code of a user for books delivery.
accounts
Column Type Description
id integer Unique identifier of an account.
user_id integer User’s id.
platform text Account’s platform. Bindle allows email and facebook signups.
email text Email attached to the account. Note that different platforms could have different emails.
created_at datetime Timestamp when this account was created. created_at of the first account is user’s signup timestamp.
books
Column Type Description
id integer Unique identifier of the book.
name text Name of the book.
slug text Identifer of a book used in URLs. For example https://www.bindle.com/books/final-future. Usually generated from book’s name.
genre text Book’s genre.
pages_count integer Number of pages in the book.
books_users
Column Type Description
user_id integer User’s id.
book_id integer Book’s id.
last_page integer The number of the last page user read in the book.
created_at datetime When user started reading the book.
products
Column Type Description
id integer Unique identifier of a product.
name text Name of a product.
price float Price of a product
purchases
Column Type Description
id integer Unique identifier of purchase.
user_id integer id of a user who made the purchase.
product_id integer id of a product inside products table.
amount float How much money user paid. The number might vary since users could apply discounts. Amount is always in US dollars.
refunded boolean Status of a purchase, we receive money on the bank account only if purchase wasn’t refunded.
created_at datetime When purchase was made.
marketing_spends
Column Type Description
id integer Unique identifier of spend, just a primary key in a table.
spent_at date The date of a spend. Spend data is usually reported by date.
amount float Amount of money in USD.
clicks integer Number of clicks on the ad on this date.
utm_source text utm_source of marketing campaign.
utm_campaign text utm_campaign of marketing campaign.
utm_term text utm_term of marketing campaign.
utm_content text utm_content of marketing campaign.
utm_medium text utm_medium of marketing campaign.
devices
Column Type Description
id integer Unique identifier of a user’s device.
user_id integer id of user who uses this device.
device_type text Type of the device, could be browser (for users who are using Bindle website) or iphone (users who are reading via Bindle app).
connected_at datetime Timestamp when user started using this device. Device with the earliest connected_at field is the device which user used for signing up.
version text For browser devices it’s a User Agent. For iPhone devices it’s a version of user’s iPhone and a version of iOS separated by comma.
web_analytics.pageviews
Column Type Description
url text URL of the visited page.
referer_url text URL of the previous page where user clicked on a link with URL. ☝️.
screen_resolution text Screen resolution of a user’s device. Example: 1024x1366.
device_type text Type of a user’s device. Could be mobile, tablet or desktop.
custom_parameters JSON All custom parameters of a pageview in a key-value format. Could be added per page, for example we might add ab_test_variation key to keep track of what AB-test variation user had seen.
pageview_id text Unique identifier of a pageview.
user_id integer If user is logged in – ID of a user in users table.
visitor_id text Unique identifier of a visitor. A fingerprint used to keep track of guest visitors who haven’t had signed up.
created_at datetime Timestamp of a pageview.
country text Country derived from user’s IP address.
web_analytics.events
Column Type Description
category text Category parameter of an event, for example Signup.
action text Action parameter of an event, for example Click.
name text Name parameter of an event, for example Signup for free.
pageview_id text Unique identifier of a record inside pageviews table. All events happen within one pageview. ⚠️
created_at datetime Timestamp of an event.
mobile_analytics.events
Column Type Description
event_id text Unique identifier of an event.
category text Category parameter of an event, for example onboarding.
action text Action parameter of an event, for example screenview.
name text Name parameter of an event, for example bindle-content.
screen_resolution text Resolution of a user’s smartphone, for example 375x812.
device_type text Model of a a user’s smartphone and the version of the OS, for example iPhone 7,12.1.0.
user_id integer If user is logged in – ID of a user in users table.
adid string Unique identifier of a user’s smartphone, same as in Adjust callbacks table.
country text Country derived from user’s IP address.
custom_parameters JSON All custom parameters of an even in a key-value format.
created_at datetime Timestamp of an event.
app_version text Version of the Bindle app a user is using, for example 1.1.1.
adjust.callbacks
Column Type Description
id integer Unique ID of an Adjust callback.
tracker text Adjust’s tracker parameter. For example https://app.adjust.com/gxel3d1.
created_at datetime Timestamp of a callback.
campaign_name text The value of campaign paramenter in Adjust URL.
adgroup_name text The value of adgroup paramenter in Adjust URL.
creative_name text The value of creative paramenter in Adjust URL.
label text The value of label paramenter in Adjust URL.
device_name text Model and OS version of a user’s device.
app_version text Bindle’s app version at the moment.
activity_kind text Predefined Adjust event, could be click, install, event (means custom event and event_name will be present) or else.
event_name text Custom Adjust event; Bindle has custom signup event.
adid text Adjust’s device ID.
user_id integer ID of a user (from users table). Present for signup event.
country text Country derived from user’s IP address.
helpers.dates
Column Type Description
id integer Unique identifier of a date. Just a primary key, an index of the table.
date date A date. By joining the very sparse timeline data to the consecutive range of dates we won’t have gaps.

All databases on SQL Habit are built with PostgreSQL.

Keywords
Column Description
SELECT

Specifies the columns of the result set.

SELECT id, email
FROM users
DISTINCT

Filters out duplicate records from the result set.

SELECT DISTINCT(country)
FROM users
CASE

If/else logic of SQL.

SELECT CASE WHEN age > 21 THEN 'adult' ELSE 'youngster' END
FROM users

It could be used without ELSE keyword (the default ELSE value is NULL):

SELECT COUNT(CASE WHEN status = 'customer' THEN id END)
FROM users

The CASE statement goes through all conditions and returns the first value when a condition is met (or ELSE value):

SELECT
  CASE
  WHEN age < 12 THEN 'child'
  WHEN age < 21 THEN 'teenager' ELSE 'adult'
  END AS age_category
FROM users
FROM

Specifies the tables we’re querying.

SELECT *
FROM purchases
WHERE

Specifies one or multiple filters.

SELECT *
FROM users
WHERE
  country = 'us'
INNER JOIN ... ON ...

Returns records that have matching values in both left (the one in FROM clause) and right (the one in INNER JOIN) tables.

SELECT *
FROM purchases p
INNER JOIN users u
  ON u.id = p.user_id
LEFT JOIN ... ON ...

Selects all records from the left table and matching records from the right table.

SELECT *
FROM users u
LEFT JOIN purchases p
  ON p.user_id = u.id
GROUP BY

Groups records and returns a row for each group – result of an aggregate function.

SELECT
  country,
  COUNT(*)
FROM users
GROUP BY 1
ORDER BY

Sorts records in the result set. The default order is ASC.

SELECT *
FROM purchases
ORDER BY created_at DESC
LIMIT

Sets the max number of records in the result set.

SELECT *
FROM books
LIMIT 10
BETWEEN ... AND ...

Filters the records within a range. ⚠️ It’s an inclusive filter, basically a shortcut for >= ... AND <= ...

SELECT *
FROM users
WHERE
  signup_date BETWEEN '2018-01-01' AND '2018-02-01'
AND

Combines two filter conditions. Evaluates to TRUE only if both conditions are TRUE.

SELECT *
FROM users
WHERE
  country = 'us'
  AND status = 'customer'
OR

Combines two filter conditions. Evaluates to TRUE if any condition is TRUE. In a query with many filters make sure to wrap two conditions with OR in brackets to avoid interferance with other filters.

SELECT *
FROM marketing_spends
WHERE
  utm_source = 'twitter'
  OR utm_source = 'facebook'
LIKE

Filters records by matching a string pattern. Symbol % stands for any character (wildcard).

SELECT *
FROM users
WHERE
  email LIKE '%gmail.com'
IN

Filters records that are present in a list or a subquery.

SELECT *
FROM users
WHERE
  country IN ('us', 'ca', 'au', 'gb')
IS NULL

Evaluates to TRUE if the value is NULL.

SELECT *
FROM users
WHERE
  utm_campaign IS NULL
IS NOT NULL

Evaluates to TRUE if the value is not NULL.

SELECT *
FROM users
WHERE
  utm_campaign IS NOT NULL
WITH ... AS ()

Defines a CTE (Common Table Expression) or simply a subquery.

WITH customers AS (
  SELECT *
  FROM users
  WHERE
    status = 'customer'
)

SELECT *
FROM customers
Date and time functions
Column Description
NOW()

Returns the current timestamp.

SELECT NOW()
-- Example result: 2020-02-22 22:16:16.634526+02
DATE_PART(part, timestamp/interval)

Returns the part of a timestamp or an interval as a number.

SELECT DATE_PART('day', '2020-02-14'::timestamp)
-- Result: 14
DATE_TRUNC(part, timestamp/interval)

Returns the part of a timestamp as a rounded timestamp.

SELECT DATE_TRUNC('day', '2020-02-14'::timestamp)
-- Result: 2020-02-14 00:00:00
TO_CHAR(timestamp, format)

Returns a formatted timestamp as a string. Possible patterns that could be used in a format argument:

Pattern Meaning and example
YYYY 4-digit year: 2022 or 1988.
YY 2-digit year: 22 or 88
MM 2-digit month: 11 or 05
MON 3-char month name: AUG or JUL
DD 2-digit day: 31 or 04
HH12 2-digit hour number in 0-11 format: 03 or 11
HH24 2-digit hour number in the 0-23 format: 03 or 22
MI Minutes in a 2-digit format, 01 or 59
SELECT TO_CHAR('2020-02-14'::timestamp, 'YYYY-MM')
-- Result: 2020-02

The whole list of patterns could be found on the official PostgreSQL website.

TO_DATE(string, date_format)

Converts a string into a date.

SELECT TO_DATE('2020-02-14', 'YYYY-MM-DD')
-- Result: 2020-02-14
SELECT TO_DATE('02/14/2020', 'MM/DD/YYYY')
-- Result: 2020-02-14
Numeric functions
Column Description
ABS(number)

Returns the absolute value of the number.

SELECT ABS(-100)
-- Result: 100
RANDOM()

Returns a random float number between 0 and 1.

SELECT RANDOM()
-- Exampe result: 0.8502873764373362
ROUND(number, precision = 0)

Rounds a number to a certain precision.

SELECT ROUND(3.141592)
-- Result: 3

SELECT ROUND(3.141592, 4)
-- Result: 3.1416
CEIL(number)

Round a number to the closest higher integer.

SELECT CEIL(3.1415)
-- Result: 4
FLOOR(number)

Round a number to the closest lower integer.

SELECT FLOOR(3.1415)
-- Result: 3
TRUNC(number, precision = 0)

Truncates the number to a precision without rounding it.

SELECT TRUNC(3.141592, 5)
-- Result: 3.14159

SELECT TRUNC(3.141592)
-- Result: 3
LEAST(val1, val2, …)

Returns the smallest value from the list of arguments.

SELECT LEAST(2, 1, 3)
-- Result: 1
GREATEST(val1, val2, …)

Returns the highest value from the list of arguments.

SELECT GREATEST(1, 3, 2)
-- Result: 3
NULL functions
Column Description
COALESCE(val1, val2, ...)

Returns the first non-NULL input argument. Used to specify default values.

SELECT COALESCE(NULL, NULL, 'foo', 'bar')
-- Result: foo
NULLIF(val1, val2)

Returns NULL if arguments are equal. Returns the first argument otherwise.

SELECT
  NULLIF(123, 123),
  NULLIF('2020-02-14', '2020-02-15')
-- Result: NULL, '2020-02-14'
Window functions
Column Description
ROW_NUMBER()

Gives each record in the partition a number (starting with 1).

SELECT
  ROW_NUMBER() OVER(PARTITION BY visitor_id ORDER BY created_at ASC),
  *
FROM web_analytics.pageviews
LAG(column, offset = 1)

Returns the column value (or NULL) from the previous row in the partition (or a number of rows specified by the offset argument).

SELECT
  user_id,
  created_at,
  amount,
  LAG(amount) OVER(PARTITION BY user_id ORDER BY created_at ASC)
FROM purchases
LEAD(column, offset = 1)

Returns the column value (or NULL) from the next row in the partition (or a number of rows specified by the offset argument).

SELECT
  visitor_id,
  created_at,
  url,
  LEAD(url) OVER(PARTITION BY visitor_id ORDER BY created_at ASC)
FROM web_analytics.pageviews
FIRST_VALUE(column)

Return value of the first row in the partition.

SELECT
  FIRST_VALUE(url) OVER(PARTITION BY visitor_id ORDER BY created_at ASC),
  *
FROM web_analytics.pageviews
LAST_VALUE(column)

Return value of the last row in the partition.

SELECT
  LAST_VALUE(url) OVER(PARTITION BY visitor_id ORDER BY created_at ASC),
  *
FROM web_analytics.pageviews
COUNT(statement)

Returns the number of rows in the partition. ⚠️ NULL values are ignored.

SELECT
  COUNT(*) OVER (PARTITION BY country),
  *
FROM users
SUM(statement)

Returns the sum of values in the partition.

SELECT
  SUM(amount) OVER (),
  *
FROM purchases
AVG(statement)

Returns the average value in the partition.

SELECT
  AVG(age) OVER(PARTITION BY country),
  *
FROM users
MAX(statement)

Returns the max value in the partition.

SELECT
  MAX(age) OVER(PARTITION BY country),
  *
FROM users
MIN(statement)

Returns the min value in the partition.

SELECT
  MIN(age) OVER(PARTITION BY country),
  *
FROM users
Aggregate functions
Column Description
COUNT(statement)

Returns the number of rows. ⚠️ NULL values are ignored.

SELECT
  country,
  COUNT(*)
FROM users
GROUP BY 1
SUM(statement)

Returns the sum of values.

SELECT
  SUM(amount)
FROM purchases
AVG(statement)

Returns the average value.

SELECT
  AVG(age)
FROM users
MAX(statement)

Returns the max value.

SELECT
  country,
  MAX(age)
FROM users
GROUP BY 1
MIN(statement)

Returns the min value.

SELECT
  country,
  MIN(age)
FROM users
GROUP BY 1
Set operators
Column Description
UNION

Combines the results of two queries (stacks two result sets vertically). ⚠️ It filters out duplicate records.

SELECT *
FROM table_1
UNION
SELECT *
FROM table_2
UNION ALL

Combines the results of two queries. It keeps duplicate records.

SELECT *
FROM table_1
UNION ALL
SELECT *
FROM table_2
EXCEPT

Returns records from the first query that are not returned by the second query. ⚠️ It filters out duplicate records.

SELECT *
FROM table_1
EXCEPT
SELECT *
FROM table_2
EXCEPT ALL

Returns records from the first query that are not returned by the second query. It keeps duplicate records.

SELECT *
FROM table_1
EXCEPT ALL
SELECT *
FROM table_2
INTERSECT

Returns records that are returned by both the first and second queries. ⚠️ It filters out duplicate records.

SELECT *
FROM table_1
INTERSECT
SELECT *
FROM table_2
INTERSECT ALL

Returns records that are returned by both the first and second queries. It keeps duplicate records.

SELECT *
FROM table_1
INTERSECT ALL
SELECT *
FROM table_2
String functions
Column Description
CONCAT(str1, str2, ...)

Combines all input strings to one string.

SELECT CONCAT('Hello', ' ', 'world', '!')
-- Result: Hello world!
LENGTH(text)

Returns a number of characters in the input text.

SELECT LENGTH('Hello world!')
-- Result: 12
LOWER(text)

Changes the case of text to the lower case.

SELECT LOWER('Hello world!')
-- Result: hello world!
UPPER(text)

Changes the case of text to the upper case.

SELECT LOWER('Hello world!')
-- Result: HELLO WORLD!
REVERSE(text)

Reverses the input text.

SELECT REVERSE('foobar')
-- Result: raboof
LEFT(text, number)

Trims the number of characters from the beginning of a text.

SELECT LEFT('foobar', 3)
-- Result: foo
RIGHT(text, number)

Trims the number of characters from the end of a text.

SELECT RIGHT('foobar', 3)
-- Result: bar
LPAD(text, length, text_to_add)

Adds new characters to the beginning of a text input until it has the specified length.

SELECT LPAD('8319', 16, 'X')
-- Result: XXXXXXXXXXXX8319
RPAD(text, length, text_to_add)

Adds new characters to the end of a text input until it has the specified length.

SELECT RPAD('1248', 8, '_')
-- Result: 1248____
SPLIT_PART(text, substring, number)

Splits a text input by a substring into multiple parts and returns the specified part. The parts numbers start with 1.

SELECT SPLIT_PART('foo_bar', '_', 2)
-- Result: bar
REPEAT(text, number)

Repeats a text input a number of times.

SELECT REPEAT('abc', 3)
-- Result: abcabcabc
REPLACE(text, substring, new_string)

Replaces a substring in a text with a new string.

SELECT REPLACE('foobar', 'foo', '')
-- Result: bar
REGEXP_REPLACE(text, pattern, new_string)

Replaces substrings of a text which match an input pattern with a new string.

SELECT REGEXP_REPLACE('foobar@gmail.com', '\A\w+@', '')
-- Result: gmail.com
SUBSTRING(text_input, regex)

Returns the first matched group.

SELECT SUBSTRING('Sally (engineer)', '(\w+)\s\((\w+)\)')
-- Result: Sally
REGEXP_MATCHES(text_input, regex)

Returns an array of all captured substrings.

SELECT regexp_matches('Sally (engineer)', '(\w+)\s\((\w+)\)')
-- Result: {Sally,engineer}