This data warehouse belongs to a fictional startup named Bindle. Bindle is a web and mobile app for reading books, it has subscription business model. The SQL Habit Course is based on Bindle’s story.
Bindle’s data warehouse contains everything needed to run a modern Internet company – web and mobile analytics, marketing data, AB-test data, etc.
Column | Type | Description |
---|---|---|
id | integer | Unique identifier of user. |
text | User’s email, unique. | |
first_name | text | User’s first name. |
last_name | text | User’s last name. |
country | text | User’s signup country. |
signup_date | date | Date when user signed up. |
created_at | datetime | Timestamp when user record was created, we can treat it as signup date and time. |
status | text | What status user has in Bindle, could be free (can read only free books) or customer (user who purchased a subscription, can read all books). |
age | integer | User’s age. |
referrer_id | integer | id of another user who referred this user (this is usually set when users sign up via referral link). |
visitor_id | text | Identifier of a user in the web_analytics.pageviews table. Generated by a web analytics system and stored in a cookie. |
utm_source | text | utm_source in URL when user signed up, used for marketing attribution |
utm_campaign | text | utm_campaign in URL when user signed up, used for marketing attribution |
utm_term | text | utm_term in URL when user signed up, used for marketing attribution |
utm_content | text | utm_content in URL when user signed up, used for marketing attribution |
utm_medium | text | utm_medium in URL when user signed up, used for marketing attribution |
adjust_tracker | text | Adjust tracker in case user signed up via an Adjust link https://app.adjust.com/gxel3d1. |
adjust_campaign | text | The value of campaign paramenter in Adjust URL. |
adjust_adgroup | text | The value of adgroup paramenter in Adjust URL. |
adjust_creative | text | The value of creative paramenter in Adjust URL. |
Column | Type | Description |
---|---|---|
id | integer | Unique identifier of a profile. |
user_id | integer | User’s id. |
about | text | Information about a user. |
interests | text | User’s interests. Comma separated list of tags. |
avatar_url | text | URL of an avatar user uploaded. Check some of them out ![]() |
postal_code | text | A postal code of a user for books delivery. |
Column | Type | Description |
---|---|---|
id | integer | Unique identifier of an account. |
user_id | integer | User’s id. |
platform | text | Account’s platform. Bindle allows email and facebook signups. |
text | Email attached to the account. Note that different platforms could have different emails. | |
created_at | datetime | Timestamp when this account was created. created_at of the first account is user’s signup timestamp. |
Column | Type | Description |
---|---|---|
id | integer | Unique identifier of the book. |
name | text | Name of the book. |
slug | text | Identifer of a book used in URLs. For example https://www.bindle.com/books/final-future. Usually generated from book’s name. |
genre | text | Book’s genre. |
pages_count | integer | Number of pages in the book. |
Column | Type | Description |
---|---|---|
user_id | integer | User’s id. |
book_id | integer | Book’s id. |
last_page | integer | The number of the last page user read in the book. |
created_at | datetime | When user started reading the book. |
Column | Type | Description |
---|---|---|
id | integer | Unique identifier of a product. |
name | text | Name of a product. |
price | float | Price of a product |
Column | Type | Description |
---|---|---|
id | integer | Unique identifier of purchase. |
user_id | integer | id of a user who made the purchase. |
product_id | integer | id of a product inside products table. |
amount | float | How much money user paid. The number might vary since users could apply discounts. Amount is always in US dollars. |
refunded | boolean | Status of a purchase, we receive money on the bank account only if purchase wasn’t refunded. |
created_at | datetime | When purchase was made. |
Column | Type | Description |
---|---|---|
id | integer | Unique identifier of spend, just a primary key in a table. |
spent_at | date | The date of a spend. Spend data is usually reported by date. |
amount | float | Amount of money in USD. |
clicks | integer | Number of clicks on the ad on this date. |
utm_source | text | utm_source of marketing campaign. |
utm_campaign | text | utm_campaign of marketing campaign. |
utm_term | text | utm_term of marketing campaign. |
utm_content | text | utm_content of marketing campaign. |
utm_medium | text | utm_medium of marketing campaign. |
Column | Type | Description |
---|---|---|
id | integer | Unique identifier of a user’s device. |
user_id | integer | id of user who uses this device. |
device_type | text | Type of the device, could be browser (for users who are using Bindle website) or iphone (users who are reading via Bindle app). |
connected_at | datetime | Timestamp when user started using this device. Device with the earliest connected_at field is the device which user used for signing up. |
version | text | For browser devices it’s a User Agent. For iPhone devices it’s a version of user’s iPhone and a version of iOS separated by comma. |
Column | Type | Description |
---|---|---|
url | text | URL of the visited page. |
referer_url | text | URL of the previous page where user clicked on a link with URL. ![]() |
screen_resolution | text | Screen resolution of a user’s device. Example: 1024x1366. |
device_type | text | Type of a user’s device. Could be mobile, tablet or desktop. |
custom_parameters | JSON | All custom parameters of a pageview in a key-value format. Could be added per page, for example we might add ab_test_variation key to keep track of what AB-test variation user had seen. |
pageview_id | text | Unique identifier of a pageview. |
user_id | integer | If user is logged in – ID of a user in users table. |
visitor_id | text | Unique identifier of a visitor. A fingerprint used to keep track of guest visitors who haven’t had signed up. |
created_at | datetime | Timestamp of a pageview. |
country | text | Country derived from user’s IP address. |
Column | Type | Description |
---|---|---|
category | text | Category parameter of an event, for example Signup. |
action | text | Action parameter of an event, for example Click. |
name | text | Name parameter of an event, for example Signup for free. |
pageview_id | text | Unique identifier of a record inside pageviews table. All events happen within one pageview. ![]() |
created_at | datetime | Timestamp of an event. |
Column | Type | Description |
---|---|---|
event_id | text | Unique identifier of an event. |
category | text | Category parameter of an event, for example onboarding. |
action | text | Action parameter of an event, for example screenview. |
name | text | Name parameter of an event, for example bindle-content. |
screen_resolution | text | Resolution of a user’s smartphone, for example 375x812. |
device_type | text | Model of a a user’s smartphone and the version of the OS, for example iPhone 7,12.1.0. |
user_id | integer | If user is logged in – ID of a user in users table. |
adid | string | Unique identifier of a user’s smartphone, same as in Adjust callbacks table. |
country | text | Country derived from user’s IP address. |
custom_parameters | JSON | All custom parameters of an even in a key-value format. |
created_at | datetime | Timestamp of an event. |
app_version | text | Version of the Bindle app a user is using, for example 1.1.1. |
Column | Type | Description |
---|---|---|
id | integer | Unique ID of an Adjust callback. |
tracker | text | Adjust’s tracker parameter. For example https://app.adjust.com/gxel3d1. |
created_at | datetime | Timestamp of a callback. |
campaign_name | text | The value of campaign paramenter in Adjust URL. |
adgroup_name | text | The value of adgroup paramenter in Adjust URL. |
creative_name | text | The value of creative paramenter in Adjust URL. |
label | text | The value of label paramenter in Adjust URL. |
device_name | text | Model and OS version of a user’s device. |
app_version | text | Bindle’s app version at the moment. |
activity_kind | text | Predefined Adjust event, could be click, install, event (means custom event and event_name will be present) or else. |
event_name | text | Custom Adjust event; Bindle has custom signup event. |
adid | text | Adjust’s device ID. |
user_id | integer | ID of a user (from users table). Present for signup event. |
country | text | Country derived from user’s IP address. |
Column | Type | Description |
---|---|---|
id | integer | Unique identifier of a date. Just a primary key, an index of the table. |
date | date | A date. By joining the very sparse timeline data to the consecutive range of dates we won’t have gaps. |
All databases on SQL Habit are built with PostgreSQL.
Column | Description |
---|---|
SELECT | Specifies the columns of the result set.
|
DISTINCT | Filters out duplicate records from the result set.
|
CASE | If/else logic of SQL.
It could be used without ELSE keyword (the default ELSE value is NULL):
The
|
FROM | Specifies the tables we’re querying.
|
WHERE | Specifies one or multiple filters.
|
INNER JOIN ... ON ... | Returns records that have matching values in both left (the one in FROM clause) and right (the one in INNER JOIN) tables.
|
LEFT JOIN ... ON ... | Selects all records from the left table and matching records from the right table.
|
GROUP BY | Groups records and returns a row for each group – result of an aggregate function.
|
ORDER BY | Sorts records in the result set. The default order is ASC.
|
LIMIT | Sets the max number of records in the result set.
|
BETWEEN ... AND ... | Filters the records within a range.
|
AND | Combines two filter conditions. Evaluates to TRUE only if both conditions are TRUE.
|
OR | Combines two filter conditions. Evaluates to TRUE if any condition is TRUE. In a query with many filters make sure to wrap two conditions with OR in brackets to avoid interferance with other filters.
|
LIKE | Filters records by matching a string pattern. Symbol % stands for any character (wildcard).
|
IN | Filters records that are present in a list or a subquery.
|
IS NULL | Evaluates to TRUE if the value is NULL.
|
IS NOT NULL | Evaluates to TRUE if the value is not NULL.
|
WITH ... AS () | Defines a CTE (Common Table Expression) or simply a subquery.
|
Column | Description | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NOW() | Returns the current timestamp.
| ||||||||||||||||||
DATE_PART(part, timestamp/interval) | Returns the part of a timestamp or an interval as a number.
| ||||||||||||||||||
DATE_TRUNC(part, timestamp/interval) | Returns the part of a timestamp as a rounded timestamp.
| ||||||||||||||||||
TO_CHAR(timestamp, format) | Returns a formatted timestamp as a string. Possible patterns that could be used in a format argument:
The whole list of patterns could be found on the official PostgreSQL website. | ||||||||||||||||||
TO_DATE(string, date_format) | Converts a string into a date.
|
Column | Description |
---|---|
ABS(number) | Returns the absolute value of the number.
|
RANDOM() | Returns a random float number between 0 and 1.
|
ROUND(number, precision = 0) | Rounds a number to a certain precision.
|
CEIL(number) | Round a number to the closest higher integer.
|
FLOOR(number) | Round a number to the closest lower integer.
|
TRUNC(number, precision = 0) | Truncates the number to a precision without rounding it.
|
LEAST(val1, val2, …) | Returns the smallest value from the list of arguments.
|
GREATEST(val1, val2, …) | Returns the highest value from the list of arguments.
|
Column | Description |
---|---|
COALESCE(val1, val2, ...) | Returns the first non-NULL input argument. Used to specify default values.
|
NULLIF(val1, val2) | Returns NULL if arguments are equal. Returns the first argument otherwise.
|
Column | Description |
---|---|
ROW_NUMBER() | Gives each record in the partition a number (starting with 1).
|
LAG(column, offset = 1) | Returns the column value (or NULL) from the previous row in the partition (or a number of rows specified by the offset argument).
|
LEAD(column, offset = 1) | Returns the column value (or NULL) from the next row in the partition (or a number of rows specified by the offset argument).
|
FIRST_VALUE(column) | Return value of the first row in the partition.
|
LAST_VALUE(column) | Return value of the last row in the partition.
|
COUNT(statement) | Returns the number of rows in the partition.
|
SUM(statement) | Returns the sum of values in the partition.
|
AVG(statement) | Returns the average value in the partition.
|
MAX(statement) | Returns the max value in the partition.
|
MIN(statement) | Returns the min value in the partition.
|
Column | Description |
---|---|
COUNT(statement) | Returns the number of rows.
|
SUM(statement) | Returns the sum of values.
|
AVG(statement) | Returns the average value.
|
MAX(statement) | Returns the max value.
|
MIN(statement) | Returns the min value.
|
Column | Description |
---|---|
UNION | Combines the results of two queries (stacks two result sets vertically).
|
UNION ALL | Combines the results of two queries. It keeps duplicate records.
|
EXCEPT | Returns records from the first query that are not returned by the second query.
|
EXCEPT ALL | Returns records from the first query that are not returned by the second query. It keeps duplicate records.
|
INTERSECT | Returns records that are returned by both the first and second queries.
|
INTERSECT ALL | Returns records that are returned by both the first and second queries. It keeps duplicate records.
|
Column | Description |
---|---|
CONCAT(str1, str2, ...) | Combines all input strings to one string.
|
LENGTH(text) | Returns a number of characters in the input text.
|
LOWER(text) | Changes the case of text to the lower case.
|
UPPER(text) | Changes the case of text to the upper case.
|
REVERSE(text) | Reverses the input text.
|
LEFT(text, number) | Trims the number of characters from the beginning of a text.
|
RIGHT(text, number) | Trims the number of characters from the end of a text.
|
LPAD(text, length, text_to_add) | Adds new characters to the beginning of a text input until it has the specified length.
|
RPAD(text, length, text_to_add) | Adds new characters to the end of a text input until it has the specified length.
|
SPLIT_PART(text, substring, number) | Splits a text input by a substring into multiple parts and returns the specified part. The parts numbers start with 1.
|
REPEAT(text, number) | Repeats a text input a number of times.
|
REPLACE(text, substring, new_string) | Replaces a substring in a text with a new string.
|
REGEXP_REPLACE(text, pattern, new_string) | Replaces substrings of a text which match an input pattern with a new string.
|
SUBSTRING(text_input, regex) | Returns the first matched group.
|
REGEXP_MATCHES(text_input, regex) | Returns an array of all captured substrings.
|