Database Fundamentals

SQL 101
Structured Query Language

SQL is the standard language for managing relational databases. It allows you to create, modify, and query data efficiently. Essential knowledge for data engineering, analysis, and backend development.

Author: Michaël Bettan

Topic: Relational Databases

SQL Basics & Sublanguages

Use Cases

Defining database structures (tables, views, etc.)
Adding, updating, and deleting data
Retrieving data with precise filtering and sorting
Controlling access to your database

History & Popular RDBMS

Developed in the 1970s at IBM (originally SEQUEL).
Standardized by ANSI and ISO in the 1980s.
Continues to evolve with new features and standards.
Commercial: Oracle, SQL Server, Db2
Open-Source: MySQL, PostgreSQL

SQL commands are grouped into distinct sublanguages based on their primary function within the database.

DDL (Data Definition Language)

Defining the database structure (schema).

CREATE: Create databases, tables, views, indexes.
CTAS: CREATE TABLE AS SELECT by directly selecting data from an existing table or query.
ALTER: Modify existing database objects (e.g., add a column to a table).
DROP: Delete database objects.
TRUNCATE: Remove all data from a table, but keep the table structure.

DML (Data Manipulation Language)

Manipulating data within the database.

SELECT: Retrieve data from one or more tables.
INSERT: Add new data into a table.
UPDATE: Modify existing data in a table.
DELETE: Remove data from a table.
MERGE: conditionally update, insert, or delete rows

DQL (Data Query Language)

Focuses solely on retrieving data.

SELECT: core command for querying data, with various clauses for filtering, sorting, and joining.

DCL & TCL

Data Control Language (DCL): Controlling access.

GRANT: Give users specific privileges (e.g., read, write).
REVOKE: Take away user privileges.

Transaction Control Language (TCL): Managing transactions (logical units of work).

COMMIT: Save changes made within a transaction.
ROLLBACK: Undo changes made within a transaction.
SAVEPOINT: Create points within a transaction to potentially roll back to.

Data Types & Operators

Every column in a database table must have a defined data type. Operators allow you to perform calculations and comparisons on these data types.

Numeric Types: INT: Whole numbers.
DECIMAL(p, s): Fixed-point numbers with precision p (total digits) and scale s (digits after the decimal).
FLOAT: Single-precision floating-point numbers.
DOUBLE: Double-precision floating-point numbers.

String Types: CHAR(n): Fixed-length strings with a maximum length of n characters.
VARCHAR(n): Variable-length strings with a maximum length of n characters.
TEXT: Large amounts of text data.

Date & Time Types: DATE: Stores a date (YYYY-MM-DD).
TIME: Stores a time (HH:MM:SS).
DATETIME: Stores both date and time.
TIMESTAMP: Stores a timestamp, which is a specific point in time.

Other Data Types: BOOLEAN: True or False values.
BLOB: Binary Large Object, for storing binary data (e.g., images).

Operators

Math: +, -, *, /, % (modulo)
Comparison: =, !=, >, <, >=, <=
String: || (concatenation), LIKE, NOT LIKE (pattern matching)
Variables: (Vary by database system, often prefixed with @ or :)
Logical: AND, OR, NOT
Bitwise: & (AND), | (OR), ^ (XOR), ~ (NOT)
Miscellaneous: IN, BETWEEN, IS NULL, IS NOT NULL
Comments: Single-line: -- This is a comment | Multi-line: /* This is a multi-line comment */

Operator Precedence

SQL follows general mathematical order of operations (parentheses first, then multiplication/division, then addition/subtraction, etc.).

Querying & Search Conditions

Search Conditions & Clauses

WHERE clause: Filters data based on specified conditions (e.g., WHERE age > 25).
FROM & JOIN: Specifies the table(s) to retrieve data from and how to join them if multiple tables are involved.
GROUP BY: Groups rows with the same values in one or more columns.
HAVING: Filters grouped data (used with GROUP BY).
ORDER BY: Sorts the result set by one or more columns (ascending ASC or descending DESC).
TOP / LIMIT: Limits the number of rows returned (e.g., TOP 10 or LIMIT 10).
INTO: (Used in some database systems to insert data into a new table).

Aggregate Functions

GROUP BY is necessary when you want to apply aggregate functions to subsets of your data. For the entire dataset, not needed.

SUM: Total of numeric values.
AVG: Average of numeric values.
MIN/MAX: Retrieve the minimum or maximum value.
COUNT: Count the number of rows.
ROUND: Round a numeric value to a specified number of decimal places.
ARRAY_AGG: gathers values from multiple rows and puts them into an array.

Conditional Queries & SubQueries

Conditional:

CASE: Conditionally return different values.
COALESCE: Return the first non-null value.
CAST: Convert a value to a different data type.

SubQueries:

IN/EXISTS: Test for the presence of rows in a subquery.
Correlated Subquery: Subquery that references columns from the outer query.
CTE (WITH): Define temporary result sets.

Joins & Combining Data

Joins (inclusive)

(INNER) JOIN: returns rows with matching values in both tables
LEFT JOIN: returns all rows from the left table and matching rows from the right
RIGHT JOIN: returns all rows from the right table and matching rows from the left
FULL (OUTER) JOIN: returns all rows from both sides even if join condition is not met
CROSS JOIN: returns all possible combinations of rows (cartesian product)
Self-join: is a join where a table is joined with itself. It allows you to compare rows within the same table. Useful when you need to compare or relate data from different rows of the same table.

Combine Data

UNION: Combine results of two queries, removing duplicates.
UNION ALL: Combine results without removing duplicates. (SQL operator that appends together rows from different result sets).
EXCEPT: Return rows from the first query not found in the second.
INTERSECT: Return rows common to both queries.

Union vs Join

Union combines the rows of two or more tables with similar structures (same number of columns and compatible data types). Removes duplicate rows by default (unless UNION ALL). Think of it as stacking tables on top of each other.

Join combines columns from two or more tables based on a related column between them. Link information from different tables. Determine which rows are included in the output based on matches in the join condition.

Advanced Functions & Techniques

String Functions

UPPER/LOWER: Convert text to uppercase or lowercase.
LENGTH: Calculate the length of a string.
SUBSTR: Extract a substring from a string.
Concatenation: Combine strings together.

Mathematical Functions

SAFE_DIVIDE(x, y): divides, but returns NULL if dividing by zero
ROUND(x [, N]): rounds x to N decimal places (or nearest integer if N is omitted)
MOD(x, y): gives the remainder of x divided by y.
ABS(x): returns the positive version of x.

Window & Ranking Functions

Window Functions:

PARTITION BY: divide the result set into partitions for calculations.
ORDER BY: define the order of rows within a partition.

Ranking data:

RANK: Assign ranks to rows, leaving gaps for ties.
DENSE_RANK: Assign ranks without gaps for ties.
ROW_NUMBER: Assign a unique number to each row.

Miscellaneous Concepts

Table wildcards: Enable you to query multiple tables using concise SQL statements. Wildcard tables are available only in standard SQL.

DISTINCT: Operates on columns: The keyword is used to return unique rows based on the values in specified columns. It eliminates duplicate rows. You can't directly apply DISTINCT to a single column while selecting other columns. DISTINCT works on the entire row, considering all the columns you specify in the SELECT statement.

NEST: Combines values from multiple rows into a single row with an array (like creating those smaller boxes).

UNNEST: Expands an array into multiple rows (like taking the toys out of the smaller boxes).

STRUCT: Lets you organize related data within a single column, making your data more structured and easier to work with.

LAG: Window function in SQL used to access data from a previous row (or rows) within the same result set without needing to perform a self-join. It’s helpful when you need to compare the current row with the previous one in a specific order.

Resources & Assets

Video Content

Watch the SQL 101 blueprint video on YouTube for a visual walkthrough.

Watch on YouTube

Downloadable Assets

High-Res Infographic Slidedeck (PDF) Podcast Episode

Self-Assessment Questions

Q1. What does DDL stand for and give an example command?

Data Definition Language. Examples: CREATE, ALTER, DROP, TRUNCATE.

Q2. What is the difference between UNION and UNION ALL?

UNION combines results and removes duplicates, while UNION ALL combines results without removing duplicates.

Q3. Which join returns all rows from both tables even if there is no match?

FULL (OUTER) JOIN.

Q4. What is the purpose of the GROUP BY clause?

To group rows with the same values in one or more columns, typically for use with aggregate functions.

Q5. What is the difference between RANK and DENSE_RANK window functions?

RANK leaves gaps in the ranking sequence when there are ties, while DENSE_RANK does not leave gaps.

Study Progress — Module Mastery 100%

SQL 101Structured Query Language

SQL Basics & Sublanguages

Use Cases

History & Popular RDBMS

DDL (Data Definition Language)

DML (Data Manipulation Language)

DQL (Data Query Language)

DCL & TCL

Data Types & Operators

Operators

Operator Precedence

Querying & Search Conditions

Search Conditions & Clauses

Aggregate Functions

Conditional Queries & SubQueries

Joins & Combining Data

Joins (inclusive)

Combine Data

Union vs Join

Advanced Functions & Techniques

String Functions

Mathematical Functions

Window & Ranking Functions

Miscellaneous Concepts

Resources & Assets

Video Content

Downloadable Assets

Self-Assessment Questions

SQL 101
Structured Query Language