×
Database Fundamentals

SQL 101
Structured Query Language

SQL is the standard language for managing relational databases. It allows you to create, modify, and query data efficiently. Essential knowledge for data engineering, analysis, and backend development.

Author: Michaël Bettan
Topic: Relational Databases
01

SQL Basics & Sublanguages

Use Cases

  • Defining database structures (tables, views, etc.)
  • Adding, updating, and deleting data
  • Retrieving data with precise filtering and sorting
  • Controlling access to your database

History & Popular RDBMS

  • Developed in the 1970s at IBM (originally SEQUEL).
  • Standardized by ANSI and ISO in the 1980s.
  • Continues to evolve with new features and standards.
  • Commercial: Oracle, SQL Server, Db2
  • Open-Source: MySQL, PostgreSQL

SQL commands are grouped into distinct sublanguages based on their primary function within the database.

DDL (Data Definition Language)

Defining the database structure (schema).

  • CREATE: Create databases, tables, views, indexes.
    CTAS: CREATE TABLE AS SELECT by directly selecting data from an existing table or query.
  • ALTER: Modify existing database objects (e.g., add a column to a table).
  • DROP: Delete database objects.
  • TRUNCATE: Remove all data from a table, but keep the table structure.

DML (Data Manipulation Language)

Manipulating data within the database.

  • SELECT: Retrieve data from one or more tables.
  • INSERT: Add new data into a table.
  • UPDATE: Modify existing data in a table.
  • DELETE: Remove data from a table.
  • MERGE: conditionally update, insert, or delete rows

DQL (Data Query Language)

Focuses solely on retrieving data.

  • SELECT: core command for querying data, with various clauses for filtering, sorting, and joining.

DCL & TCL

Data Control Language (DCL): Controlling access.

  • GRANT: Give users specific privileges (e.g., read, write).
  • REVOKE: Take away user privileges.

Transaction Control Language (TCL): Managing transactions (logical units of work).

  • COMMIT: Save changes made within a transaction.
  • ROLLBACK: Undo changes made within a transaction.
  • SAVEPOINT: Create points within a transaction to potentially roll back to.
02

Data Types & Operators

Every column in a database table must have a defined data type. Operators allow you to perform calculations and comparisons on these data types.

Numeric Types
INT: Whole numbers.
DECIMAL(p, s): Fixed-point numbers with precision p (total digits) and scale s (digits after the decimal).
FLOAT: Single-precision floating-point numbers.
DOUBLE: Double-precision floating-point numbers.
String Types
CHAR(n): Fixed-length strings with a maximum length of n characters.
VARCHAR(n): Variable-length strings with a maximum length of n characters.
TEXT: Large amounts of text data.
Date & Time Types
DATE: Stores a date (YYYY-MM-DD).
TIME: Stores a time (HH:MM:SS).
DATETIME: Stores both date and time.
TIMESTAMP: Stores a timestamp, which is a specific point in time.
Other Data Types
BOOLEAN: True or False values.
BLOB: Binary Large Object, for storing binary data (e.g., images).

Operators

Operator Precedence

SQL follows general mathematical order of operations (parentheses first, then multiplication/division, then addition/subtraction, etc.).

03

Querying & Search Conditions

Search Conditions & Clauses

Aggregate Functions

GROUP BY is necessary when you want to apply aggregate functions to subsets of your data. For the entire dataset, not needed.

  • SUM: Total of numeric values.
  • AVG: Average of numeric values.
  • MIN/MAX: Retrieve the minimum or maximum value.
  • COUNT: Count the number of rows.
  • ROUND: Round a numeric value to a specified number of decimal places.
  • ARRAY_AGG: gathers values from multiple rows and puts them into an array.

Conditional Queries & SubQueries

Conditional:

  • CASE: Conditionally return different values.
  • COALESCE: Return the first non-null value.
  • CAST: Convert a value to a different data type.

SubQueries:

  • IN/EXISTS: Test for the presence of rows in a subquery.
  • Correlated Subquery: Subquery that references columns from the outer query.
  • CTE (WITH): Define temporary result sets.
04

Joins & Combining Data

Joins (inclusive)

  • (INNER) JOIN: returns rows with matching values in both tables
  • LEFT JOIN: returns all rows from the left table and matching rows from the right
  • RIGHT JOIN: returns all rows from the right table and matching rows from the left
  • FULL (OUTER) JOIN: returns all rows from both sides even if join condition is not met
  • CROSS JOIN: returns all possible combinations of rows (cartesian product)
  • Self-join: is a join where a table is joined with itself. It allows you to compare rows within the same table. Useful when you need to compare or relate data from different rows of the same table.

Combine Data

  • UNION: Combine results of two queries, removing duplicates.
  • UNION ALL: Combine results without removing duplicates. (SQL operator that appends together rows from different result sets).
  • EXCEPT: Return rows from the first query not found in the second.
  • INTERSECT: Return rows common to both queries.

Union vs Join

Union combines the rows of two or more tables with similar structures (same number of columns and compatible data types). Removes duplicate rows by default (unless UNION ALL). Think of it as stacking tables on top of each other.

Join combines columns from two or more tables based on a related column between them. Link information from different tables. Determine which rows are included in the output based on matches in the join condition.

05

Advanced Functions & Techniques

String Functions

  • UPPER/LOWER: Convert text to uppercase or lowercase.
  • LENGTH: Calculate the length of a string.
  • SUBSTR: Extract a substring from a string.
  • Concatenation: Combine strings together.

Mathematical Functions

  • SAFE_DIVIDE(x, y): divides, but returns NULL if dividing by zero
  • ROUND(x [, N]): rounds x to N decimal places (or nearest integer if N is omitted)
  • MOD(x, y): gives the remainder of x divided by y.
  • ABS(x): returns the positive version of x.

Window & Ranking Functions

Window Functions:

  • PARTITION BY: divide the result set into partitions for calculations.
  • ORDER BY: define the order of rows within a partition.

Ranking data:

  • RANK: Assign ranks to rows, leaving gaps for ties.
  • DENSE_RANK: Assign ranks without gaps for ties.
  • ROW_NUMBER: Assign a unique number to each row.

Miscellaneous Concepts

Table wildcards
Enable you to query multiple tables using concise SQL statements. Wildcard tables are available only in standard SQL.
DISTINCT
Operates on columns: The keyword is used to return unique rows based on the values in specified columns. It eliminates duplicate rows. You can't directly apply DISTINCT to a single column while selecting other columns. DISTINCT works on the entire row, considering all the columns you specify in the SELECT statement.
NEST
Combines values from multiple rows into a single row with an array (like creating those smaller boxes).
UNNEST
Expands an array into multiple rows (like taking the toys out of the smaller boxes).
STRUCT
Lets you organize related data within a single column, making your data more structured and easier to work with.
LAG
Window function in SQL used to access data from a previous row (or rows) within the same result set without needing to perform a self-join. It’s helpful when you need to compare the current row with the previous one in a specific order.

Resources & Assets

Video Content

Watch the SQL 101 blueprint video on YouTube for a visual walkthrough.

Watch on YouTube

Self-Assessment Questions

Q1. What does DDL stand for and give an example command?

Data Definition Language. Examples: CREATE, ALTER, DROP, TRUNCATE.

Q2. What is the difference between UNION and UNION ALL?

UNION combines results and removes duplicates, while UNION ALL combines results without removing duplicates.

Q3. Which join returns all rows from both tables even if there is no match?

FULL (OUTER) JOIN.

Q4. What is the purpose of the GROUP BY clause?

To group rows with the same values in one or more columns, typically for use with aggregate functions.

Q5. What is the difference between RANK and DENSE_RANK window functions?

RANK leaves gaps in the ranking sequence when there are ties, while DENSE_RANK does not leave gaps.
Study Progress — Module Mastery 100%