Subqueries, or nested SELECT statements, are queries written inside another query.
They are often
used to perform intermediate calculations or to filter data before applying additional logic. Here's a
breakdown of how they work and examples of common use cases.
Syntax
A subquery is usually enclosed in parentheses and can appear in:
• WHERE Clause: To filter data based on another query.
• FROM Clause: To create a temporary table for further queries.
• SELECT Clause: To calculate values for each row.
SELECT column1, column2
FROM table1
WHERE column3 = (SELECT column4 FROM table2 WHERE condition);
Types of Subqueries
1. Single-row Subquery: Returns one value (a single row and column).
2. Multi-row Subquery: Returns multiple rows (a single column).
3. Multi-column Subquery: Returns multiple columns (and possibly multiple rows).
4. Correlated Subquery: References columns from the outer query and executes repeatedly for
each row of the outer query.
Examples
1. Subquery in the WHERE Clause
Find employees who earn more than the average salary.
SELECT employee_name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
2. Subquery in the FROM Clause
Create a temporary result set to calculate aggregates.
SELECT department_id, MAX(average_salary)
FROM (SELECT department_id, AVG(salary) AS average_salary
FROM employees
GROUP BY department_id) AS dept_avg;
3. Subquery in the SELECT Clause
Retrieve each employee's salary as a percentage of the total salary.
SELECT employee_name,
(salary / (SELECT SUM(salary) FROM employees)) * 100 AS salary_percentage
FROM employees;
4. Correlated Subquery
Find employees whose salaries are greater than the average salary in their department.
SELECT employee_name
FROM employees e1
WHERE salary > (SELECT AVG(salary)
FROM employees e2
WHERE e1.department_id = e2.department_id);
Key Considerations
1. Performance: Subqueries can be slower than joins or Common Table Expressions (CTEs) for
large datasets.
2. Nesting Levels: Too many nested subqueries can make queries harder to read and maintain.
3. Use of EXISTS or IN:
o Use EXISTS for checking the existence of rows.
o Use IN when comparing a value with multiple possible matches.
Commonly used SQL functions for data analysis.
SQL offers a wide range of functions that are extremely useful for data analysis. These functions can
be categorized into aggregate functions, string functions, date/time functions, and window
functions. Here's a breakdown:
1. Aggregate Functions
Used to perform calculations on a group of rows and return a single result.
Function Description Example
SUM() Calculates the total sum of a column. SELECT SUM(sales) FROM orders;
AVG() Calculates the average value. SELECT AVG(salary) FROM employees;
Function Description Example
COUNT() Counts rows or non-NULL values. SELECT COUNT(*) FROM customers;
MIN() Finds the smallest value. SELECT MIN(price) FROM products;
MAX() Finds the largest value. SELECT MAX(price) FROM products;
Concatenates values within a group SELECT GROUP_CONCAT(name) FROM
GROUP_CONCAT()
(MySQL). students;
2. String Functions
Used to manipulate and format string data.
Function Description Example
SELECT CONCAT(first_name, ' ',
CONCAT() Combines multiple strings.
last_name);
UPPER() / Converts text to uppercase or
SELECT UPPER(name) FROM customers;
LOWER() lowercase.
SELECT SUBSTRING(name, 1, 3) FROM
SUBSTRING() Extracts part of a string.
users;
Removes spaces or characters from
TRIM() SELECT TRIM(' hello ');
ends.
SELECT CHAR_LENGTH(name) FROM
CHAR_LENGTH() Returns the length of a string.
students;
Replaces a substring with another
REPLACE() SELECT REPLACE(name, 'John', 'Jane');
string.
3. Date and Time Functions
Used for date/time manipulation and formatting.
Function Description Example
NOW() Returns the current timestamp. SELECT NOW();
CURDATE() Returns the current date. SELECT CURDATE();
Calculates days between two SELECT DATEDIFF('2024-12-01', '2024-11-
DATEDIFF()
dates. 01');
DATE_ADD() / Adds or subtracts days to a SELECT DATE_ADD(NOW(), INTERVAL 7
Function Description Example
DATE_SUB() date. DAY);
YEAR(), MONTH(), DAY() Extracts components of a date. SELECT YEAR(order_date) FROM orders;
FORMAT() Formats a date or number. SELECT FORMAT(NOW(), 'yyyy-MM-dd');
4. Mathematical Functions
Used for numerical operations.
Function Description Example
ROUND() Rounds a number to a specified number of digits. SELECT ROUND(price, 2) FROM products;
FLOOR() Rounds down to the nearest integer. SELECT FLOOR(9.99);
CEIL() Rounds up to the nearest integer. SELECT CEIL(9.01);
ABS() Returns the absolute value. SELECT ABS(-5);
MOD() Returns the remainder of a division. SELECT MOD(10, 3);
POWER() Raises a number to a power. SELECT POWER(2, 3);
5. Window Functions
Perform calculations across a set of table rows related to the current row.
Function Description Example
Assigns a unique number to SELECT ROW_NUMBER() OVER (ORDER BY
ROW_NUMBER()
each row. salary);
Assigns a rank, skipping SELECT RANK() OVER (PARTITION BY dept
RANK()
numbers for ties. ORDER BY salary);
Assigns a rank without SELECT DENSE_RANK() OVER (PARTITION BY
DENSE_RANK()
skipping numbers. dept ORDER BY salary);
Divides rows into specified
NTILE() SELECT NTILE(4) OVER (ORDER BY salary);
buckets.
Accesses subsequent or prior SELECT salary, LEAD(salary) OVER (ORDER BY
LEAD() / LAG()
rows. salary);
SUM() / AVG() with Computes aggregates within
SELECT SUM(salary) OVER (PARTITION BY dept);
OVER a window.
6. Conditional Functions
Used to apply logic in SQL queries.
Function Description Example
SELECT CASE WHEN salary > 50000 THEN 'High' ELSE
CASE Implements conditional logic.
'Low' END FROM employees;
Returns the first non-NULL
COALESCE() SELECT COALESCE(phone, 'N/A') FROM users;
value.
Returns NULL if two
NULLIF() SELECT NULLIF(10, 10);
expressions are equal.
Combining Functions for Advanced Analysis
You can combine these functions to analyze data more effectively:
COde
SELECT department,
COUNT(employee_id) AS employee_count,
AVG(salary) AS avg_salary,
MAX(hire_date) AS latest_hire
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;