SQL Subqueries
Basic - Intermediate
Announcements
• HW2 and WQ2 released
– Both due next Tuesday
• Please fill in the Azure questionnaire by
tonight!
– See HW2 writeup for details
CSE 414 - Spring 2018 2
Simple Aggregations
Five basic aggregate operations in SQL
select count(*) from Purchase
select sum(quantity) from Purchase
select avg(price) from Purchase
select max(quantity) from Purchase
select min(quantity) from Purchase
Except count, all aggregations apply to a single attribute
CSE 414 - Spring 2018 3
Everything in SELECT must be
either a GROUP-BY attribute, or an aggregate
Need to be Careful…
SELECT product, Product Price Quantity
max(quantity)
FROM Purchase Bagel 3 20
GROUP BY product Bagel 1.50 20
SELECT product, quantity Banana 0.5 50
FROM Purchase
Banana 2 10
GROUP BY product
-- what does this mean? Banana 4 10
Product Max(quantity) Product Quantity
Bagel 20 Bagel 20
Banana 50 Banana ??
CSE 414 - Spring 2018 4
Semantics of SQL With
Group-By
SELECT S
FROM R1,…,Rn
WHERE C1 FWGHOS
GROUP BY a1,…,ak
HAVING C2
Evaluation steps:
1. Evaluate FROM-WHERE using Nested Loop Semantics
2. Group by the attributes a1,…,ak
3. Apply condition C2 to each group (may have aggregates)
4. Compute aggregates in S and return the result 5
CSE 414 - Spring 2018
Purchase(pid, product, price, quantity, month)
Exercise FWGHOS
Compute the total income per month
Show only months with less than 10 items sold
Order by quantity sold and display as “TotalSold”
SELECT month, sum(price*quantity),
sum(quantity) as TotalSold
FROM Purchase
GROUP BY month
HAVING sum(quantity) < 10
ORDER BY sum(quantity)
CSE 414 - Spring 2018 11
WHERE vs HAVING
• WHERE condition is applied to individual rows
– The rows may or may not contribute to the aggregate
– No aggregates allowed here
• HAVING condition is applied to the entire group
– Only applicable if GROUP BY is involved
– Entire group is returned, or removed
– May use aggregate functions on the group
CSE 414 - Spring 2018 12
Product(pid,pname,manufacturer)
Purchase(id,product_id,price,month)
Aggregate + Join
For each manufacturer, compute how many products
with price > $100 they sold
CSE 414 - Spring 2018 15
Product(pid,pname,manufacturer)
Purchase(id,product_id,price,month)
Aggregate + Join
For each manufacturer, compute how many products
with price > $100 they sold
Problem: manufacturer is in Product, price is in Purchase...
CSE 414 - Spring 2018 16
Product(pid,pname,manufacturer)
Purchase(id,product_id,price,month)
Aggregate + Join
For each manufacturer, compute how many products
with price > $100 they sold
Problem: manufacturer is in Product, price is in Purchase...
manu
-- step 1: think about their join ... price ...
facturer
SELECT ... Hitachi 150
FROM Product x, Purchase y Canon 300
WHERE x.pid = y.product_id
Hitachi 180
and y.price > 100
CSE 414 - Spring 2018 17
Product(pid,pname,manufacturer)
Purchase(id,product_id,price,month)
Aggregate + Join
For each manufacturer, compute how many products
with price > $100 they sold
Problem: manufacturer is in Product, price is in Purchase...
manu
-- step 1: think about their join ... price ...
facturer
SELECT ... Hitachi 150
FROM Product x, Purchase y Canon 300
WHERE x.pid = y.product_id
Hitachi 180
and y.price > 100
-- step 2: do the group-by on the join
manu
count(*)
SELECT x.manufacturer, count(*) facturer
FROM Product x, Purchase y Hitachi 2
WHERE x.pid = y.product_id Canon 1
and y.price > 100
...
GROUP BY x.manufacturer
Product(pid,pname,manufacturer)
Purchase(id,product_id,price,month)
Aggregate + Join
Variant:
For each manufacturer, compute how many products
with price > $100 they sold in each month
SELECT x.manufacturer, y.month, count(*)
FROM Product x, Purchase y
WHERE x.pid = y.product_id
and y.price > 100 manu
month count(*)
GROUP BY x.manufacturer, y.month facturer
Hitachi Jan 2
Hitachi Feb 1
Canon Jan 3
...
CSE 414 - Spring 2018
FWGHOS
Including Empty Groups
• In the result of a group by query, there
is one row per group in the result
Count(*) is
never 0
SELECT x.manufacturer, count(*)
FROM Product x, Purchase y
WHERE x.pname = y.product
GROUP BY x.manufacturer
CSE 414 - Spring 2018 20
Including Empty Groups
SELECT x.manufacturer, count(y.pid)
FROM Product x LEFT OUTER JOIN Purchase y
ON x.pname = y.product
GROUP BY x.manufacturer
Count(pid) is 0
when all pid’s in
the group are
NULL
CSE 414 - Spring 2018 21
What we have in our SQL toolbox
• Projections (SELECT * / SELECT c1, c2, …)
• Selections (aka filtering) (WHERE cond)
• Joins (inner and outer)
• Aggregates
• Group by
• Inserts, updates, and deletes
Make sure you read the textbook!
CSE 414 - Spring 2018 22
Subqueries
• A subquery is a SQL query nested inside a larger query
• Such inner-outer queries are called nested queries
• A subquery may occur in:
– A SELECT clause
– A FROM clause
– A WHERE clause
• Rule of thumb: avoid nested queries when possible
– But sometimes it’s impossible, as we will see
FWGHOS
CSE 414 - Spring 2018 23
Subqueries…
• Can return a single value to be included in a
SELECT clause
• Can return a relation to be included in the
FROM clause, aliased using a tuple variable
• Can return a single value to be compared
with another value in a WHERE clause
• Can return a relation to be used in the WHERE
or HAVING clause under an existential
quantifier
CSE 414 - Spring 2018 24
1. Subqueries in SELECT
Product (pname, price, cid)
Company (cid, cname, city)
For each product return the city where it is manufactured
SELECT X.pname, (SELECT Y.city “correlated
FROM Company Y subquery”
WHERE Y.cid=X.cid) as City
FROM Product X
What happens if the subquery returns more than one city?
We get a runtime error
(and SQLite simply ignores the extra values…)
CSE 414 - Spring 2018 25
Product (pname, price, cid)
Company (cid, cname, city)
1. Subqueries in SELECT
Whenever possible, don’t use a nested queries:
SELECT X.pname, (SELECT Y.city
FROM Company Y
WHERE Y.cid=X.cid) as City
FROM Product X =
We have
SELECT X.pname, Y.city “unnested”
FROM Product X, Company Y
WHERE X.cid=Y.cid the query
CSE 414 - Spring 2018 26
Product (pname, price, cid)
Company (cid, cname, city)
1. Subqueries in SELECT
Compute the number of products made by each company
SELECT DISTINCT C.cname, (SELECT count(*)
FROM Product P
WHERE P.cid=C.cid)
FROM Company C
CSE 414 - Spring 2018 27
Product (pname, price, cid)
Company (cid, cname, city)
1. Subqueries in SELECT
Compute the number of products made by each company
SELECT DISTINCT C.cname, (SELECT count(*)
FROM Product P
WHERE P.cid=C.cid)
FROM Company C
SELECT C.cname, count(*)
Better: we can FROM Company C, Product P
unnest using a WHERE C.cid=P.cid
GROUP BY GROUP BY C.cname
CSE 414 - Spring 2018 28
Product (pname, price, cid)
Company (cid, cname, city)
1. Subqueries in SELECT
But are these really equivalent?
SELECT DISTINCT C.cname, (SELECT count(*)
FROM Product P
WHERE P.cid=C.cid)
FROM Company C
SELECT C.cname, count(*)
FROM Company C, Product P
WHERE C.cid=P.cid
GROUP BY C.cname
CSE 414 - Spring 2018 29
Product (pname, price, cid)
Company (cid, cname, city)
1. Subqueries in SELECT
But are these really equivalent?
SELECT DISTINCT C.cname, (SELECT count(*)
FROM Product P
WHERE P.cid=C.cid)
FROM Company C
SELECT C.cname, count(*)
FROM Company C, Product P No! Different results if a
WHERE C.cid=P.cid company has no products
GROUP BY C.cname
SELECT C.cname, count(pname)
FROM Company C LEFT OUTER JOIN Product P
ON C.cid=P.cid
GROUP BY C.cname
CSE 414 - Spring 2018 30
Product (pname, price, cid)
Company (cid, cname, city)
2. Subqueries in FROM
Find all products whose prices is > 20 and < 500
SELECT X.pname
FROM (SELECT *
FROM Product AS Y
WHERE price > 20) as X
WHERE X.price < 500
CSE 414 - Spring 2018 31
Product (pname, price, cid)
Company (cid, cname, city)
2. Subqueries in FROM
Find all products whose prices is > 20 and < 500
SELECT X.pname
FROM (SELECT *
FROM Product AS Y
WHERE price > 20) as X
WHERE X.price < 500
Try unnest this query !
CSE 414 - Spring 2018 32
Product (pname, price, cid)
Company (cid, cname, city)
2. Subqueries in FROM
Find all products whose prices is > 20 and < 500
SELECT X.pname
FROM (SELECT *
FROM Product AS Y
Side note: This is not a
WHERE price > 20) as X correlated subquery. (why?)
WHERE X.price < 500
Try unnest this query !
CSE 414 - Spring 2018 33
2. Subqueries in FROM
Sometimes we need to compute an
intermediate table only to use it later in a
SELECT-FROM-WHERE
• Option 1: use a subquery in the FROM
clause
• Option 2: use the WITH clause
– See textbook for details
CSE 414 - Spring 2018 34
Product (pname, price, cid)
Company (cid, cname, city)
2. Subqueries in FROM
SELECT X.pname
FROM (SELECT *
FROM Product AS Y
WHERE price > 20) as X
WHERE X.price < 500
A subquery whose
=
result we called myTable
WITH myTable AS (SELECT * FROM Product AS Y WHERE price > 20)
SELECT X.pname
FROM myTable as X
WHERE X.price < 500
CSE 414 - Spring 2018 35