Introduction to Database Systems
Lectures 4: Joins & Aggregation
(Ch. 6.1-6.4)
1
Outline
• Inner joins (6.2, review)
• Outer joins (6.3.8)
• Aggregations (6.4.3 – 6.4.6)
2
UNIQUE
• PRIMARY KEY adds implicit “NOT NULL” constraint
while UNIQUE does not
– you would have to add this explicitly for UNIQUE:
CREATE TABLE Company(
name VARCHAR(20) NOT NULL, …
UNIQUE (name));
• You almost always want to do this (in real schemas)
– SQL Server behaves strangely with NULL & UNIQUE
– otherwise, think through NULL for every query
– you can remove the NOT NULL constraint later
3
(Inner) Joins
SELECT a1, a2, …, an
FROM R1, R2, …, Rm
WHERE Cond
for t1 in R1: (Nested loop
for t2 in R2: semantics)
...
for tm in Rm:
if Cond(t1.a1, t1.a2, …):
output(t1.a1, t1.a2, …, tm.an)
4
(Inner) joins
Company(cname, country)
Product(pname, price, category, manufacturer)
– manufacturer is foreign key
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
5
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Camera Photo Hitachi Canon Japan
OneClick Photo Hitachi Hitachi Japan
6
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Camera Photo Hitachi Canon Japan
OneClick Photo Hitachi Hitachi Japan
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
7
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Camera Photo Hitachi Canon Japan
OneClick Photo Hitachi Hitachi Japan
Not output because country != ‘USA’
(also cname != manufacturer)
8
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Camera Photo Hitachi Canon Japan
OneClick Photo Hitachi Hitachi Japan
Not output because country != ‘USA’
9
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Camera Photo Hitachi Canon Japan
OneClick Photo Hitachi Hitachi Japan
Not output because category != ‘gadget’ (and …)
10
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Camera Photo Hitachi Canon Japan
OneClick Photo Hitachi Hitachi Japan
Not output because category != ‘gadget’
11
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Camera Photo Hitachi Canon Japan
OneClick Photo Hitachi Hitachi Japan
Not output because category != ‘gadget’
12
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Camera Photo Hitachi Canon Japan
OneClick Photo Hitachi Hitachi Japan
Not output because category != ‘gadget’ (with any Company)
13
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Camera Photo Hitachi Canon Japan
OneClick Photo Hitachi Hitachi Japan
restrict to category = ‘gadget’
14
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product (where category = ‘gadget’) Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Canon Japan
Hitachi Japan
restrict to country = ‘USA’
15
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Product (where category = ‘gadget’) Company (where country = ‘USA’)
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
Now only one combination to consider
(Query optimizers do this too.)
16
(Inner) joins
SELECT DISTINCT cname
FROM Product, Company
WHERE country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Alternative syntax:
SELECT DISTINCT cname
FROM Product JOIN Company ON
country = ‘USA’ AND category = ‘gadget’AND
manufacturer = cname
Emphasizes that the predicate is part of the join.
17
Self-Joins and Tuple Variables
• Ex: find companies that manufacture both products in
the ‘gadgets’ category and in the ‘photo’ category
• Just joining Company with Product is insufficient:
need to join Company with Product with Product
FROM Company, Product, Product
• When a relation occurs twice in the FROM clause we
call it a self-join; in that case every column name in
Product is ambiguous (why?)
– are you referring to the tuple in the 2nd or 3rd loop?
19
Name Conflicts we used cname / pname
to avoid this problem
• When a name is ambiguous, qualify it:
WHERE Company.name = Product.name AND …
• For self-join, we need to distinguish tables:
FROM Product x, Product y, Company
• These new names are called “tuple variables”
– can think of as name for the variable of each loop
– can also write “Company AS C” etc.
– can make SQL query shorter: C.name vs. Company.name
19
Self-joins
SELECT DISTINCT z.cname
FROM Product x, Product y, Company z
WHERE z.country = ‘USA’
AND x.category = ‘gadget’
AND y.category = ‘photo’
AND x.manufacturer = cname
AND y.manufacturer = cname;
Product Company
pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
SingleTouch photo Hitachi Hitachi Japan
MultiTouch photo GizmoWorks
20
Self-joins
SELECT DISTINCT z.cname
FROM Product x, Product y, Company z
WHERE z.country = ‘USA’
AND x.category = ‘gadget’
AND y.category = ‘photo’
AND x.manufacturer = cname
AND y.manufacturer = cname;
Product Company
x pname category manufacturer cname country
Gizmo gadget GizmoWorks GizmoWorks USA
SingleTouch photo Hitachi Hitachi Japan
MultiTouch photo GizmoWorks
21
Self-joins
SELECT DISTINCT z.cname
FROM Product x, Product y, Company z
WHERE z.country = ‘USA’
AND x.category = ‘gadget’
AND y.category = ‘photo’
AND x.manufacturer = cname
AND y.manufacturer = cname;
Product Company
x pname category manufacturer cname country
y Gizmo gadget GizmoWorks GizmoWorks USA
SingleTouch photo Hitachi Hitachi Japan
MultiTouch photo GizmoWorks
22
Self-joins
SELECT DISTINCT z.cname
FROM Product x, Product y, Company z
WHERE z.country = ‘USA’
AND x.category = ‘gadget’
AND y.category = ‘photo’
AND x.manufacturer = cname
AND y.manufacturer = cname;
Product Company
x pname category manufacturer cname country z
y Gizmo gadget GizmoWorks GizmoWorks USA
SingleTouch photo Hitachi Hitachi Japan
MultiTouch photo GizmoWorks
restrict to country = ‘USA’
Not output because y.category != ‘photo’ 24
Self-joins
SELECT DISTINCT z.cname
FROM Product x, Product y, Company z
WHERE z.country = ‘USA’
AND x.category = ‘gadget’
AND y.category = ‘photo’
AND x.manufacturer = cname
AND y.manufacturer = cname;
Product Company
x pname category manufacturer cname country z
Gizmo gadget GizmoWorks GizmoWorks USA
y SingleTouch photo Hitachi Hitachi Japan
MultiTouch photo GizmoWorks
Not output because y.manufacturer != cname
25
Self-joins
SELECT DISTINCT z.cname
FROM Product x, Product y, Company z
WHERE z.country = ‘USA’
AND x.category = ‘gadget’
AND y.category = ‘photo’
AND x.manufacturer = cname
AND y.manufacturer = cname;
Product Company
x pname category manufacturer cname country z
Gizmo gadget GizmoWorks GizmoWorks USA
SingleTouch photo Hitachi Hitachi Japan
y MultiTouch photo GizmoWorks
x.pname x.category x.manufacturer y.pname y.category y.manufacturer z.cname z.country
Gizmo gadget GizmoWorks MultiTouch Photo GizmoWorks GizmoWorks 26USA
Outer joins
Product(name, category)
Purchase(prodName, store) -- prodName is foreign key
SELECT Product.name, …, Purchase.store
FROM Product, Purchase
WHERE Product.name = Purchase.prodName
Or equivalently:
SELECT Product.name, …, Purchase.store
FROM Product JOIN Purchase ON
Product.name = Purchase.prodName
But some Products may not be not listed. Why? 27
Outer joins
Product(name, category)
Purchase(prodName, store) -- prodName is foreign key
If we want to include products that never sold,
then we need an “outer join”:
SELECT Product.name, …, Purchase.store
FROM Product LEFT OUTER JOIN Purchase ON
Product.name = Purchase.prodName
28
SELECT Product.name, Purchase.store
FROM Product JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
28
SELECT Product.name, Purchase.store
FROM Product JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
29
SELECT Product.name, Purchase.store
FROM Product JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
Name Store
Gizmo Wiz
30
SELECT Product.name, Purchase.store
FROM Product JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
Name Store
Gizmo Wiz
31
SELECT Product.name, Purchase.store
FROM Product JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
Name Store
Gizmo Wiz
32
SELECT Product.name, Purchase.store
FROM Product JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
Name Store
Gizmo Wiz
33
SELECT Product.name, Purchase.store
FROM Product JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
Name Store
Gizmo Wiz
Camera Ritz
34
SELECT Product.name, Purchase.store
FROM Product JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
Name Store
Gizmo Wiz
Camera Ritz
Camera Wiz
35
SELECT Product.name, Purchase.store
FROM Product JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
Name Store
Gizmo Wiz
Camera Ritz
Camera Wiz
36
SELECT Product.name, Purchase.store
FROM Product LEFT OUTER JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
Name Store
Gizmo Wiz
Camera Ritz
Camera Wiz
37
SELECT Product.name, Purchase.store
FROM Product LEFT OUTER JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category ProdName Store
Gizmo gadget Gizmo Wiz
Camera Photo Camera Ritz
OneClick Photo Camera Wiz
Name Store
Gizmo Wiz
Camera Ritz
Camera Wiz
OneClick NULL
38
SELECT Product.name, Purchase.store
FROM Product RIGHT OUTER JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category
ProdName Store
Gizmo gadget
Gizmo Wiz
Camera Photo
Camera Ritz
OneClick Photo
Camera Wiz
Name Store Phone Foo
Gizmo Wiz
Camera Ritz
Camera Wiz
NULL Foo
39
SELECT Product.name, Purchase.store
FROM Product FULL OUTER JOIN Purchase ON
Product.name = Purchase.prodName
Product Purchase
Name Category
ProdName Store
Gizmo gadget
Gizmo Wiz
Camera Photo
Camera Ritz
OneClick Photo
Camera Wiz
Name Store Phone Foo
Gizmo Wiz
Camera Ritz
Camera Wiz
OneClick NULL
l
NULL Foo
40
Outer Joins
• Left outer join:
– Include the left tuple even if there’s no match
• Right outer join:
– Include the right tuple even if there’s no match
• Full outer join:
– Include both left and right tuples even if there’s no
match
• (Also something called a UNION JOIN, though it’s rarely used.)
• (Actually, all of these are used much more rarely than inner joins.)
41
Outer Joins Example
See lec04-sql-outer-joins.sql…
42
Aggregation in SQL
>sqlite3 lecture04
sqlite> create table Purchase(
pid int primary key,
product text,
price float, Other DBMSs have
quantity int, other ways of
month varchar(15)); importing data
sqlite> -- download data.txt
sqlite> .import lec04-data.txt Purchase
43
Comment about SQLite
• One cannot load NULL values such that they
are actually loaded as null values
• So we need to use two steps:
– Load null values using some type of special value
– Update the special values to actual null values
update Purchase
set price = null
where price = ‘null’
44
Simple Aggregations
Five basic aggregate operations in SQL
select count(*) from Purchase
select sum(quantity) from Purchase
select avg(price) from Purchase
select max(quantity) from Purchase
select min(quantity) from Purchase
Except count, all aggregations apply to a single value
45
Aggregates and NULL Values
Null values are not used in aggregates
insert into Purchase
values(12, 'gadget', NULL, NULL, 'april')
Let’s try the following
select count(*) from Purchase
select count(quantity) from Purchase
select sum(quantity) from Purchase
select sum(quantity)
from Purchase
where quantity is not null; 46
Aggregates and NULL Values
Null values are not used in aggregates
insert into Purchase
values(12, 'gadget', NULL, NULL, 'april')
Let’s try the following
select count(*) from Purchase
-- NULL is counted in count(*)
select count(quantity) from Purchase
-- NULL is ignored in count(quantity)
select sum(quantity) from Purchase
select sum(quantity)
from Purchase
where quantity is not null;
-- “is not null” is redundant
47
Counting Duplicates
COUNT applies to duplicates, unless otherwise stated:
SELECT Count(product) same as Count(*) if no nulls
FROM Purchase
WHERE price > 4.99
We probably want:
SELECT Count(DISTINCT product)
FROM Purchase
WHERE price> 4.99
48
More Examples
SELECT Sum(price * quantity)
FROM Purchase
What do
SELECT Sum(price * quantity) they mean ?
FROM Purchase
WHERE product = ‘bagel’
49
Simple Aggregations
Purchase Product Price Quantity
Bagel 3 20
Bagel 1.50 20
Banana 0.5 50
Banana 2 10
Banana 4 10
SELECT Sum(price * quantity)
FROM Purchase 90 (= 60+30)
WHERE product = ‘Bagel’
50
Simple Aggregations
Purchase Product Price Quantity
Bagel 3 20
Bagel 1.50 20
Banana 0.5 50
Banana 2 10
Banana 4 10
SELECT Sum(price * quantity)
FROM Purchase 90 (= 60+30)
WHERE product = ‘Bagel’
51
More Examples
How can we find the average revenue per sale?
SELECT sum(price * quantity) / count(*)
FROM Purchase
WHERE product = ‘bagel’
How can we find the average price of a bagel sold?
SELECT sum(price * quantity) / sum(quantity)
FROM Purchase
WHERE product = ‘bagel’
52
More Examples
SELECT sum(price * quantity) / count(*)
FROM Purchase
WHERE product = ‘bagel’
SELECT sum(price * quantity) / sum(quantity)
FROM Purchase
WHERE product = ‘bagel’
What happens if there are NULLs in price or quantity?
Lesson: disallow NULLs unless you need to handle them
53