Lecture 16:
Relational Algebra
Monday, May 10, 2010
Dan Suciu -- 444 Spring 2010 1
Outline
Relational Algebra:
• Chapters 5.1 and 5.2
Dan Suciu -- 444 Spring 2010 2
The WHAT and the HOW
• In SQL we write WHAT we want to get form
the data
• The database system needs to figure out
HOW to get the data we want
• The passage from WHAT to HOW goes
through the Relational Algebra
Data
Dan Independence
Suciu -- 444 Spring 2010 3
SQL = WHAT
Product(pid, name, price)
Purchase(pid, cid, store)
Customer(cid, name, city)
SELECT DISTINCT x.name, z.name
FROM Product x, Purchase y, Customer z
WHERE x.pid = y.pid and y.cid = y.cid and
x.price > 100 and z.city = ‘Seattle’
It’s clear WHATDanwe want,
Suciu unclear
-- 444 Spring 2010 HOW to get it4
Relational Algebra = HOW
Final answer
Product(pid, name, price) δ
Purchase(pid, cid, store) T4(name,name)
Customer(cid, name, city)
Π
x.name,z.name
T3(. . . )
T2( . . . .) σ
price>100 and city=‘Seattle’
T1(pid,name,price,pid,cid,store) cid=cid
Temporary tables pid=pid
T1, T2, . . . Customer
5
Product Purchase
Relational Algebra = HOW
The order is now clearly specified:
Iterate over PRODUCT…
…join with PURCHASE…
…join with CUSTOMER…
…select tuples with Price>100 and
City=‘Seattle’…
…eliminate duplicates…
…and that’s the final answer !
Dan Suciu -- 444 Spring 2010 6
Sets v.s. Bags
• Sets: {a,b,c}, {a,d,e,f}, { }, . . .
• Bags: {a, a, b, c}, {b, b, b, b, b}, . . .
Relational Algebra has two semantics:
• Set semantics
• Bag semantics
Dan Suciu -- 444 Spring 2010 7
Extended Algebra Operators
• Union ∪, intersection ∩, difference -
• Selection σ
• Projection Π
• Join ⨝
• Rename ρ
• Duplicate elimination δ
• Grouping and aggregation γ
• Sorting τ
Dan Suciu -- 444 Spring 2010 8
Relational Algebra (1/3)
The Basic Five operators:
• Union: ∪
• Difference: -
• Selection: σ
• Projection: Π
• Join: ⨝
Dan Suciu -- 444 Spring 2010 9
Relational Algebra (2/3)
Derived or auxiliary operators:
• Renaming: ρ
• Intersection, complement
• Variations of joins
– natural, equi-join, theta join,
semi-join, cartesian product
Dan Suciu -- 444 Spring 2010 10
Relational Algebra (3/3)
Extensions for bags:
• Duplicate elimination: δ
• Group by: γ
• Sorting: τ
Dan Suciu -- 444 Spring 2010 11
Union and Difference
R1 ∪ R2
R1 – R2
What do they mean over bags ?
Dan Suciu -- 444 Spring 2010 12
What about Intersection ?
• Derived operator using minus
R1 ∩ R2 = R1 – (R1 – R2)
• Derived using join (will explain later)
R1 ∩ R2 = R1 ⨝ R2
Dan Suciu -- 444 Spring 2010 13
Selection
• Returns all tuples which satisfy a
condition
σc(R)
• Examples
– σSalary > 40000 (Employee)
– σname = “Smith” (Employee)
• The condition c can be =, <, ≤, >, ≥, <>
Dan Suciu -- 444 Spring 2010 14
Employee
SSN Name Salary
1234545 John 200000
5423341 Smith 600000
4352342 Fred 500000
σSalary > 40000 (Employee)
SSN Name Salary
5423341 Smith 600000
4352342 Fred 500000
Dan Suciu -- 444 Spring 2010 15
Projection
• Eliminates columns
Π A1,…,An (R)
• Example: project social-security number
and names:
– Π SSN, Name (Employee)
– Answer(SSN, Name)
Semantics differs over set or over bags
Dan Suciu -- 444 Spring 2010 16
Employee
SSN Name Salary
1234545 John 200000
5423341 John 600000
4352342 John 200000
Π Name,Salary (Employee)
Name Salary Name Salary
John 20000 John 20000
John 60000 John 60000
John 20000
Bag semantics
Set semantics
Which is Dan
more efficient to implement ?
Suciu -- 444 Spring 2010 17
Cartesian Product
• Each tuple in R1 with each tuple in R2
R1 × R2
• Very rare in practice; mainly used to
express joins
Dan Suciu -- 444 Spring 2010 18
Employee
Dependent
Name
SSN
EmpSSN
DepName
John
999999999
999999999
Emily
Tony
777777777
777777777
Joe
Employee ✕ Dependent
Name
SSN
EmpSSN
DepName
John
999999999
999999999
Emily
John
999999999
777777777
Joe
Tony
777777777
999999999
Emily
Tony
777777777
777777777
Joe
Dan Suciu -- 444 Spring 2010 19
Renaming
• Changes the schema, not the instance
ρ B1,…,Bn (R)
• Example:
– ρN, S(Employee) Answer(N, S)
Dan Suciu -- 444 Spring 2010 20
Natural Join
R1 ⨝ R2
• Meaning: R1⨝ R2 = ΠA(σ(R1 × R2))
• Where:
– The selection σ checks equality of all
common attributes
– The projection eliminates the duplicate
common attributes
Dan Suciu -- 444 Spring 2010 21
Natural Join
R
A B S
B C
X Y Z U
X Z V W
Y Z Z V
Z V
A B C
R ⨝ S =
X Z U
ΠABC(σR.B=S.B(R × S))
X Z V
Y Z U
Y Z V
Z V W
Dan Suciu -- 444 Spring 2010 22
Natural Join
• Given the schemas R(A, B, C, D), S(A, C,
E), what is the schema of R ⨝ S ?
• Given R(A, B, C), S(D, E), what is R ⨝ S ?
• Given R(A, B), S(A, B), what is R ⨝ S ?
Dan Suciu -- 444 Spring 2010 23
Theta Join
• A join that involves a predicate
R1 ⨝θ R2 = σ θ (R1 × R2)
• Here θ can be any condition
Dan Suciu -- 444 Spring 2010 24
Eq-join
• A theta join where θ is an equality
R1 ⨝A=B R2 = σA=B (R1 × R2)
• This is by far the most used variant of
join in practice
Dan Suciu -- 444 Spring 2010 25
So Which Join Is It ?
• When we write R ⨝ S we usually mean
an eq-join, but we often omit the
equality predicate when it is clear from
the context
Dan Suciu -- 444 Spring 2010 26
Semijoin
R ⋉C S = Π A1,…,An (R ⨝C S)
• Where A1, …, An are the attributes in R
Dan Suciu -- 444 Spring 2010 27
Semijoins in Distributed
Databases
Dependent
Employee
EmpSSN DepName Age Stuff
SSN Name Stuff ... ... ... ....
... ... .... network
Employee ⨝SSN=EmpSSN (σ age>71 (Dependent))
Task: compute the query with minimum amount of data transfer
28
Semijoins in Distributed
Databases
Dependent
Employee
EmpSSN DepName Age Stuff
SSN Name Stuff ... ... ... ....
... ... .... network
Employee ⨝SSN=EmpSSN (σ age>71 (Dependent))
T(SSN) = Π SSN σ age>71 (Dependents)
29
Semijoins in Distributed
Databases
Dependent
Employee
EmpSSN DepName Age Stuff
SSN Name Stuff ... ... ... ....
... ... .... network
Employee ⨝SSN=EmpSSN (σ age>71 (Dependent))
T(SSN) = Π SSN σ age>71 (Dependents)
R = Employee ⨝SSN=EmpSSN T
= Employee ⋉SSN=EmpSSN (σ age>71 (Dependents))
30
Semijoins in Distributed
Databases
Dependent
Employee
EmpSSN DepName Age Stuff
SSN Name Stuff ... ... ... ....
... ... .... network
Employee ⨝SSN=EmpSSN (σ age>71 (Dependent))
T(SSN) = Π SSN σ age>71 (Dependents)
R = Employee ⋉SSN=EmpSSN T
Answer = R ⨝SSN=EmpSSN Dependents
31
Joins R US
• The join operation in all its variants (eq-
join, natural join, semi-join, outer-join) is
at the heart of relational database
systems
• WHY ?
Dan Suciu -- 444 Spring 2010 32
Operators on Bags
• Duplicate elimination δ
δ(R) = select distinct * from R
• Grouping γ
γA,sum(B) = select A,sum(B) from R group by A
• Sorting τ
Dan Suciu -- 444 Spring 2010 33
Complex RA Expressions
γ u.name, count(*)
x.ssn=y.buyer-ssn
y.pid=u.pid
y.seller-ssn=z.ssn
Π ssn Π pid
σname=fred σname=gizmo
Person x Purchase y Person z Product u
Dan Suciu -- 444 Spring 2010 34
RA = Dataflow Program
• Several operations, plus strictly
specified order
• In RDBMS the dataflow graph is always
a tree
• Novel applications (s.a. PIG), dataflow
graph may be a DAG
Dan Suciu -- 444 Spring 2010 35
Limitations of RA
• Cannot compute “transitive closure”
Name1 Name2 Relationship
Fred Mary Father
Mary Joe Cousin
Mary Bill Spouse
Nancy Lou Sister
• Find all direct and indirect relatives of Fred
• Cannot express in RA !!! Need to write Java program
• Remember the Bacon number ? Needs TC too !
Dan Suciu -- 444 Spring 2010 36