Database Systems
Save this note as PDF
In our Part 1 note we ony ooked into querying from one tabe. Often, however, the data
we need to answer a question wi be stored in mutipe tabes. To query from two tabes
and combine the resuts we use a join.
Cross Join
The simpest join is caed cross join, which is aso known as a cross product or a
cartesian product. A cross join is the resut of combining every row from the eft tabe
with every row from the right tabe. To do a cross join, simpy comma separate the
tabes you woud ike to join. Here is an exampe:
SELECT *
FROM courses, enrollment;
If the courses tabe ooked ike this:
num name
CS186 DB
CS188 AI
CS189 ML
And the enroment tabe ooked ike this:
c_num students
CS186 700
CS188 800
The resut of the query woud be:
num name c_num students
num name c_num students
CS186 DB CS186 700
CS186 DB CS188 800
CS188 AI CS186 700
CS188 AI CS188 800
CS189 ML CS186 700
CS189 ML CS188 800
The cartesian product often contains much more information than we are actuay
interested in. Letʼs say we wanted a of the information about a course (num, name, and
num of students enroed in it). We cannot just bindy join every row from the eft tabe
with every row from the right tabe. There are rows that have two different courses in
them! To account for this we wi add a join condition in the WHERE cause to ensure that
each row is ony about one cass.
To get the enroment information for a course propery we want to make sure that num
in the courses tabe is equa to c_num in the enroment tabe because they are the same
thing. The correct query is:
SELECT *
FROM courses, enrollment;
WHERE num = c_num
which produces:
num name c_num students
CS186 DB CS186 700
CS188 AI CS188 800
Notice that CS189, which was present in the courses tabe but not in the enrollment tabe,
is not incuded. Since it does not appear as a c_num vaue in enroment, it cannot fufi
the join condition num = c_num .
If we reay want CS189 to appear anyway, there are ways to do this that we wi discuss
ater.
Inner Join
The cross join works great, but it seems a itte soppy. We are incuding join ogic in the
WHERE cause. It can be difficut to find what the join condition is. In contrast, the inner
join aows you to specify the condition in the ON cause. Here is the syntax:
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1_column_name = table2_column_name;
The table1_column_name = table2_column_name is the join condition. Letʼs write the query
that gets us a of the course information as an inner join:
SELECT *
FROM courses INNER JOIN enrollment
ON num = c_num ;
This query is ogicay the exact same query as the query we ran in the previous
section. The inner join is essentiay syntatic sugar for a cross join with a join condition
in the WHERE cause ike we demonstrated before.
Outer Joins
Now etʼs address the probem we encountered before when we eft out CS189 because
it did not have any enroment information. This situation happens frequenty. We sti
want to keep a the data from a reation even if it does not have a “matchˮ in the tabe
we are joining it with. To fix this probem we wi use a eft outer join. The eft outer join
makes sure that every row from the eft tabe wi appear in the output. If a row does not
have any matches with the right tabe, the row is sti incuded and the coumns from the
right tabe are fied in with NULL . Letʼs fix our query:
SELECT *
FROM courses LEFT OUTER JOIN enrollment
ON num = c_num;
This wi produce the foowing output:
num name c_num students
CS186 DB CS186 700
CS188 AI CS188 800
CS189 ML NULL NULL
Notice that CS189 is now incuded and the coumns that shoud be from the right tabe
(c_num, students) are NULL .
The right outer join is the exact same thing as the eft outer join but it keeps a the rows
from the right tabe instead of the eft tabe. The foowing query is identica to the query
above that uses the eft outer join:
SELECT *
FROM enrollment RIGHT OUTER JOIN courses
ON num = c_num;
Notice that I fipped the order of the joins and changed LEFT to RIGHT because now
courses is on the right side.
Letʼs say we add a row to our enroment tabe now:
c_num students
CS186 700
CS188 800
CS160 400
But we sti want to present a of the information that we have. If we just use a eft or a
right join we have to pick between using a of the information in the eft tabe or a of
the information in the right tabe. With what we know so far, it is impossibe for us to
incude the information that we have about both CS189 and CS160 because they occur
in different tabes and do not have matches in the other tabe. To fix this we can use the
fu outer join which guarantees that a rows from each tabe wi appear in the output.
If a row from either tabe does not have a match it wi sti show up in the output and the
coumns from the other tabe in the join wi be NULL .
To incude a of data we have etʼs change the query to be:
SELECT *
FROM courses FULL OUTER JOIN enrollment
ON num = c_num;
which produces the foowing output:
num name c_num students
CS186 DB CS186 700
CS188 AI CS188 800
CS189 ML NULL NULL
NULL NULL CS160 400
Name Conficts
Up to this point our tabes have had coumns with different names. But what happens if
we change the enroment tabe so that itʼs c_num coumn is now caed num ?
num students
CS186 700
CS188 800
CS160 400
Now there is a num coumn in both tabes, so simpy using num in your query is
ambiguous. We now have to specify which tabeʼs coumn we are referring to. To do
this, we put the tabe name and a period in front of the coumn name. Here is an
exampe of doing an inner join of the two tabes now:
SELECT *
FROM courses INNER JOIN enrollment
ON courses.num = enrollment.num
The resut is:
num name num students
num name num students
CS186 DB CS186 700
CS188 AI CS188 800
It can be annoying to type out the entire tabe name each time we refer to it, so instead
we can aias the tabe name. This aows us to rename the tabe for the rest of the query
as something ese (usuay ony a few characters). To do this, after isting the tabe in
the FROM we add AS <alias name> . Here is an equivaent query that uses aiases:
SELECT *
FROM courses AS c INNER JOIN enrollment AS e
ON c.num = e.num;
The resut is:
num name num students
CS186 DB CS186 700
CS188 AI CS188 800
Aiases can aso be used in the SELECT cause to rename the coumn names of the
output. If we execute the foowing query:
SELECT c.num AS num1, c.name, e.num AS num2, e.students
FROM courses AS c INNER JOIN enrollment AS e
ON c.num = e.num;
The output wi be:
num1 name num2 students
CS186 DB CS186 700
CS188 AI CS188 800
Natura Join
Often in reationa databases, the coumns you want to join on wi have the same name.
To make it easier to write queries, SQL has the natura join which automaticay does an
equijoin (equijoin = checks if coumns are equivaent) on coumns with the same name
in different tabes. The foowing query is the same as expicity doing an inner join on
the num coumns in each tabe:
SELECT *
FROM courses NATURAL JOIN enrollment;
The join condition: courses.num = enrollment.num is impicit. Whie this is convenient,
natura joins are not often used in practice because they are confusing to read and
because adding coumns that are not reated to the query can change the output.
Subqueries
Subqueries aow you to write more powerfu queries. Letʼs ook at an exampe…
Letʼs say you want to find the course num of every course that has a higher than
average num of students. You cannot incude an aggregation expression (ike AVG) in
the WHERE cause because aggregation happens after rows have been fitered. This
may seem chaenging at first, but sub-queries make it easy:
SELECT num
FROM enrollment;
WHERE students >= (
SELECT AVG(students)
FROM enrollment;
);
The output of this query is:
num
CS186
CS188
The inner subquery cacuated the average and returned one row. The outer query
compared the students vaue for each row to what the subquery returned to determine if
the row shoud be kept. Note that this query woud be invaid if the subquery returned
more than one row because >= is meaningess for more than one number. If it returned
more than one row we woud have to use a set operator ike ALL .
Correated Subqueries
The subquery can aso be correated with the outer query. Each row essentiay gets
pugged in to the subquery and then the subquery uses the vaues of that row. To
iustrate this point, etʼs write a query that returns a of the casses that appear in both
tabes.
SELECT *
FROM classes
WHERE EXISTS (
SELECT *
FROM enrollment
WHERE classes.num = enrollment.num
);
As expected, this query returns:
num name
CS186 AI
CS188 DB
Letʼs start by examining the subquery. It compares the classes.num (the num of the cass
from the current row) to every enrollment.num and returns the row if they match.
Therefore, the ony rows that wi ever be returned are rows with casses that occur in
each tabe. The EXISTS keyword is a set operator that returns true if any rows are
returned by the subquery and fase if otherwise. For CS186 and CS188 it wi return true
(because a row is returned by the subquery), but for CS189 it wi return fase. There are
a ot of other set operators you shoud know (incuding ANY , ALL , UNION , INTERSECT ,
DIFFERENCE , IN ) but we wi not cover any others in this note (there is penty of
documentation for these operators onine).
Subqueries in the From
You can aso use subqueries in the FROM cause. This ets you create a temporary tabe
to query from. Here is an exampe:
SELECT *
FROM (
SELECT num
FROM classes
) AS a
WHERE num = 'CS186';
Returns:
num
CS186
The subquery returns ony the num coumn of the origina tabe, so ony the num coumn
wi appear in the output. One thing to note is that subqueries in the FROM cannot usuay
be correated with other tabes isted in the FROM . There is a work around for this, but it is
out of scope for this course. A ceaner way of doing this is using common tabe
expressions (or views if you want to reuse the temporary tabe in other queries) but we
wi not cover this in the note.
Subquery Factoring
Subquery factoring can simpify queries by giving a subquery a name to be used ater.
To do this, we use the WITH cause:
WITH courseEnrollment AS (
SELECT c.num AS num1, c.name, e.num AS num2, e.students
FROM courses AS c INNER JOIN enrollment AS e
ON c.num = e.num;
We can then treat courseEnrollment as if it were its own tabe in a future query:
SELECT num1, name, students
FROM courseEnrollment
WHERE students > 700;
Returns:
num1 name students
CS188 AI 800
The subquery returns ony the num coumn of the origina tabe, so ony the num coumn
wi appear in the output. One thing to note is that subqueries in the FROM cannot usuay
be correated with other tabes isted in the FROM . There is a work around for this, but it is
out of scope for this course. A ceaner way of doing this is using common tabe
expressions (or views if you want to reuse the temporary tabe in other queries) but we
wi not cover this in the note.
Practice Questions
We wi reuse the dogs tabe from part 1
CREATE TABLE dogs (
dogid integer,
ownerid integer,
name varchar,
breed varchar,
age integer,
PRIMARY KEY (dogid),
FOREIGN KEY (ownerid) REFERENCES users(userid)
);
and add an owners tabe that ooks ike this:
CREATE TABLE users (
userid integer,
name varchar,
age integer,
PRIMARY KEY (userid)
);
The users own dogs. The ownerid coumn in the dogs tabe corresponds to the userid
coumn of the users tabe ( ownerid is a foreign key that references the users tabe).
1 Write a query that ists the names of a the dogs that “Josh Hugˮ owns.
2 Write the query above using a different kind of join (i.e. if you used an INNER JOIN,
try using a cross join with the join condition in the WHERE.
3 Write a query that finds the name of the user and the number of dogs that user owns
for the user that owns the most dogs in the database. Assume that there are no ties
(i.e. this query shoud ony return 1 user). Users may share the same name.
4 Now write the same query again, but you can no onger assume that there are no
ties.
Soutions
1
SELECT dogs.name
FROM dogs INNER JOIN users ON dogs.ownerid = users.userid
WHERE users.name="Josh Hug";
We now need information from both tabes (the dog name is ony in the dogs tabe
and the owner name is ony in the users tabe). The join condition is
dogs.ownerid=users.userid because we ony want to get rows with the dog and its
owner in it. Finay we add the predicate to the WHERE cause to ony get Joshʼs
dogs.
2
SELECT dogs.name
FROM dogs, users
WHERE dogs.ownerid = users.userid and users.name="Josh Hug";
We first do a cross join and then add our join condition to the WHERE cause.
3
SELECT users.name, COUNT(*)
FROM users INNER JOIN dogs on users.userid = dogs.ownerid
GROUP BY users.userid, users.name
ORDER BY COUNT(*) DESC
LIMIT 1;
Simiary to question 2 in the part 1 notes, we can use an ORDER BY combined with a
LIMIT to seect the first n most rows (with n being 1 in this case). We GROUP BY the
name because we want our groups to be a about one user. We have to incude
userid in the GROUP BY, because users may share the same name.
4
SELECT users.name, COUNT(*)
FROM users INNER JOIN dogs ON users.userid = dogs.ownerid
GROUP BY users.userid, users.name
HAVING COUNT(*) >= all(
SELECT COUNT(*)
FROM dogs
GROUP BY ownerid
The inner query gets the number of dogs owned by each owner. The owner(s) with
the max number of dogs must have a number of dogs that is >= a these rows in
order to be the max. We put this condition in the HAVING rather than the WHERE
cause because it pertains to the groups not the individua rows.