[go: up one dir, main page]

0% found this document useful (0 votes)
25 views12 pages

SQL 2

The document discusses various types of SQL joins used to query data from multiple tables, including cross joins, inner joins, outer joins, and natural joins. It explains how to structure queries to combine data effectively while addressing potential issues like name conflicts and the use of subqueries for more complex queries. Additionally, it highlights the importance of specifying join conditions and the differences between different join types to ensure accurate data retrieval.

Uploaded by

shsocial1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views12 pages

SQL 2

The document discusses various types of SQL joins used to query data from multiple tables, including cross joins, inner joins, outer joins, and natural joins. It explains how to structure queries to combine data effectively while addressing potential issues like name conflicts and the use of subqueries for more complex queries. Additionally, it highlights the importance of specifying join conditions and the differences between different join types to ensure accurate data retrieval.

Uploaded by

shsocial1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Database Systems

Save this note as PDF

In our Part 1 note we ony ooked into querying from one tabe. Often, however, the data
we need to answer a question wi be stored in mutipe tabes. To query from two tabes
and combine the resuts we use a join.

Cross Join
The simpest join is caed cross join, which is aso known as a cross product or a
cartesian product. A cross join is the resut of combining every row from the eft tabe
with every row from the right tabe. To do a cross join, simpy comma separate the
tabes you woud ike to join. Here is an exampe:

SELECT *

FROM courses, enrollment;

If the courses tabe ooked ike this:

num name

CS186 DB

CS188 AI

CS189 ML

And the enroment tabe ooked ike this:

c_num students

CS186 700

CS188 800

The resut of the query woud be:

num name c_num students


num name c_num students

CS186 DB CS186 700

CS186 DB CS188 800

CS188 AI CS186 700

CS188 AI CS188 800

CS189 ML CS186 700

CS189 ML CS188 800

The cartesian product often contains much more information than we are actuay
interested in. Letʼs say we wanted a of the information about a course (num, name, and
num of students enroed in it). We cannot just bindy join every row from the eft tabe
with every row from the right tabe. There are rows that have two different courses in
them! To account for this we wi add a join condition in the WHERE cause to ensure that
each row is ony about one cass.

To get the enroment information for a course propery we want to make sure that num

in the courses tabe is equa to c_num in the enroment tabe because they are the same
thing. The correct query is:

SELECT *

FROM courses, enrollment;

WHERE num = c_num

which produces:

num name c_num students

CS186 DB CS186 700

CS188 AI CS188 800

Notice that CS189, which was present in the courses tabe but not in the enrollment tabe,
is not incuded. Since it does not appear as a c_num vaue in enroment, it cannot fufi
the join condition num = c_num .

If we reay want CS189 to appear anyway, there are ways to do this that we wi discuss
ater.

Inner Join
The cross join works great, but it seems a itte soppy. We are incuding join ogic in the
WHERE cause. It can be difficut to find what the join condition is. In contrast, the inner
join aows you to specify the condition in the ON cause. Here is the syntax:

SELECT column_name(s)

FROM table1

INNER JOIN table2

ON table1_column_name = table2_column_name;

The table1_column_name = table2_column_name is the join condition. Letʼs write the query
that gets us a of the course information as an inner join:

SELECT *

FROM courses INNER JOIN enrollment

ON num = c_num ;

This query is ogicay the exact same query as the query we ran in the previous
section. The inner join is essentiay syntatic sugar for a cross join with a join condition
in the WHERE cause ike we demonstrated before.

Outer Joins
Now etʼs address the probem we encountered before when we eft out CS189 because
it did not have any enroment information. This situation happens frequenty. We sti
want to keep a the data from a reation even if it does not have a “matchˮ in the tabe
we are joining it with. To fix this probem we wi use a eft outer join. The eft outer join
makes sure that every row from the eft tabe wi appear in the output. If a row does not
have any matches with the right tabe, the row is sti incuded and the coumns from the
right tabe are fied in with NULL . Letʼs fix our query:

SELECT *

FROM courses LEFT OUTER JOIN enrollment

ON num = c_num;

This wi produce the foowing output:


num name c_num students

CS186 DB CS186 700

CS188 AI CS188 800

CS189 ML NULL NULL

Notice that CS189 is now incuded and the coumns that shoud be from the right tabe
(c_num, students) are NULL .

The right outer join is the exact same thing as the eft outer join but it keeps a the rows
from the right tabe instead of the eft tabe. The foowing query is identica to the query
above that uses the eft outer join:

SELECT *

FROM enrollment RIGHT OUTER JOIN courses

ON num = c_num;

Notice that I fipped the order of the joins and changed LEFT to RIGHT because now
courses is on the right side.

Letʼs say we add a row to our enroment tabe now:

c_num students

CS186 700

CS188 800

CS160 400

But we sti want to present a of the information that we have. If we just use a eft or a
right join we have to pick between using a of the information in the eft tabe or a of
the information in the right tabe. With what we know so far, it is impossibe for us to
incude the information that we have about both CS189 and CS160 because they occur
in different tabes and do not have matches in the other tabe. To fix this we can use the
fu outer join which guarantees that a rows from each tabe wi appear in the output.
If a row from either tabe does not have a match it wi sti show up in the output and the
coumns from the other tabe in the join wi be NULL .

To incude a of data we have etʼs change the query to be:


SELECT *

FROM courses FULL OUTER JOIN enrollment

ON num = c_num;

which produces the foowing output:

num name c_num students

CS186 DB CS186 700

CS188 AI CS188 800

CS189 ML NULL NULL

NULL NULL CS160 400

Name Conficts
Up to this point our tabes have had coumns with different names. But what happens if
we change the enroment tabe so that itʼs c_num coumn is now caed num ?

num students

CS186 700

CS188 800

CS160 400

Now there is a num coumn in both tabes, so simpy using num in your query is
ambiguous. We now have to specify which tabeʼs coumn we are referring to. To do
this, we put the tabe name and a period in front of the coumn name. Here is an
exampe of doing an inner join of the two tabes now:

SELECT *

FROM courses INNER JOIN enrollment

ON courses.num = enrollment.num

The resut is:

num name num students


num name num students

CS186 DB CS186 700

CS188 AI CS188 800

It can be annoying to type out the entire tabe name each time we refer to it, so instead
we can aias the tabe name. This aows us to rename the tabe for the rest of the query
as something ese (usuay ony a few characters). To do this, after isting the tabe in
the FROM we add AS <alias name> . Here is an equivaent query that uses aiases:

SELECT *

FROM courses AS c INNER JOIN enrollment AS e

ON c.num = e.num;

The resut is:

num name num students

CS186 DB CS186 700

CS188 AI CS188 800

Aiases can aso be used in the SELECT cause to rename the coumn names of the
output. If we execute the foowing query:

SELECT c.num AS num1, c.name, e.num AS num2, e.students

FROM courses AS c INNER JOIN enrollment AS e

ON c.num = e.num;

The output wi be:

num1 name num2 students

CS186 DB CS186 700

CS188 AI CS188 800

Natura Join
Often in reationa databases, the coumns you want to join on wi have the same name.
To make it easier to write queries, SQL has the natura join which automaticay does an
equijoin (equijoin = checks if coumns are equivaent) on coumns with the same name
in different tabes. The foowing query is the same as expicity doing an inner join on
the num coumns in each tabe:

SELECT *

FROM courses NATURAL JOIN enrollment;

The join condition: courses.num = enrollment.num is impicit. Whie this is convenient,


natura joins are not often used in practice because they are confusing to read and
because adding coumns that are not reated to the query can change the output.

Subqueries
Subqueries aow you to write more powerfu queries. Letʼs ook at an exampe…

Letʼs say you want to find the course num of every course that has a higher than
average num of students. You cannot incude an aggregation expression (ike AVG) in
the WHERE cause because aggregation happens after rows have been fitered. This
may seem chaenging at first, but sub-queries make it easy:

SELECT num

FROM enrollment;

WHERE students >= (

SELECT AVG(students)

FROM enrollment;

);

The output of this query is:

num

CS186

CS188

The inner subquery cacuated the average and returned one row. The outer query
compared the students vaue for each row to what the subquery returned to determine if
the row shoud be kept. Note that this query woud be invaid if the subquery returned
more than one row because >= is meaningess for more than one number. If it returned
more than one row we woud have to use a set operator ike ALL .

Correated Subqueries
The subquery can aso be correated with the outer query. Each row essentiay gets
pugged in to the subquery and then the subquery uses the vaues of that row. To
iustrate this point, etʼs write a query that returns a of the casses that appear in both
tabes.

SELECT *

FROM classes

WHERE EXISTS (

SELECT *

FROM enrollment

WHERE classes.num = enrollment.num

);

As expected, this query returns:

num name

CS186 AI

CS188 DB

Letʼs start by examining the subquery. It compares the classes.num (the num of the cass
from the current row) to every enrollment.num and returns the row if they match.
Therefore, the ony rows that wi ever be returned are rows with casses that occur in
each tabe. The EXISTS keyword is a set operator that returns true if any rows are
returned by the subquery and fase if otherwise. For CS186 and CS188 it wi return true
(because a row is returned by the subquery), but for CS189 it wi return fase. There are
a ot of other set operators you shoud know (incuding ANY , ALL , UNION , INTERSECT ,

DIFFERENCE , IN ) but we wi not cover any others in this note (there is penty of
documentation for these operators onine).

Subqueries in the From


You can aso use subqueries in the FROM cause. This ets you create a temporary tabe
to query from. Here is an exampe:
SELECT *

FROM (

SELECT num

FROM classes

) AS a

WHERE num = 'CS186';

Returns:

num

CS186

The subquery returns ony the num coumn of the origina tabe, so ony the num coumn
wi appear in the output. One thing to note is that subqueries in the FROM cannot usuay
be correated with other tabes isted in the FROM . There is a work around for this, but it is
out of scope for this course. A ceaner way of doing this is using common tabe
expressions (or views if you want to reuse the temporary tabe in other queries) but we
wi not cover this in the note.

Subquery Factoring
Subquery factoring can simpify queries by giving a subquery a name to be used ater.
To do this, we use the WITH cause:

WITH courseEnrollment AS (

SELECT c.num AS num1, c.name, e.num AS num2, e.students

FROM courses AS c INNER JOIN enrollment AS e

ON c.num = e.num;

We can then treat courseEnrollment as if it were its own tabe in a future query:

SELECT num1, name, students

FROM courseEnrollment

WHERE students > 700;

Returns:
num1 name students

CS188 AI 800

The subquery returns ony the num coumn of the origina tabe, so ony the num coumn
wi appear in the output. One thing to note is that subqueries in the FROM cannot usuay
be correated with other tabes isted in the FROM . There is a work around for this, but it is
out of scope for this course. A ceaner way of doing this is using common tabe
expressions (or views if you want to reuse the temporary tabe in other queries) but we
wi not cover this in the note.

Practice Questions
We wi reuse the dogs tabe from part 1

CREATE TABLE dogs (

dogid integer,

ownerid integer,

name varchar,

breed varchar,

age integer,

PRIMARY KEY (dogid),

FOREIGN KEY (ownerid) REFERENCES users(userid)

);

and add an owners tabe that ooks ike this:

CREATE TABLE users (

userid integer,

name varchar,

age integer,

PRIMARY KEY (userid)

);

The users own dogs. The ownerid coumn in the dogs tabe corresponds to the userid

coumn of the users tabe ( ownerid is a foreign key that references the users tabe).

1 Write a query that ists the names of a the dogs that “Josh Hugˮ owns.
2 Write the query above using a different kind of join (i.e. if you used an INNER JOIN,
try using a cross join with the join condition in the WHERE.
3 Write a query that finds the name of the user and the number of dogs that user owns
for the user that owns the most dogs in the database. Assume that there are no ties
(i.e. this query shoud ony return 1 user). Users may share the same name.
4 Now write the same query again, but you can no onger assume that there are no
ties.

Soutions
1
SELECT dogs.name

FROM dogs INNER JOIN users ON dogs.ownerid = users.userid

WHERE users.name="Josh Hug";

We now need information from both tabes (the dog name is ony in the dogs tabe
and the owner name is ony in the users tabe). The join condition is
dogs.ownerid=users.userid because we ony want to get rows with the dog and its
owner in it. Finay we add the predicate to the WHERE cause to ony get Joshʼs
dogs.

2
SELECT dogs.name

FROM dogs, users

WHERE dogs.ownerid = users.userid and users.name="Josh Hug";

We first do a cross join and then add our join condition to the WHERE cause.

3
SELECT users.name, COUNT(*)

FROM users INNER JOIN dogs on users.userid = dogs.ownerid

GROUP BY users.userid, users.name

ORDER BY COUNT(*) DESC

LIMIT 1;

Simiary to question 2 in the part 1 notes, we can use an ORDER BY combined with a
LIMIT to seect the first n most rows (with n being 1 in this case). We GROUP BY the
name because we want our groups to be a about one user. We have to incude
userid in the GROUP BY, because users may share the same name.

4
SELECT users.name, COUNT(*)

FROM users INNER JOIN dogs ON users.userid = dogs.ownerid


GROUP BY users.userid, users.name

HAVING COUNT(*) >= all(

SELECT COUNT(*)

FROM dogs

GROUP BY ownerid

The inner query gets the number of dogs owned by each owner. The owner(s) with
the max number of dogs must have a number of dogs that is >= a these rows in
order to be the max. We put this condition in the HAVING rather than the WHERE
cause because it pertains to the groups not the individua rows.

You might also like