Introduction to
Relational Model
Outline
Structure of Relational Databases
Database Schema and Instance
Keys
Relational Query Languages
The Relational Algebra
Example of a Relation: Instructor relation
attributes
(or columns)
tuples
(or rows)
Example of a Relation: teaches relation
Attribute
The set of allowed values for each attribute is called the
domain of the attribute
Roll #: Alphanumeric string
Fname, Lname: Alphabetic string
DoB: Date
Passport #: String (Letter followed by 7 digits)- nullable
Aadhaar #: 12-digit number
Department: Alphabetic string
Attribute values are (normally) required to be atomic;
that is, indivisible
The special value null is a member of every domain. It
indicates that the value is “unknown”
Students
Roll # Fname Lname DoB Passport # Aadhaar # Department
15CS1026 Lalit Dubey 27-Mar-1997 L4032464 172861749239 Computer
16EE3029 Jatin Chopra 17-Nov-1996 null 152861649112 Mathematics
Database Schema and Instance
A1, A2, A3…..,An are attributes
R = (A1, A2, A3…..,An) is a relation schema
Example:
instructor = (ID, name, dept_name, salary)
Formally, for given sets D1, D2,…….Dn, a relation r is a subset of
D1 x D2 x…….x Dn
Thus, a relation is a set of n-tuples (a1, a2, ……., an) where
each ai ϵ Di
The current values (relation instance) of a relation are specified
by a table.
An element t of r is a tuple, represented by a row in a table.
Database Schema and Instance
Database schema -- is the logical structure of the database.
Database instance -- is a snapshot of the data in the
database at a given instant of time.
Example:
schema: instructor (ID, name, dept_name, salary)
Instance:
Relations are Unordered
Order of tuples is irrelevant (tuples may be stored in an
arbitrary order)
Example: instructor relation with unordered tuples
Keys
Let K R
K is a super key of R if values for K are sufficient to identify a
unique tuple of each possible relation r(R)
Example: {ID} and {ID, name} are both superkeys of instructor.
Superkey K is a candidate key if K is minimal
Example: {ID} is a candidate key for Instructor
One of the candidate keys is selected to be the primary key.
A surrogate key (or synthetic key) in a database is a unique
identifier for an entity set in the modeled world (or database)
The surrogate key is not derived from application data, unlike a natural
(or business) key, which is derived from application data
Keys
Super Key: Roll#, {Roll#, DoB}, …………
Candidate Keys: Roll#, Aadhaar#, {Fname, Lname}
Passport# cannot be a key. Why?
Primary Key: Roll#
Secondary/Alternate Key: {Fname, Lname}, Aadhaar#
Simple Key: Consists of a single attribute
Composite Key: {Fname, Lname}
Consists of more than one attribute to uniquely identify an entity
One or more of the attributes, which make up the key, are not simple
keys in their own right
Students
Roll # Fname Lname DoB Passport # Aadhaar # Department
15CS1026 Lalit Dubey 27-Mar-1997 L4032464 172861749239 Computer
16EE3029 Jatin Chopra 17-Nov-1996 null 152861649112 Mathematics
16EC2016 Smriti Mongra 23-Dec-1996 G3200125 190852699005 Electronics
16CE4038 Ramdin Minz 21-Jan-1995 X8300127 165721079131 Civil
Keys
Foreign key constraint: Value in one relation must appear in another
relation
Referencing relation (Jo kar rha hai)
Enrolment: Foreign keys- Roll#, Course#
Referenced relations (Jo use ho rha hai)
Students, Courses
A compound key consists of more than one attribute to uniquely
identify an entity occurrence
Each attribute, which makes up the key, is a simple key in its own right
{Roll#, Course#}
Students
Roll # Fname Lname DoB Passport # Aadhaar # Department
Courses
Course# Course Name Credits L-T-P Department
Enrolment
Roll # Course# Instructor ID
Relational Query Languages
Procedural vs. Non-procedural (or Declarative) languages
Procedural programming requires that the programmer tell
the computer what to do
That is, how to get the output from given inputs
The programmer must know an appropriate algorithm
Non-procedural programming: The programmer does not
provide a specific procedure for obtaining output
The programmer only need to know what relationships hold between
input and output
Example: Root of n is m such that m2 = n
Relational Query Languages
“Pure” query languages:
Relational algebra
Tuple relational calculus
Domain relational calculus
The above 3 pure languages are equivalent in computing
power
We will concentrate here on relational algebra
Relational Algebra
A query language consisting of a set of operations that take
one or two relations as input and produces a new relation
as the output
Basic operators
Select:
Project:
Union:
Set difference: –
Cartesian product: x
Rename:
These operators take one or two relations as inputs and
produce a new relation as output
Select Operation-Selection of rows (tuples)
The select operation selects tuples that satisfy a given predicate
Notation: p(r)
p is called the selection predicate
Example: Select those tuples of the instructor relation where the
instructor is from the “Physics” department
Query
dept_name=“Physics” (instructor)
Result
Select Operation (Cont.)
We allow comparisons using
=, , >, , <,
in the selection predicate.
We can combine several predicates into a larger predicate by using
the connectives:
(and), (or), (not)
Example: Find the instructors in Physics with a salary greater than
$90,000, we write:
dept_name=“Physics” ^ salary > 90000 (instructor)
Select predicate may include comparisons between two attributes
Example: Find all departments whose name is the same as
their building name:
dept_name=building (department)
Select Operation (Cont..)
Relatio
A B C D
nr
1 7
5 7
12 3
23 10
A=B ^ D > 5 (r)
A B C D
1 7
23 10
Project Operation- selection of columns (attributes)
A unary operation that returns its argument relation, with
certain attributes left out
Notation:
A1,A2,A3 ….Ak (r)
where A1, A2 are attribute names and r is a relation name
The result is defined as the relation of k columns obtained
by erasing the columns that are not listed
Duplicate rows removed from result, since relations are
sets
Project Operation (Cont..)
A B C
Relation r:
10 1
20 1
30 1
40 2
• A,C (r) A C A C
1 1
1 = 1
1 2
2
Project Operation (Cont.)
Example: Eliminate the dept_name attribute of instructor
Query:
ID, name, salary (instructor)
Result:
Union Operation
The union operation allows us to combine two relations
Notation: r s
For r s to be valid:
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (Example: 2nd
column of r deals with the same type of values as the 2nd column
of s does)
Union of two relations
Relations r, s: A B A B
1 2
2 3
1 s
r
A B
r s:
1
2
1
3
Union Operation (Cont.)
Example: Find course id of all courses taught in the Fall 2009
semester, or in the Spring 2010 semester, or in both from section
relation
course_id ( semester=“Fall” Λ year=2009 (section))
course_id ( semester=“Spring” Λ year=2010 (section))
Union Operation (Cont.)
Union Operation (Cont.)
Result of:
course_id ( semester=“Fall” Λ year=2009 (section))
course_id ( semester=“Spring” Λ year=2010 (section))
Set difference of two relations
Relations r, s:
A B A B
1 2
2 3
1 s
r
r – s:
A B
1
1
Set Difference Operation
The set-difference operation allows us to find tuples that are in one
relation but are not in another
Notation: r – s
Set differences must be taken between compatible relations
r and s must have the same arity (same number of attributes)
Attribute domains of r and s must be compatible
Example: Find course id of all courses taught in the Fall 2009
semester, but not in the Spring 2010 semester
course_id ( semester=“Fall” Λ year=2009 (section)) −
course_id ( semester=“Spring” Λ year=2010 (section))
Cartesian-Product Operation
Relations r, s:
A B C D E
1 10 a
10 a
2 20 b
r 10 b
r x s: s
A B C D E
1 10 a
1 10 a
1 20 b
1 10 b
2 10 a
2 10 a
2 20 b
2 10 b
Cartesian-Product Operation
The Cartesian-product operation (denoted by X) allows us to
combine information from any two relations
Example: The Cartesian product of the relations instructor and
teaches is written as:
instructor X teaches
We construct tuples of the resultant relation from each possible
pair of tuples- one from the instructor relation and one from the
teaches relation
Since the instructor ID appears in both relations, we distinguish
between these attributes by attaching the name of the relations to
the attribute from which the attribute originally came
instructor.ID
teaches.ID
The Rename Operation
Allows us to give a name corresponding to the results of relational-
algebra expressions
Also allows us to give more than one names to a relation
Example:
x (E)
returns the expression E under the name X
If a relational-algebra expression E has arity n, then
x ( A ,A
1 2 ,..., An )
(E )
returns the result of expression E under the name X, and with the
attributes renamed to A1 , A2 , …., An
Composition of Operations
We can build expressions using multiple operations
Example: A=C(r x s) rxs
r s A B C D E
A B C D E 1 10 a
1 10 a
1 10 a 1 20 b
10 a 1 10 b
2 20 b 2 10 a
10 b 2 10 a
2 20 b
2 10 b
A B C D E
A=C(r x s)
1 10 a
2 10 a
2 20 b
Composition of Operations
The result of a relational-algebra operation produces relation.
Therefore the relational-algebra operations can be composed
together into a relational-algebra expression
Example: Find the names of all instructors from the Physics
department
name( dept_name =“Physics” (instructor))
Instead of giving the name of a relation as the argument of
the projection operation, here we have given an expression
Additional Relational-Algebra Operations
Additional Operations
Set intersection
Natural join
Assignment Operation
Aggregation
Outer Join
Division
Set intersection of two relations
Relation r, s: A B A B
1 2
2 3
1
s
r
A B
rs 2
Set Intersection Operation
The set-intersection operation allows us to find tuples that
are in both the input relations
Notation: r s
Assume:
r, s have the same arity
Attributes of r and s are compatible
Example: Find course id of all courses taught in both the Fall
2009 and the Spring 2010 semesters
course_id ( semester=“Fall” Λ year=2009 (section))
course_id ( semester=“Spring” Λ year=2010 (section))
Result
The Natural-Join Operation
The Cartesian-Product
instructor X teaches
associates every tuple of instructor with every tuple of teaches
Most of the resulting rows have information about instructors who did
NOT teach a particular course
To get only those tuples of “instructor X teaches” that pertain to
instructors and the courses that they taught, we write:
instructor.id = teaches.id (instructor x teaches))
We get only those tuples of “instructor X teaches” that pertain to
instructors and the courses that they taught
The result of this expression, shown in the next slide
The Natural-Join Operation (Cont.)
The table corresponding to:
instructor.id = teaches.id (instructor x teaches))
The Natural-Join Operation (Cont.)
The natural join is a binary operation that allows us to combine
certain selections and a Cartesian product into one operation
It forms a Cartesian product of its two arguments, performs a
selection on those attributes that appear in both relation
schemas, and finally removes duplicate attributes
Notation: r s
Example:
r = (A, B, C, D)
s = (E, B, D)
Resultant schema = (A, B, C, D, E)
r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
The Natural-Join Operation – Example
Relations r, s:
A B C D B D E
1 a 1 a
2 a 3 a
4 b 1 a
1 a 2 b
2 b 3 b
r s
r s
A B C D E
1 a
1 a
1 a
1 a
2 b
The Theta Join Operation
The theta join operation is a variant of the natural-join
operation that allows us to combine a selection and a
Cartesian product into a single operation
The theta join operation is defined as follows:
The Assignment Operation
It is convenient at many times to write a relational-algebra
expression by assigning parts of it to temporary relation variables
The assignment operation is denoted by and works like
assignment in a programming language
Example: Find all instructor details in the “Physics” and “Music”
department
Physics dept_name=“Physics” (instructor)
Music dept_name=“Music” (instructor)
result = Physics Music
With the assignment operation, a query can be written as a
sequential program consisting of a series of assignments followed by
an expression whose value is displayed as the result of the query
Aggregate Functions and Operations
Aggregate functions: Take a collection of
values and returns a single value as a result
avg: Returns average value
min: Returns minimum value
max: Returns maximum value
sum: Returns sum of values
count: Returns number of values
The general form of the aggregation operation
is as follows:
Aggregate Functions – Example
Relation r:
A B C
7
7
3
10
g sum(c) (r) sum(c )
27
Aggregate Operation – Example
There are circumstances where we would like to
apply the aggregate function not to a single set of
tuples, but instead to a group of sets of tuples.
branch_nameaccount_number balance
e.g. Perryridge A-102 400
Perryridge
Relation account A-201 900
Brighton
grouped by
A-217
branch-name:
750
Brighton A-215 750
Redwood A-222 700
branch_name g sum(balance) (account)
branch_name sum(balance)
Perryridge 1300
Brighton 1500
Redwood 700
Aggregate Functions (Cont.)
Result of aggregation does not have a name
We can use rename operation to give it a name
For convenience, we permit renaming as part of aggregate operation
branch_name g sum(balance) as sum_balance
(account)
Aggregate Functions (Cont.)
There are cases where we must eliminate multiple occurrences of
a value before computing an aggregate function
If we want to eliminate duplicates, we use hyphenated string
“distinct” at the end of the “count” function
Example: Find the total number of instructors who teach a course
in the Spring 2010 semester using teaches table
In this case, an instructor counts only once, regardless of the number
of course sections that the instructor teaches
Outer Join
An extension of the join operation that avoids loss of information
Computes the join and then adds tuples from one relation that does not
match tuples in the other relation to the result of the join
Use null values:
null signifies that the value is unknown or does not exist
Outer Join – Example
Relation loan
loan_numberbranch_nameamount
L-170 Downtown 3000
L-230 Redwood 4000
L-260 Perryridge 1700
Relation borrower
customer_name
loan_number
Jones L-170
Smith L-230
Hayes L-155
Outer Join – Example
Join: loan borrower
loan_number branch_nameamount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
• Left Outer Join: Takes all tuples in the left relation that did
not match with any tuple in the right relation, pads the
tuples with null values for all other attributes from the right
relation, and adds them to the result of the natural join
loan borrower
loan_number branch_nameamount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-260 Perryridge 1700 null
Outer Join – Example
• Right Outer Join: Takes all tuples in the right relation
that did not match with any tuple in the left relation,
pads the tuples with null values for all other attributes
from the left relation, and adds them to the result of
the natural join
loan borrower
loan_number branch_nameamount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-155 null null Hayes
Outer Join – Example
• Full Outer Join: Does both the left and right outer join
operations
loan borrower
loan_number branch_nameamount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-260 Perryridge 1700 null
L-155 null null Hayes
Division Operation
r
s
Notation:
Let r and s be relations on schemas R
and S respectively where
R = (A1, …, Am , B1, …, Bn )
S = (B1, …, Bn)
The result of r s is a relation on schema
R – S = (A1, …, Am)
r s = { t | t R-S (r) u s
( tu r ) }
Division Operation – Example
Relations r, s:
A B B
1
1
2
3 2
1 s
1
1
3
4
6
1
2
r s: A r
Another Division Example
Relations r,
s: A B C D E D E
a a 1 a 1
a a 1 b 1
a b 1 s
a a 1
a b 3
a a 1
a b 1
a b 1
r
r s:
A B C
a
a
End