[go: up one dir, main page]

0% found this document useful (0 votes)
9 views52 pages

02 - Relational Model

Uploaded by

Ashraf Bawer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views52 pages

02 - Relational Model

Uploaded by

Ashraf Bawer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Data Management and Analysis

Giuseppe Perelli
perelli@di.uniroma1.it

Slide credits: Prof. Maria de Marsico, Prof. Marina Moscarini, and Prof. Maurizio Mancini, Sapienza University of Rome
Relational model

∙ proposed by E. F. Codd in 1970 to promote


data independence

∙ available in real DBMS in 1981 (not easy to


implement independence with efficiency
and reliability!).

2
Relational model

● based on the mathematical notion of relation

● relations translate naturally into tables

● data and relationships/associations between


data of different sets (relations/tables) are
represented as values

● careful with the different meanings of


relation and relationship...
3
Definitions 1

∙ domain: a (possibly infinite) set of values


∙ examples:
− the set of integers is a domain
− the set of decimal numbers is a domain
− the set of character strings of length = 20
is a domain
− {0,1} is a domain

4
Definitions 2

let D1,D2,.....Dk be domains, not necessarily distinct;


the Cartesian product of these domains, denoted by:

D1 x D2 x..... x Dk
is the set

{(v1, v2, ....., vk) | v1∊D1, v2∊D2, ..... Vk∊Dk}

ordered list of values such that belongs to

5
Definitions 3

∙ mathematical relation is any subset of the Cartesian


product of one or more domains

● a relation that is a subset of the Cartesian product of k


domains is said to be of degree k

● the elements of a relation are called tuples (or n-uples or


ennuples): the number of tuples of a relation is its cardinality

● each tuple of a relation of degree k has k ordered


components (the i-th value comes from the i-th domain) but
there is no ordering among tuples

● the tuples of a relation are all distinct (at least for a value)...
as a relation is a set!

6
Definitions -
example

∙ suppose k = 2

• D1 = {white, black}, D2 = {0, 1, 2}

D1 X D2 = {(white, 0), (white, 1), (white, 2), (black, 0),


(black, 1), (black, 2)}

so, {(white, 0), (black, 0), (black, 2)} is a relation of


degree 2, cardinality 3 and with tuples {(white, 0),
(black, 0), (black, 2)}

{(black, 0), (black, 2)} is a relation of degree 2,


cardinality 2 and with tuples {(black, 0), (black, 2)}
7
Report - example

• D1={a,b} ax
• D2={x,y,z} ay
az
• Cartesian product D1 × D2 ax
by
bz

ax
• relation r ⊆ D1 × D2 az
ay
8
Report - example

∙ almost always we use well-known domains


also used in programming languages

• the following is a relation with two 4-ple on


String, String, Integer, Real

Paolo Rossi 2 26.5


Mario Bianchi 10 28.7

9
Notation

∙ given r, a relation of degree k


∙ given t, a tuple in r
∙ if i is an integer in {1,...,k}
t[i] (or t.i) indicates the i-th component of t

∙ example:
- r = {(0,a), (0,c),(1,b)}
- t = (0,a) is a tuple of r 0 a
- t[2] = a
- t[1] = 0
0 c
- t[1,2] = (0, a) 1 b
10
Relations and Tables

∙how do we interpret the data in the table?

Paolo Rossi 2 26.5


Mario Bianchi 10 28.7

11
Relations and Tables

∙solution:
∙ assign names to the table and the columns

Student Name Surname Exams Avg

Paolo Rossi 2 26.5


Mario Bianchi 10 28.7

from data to information!


12
Relations and Tables

∙ an attribute is defined by name A and its domain


which we denote by dom(A)

∙ let R be a set of attributes: an ennuple (tuple) on R is


a function defined on R that associates with each
attribute A in R an element of dom(A)

• if t is an ennuple in R and A is an attribute in R, then


by t(A) we will denote the value taken by the
function t at the attribute A

13
Reports and Tables

relation schema

relation name attributes


Student Name Surname Exams Avg

14
Reports and Tables

Student Name Surname Exams Avg

tuples Paolo Rossi 2 26.5


Mario Bianchi 10 28.7

relation instance
15
Relations and tables: summary

• a relation can be regarded as a table in which each row is


a distinct tuple and each column corresponds to a
component (with homogeneous values, i.e., coming from
the same domain)

• the columns correspond to the domains D1,D2,.....Dk


having unique self-explanatory names within the table

• the pair (attribute name, domain) is called an attribute;


the set of attributes of a relation is called schema

• if R is a relation and its attributes are


A1, A2,.....,Ak, the schema is often indicated as:
R(A1, A2,.....,Ak)
16
Schemas and instances

• relation schema: a relation name R with a set of attribute names:


R(A1, A2,.....,Ak)

• the schema of a relation is invariant over time, and describes its


structure (intentional aspect)

• the instance of a relation with schema R(X): a set r of tuples on X

• the instance contains the current values, which can also change
very quickly (extensional aspect)

17
Relations and databases

• database schema: a set of relation


schemas with different names

• relational database schema: set


R1, R2, ..., Rn of schemas

• relational database with schema R1, R2, ...,


Rn: set r1, r2, ..., rn where ri is a relation
instance with schema Ri

18
Example

∙ Schema: Info_City(City,Region,Population)

• Info_City instance:

City Region Population


Rome Lazio 3000000
Milan Lombardy 1500000
Genoa Liguria 800000
Pisa Tuscany 150000
19
Relations and Tables

∙ in the definition of a relational model, the components of


a relation are indicated by the names of attributes,
rather than by their position

• t[Ai] indicates the value of the attribute with name Ai of


the tuple t

• if t is the second tuple in the previous example, then


t[Region] = Lombardy

• if Y is a subset of attributes of the schema X of a relation


(Y⊆X) then t[Y] is the subset of values in the tuple t that
correspond to attributes in Y (also called restriction of t)

20
To recap

∙ Object = tuple (implemented as a record)


∙ Fields = Information of interest ->
schema of the relation

• Subject = "Staff Member"


• Information of interest = Code, Surname,
First name, Role, Hiring year
CODE SURNAME NAME ROLE HIRING

COD1 Rossi Mario Analyst 1995


21
To recap

∙ Table = Set of tuples of homogeneous type


a particular INSTANCE of the relation

∙ STAFF table = Set of tuples of type "Staff


Member".
CODE SURNAM NAME ROLE HIRING
E
COD1 Rossi Mario Analyst 1995

COD2 Bianchi Peter Analyst 1990


COD3 Neri Paolo Administrator 1985

22
The model is based on values

∙ in the relational model, references between


data in different relations are represented
by means of domain values that appear in
the ennuples

23
students Matric Surname e Name Date of birth
6554 Rossi Mario 05/12/1978
8765 Neri Paolo 03/11/1976
9283 Greens Luisa 12/11/1979
3456 Rossi Maria 01/02/1978

exams Student Grade Course


3456 30 04
3456 24 02
courses 9283 28 01
Code Title Lecture
6554 26
Chemistry 01
01 rMario
02 Math
Bruni
04 Chemistry
Verdi
Null values

• NULL values represent lack of information or the fact that the


information is not applicable
• e.g. phone number:
− the person doesn't have a phone
− I don't know if the person has a phone
− the person has a phone, but I don't know the number

• we can't avoid to enter a value (the tuple must adhere to the


schema)! we can define a default value...

• warning: some attributes should never assume null values


− student ID/matriculation
− examination grades

25
Null value

∙ bad habit: "unused" domain values


− they could be used later
− they might skew the calculations (we know an
employee's salary, but a 0 weighs in an average
value calculation like any other number)
∙ special value: NULL
∙ NULL: polymorphic value = does not belong to any
domain but can replace values in any domain
∙ two NULL values, even on the same domain, are
considered different
• Warning: NULL is not 0 (integer)

26
Too many null values

students Matric Surname e Name Date of birth

6554 Rossi Mario 05/12/1978


9283 Greens Luisa 12/11/1979
NULL Rossi Maria 01/02/1978

exams
Student Vote Course courses
NULL 30 NULL Course Title Lecturer
NULL 24 02 01 NULL NULL
Math
9283 28 01
02 Chemis 02
try
04 01

28
An "incorrect" database

EMPLOYEE
CODE SURNAME NAME ROLE HIRING DEPT
COD1 Rossi Mario Analyst 1795 01

COD2 Bianchi Luigi Analyst 1990 05

COD2 Neri Paolo Admin 1985 01

DEPARTMENT
NUMBER NAME
01 Management what's wrong? syntactically, it is
correct...
02 Administration
28
An "incorrect" database

EMPLOYEE
CODE SURNAME NAME ROLE HIRING DEPT
COD1 Rossi Mario Analyst 1795 01

COD2 Bianchi Luigi Analyst 1990 05

COD2 Neri Paolo Administ 1985 01


rator
DEPARTMENT
NUMBER NAME
01 Management

02 Administration
29
An "incorrect" database

Exams Student Grade Honor Course

276545 32 01
276545 30 Yes 02
787643 27 Yes 03
739430 24 04

Students Matric Surname Name

276545 Rossi Mario


787643 Neri Piero
787643 Bianchi Luca

30
Integrity constraints

• integrity constraint: property that must be satisfied by


every instance of the database

• constraints describe specific properties of the scope,


and therefore of the information related to it modeled
through the database

• a database instance is correct if it satisfies all integrity


constraints associated with its schema

31
How to avoid "violations"

EMPLOYEE
CODE SURNAME NAME ROLE HIRING DEPT

COD1 Rossi Mario Analyst 1795 01

COD2 Bianchi Peter Analyst 1990 05

COD2 Neri Paolo Administ 1985 01


rator
DEPARTMENT
(HIRING > 1980)
NUMBER NAME
01 Management CODE UNIQUE
DEPT REFERENCES DEPARTMENT.NUMBER
02 Administration
32
How to avoid "violations"

Exams Student Grade Honor Course


(Grade>=18) AND (Grade<=31)
276545 32 01
276545 30 Yes 02 (Vote=30) OR NOT (Honor="yes")
787643 27 Yes 03 intra-relational constraints
739430 24 04

Student references Students.Matric inter-relational constraints

Matric Surname Name


Students Unique Matric
276545 Rossi Mario
787643 Neri Piero
787643 Bianchi Luca
33
Integrity constraints

• intra-relational constraints: defined on


single attribute values (domain) or
between values of the same tuple or
between tuples of the same relation

• interrelational constraints: defined


between multiple relations

34
Integrity constraints

∙ Domain Constraints
− HIRING > 1980
− (Grade>=18) AND (Grade<=31)
∙ Tuple Constraints
– (Grade=31) OR NOT (Honor = "yes")
∙ Uniqueness constraints
– Matric UNIQUE
∙ Constraints between values in tuples of different relations
– DEPT REFERENCES DEPARTMENT.NUMBER
∙ Primary key constraints
– unique
– not null
∙ Value existence constraints
– not null
35
Keys 1

∙ we need to uniquely identify the tuples of an


instance

• a relation key (not necessarily unique) is an


attribute or set of attributes that uniquely
identifies a tuple

36
Keys 2

• a set X of attributes of a relation R is a key


of R if it satisfies the following conditions:

• 1) for each instance of R, there do not exist


two distinct tuples t1 and t2 having the same
values for all attributes in X, i.e., such that
t1[X] = t2[X]

• 2) there does not exist a proper subset of X


satisfying condition 1)

37
Example

∙ instance of Staff:
ID SURNAME NAME ROLE HIRING
COD1 Rossi Mario Analyst 1995

COD2 Bianchi Peter Analyst 1990

COD3 Neri Paolo Admin 1985

Key(s)? based on this instance, any attribute except


Role could be a key.... but it is correct?

38
Example

∙ Info_City report request

City Region Population


Rome Lazio 3000000
Milan Lombardy 1500000
Genoa Liguria 800000
Pisa Tuscany 150000
Key? does it depend... on the "size" of the city?
39
Keys 3

• a relation could have several alternative keys

• we choose the most used one or the one


consisting of a smaller number of attributes =
primary key

• the primary key does not allow null values

• there is always at least one key. Why? there


can't be identical tuples!

• the keys allow us to refer to data in different


tables
40
Example

∙ instance of Staff
TAXCODE CODE SURNAME NAME ROLE HIRING
CSR... COD1 Rossi Mario Analyst 1995

BA... COD2 Bianchi Peter Analyst 1990

NRI... COD3 Neri Paolo Admin 1985

- keys?
- according to the definition, is it possible that
(Surname, Role) is a key?
41
Interrelational Constraints

∙ referential integrity constraint (foreign key):


portions of information in different relations
are associated by (the same) key values
∙ the first relation refers to the value of an
attribute or set of attributes that should
appear in the second relation
∙ a referential integrity constraint between
the X attributes of a relation R1 and another
relation R2 forces the values on X in R1 to
appear as values of the primary key of R2
42
Road violations
Code Date Officer Prov Plate
3987
34321 1/2/95 3987
3295 MI 39548K
53524 4/3/95 3295
3295 TO E39548
64521 5/4/96 3295
9345 PR 839548
73321 5/2/98 9345 PR 839548
Officers ID Surname Name
3987 Bianchi Luca
3295 Gialli Piero
9345 Rossi Mario
7543 Verdi Gino 4
4
Road violations
Code Date Officer Pro Plate
M v 39548K
34321 1/2/95 3987 ITMI 39548K
E3954
53524 4/3/95 3295 PTO
O 8E39548
39548
64521 5/4/96 3295 PPR
R 839548
839548
73321 5/2/98 9345 RPR 839548
Cars Prov Plate Surname Name
MMI E39548 Rossi
39548K Mario
IT
TO F34268 Rossi
E3954 Mario
O
PR
P 8 39548 Neri
839548 Luigi
44
Interrelational Constraints

∙ referential integrity constraints between:


− the Officer attribute of the relation Road
violations and the attribute ID (key) of the relation
Officers
− the attributes Prov and Plate of Road violations
and the attributes Prov and Plate (key) of the
relation Cars

45
Breach of referential integrity
constraint

Road violations
Code Date OfficerPro Plate
v
34321 1/2/95 3987 TMI E3954
39548K
53524 4/3/95 3295 OTO 8E39548
64521 5/4/96 3295 PR 839548
Cars
73321 Prov Surname
Plate 9345
5/2/98 PR Name
839548
MI E39548 Rossi
E3954 Mario
BEWARE OF
CONSTRAINTS TO
T 8 34268 Rossi
F Mario
ON MULTIPLE
ATTRIBUTES PR
O 839548 Neri Luigi
46
Referential integrity constraints:
comments

● they play a key role in the concept of a


"values-based model" (the relational model)

● in the presence of null values the constraints


can be made less restrictive

● it is possible to define compensatory


"actions" following violations

● beware of constraints on multiple attributes

47
Referential integrity and null
values

Employees ID a Project Name


34321 Rossi IDEA
The 53524 Neri XYZ
referential 64521 Greens NULL
integrity
constraint 73032 Bianchi IDEA
is not Projects
violated by Name Start Duratio Cost
the NULL
n
value
IDEA 01/2000 36 200
XYZ 07/2001 24 120
48

BOH 09/2001 24 150


Multiple Constraints on Multiple
Attributes

● not all properties of interest can be represented by


explicit constraints in the logic model ...

● ...but that will be covered in unit 2

● how do we formalize the constraints?

● intra-relational constraints ⊇ relations between


attribute values of the same tuple -> functional
dependencies defined on the same schema

49
Functional dependencies

∙ a functional dependency establishes a semantic link


between two non-empty sets of attributes X and Y
belonging to a schema R
∙ this constraint is written X → Y and is read:
● "X determines Y"

50
Functional dependencies: example

∙ suppose we have a relation scheme Flights (Code,


Day, Pilot, Time)
∙ with the constraints, dictated by common sense:
− a flight with a certain code always leaves at the same time
− there is only one flight with a given pilot, on a given day at a
given time
− there is only one pilot of a given flight on a given day
∙ constraints give raise to the functional dependencies:
− Code → Time
− {Day, Pilot, Time} → Code
− {Code, Day} → Pilot

51
Functional dependencies: are satisfied if...

∙ we say that a relation r with schema R satisfies the


functional dependence X → Y if:

− (i) the functional dependence X → Y is applicable to R,


in the sense that both X and Y are subsets of R;
− (ii) ennuples in r that are identical on X are also identical
on Y, i.e., for each pair of ennuples t1 and t2 in r:
t1[X] = t2[X] ⇒ t1[Y] = t2[Y].

important note: right arrow means that if tuples are equal


on X, then they must also be equal on Y

52

You might also like