[go: up one dir, main page]

0% found this document useful (0 votes)
38 views59 pages

Unit 1-DBP

The document discusses the relational data model and relational database concepts. It defines key terms like entities, attributes, relationships, and ER diagrams. It explains how ER diagrams model entities as objects and relationships between entities, and how they are converted to relational tables. The document also defines relational model concepts like relations, tuples, domains, keys, and provides an example of a student relation and its attributes.

Uploaded by

Hafiz Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views59 pages

Unit 1-DBP

The document discusses the relational data model and relational database concepts. It defines key terms like entities, attributes, relationships, and ER diagrams. It explains how ER diagrams model entities as objects and relationships between entities, and how they are converted to relational tables. The document also defines relational model concepts like relations, tuples, domains, keys, and provides an example of a student relation and its attributes.

Uploaded by

Hafiz Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Unit – 1 Relational Data Model

What is Data?
In simple words, data can be facts related to any object in consideration. For example, your name,
age, height, weight, etc. are some data related to you. A picture, image, file, pdf, etc. can also be
considered data.

Define Database :
A database is an organized collection of structured information, or data, typically stored
electronically in a computer system. A database is usually controlled by a database management
system (DBMS). ... The data can then be easily accessed, managed, modified, updated, controlled,
and organized.

ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This
model is used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy to
design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-
relationship diagram.

For example, Suppose we design a school database. In this database, the student will be an entity
with attributes like address, name, id, age, etc. The address can be another entity with attributes
like city, street name, pin code, etc and there will be a relationship between them.

Component of ER Diagram

1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be


taken as an entity.

a. Weak Entity
10 SecQL CREATE TABLE

An entity that depends on another entity called a weak entity. The weak entity doesn't contain any
key attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary
key. The key attribute is represented by an ellipse with the text underlined.

b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an ellipse.

c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued attribute.
The double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.

d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute like
Date of birth.

3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the
relationship.

Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known as one to one relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the right associates
with the relationship then this is known as a one-to-many relationship.

For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.

c. Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on the right associates
with the relationship then it is known as a many-to-one relationship.

For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.

Participation Constraints
 Total Participation − Each entity is involved in the relationship. Total participation is
represented by double lines.
 Partial participation − Not all entities are involved in the relationship. Partial participation is
represented by single lines.
Example-Total

Here,

 Double line between the entity set “Student” and relationship set “Enrolled in” signifies total participation.
 It specifies that each student must be enrolled in at least one course.

Example-

Here,

 Single line between the entity set “Course” and relationship set “Enrolled in” signifies partial participation.
 It specifies that there might exist some courses for which no enrollments are made.

What is Relational Model?


Relational Model (RM) represents the database as a collection of relations. A relation is nothing but
a table of values. Every row in the table represents a collection of related data values. These rows in
the table denote a real-world entity or relationship.
The table name and column names are helpful to interpret the meaning of values in each row. The
data are represented as a set of relations. In the relational model, data are stored as tables. However,
the physical storage of the data is independent of the way the data are logically organized.
Relational Model concept
Relational model can represent as a table with columns and rows. Each row is known as a tuple.
Each table of the column has a name or attribute.

Domain: It contains a set of atomic values that an attribute can take.

 The domain of Marital Status has a set of possibilities: Married, Single, Divorced.
 The domain of Shift has the set of all possible days: {Mon, Tue, Wed…}.
 The domain of Salary is the set of all floating-point numbers greater than 0 and less than
200,000.
 The domain of First Name is the set of character strings that represents names of people.

Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a
domain, dom(Ai)

Relational instance: In the relational database system, the relational instance is represented by a
finite set of tuples. Relation instances do not have duplicate tuples.

Relational schema: A relational schema contains the name of the relation and name of all
columns or attributes.

Relational key: In the relational key, each row has one or more attributes. It can identify the row
in the relation uniquely.

Degree: The degree is the number of attributes in a table.

Example: STUDENT Relation

NAME ROLL_NO PHONE_NO ADDRESS AGE

Ram 14795 7305758992 Noida 24


Shyam 12839 9026288936 Delhi 35

Laxman 33289 8583287182 Gurugram 20

Mahesh 27857 7086819134 Ghaziabad 27

Ganesh 17282 9028 9i3988 Delhi 40

o In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the
attributes.
o The instance of schema STUDENT has 5 tuples.
o t3 = <Laxman, 33289, 8583287182, Gurugram, 20>

Properties of Relations

o Name of the relation is distinct from all other relations.


o Each relation cell contains exactly one atomic (single) value
o Each attribute contains a distinct name
o Attribute domain has no significance
o tuple has no duplicate value
o Order of tuple can have a different sequence

Keys :
atomic value: each value in the domain is indivisible as far as the relational model is
concernedattribute: principle storage unit in a database

column: see attribute

degree: number of attributes in a table

domain: the original sets of atomic values used to model data; a set of acceptable values that a
column is allowed to contain

field: see attribute

file:see relation

record: contains fields that are related; see tuple

relation: a subset of the Cartesian product of a list of domains characterized by a name; the technical
term for table or file

row: see tuple


structured query language (SQL): the standard database access language

table:see relation

tuple: a technical term for row or record


Degree Of Relationship :

Denotes the number of entity types that participate in a relationship

Types:

1)Unary Relationship:

Exists when there is an association with only one entity.

Eg : PERSON is Married to

PERSON is entity

Married to is relationship

2)Binary Relationship :

Exists when there is an association among 2 entities.

Eg)PUBLISHER publishes BOOK

3)Ternary Relationship :

Exists when there is an association among 3 entities

Eg)SUBJECT teaches TEACHER teaches STUDENT

Mapping from ER Model to Relational Model


After designing the ER diagram of system, we need to convert it to Relational models which can
directly be implemented by any RDBMS like Oracle, MySQL etc. In this article we will discuss how to
convert ER diagram to Relational Model for different scenarios.
In data modelling terms, cardinality is how one table relates to another.
Relationship constraints :
1.Cardinality Ratio: Maximum number of relationship instances that an entity can
participate in.
Case 1: Binary Relationship with 1:1 cardinality with total participation of an entity
A person has 0 or 1 passport number and Passport is always owned by 1 person. So it is 1:1 cardinality
with full participation constraint from Passport.

First Convert each entity and relationship to tables. Person table corresponds to Person Entity with
key as Per-Id. Similarly Passport table corresponds to Passport Entity with key as Pass-No. HashTable
represents relationship between Person and Passport (Which person has which passport). So it will take
attribute Per-Id from Person and Pass-No from Passport.

Person Has Passport

Per-Id Other Person Attribute Per-Id Pass-No Pass-No Other PassportAttribute

PR1 – PR1 PS1 PS1 –

PR2 – PR2 PS2 PS2 –

PR3 –

Table 1
As we can see from Table 1, each Per-Id and Pass-No has only one entry in Hashtable. So we can
merge all three tables into 1 with attributes shown in Table 2. Each Per-Id will be unique and not null.
So it will be the key. Pass-No can’t be key because for some person, it can be NULL.

Per-Id Other Person Attribute Pass-No Other PassportAttribute

Table 2
Case 2: Binary Relationship with 1:1 cardinality and partial participation of both entities
A male marries 0 or 1 female and vice versa as well. So it is 1:1 cardinality with partial participation
constraint from both. First Convert each entity and relationship to tables. Male table corresponds to
Male Entity with key as M-Id. Similarly Female table corresponds to Female Entity with key as F-Id.
Marry Table represents relationship between Male and Female (Which Male marries which female). So
it will take attribute M-Id from Male and F-Id from Female.

Male Marry Female

M-Id Other Male Attribute M-Id F-Id F-Id Other FemaleAttribute

M1 – M1 F2 F1 –

M2 – M2 F1 F2 –

M3 – F3 –

Table 3
As we can see from Table 3, some males and some females do not marry. If we merge 3 tables into 1,
for some M-Id, F-Id will be NULL. So there is no attribute which is always not NULL. So we can’t
merge all three tables into 1. We can convert into 2 tables. In table 4, M-Id who are married will have
F-Id associated. For others, it will be NULL. Table 5 will have information of all females. Primary
Keys have been underlined.

M-Id Other Male Attribute F-Id

Table 4

F-Id Other FemaleAttribute


Table 5
Note: Binary relationship with 1:1 cardinality will have 2 table if partial participation of both entities in
the relationship. If atleast 1 entity has total participation, number of tables required will be 1.
Case 3: Binary Relationship with n: 1 cardinality

In this scenario, every student can enroll only in one elective course but for an elective course there can
be more than one student. First Convert each entity and relationship to tables. Student table
corresponds to Student Entity with key as S-Id. Similarly Elective_Course table corresponds to
Elective_Course Entity with key as E-Id. Enrolls Table represents relationship between Student and
Elective_Course (Which student enrolls in which course). So it will take attribute S-Id from and Student
E-Id from Elective_Course.

Student Enrolls Elective_Course

S-Id Other Student Attribute S-Id E-Id E-Id Other Elective CourseAttribute

S1 – S1 E1 E1 –

S2 – S2 E2 E2 –

S3 – S3 E1 E3 –

S4 – S4 E1

Table 6
As we can see from Table 6, S-Id is not repeating in Enrolls Table. So it can be considered as a key of
Enrolls table. Both Student and Enrolls Table’s key is same; we can merge it as a single table. The
resultant tables are shown in Table 7 and Table 8. Primary Keys have been underlined.

S-Id Other Student Attribute E-Id


Table 7

E-Id Other Elective CourseAttribute

Table 8
Case 4: Binary Relationship with m: n cardinality

In this scenario, every student can enroll in more than 1 compulsory course and for a compulsory course
there can be more than 1 student. First Convert each entity and relationship to tables. Student table
corresponds to Student Entity with key as S-Id. Similarly Compulsory_Courses table corresponds to
Compulsory Courses Entity with key as C-Id. Enrolls Table represents relationship between Student
and Compulsory_Courses (Which student enrolls in which course). So it will take attribute S -Id from
Person and C-Id from Compulsory_Courses.

Student Enrolls Compulsory_Courses

S-Id Other Student Attribute S-Id C-Id C-Id Other Compulsory CourseAttribute

S1 – S1 C1 C1 –

S2 – S1 C2 C2 –

S3 – S3 C1 C3 –

S4 – S4 C3 C4 –

S4 C2

S3 C3

Table 9
As we can see from Table 9, S-Id and C-Id both are repeating in Enrolls Table. But its combination is
unique; so it can be considered as a key of Enrolls table. All tables’ keys are different, these can’t be
merged. Primary Keys of all tables have been underlined.
Case 5: Binary Relationship with weak entity

In this scenario, an employee can have many dependents and one dependent can depend on one
employee. A dependent does not have any existence without an employee (e.g; you as a child can be
dependent of your father in his company). So it will be a weak entity and its participation will always
be total. Weak Entity does not have key of its own. So its key will be combination of key of its
identifying entity (E-Id of Employee in this case) and its partial key (D-Name).
First Convert each entity and relationship to tables. Employee table corresponds to Employee Entity
with key as E-Id. Similarly Dependents table corresponds to Dependent Entity with key as D-Name
and E-Id. HashTable represents relationship between Employee and Dependents (Which employee has
which dependents). So it will take attribute E-Id from Employee and D-Name from Dependents.

Employee Has Dependents

E-Id Other Employee Attribute E-Id D-Name D-Name E-Id Other DependentsAttribute

E1 – E1 RAM RAM E1 –

E2 – E1 SRINI SRINI E1 –

E3 – E2 RAM RAM E2 –

E3 ASHISH ASHISH E3 –

Table 10
As we can see from Table 10, E-Id, D-Name is key for Has as well as Dependents Table. So we can
merge these two into 1. So the resultant tables are shown in Tables 11 and 12. Primary Keys of all
tables have been underlined.
E-Id Other Employee Attribute

Table 11

D-Name E-Id Other DependentsAttribute

Table 12
ER Model, when conceptualized into diagrams, gives a good overview of entity-relationship, which is easier to
understand. ER diagrams can be mapped to relational schema, that is, it is possible to create relational schema
using ER diagram. We cannot import all the ER constraints into relational model, but an approximate schema can
be generated.
There are several processes and algorithms available to convert ER Diagrams into Relational Schema. Some of
them are automated and some of them are manual. We may focus here on the mapping diagram contents to
relational basics.
ER diagrams mainly comprise of −

 Entity and its attributes


 Relationship, which is association among entities.

Mapping Entity
An entity is a real-world object with some attributes.

Mapping Process (Algorithm)


 Create table for each entity.
 Entity's attributes should become fields of tables with their respective data types.
 Declare primary key.

Mapping Relationship
A relationship is an association among entities.
Mapping Process
 Create table for a relationship.
 Add the primary keys of all participating Entities as fields of table with their respective data types.
 If relationship has any attribute, add each attribute as field of table.
 Declare a primary key composing all the primary keys of participating entities.
 Declare all foreign key constraints.

Mapping Weak Entity Sets


A weak entity set is one which does not have any primary key associated with it.

Mapping Process
 Create table for weak entity set.
 Add all its attributes to table as field.
 Add the primary key of identifying entity set.
 Declare all foreign key constraints.

Mapping Hierarchical Entities


ER specialization or generalization comes in the form of hierarchical entity sets.
Mapping Process
 Create tables for all higher-level entities.
 Create tables for lower-level entities.
 Add primary keys of higher-level entities in the table of lower-level entities.
 In lower-level tables, add all other attributes of lower-level entities.
 Declare primary key of higher-level table and the primary key for lower-level table.
 Declare foreign key constraints.

Generalization, Specialization and Aggregation in ER Model


Generalization –
Generalization is the process of extracting common properties from a set of entities and create
a generalized entity from it. It is a bottom-up approach in which two or more entities can be
generalized to a higher level entity if they have some attributes in common. For Example,
STUDENT and FACULTY can be generalized to a higher level entity called PERSON as
shown in Figure 1. In this case, common attributes like P_NAME, P_ADD become part of
higher entity (PERSON) and specialized attributes like S_FEE become part of specialized
entity (STUDENT).
Specialization –
In specialization, an entity is divided into sub-entities based on their characteristics. It is a top-
down approach where higher level entity is specialized into two or more lower level entities.
For Example, EMPLOYEE entity in an Employee management system can be specialized into
DEVELOPER, TESTER etc. as shown in Figure 2. In this case, common attributes like
E_NAME, E_SAL etc. become part of higher entity (EMPLOYEE) and specialized attributes
like TES_TYPE become part of specialized entity (TESTER).
Aggregation –
An ER diagram is not capable of representing relationship between an entity and a
relationship which may be required in some scenarios. In those cases, a relationship wi th its
corresponding entities is aggregated into a higher level entity. Aggregation is an abstraction
through which we can represent relationships as higher level entity sets.
For Example, Employee working for a project may require some machinery. So, REQUIRE
relationship is needed between relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is
aggregated into single entity and relationship REQUIRE is created between aggregated entity
and MACHINERY.
Relational Algebra
Relational database systems are expected to be equipped with a query language that can assist its users to query
the database instances. There are two kinds of query languages − relational algebra and relational calculus.

Relational Algebra
Relational Algebra is procedural query language, which takes Relation as input and generate relation as
output. Relational algebra mainly provides theoretical foundation for relational databases and SQL.

It uses operators to perform queries. An operator can be either unary or binary. They accept relations as their
input and yield relations as their output. Relational algebra is performed recursively on a relation and
intermediate results are also considered relations.
Relational algebra is a procedural query language that works on relational model. The purpose of a query
language is to retrieve data from database or perform various operations such as insert, update, delete on
the data. When I say that relational algebra is a procedural query language, it means that it tells what data
to be retrieved and how to be retrieved.

On the other hand relational calculus is a non-procedural query language, which means it tells what data
to be retrieved but doesn’t tell how to retrieve it.

The fundamental operations of relational algebra are as follows −

 Select
 Project
 Union
 Set different
 Cartesian product
 Rename
We will discuss all these operations in the following sections.

Select Operation (σ)


It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula which may use
connectors like and, or, and not. These terms may use relational operators like : =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books published after
2010.
Selection is used to select required tuples of the relations.
for the above relation
σ (c>3)R
will select the tuples which have c more than 3.
Note: selection operator only selects the required tuples but does not display them. For displaying, data
projection operator is used.
For the above selected tuples, to display we need to use projection also.
PBM :
R
(A B C)
----------
1 2 4
2 2 3
3 2 3
4 3 4

π (σ (c>3)R ) will show following tuples.

A B C
-------
1 2 4
4 3 4
Project Operation (∏)
It projects column(s) that satisfy a given predicate.
Notation − ∏A , A , A (r)
1 2 n

Where A , A , A are attribute names of relation r.


1 2 n

Duplicate rows are automatically eliminated, as relation is a set.


For example −
∏subject, author (Books)
Selects and projects columns named as subject and author from the relation Books.
Projection is used to project required column data from a relation.
Example :
R
(A B C)
----------
1 2 4
2 2 3
3 2 3
4 3 4
π (BC)
B C
-----
2 4
2 3
3 4
Note: By Default projection removes duplicate data.

Union Operation (∪)


Union operation in relational algebra is same as union operation in set theory, only constraint is for
union of two relation both relation must have same set of Attributes.

It performs binary union between two given relations and is defined as −


r ∪ s = { t | t ∈ r or t ∈ s}
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −

 r, and s must have the same number of attributes.


 Attribute domains must be compatible.
 Duplicate tuples are automatically eliminated.
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have either written a book or an article or both.
Syntax
∏regno(R1) ∪ ∏regno(R2)
It displays all the regno of R1 and R2.
Example
Consider two tables R1 and R2 −
Table R1 is as follows −

Regno Branch Section

1 CSE A

2 ECE B

3 MECH B

4 CIVIL A

5 CSE B

Table R2 is as follows −

Regno Branch Section

1 CIVIL A

2 CSE A

3 ECE B

To display all the regno of R1 and R2, use the following command −
∏regno(R1) ∪ ∏regno(R2)
Output
Regno

Union
Union combines two different results obtained by a query into a single result in the form of a table.
However, the results should be similar if union is to be applied on them. Union removes all duplicates, if
any from the data and only displays distinct values. If duplicate values are required in the resultant data,
then UNION ALL is used.
An example of union is −
Select Student_Name from Art_Students
UNION
Select Student_Name from Dance_Students
This will display the names of all the students in the table Art_Students and Dance_Students i.e John,
Mary, Damon and Matt.

Intersection
The intersection operator gives the common data values between the two data sets that are intersected.
The two data sets that are intersected should be similar for the intersection operator to work.
Intersection also removes all duplicates before displaying the result.
An example of intersection is −
Select Student_Name from Art_Students
INTERSECT
Select Student_Name from Dance_Students
This will display the names of the students in the table Art_Students and in the table Dance_Students
i.e all the students that have taken both art and dance classes .Those are Mary and Damon in this
example

Intersection Operator (∩)


Intersection operator is denoted by ∩ symbol and it is used to select common rows (tuples) from two
tables (relations).
Lets say we have two relations R1 and R2 both have same columns and we want to select all those
tuples(rows) that are present in both the relations, then in that case we can apply intersection operation on
these two relations R1 ∩ R2.

Note: Only those rows that are present in both the tables will appear in the result set.

Syntax of Intersection Operator (∩)

table_name1 ∩ table_name2
Intersection Operator (∩) Example
Lets take the same example that we have taken above.
Table 1: COURSE

Course_Id Student_Name Student_Id


--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT

Student_Id Student_Name Student_Age


------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Query:

∏ Student_Name (COURSE) ∩ ∏ Student_Name (STUDENT)


Output:

Student_Name
------------
Aditya
Steve
Paul
Lucy

Set Difference (−)


Set Difference in relational algebra is same set difference operation as in set theory with the constraint
that both relation should have same set of attributes.

The result of set difference operation is tuples, which are present in one relation but are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.
Minus (-): Minus on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible. Minus operator when applied on two relations as R1-R2 will give a relation with
tuples which are in R1 but not in R2. Syntax:
Relation1 - Relation2
Find person who are student but not employee, we can use minus operator like:
STUDENT – EMPLOYEE
Table 1: EMPLOYEE
EMP_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

5 NARESH HISAR 9782918192 22

6 SWETA RANCHI 9852617621 21

4 SURESH DELHI 9156768971 18

Table 2: STUDENT
ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

2 RAMESH GURGAON 9652431543 18

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 9156768971 18

RESULT:
ROLL_NO NAME ADDRESS PHONE AGE

2 RAMESH GURGAON 9652431543 18

3 SUJIT ROHTAK 9156253131 20

Set difference
The set difference operators takes the two sets and returns the values that are in the first set but not the
second set.
An example of set difference is −
Select Student_Name from Art_Students
MINUS
Select Student_Name from Dance_Students
This will display the names of all the students in table Art_Students but not in table Dance_Students i.e
the students who are taking art classes but not dance classes.

Cartesian Product (Χ)


Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'tutorialspoint'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by tutorialspoint.
Cross product between two relations let say A and B, so cross product between A X B will results all
the attributes of A followed by each attribute of B. Each record of A will pairs with every record of B.
below is the example
A B
(Name Age Sex ) (Id Course)
------------------ -------------
Ram 14 M 1 DS
Sona 15 F 2 DBMS
kim 20 M

AXB
Name Age Sex Id Course
---------------------------------
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Note: if A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘n*m’ tuples.

Rename Operation (ρ)


Rename is a unary operation used for renaming attributes of a relation.
ρ (a/b)R will rename the attribute ‘b’ of relation by ‘a’.
The results of relational algebra are also relations but without any name. The rename operation allows us to
rename the output relation. 'rename' operation is denoted with small Greek letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Natural Join (⋈)
Natural join is a binary operator. Natural join between two or more relations will result set of all
combination of tuples where they have equal common attribute.
Let us see below example

Emp Dep
(Name Id Dept_name ) (Dept_name Manager)
------------------------ ---------------------
A 120 IT Sale Y
B 125 HR Prod Z
C 110 Sale IT A
D 111 IT

Emp ⋈ Dep

Name Id Dept_name Manager


-------------------------------
A 120 IT A
C 110 Sale Y
D 111 IT A

Conditional Join
Conditional join works similar to natural join. In natural join, by default condition is equal between
common attribute while in conditional join we can specify the any condition such as greater than, less
than, not equal
Let us see below example
R S
(ID Sex Marks) (ID Sex Marks)
------------------ --------------------
1 F 45 10 M 20
2 F 55 11 M 22
3 F 60 12 M 59
Join between R And S with condition R.marks >= S.marks

R.ID R.Sex R.Marks S.ID S.Sex S.Marks


-----------------------------------------------
1 F 45 10 M 20
1 F 45 11 M 22
2 F 55 10 M 20
2 F 55 11 M 22
3 F 60 10 M 20
3 F 60 11 M 22
3 F 60 12 M 59
The relations used to understand extended operators are STUDENT, STUDENT_SPORTS,
ALL_SPORTS and EMPLOYEE which are shown in Table 1, Table 2, Table 3 and Table 4
respectively.
STUDENT
ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

2 RAMESH GURGAON 9652431543 18

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 9156768971 18

Table 1
STUDENT_SPORTS
ROLL_NO SPORTS

1 Badminton

2 Cricket

2 Badminton

4 Badminton

Table 2
ALL_SPORTS
SPORTS

Badminton

Cricket

Table 3
EMPLOYEE
EMP_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

5 NARESH HISAR 9782918192 22

6 SWETA RANCHI 9852617621 21

4 SURESH DELHI 9156768971 18

Table 4
Intersection (∩): Intersection on two relations R1 and R2 can only be computed if R1 and R2
are union compatible (These two relation should have same number of attributes and
corresponding attributes in two relations have same domain). Intersection operator when
applied on two relations as R1∩R2 will give a relation with tuples which are in R1 as well as
R2. Syntax:
Relation1 ∩ Relation2
Example: Find a person who is student as well as employee- STUDENT ∩ EMPLOYEE
In terms of basic operators (union and minus) :
STUDENT ∩ EMPLOYEE = STUDENT + EMPLOYEE - (STUDENT U EMPLOYEE)
RESULT:
ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

4 SURESH DELHI 9156768971 18

Conditional Join(⋈c): Conditional Join is used when you want to join two or more relation
based on some conditions. Example: Select students whose ROLL_NO is greater than
EMP_NO of employees
STUDENT⋈c STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
In terms of basic operators (cross product and selection) :
σ (STUDENT.ROLL_NO>EMPLOYEE.EMP_NO) (STUDENT×EMPLOYEE)
RESULT:
ROLL_ ADDRE AG EMP_ NAM ADDRE AG
NO NAME SS PHONE E NO E SS PHONE E

RAME GURGA 9652431 9455123


2 SH ON 543 18 1 RAM DELHI 451 18

ROHTA 9156253 9455123


3 SUJIT K 131 20 1 RAM DELHI 451 18

SURE 9156768 9455123


4 SH DELHI 971 18 1 RAM DELHI 451 18
Equijoin(⋈): Equijoin is a special case of conditional join where only equality condition
holds between a pair of attributes. As values of two attributes will be equal in result of equijoin,
only one attribute will be appeared in result.
Example:Select students whose ROLL_NO is equal to EMP_NO of employees
STUDENT⋈STUDENT.ROLL_NO=EMPLOYEE.EMP_NOEMPLOYEE
In terms of basic operators (cross product, selection and projection) :
∏(STUDENT.ROLL_NO, STUDENT.NAME, STUDENT.ADDRESS, STUDENT.PHONE, STUDENT.AGE EMPLOYEE.NAME, EMPLOYEE.ADDRESS, EMPLOYEE.PHONE, EMPLOYEE>AGE) (σ
(STUDENT.ROLL_NO=EMPLOYEE.EMP_NO) (STUDENT×EMPLOYEE))
RESULT:

ROLL_N ADDRES AG ADDRES AG


O NAME S PHONE E NAME S PHONE E

94551234 94551234
1 RAM DELHI 51 18 RAM DELHI 51 18

SURES 91567689 SURES 91567689


4 H DELHI 71 18 H DELHI 71 18

Natural Join(⋈): It is a special case of equijoin in which equality condition hold on all
attributes which have same name in relations R and S (relations on which join operation is
applied). While applying natural join on two relations, there is no need to write equality
condition explicitly. Natural Join will also return the similar attributes only once as their value
will be same in resulting relation.
Example: Select students whose ROLL_NO is equal to ROLL_NO of STUDENT_SPORTS as:
STUDENT⋈STUDENT_SPORTS
In terms of basic operators (cross product, selection and projection) :
∏(STUDENT.ROLL_NO, STUDENT.NAME, STUDENT.ADDRESS, STUDENT.PHONE, STUDENT.AGE STUDENT_SPORTS.SPORTS) (σ (STUDENT.ROLL_NO=STUDENT_SPORTS.ROLL_NO)
(STUDENT×STUDENT_SPORTS))
RESULT:
ROLL_NO NAME ADDRESS PHONE AGE SPORTS

1 RAM DELHI 9455123451 18 Badminton

2 RAMESH GURGAON 9652431543 18 Cricket

2 RAMESH GURGAON 9652431543 18 Badminton

4 SURESH DELHI 9156768971 18 Badminton

Natural Join is by default inner join because the tuples which does not satisfy the conditions of
join does not appear in result set. e.g.; The tuple having ROLL_NO 3 in STUDENT does not
match with any tuple in STUDENT_SPORTS, so it has not been a part of result set.
Left Outer Join(⟕): When applying join on two relations R and S, some tuples of R or S does
not appear in result set which does not satisfy the join conditions. But Left Outer Joins gives
all tuples of R in the result set. The tuples of R which do not satisfy join condition will have
values as NULL for attributes of S.
Example:Select students whose ROLL_NO is greater than EMP_NO of employees and details
of other students as well
STUDENT&#10197STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT
ROLL_ ADDRE AG EMP_ NAM ADDRE AG
NO NAME SS PHONE E NO E SS PHONE E

RAME GURGA 9652431 9455123


2 SH ON 543 18 1 RAM DELHI 451 18

ROHTA 9156253 9455123


3 SUJIT K 131 20 1 RAM DELHI 451 18

SURE 9156768 9455123


4 SH DELHI 971 18 1 RAM DELHI 451 18

9455123 NUL NUL


1 RAM DELHI 451 18 NULL L NULL NULL L

Right Outer Join(⟖): When applying join on two relations R and S, some tuples of R or S
does not appear in result set which does not satisfy the join conditions. But Right Outer Joins
gives all tuples of S in the result set. The tuples of S which do not satisfy join condition will
have values as NULL for attributes of R.
Example: Select students whose ROLL_NO is greater than EMP_NO of employees and
details of other Employees as well
STUDENT⟖STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
ROLL_ ADDRE AG EMP_ ADDRE AG
NO NAME SS PHONE E NO NAME SS PHONE E

RAME GURGA 9652431 9455123


2 SH ON 543 18 1 RAM DELHI 451 18

ROHTA 9156253 9455123


3 SUJIT K 131 20 1 RAM DELHI 451 18

SURE 9156768 9455123


4 SH DELHI 971 18 1 RAM DELHI 451 18

NU NARE 9782918
NULL NULL NULL NULL LL 5 SH HISAR 192 22

NU SWET RANCH 9852617


NULL NULL NULL NULL LL 6 A I 621 21

NU SURE 9156768
NULL NULL NULL NULL LL 4 SH DELHI 971 18
Full Outer Join(⟗): When applying join on two relations R and S, some tuples of R or S does
not appear in result set which does not satisfy the join conditions. But Full Outer Joins gives all
tuples of S and all tuples of R in the result set. The tuples of S which do not satisfy join
condition will have values as NULL for attributes of R and vice versa.
Example:Select students whose ROLL_NO is greater than EMP_NO of employees and details
of other Employees as well and other Students as well
STUDENT⟗STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
ROLL_ ADDRE AG EMP_ ADDRE AG
NO NAME SS PHONE E NO NAME SS PHONE E

RAME GURGA 9652431 9455123


2 SH ON 543 18 1 RAM DELHI 451 18

ROHTA 9156253 9455123


3 SUJIT K 131 20 1 RAM DELHI 451 18

SURE 9156768 9455123


4 SH DELHI 971 18 1 RAM DELHI 451 18

NU NARE 9782918
NULL NULL NULL NULL LL 5 SH HISAR 192 22

NU SWET RANCH 9852617


NULL NULL NULL NULL LL 6 A I 621 21

NU SURE 9156768
NULL NULL NULL NULL LL 4 SH DELHI 971 18

9455123 NU
1 RAM DELHI 451 18 NULL NULL NULL NULL LL

Division Operator (÷): Division operator A÷B can be applied if and only if:
 Attributes of B is proper subset of Attributes of A.
 The relation returned by division operator will have attributes = (All attributes of A –
All Attributes of B)
 The relation returned by division operator will return those tuples from relation A
which are associated to every B’s tuple.
Consider the relation STUDENT_SPORTS and ALL_SPORTS given in Table 2 and Table 3
above.
To apply division operator as
STUDENT_SPORTS÷ ALL_SPORTS
 The operation is valid as attributes in ALL_SPORTS is a proper subset of attributes
in STUDENT_SPORTS.
 The attributes in resulting relation will have attributes {ROLL_NO,SPORTS}-
{SPORTS}=ROLL_NO
 The tuples in resulting relation will have those ROLL_NO which are associated with
all B’s tuple {Badminton, Cricket}. ROLL_NO 1 and 4 are associated to Badminton
only. ROLL_NO 2 is associated to all tuples of B. So the resulting relation will be:
ROLL_NO

Different Types of SQL JOINs


Here are the different types of the JOINs in SQL:

 (INNER) JOIN: Returns records that have matching values in both tables
 LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from
the right table
 RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records
from the left table
 FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table

Relational Calculus
In contrast to Relational Algebra, Relational Calculus is a non-procedural query language, that is, it tells what to
do but never explains how to do it.
Relational calculus exists in two forms −

Tuple Relational Calculus (TRC)


Filtering variable ranges over tuples
Notation − {T | Condition}
Returns all tuples T that satisfies a condition.
For example −
{ T.name | Author(T) AND T.article = 'database' }
Output − Returns tuples with 'name' from Author who has written article on 'database'.
TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).
For example −
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output − The above query will yield the same result as the previous one.

Domain Relational Calculus (DRC)


In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as done in TRC,
mentioned above).
Notation −
{ a , a , a , ..., a | P (a , a , a , ... ,a )}
1 2 3 n 1 2 3 n

Where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example −
{< article, page, subject > | ∈ TutorialsPoint ∧ subject = 'database'}
Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is database.
Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also involves relational
operators.
The expression power of Tuple Relation Calculus and Domain Relation Calculus is equivalent to Relational
Algebra.

SQL
o SQL stands for Structured Query Language. It is used for storing and managing data in relational database
management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to create, read, update and delete relational
databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as their standard database
language.
o SQL allows users to query the database in a number of ways, using English-like statements.

Rules:
SQL follows the following rules:

o Structure query language is not case sensitive. Generally, keywords of SQL are written in uppercase.
o Statements of SQL are dependent on text lines. We can use a single SQL statement on one or multiple text line.
o Using the SQL statements, you can perform most of the actions in a database.
o SQL depends on tuple relational calculus and relational algebra.

SQL process:
o When an SQL command is executing for any RDBMS, then the system figure out the best way to carry out the
request and the SQL engine determines that how to interpret the task.
o In the process, various components are included. These components can be optimization Engine, Query engine,
Query dispatcher, classic, etc.
o All the non-SQL queries are handled by the classic query engine, but SQL query engine won't handle logical files.
SQL is a language to operate databases; it includes database creation, deletion, fetching rows, modifying rows,
etc. SQL is an ANSI (American National Standards Institute) standard language, but there are many different
versions of the SQL language.

Purpose of the Query Optimizer


The optimizer attempts to generate the most optimal execution plan for a SQL statement.

The optimizer choose the plan with the lowest cost among all considered candidate plans. The optimizer uses available
statistics to calculate cost. For a specific query in a given environment, the cost computation accounts for factors of query
execution such as I/O, CPU, and communication.

For example, a query might request information about employees who are managers. If the optimizer statistics indicate
that 80% of employees are managers, then the optimizer may decide that a full table scan is most efficient. However, if
statistics indicate that very few employees are managers, then reading an index followed by a table access by rowid may be
more efficient than a full table scan.

Because the database has many internal statistics and tools at its disposal, the optimizer is usually in a better position than
the user to determine the optimal method of statement execution. For this reason, all SQL statements use the optimizer.

What is SQL?
SQL is Structured Query Language, which is a computer language for storing, manipulating and retrieving data
stored in a relational database.
SQL is the standard language for Relational Database System. All the Relational Database Management Systems
(RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL Server use SQL as their standard
database language.
Also, they are using different dialects, such as −

 MS SQL Server using T-SQL,


 Oracle using PL/SQL,
 MS Access version of SQL is called JET SQL (native format) etc.

Why SQL?
SQL is widely popular because it offers the following advantages −
 Allows users to access data in the relational database management systems.
 Allows users to describe the data.
 Allows users to define the data in a database and manipulate that data.
 Allows to embed within other languages using SQL modules, libraries & pre-compilers.
 Allows users to create and drop databases and tables.
 Allows users to create view, stored procedure, functions in a database.
 Allows users to set permissions on tables, procedures and views.

A Brief History of SQL


 1970 − Dr. Edgar F. "Ted" Codd of IBM is known as the father of relational databases. He described a
relational model for databases.
 1974 − Structured Query Language appeared.
 1978 − IBM worked to develop Codd's ideas and released a product named System/R.
 1986 − IBM developed the first prototype of relational database and standardized by ANSI. The first
relational database was released by Relational Software which later came to be known as Oracle.

SQL Process
When you are executing an SQL command for any RDBMS, the system determines the best way to carry out
your request and SQL engine figures out how to interpret the task.
There are various components included in this process.
These components are −

 Query Dispatcher
 Optimization Engines
 Classic Query Engine
 SQL Query Engine, etc.
A classic query engine handles all the non-SQL queries, but a SQL query engine won't handle logical files.
Following is a simple diagram showing the SQL Architecture −
SQL Commands
The standard SQL commands to interact with relational databases are CREATE, SELECT, INSERT, UPDATE,
DELETE and DROP. These commands can be classified into the following groups based on their nature −

Types of SQL Commands


There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.
1. Data Definition Language (DDL)
o DDL changes the structure of the table like creating a table, deleting a table, altering a table, etc.
o All the command of DDL are auto-committed that means it permanently save all the changes in the
database.

Here are some commands that come under DDL:

o CREATE
o ALTER
o DROP
o TRUNCATE

a. CREATE It is used to create a new table in the database.

Syntax:

1. CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);

Example:

1. CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DOB DATE);

b. DROP: It is used to delete both the structure and record stored in the table.

Syntax
1. DROP TABLE table_name;

Example

1. DROP TABLE EMPLOYEE;

c. ALTER: It is used to alter the structure of the database. This change could be either to modify the
characteristics of an existing attribute or probably to add a new attribute.

Syntax:

To add a new column in the table

1. ALTER TABLE table_name ADD column_name COLUMN-definition;

To modify existing column in the table:

1. ALTER TABLE table_name MODIFY(column_definitions....);

EXAMPLE

1. ALTER TABLE STU_DETAILS ADD(ADDRESS VARCHAR2(20));


2. ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));

d. TRUNCATE: It is used to delete all the rows from the table and free the space containing the table.

Syntax:

1. TRUNCATE TABLE table_name;

Example:

1. TRUNCATE TABLE EMPLOYEE;


2. Data Manipulation Language
o DML commands are used to modify the database. It is responsible for all form of changes in the database.
o The command of DML is not auto-committed that means it can't permanently save all the changes in the
database. They can be rollback.

Here are some commands that come under DML:

o INSERT
o UPDATE
o DELETE

a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of a table.

Syntax:
1. INSERT INTO TABLE_NAME
2. (col1, col2, col3,.... col N)
3. VALUES (value1, value2, value3, .... valueN);

Or

1. INSERT INTO TABLE_NAME


2. VALUES (value1, value2, value3, .... valueN);

For example:

1. INSERT INTO javatpoint (Author, Subject) VALUES ("Sonoo", "DBMS");

b. UPDATE: This command is used to update or modify the value of a column in the table.

Syntax:

1. UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHERE CONDITIO


N]

For example:

1. UPDATE students
2. SET User_Name = 'Sonoo'
3. WHERE Student_Id = '3'

c. DELETE: It is used to remove one or more row from a table.

Syntax:

1. DELETE FROM table_name [WHERE condition];

For example:

1. DELETE FROM javatpoint


2. WHERE Author="Sonoo";
3. Data Control Language
DCL commands are used to grant and take back authority from any database user.

Here are some commands that come under DCL:

o Grant
o Revoke

a. Grant: It is used to give user access privileges to a database.

Example
1. GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_USER;

b. Revoke: It is used to take back permissions from the user.

Example

1. REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;


4. Transaction Control Language
TCL commands can only use with DML commands like INSERT, DELETE and UPDATE only.

These operations are automatically committed in the database that's why they cannot be used while
creating tables or dropping them.

Here are some commands that come under TCL:

o COMMIT
o ROLLBACK
o SAVEPOINT

a. Commit: Commit command is used to save all the transactions to the database.

Syntax:

1. COMMIT;

Example:

1. DELETE FROM CUSTOMERS


2. WHERE AGE = 25;
3. COMMIT;

b. Rollback: Rollback command is used to undo transactions that have not already been saved to the
database.

Syntax:

1. ROLLBACK;

Example:

1. DELETE FROM CUSTOMERS


2. WHERE AGE = 25;
3. ROLLBACK;

c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling back the entire
transaction.

Syntax:
1. SAVEPOINT SAVEPOINT_NAME;
5. Data Query Language
DQL is used to fetch the data from the database.

It uses only one command:

o SELECT

a. SELECT: This is the same as the projection operation of relational algebra. It is used to select the
attribute based on the condition described by WHERE clause.

Syntax:

1. SELECT expressions
2. FROM TABLES
3. WHERE conditions;

For example:

1. SELECT emp_name
2. FROM employee
3. WHERE age > 20;

Transaction Control Language :

1. COMMIT-
COMMIT in SQL is a transaction control language that is used to permanently save the
changes done in the transaction in tables/databases. The database cannot regain its previous
state after its execution of commit.
Example: Consider the following STAFF table with records:
STAFF

sql>
SELECT *
FROM Staff
WHERE Allowance = 400;

sql> COMMIT;
Output:

So, the SELECT statement produced the output consisting of three rows.
2. ROLLBACK
ROLLBACK in SQL is a transactional control language that is used to undo the transactions
that have not been saved in the database. The command is only been used to undo changes
since the last COMMIT.
Example: Consider the following STAFF table with records:
STAFF

sql>
SELECT *
FROM EMPLOYEES
WHERE ALLOWANCE = 400;

sql> ROLLBACK;
Output:

So, the SELECT statement produced the same output with the ROLLBACK command.

Difference between COMMIT and ROLLBACK


COMMIT ROLLBACK

COMMIT permanently saves the changes made by the ROLLBACK undo the changes made by the
1. current transaction. current transaction.

2.
The transaction can not undo changes after COMMIT Transaction reaches its previous state after
COMMIT ROLLBACK

execution. ROLLBACK.

When the transaction is aborted, ROLLBACK


3. When the transaction is successful, COMMIT is applied. occurs.

Char vs Varchar
The basic difference between Char and Varchar is that: char stores only fixed-length character
string data types whereas varchar stores variable-length string where an upper limit of length is
specified.

Prime Attribute and Non Prime Attribute :


An attribute that is not part of any candidate key is known as non-prime attribute. An attribute that is a part of
one of the candidate keys is known as prime attribute.

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database

DBMS Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to establish and
identify relationships between tables.

For example: In Student table, ID is used as a key because it is unique for each student. In PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.

Types of key:
1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain
multiple keys, as we saw in the PERSON table. The key which is most suitable from those lists becomes a
primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the EMPLOYEE
table, we can even select License_Number and Passport_Number as primary keys since they are also
unique.
o For each entity, the primary key selection is based on requirements and developers.

2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys
are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like
SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate
key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also
be a key.

The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another table.
o Every employee works in a specific department in a company, and employee and department are two
different entities. So we can't store the department's information in the employee table. That's why we link
these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the EMPLOYEE
table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in
a relation. These attributes or combinations of the attributes are called the candidate keys. One key is
chosen as the primary key from these candidate keys, and the remaining candidate key, if it exists, is
termed the alternate key. In other words, the total number of the alternate keys is the total number of
candidate keys minus the primary key. The alternate key may or may not exist. If there is only one
candidate key in a relation, it does not have an alternate key.

For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate
keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No,
acts as the Alternate key.

6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is
also known as Concatenated Key.

For example, in employee relations, we assume that an employee may be assigned multiple roles, and
an employee may work on multiple projects simultaneously. So the primary key will be composed of all
three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes act as a
composite key since the primary key comprises more than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created when
a primary key is large and complex and has no relationship with many other relations. The data values of
the artificial keys are usually numbered in a serial order.

For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add a new virtual attribute to identify each tuple in the
relation uniquely.

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database

Normalization is a process of organizing the data in database to avoid data redundancy, insertion
anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss
normal forms with examples.

Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are – Insertion,
update and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that
has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name,
emp_address for storing employee’s address and emp_dept for storing the department details in which the
employee works. At some point of time the table looks like this:

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when a table is not normalized.

Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the same in
two rows or the data will become inconsistent. If somehow, the correct address gets updated in one
department but not in other then as per the database, Rick would be having two different addresses, which
is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if emp_dept field
doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the
rows that are having emp_dept as D890 would also delete the information of employee Maggie since she
is assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we will discuss about
normalization.
Normalization
Here are the most commonly used normal forms:

 First normal form(1NF)


 Second normal form(2NF)
 Third normal form(3NF)
 Boyce & Codd normal form (BCNF)

First normal form (1NF)


As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It should
hold only atomic values.

Example: Suppose a company wants to store the names and contact details of its employees. It creates a
table that looks like this:

emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

8812121212

102 Jon Kanpur

9900012222

103 Ron Chennai 7778881212

9990000123

104 Lester Bangalore


8123450987

Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same
field as you can see in the table above.

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the
emp_mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we should have the data like this:
emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

Second normal form (2NF)


A table is said to be in 2NF if both the following conditions hold:

 Table is in 1NF (First normal form)


 No non-prime attribute is dependent on the proper subset of any candidate key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Suppose a school wants to store the data of teachers and the subjects they teach. They create a
table that looks like this: Since a teacher can teach more than one subjects, the table can have multiple
rows for a same teacher.

teacher_id Subject teacher_age


111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non
prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate key.
This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset
of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_id teacher_age

111 38

222 38

333 40

teacher_subject table:
teacher_id subject

111 Maths

111 Physics

222 Biology

333 Physics

333 Chemistry

Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)


A table design is said to be in 3NF if both the following conditions hold:

 Table must be in 2NF


 Transitive functional dependency of non-prime attribute on any super key should be
removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:

 X is a super key of table


 Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Suppose a company wants to store the complete address of each employee, they create a table
named employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on


Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate
keys.

Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id
that makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super
key (emp_id). This violates the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to remove the transitive
dependency:

employee table:

emp_id emp_name emp_zip

1001 John 282005


1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999

employee_zip table:

emp_zip emp_state emp_city emp_district

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)


It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table
complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super
key of the table.
Example: Suppose there is a company wherein employees work in more than one department. They
store the data like this:

emp_id emp_nationality emp_dept dept_type dept_no_of_emp

1001 Austrian Production and planning D001 200

1001 Austrian Stores D001 250

1002 American design and technical support D134 100

1002 American Purchasing department D134 600

Functional dependencies in the table above:


emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:

emp_id emp_nationality

1001 Austrian

1002 American

emp_dept table:
emp_dept dept_type dept_no_of_emp

Production and planning D001 200

Stores D001 250

design and technical support D134 100

Purchasing department D134 600

emp_dept_mapping table:

emp_id emp_dept

1001 Production and planning

1001 Stores

1002 design and technical support

1002 Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional dependencies left side part is a key.

Functional dependency in DBMS


The attributes of a table is said to be dependent on each other when an attribute of a table uniquely
identifies another attribute of the same table.

For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id
attribute uniquely identifies the Stu_Name attribute of student table because if we know the student id we
can tell the student name associated with it. This is known as functional dependency and can be written as
Stu_Id->Stu_Name or in words we can say Stu_Name is functionally dependent on Stu_Id.

Formally:
If column A of a table uniquely identifies the column B of same table then it can represented as A->B
(Attribute B is functionally dependent on attribute A)

DBMS Schema
Definition of schema: Design of a database is called the schema. Schema is of three types: Physical
schema, logical schema and view schema.

For example: In the following diagram, we have a schema that shows the relationship between three
tables: Course, Student and Section. The diagram only shows the design of the database, it doesn’t show
the data present in those tables. Schema is only a structural view(design) of a database as shown in the
diagram below.

The design of a database at physical level is called physical schema, how the data stored in blocks of
storage is described at this level.

Design of database at logical level is called logical schema, programmers and database administrators
work at this level, at this level data can be described as certain types of data records gets stored in data
structures, however the internal details such as implementation of data structure is hidden at this level
(available at physical level).

Design of database at view level is called view schema. This generally describes end user interaction with
database systems.

DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is called instance of
database. Database schema defines the variable declarations in tables that belong to a particular database;
the value of these variables at a moment of time is called the instance of that database.

You might also like