Unit 1-DBP
Unit 1-DBP
What is Data?
In simple words, data can be facts related to any object in consideration. For example, your name,
age, height, weight, etc. are some data related to you. A picture, image, file, pdf, etc. can also be
considered data.
Define Database :
A database is an organized collection of structured information, or data, typically stored
electronically in a computer system. A database is usually controlled by a database management
system (DBMS). ... The data can then be easily accessed, managed, modified, updated, controlled,
and organized.
ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This
model is used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy to
design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-
relationship diagram.
For example, Suppose we design a school database. In this database, the student will be an entity
with attributes like address, name, id, age, etc. The address can be another entity with attributes
like city, street name, pin code, etc and there will be a relationship between them.
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.
a. Weak Entity
10 SecQL CREATE TABLE
An entity that depends on another entity called a weak entity. The weak entity doesn't contain any
key attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary
key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute.
The double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like
Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the
relationship.
a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as one to one relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the right associates
with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the right associates
with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
Participation Constraints
Total Participation − Each entity is involved in the relationship. Total participation is
represented by double lines.
Partial participation − Not all entities are involved in the relationship. Partial participation is
represented by single lines.
Example-Total
Here,
Double line between the entity set “Student” and relationship set “Enrolled in” signifies total participation.
It specifies that each student must be enrolled in at least one course.
Example-
Here,
Single line between the entity set “Course” and relationship set “Enrolled in” signifies partial participation.
It specifies that there might exist some courses for which no enrollments are made.
The domain of Marital Status has a set of possibilities: Married, Single, Divorced.
The domain of Shift has the set of all possible days: {Mon, Tue, Wed…}.
The domain of Salary is the set of all floating-point numbers greater than 0 and less than
200,000.
The domain of First Name is the set of character strings that represents names of people.
Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a
domain, dom(Ai)
Relational instance: In the relational database system, the relational instance is represented by a
finite set of tuples. Relation instances do not have duplicate tuples.
Relational schema: A relational schema contains the name of the relation and name of all
columns or attributes.
Relational key: In the relational key, each row has one or more attributes. It can identify the row
in the relation uniquely.
o In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the
attributes.
o The instance of schema STUDENT has 5 tuples.
o t3 = <Laxman, 33289, 8583287182, Gurugram, 20>
Properties of Relations
Keys :
atomic value: each value in the domain is indivisible as far as the relational model is
concernedattribute: principle storage unit in a database
domain: the original sets of atomic values used to model data; a set of acceptable values that a
column is allowed to contain
file:see relation
relation: a subset of the Cartesian product of a list of domains characterized by a name; the technical
term for table or file
table:see relation
Types:
1)Unary Relationship:
Eg : PERSON is Married to
PERSON is entity
Married to is relationship
2)Binary Relationship :
3)Ternary Relationship :
First Convert each entity and relationship to tables. Person table corresponds to Person Entity with
key as Per-Id. Similarly Passport table corresponds to Passport Entity with key as Pass-No. HashTable
represents relationship between Person and Passport (Which person has which passport). So it will take
attribute Per-Id from Person and Pass-No from Passport.
PR3 –
Table 1
As we can see from Table 1, each Per-Id and Pass-No has only one entry in Hashtable. So we can
merge all three tables into 1 with attributes shown in Table 2. Each Per-Id will be unique and not null.
So it will be the key. Pass-No can’t be key because for some person, it can be NULL.
Table 2
Case 2: Binary Relationship with 1:1 cardinality and partial participation of both entities
A male marries 0 or 1 female and vice versa as well. So it is 1:1 cardinality with partial participation
constraint from both. First Convert each entity and relationship to tables. Male table corresponds to
Male Entity with key as M-Id. Similarly Female table corresponds to Female Entity with key as F-Id.
Marry Table represents relationship between Male and Female (Which Male marries which female). So
it will take attribute M-Id from Male and F-Id from Female.
M1 – M1 F2 F1 –
M2 – M2 F1 F2 –
M3 – F3 –
Table 3
As we can see from Table 3, some males and some females do not marry. If we merge 3 tables into 1,
for some M-Id, F-Id will be NULL. So there is no attribute which is always not NULL. So we can’t
merge all three tables into 1. We can convert into 2 tables. In table 4, M-Id who are married will have
F-Id associated. For others, it will be NULL. Table 5 will have information of all females. Primary
Keys have been underlined.
Table 4
In this scenario, every student can enroll only in one elective course but for an elective course there can
be more than one student. First Convert each entity and relationship to tables. Student table
corresponds to Student Entity with key as S-Id. Similarly Elective_Course table corresponds to
Elective_Course Entity with key as E-Id. Enrolls Table represents relationship between Student and
Elective_Course (Which student enrolls in which course). So it will take attribute S-Id from and Student
E-Id from Elective_Course.
S-Id Other Student Attribute S-Id E-Id E-Id Other Elective CourseAttribute
S1 – S1 E1 E1 –
S2 – S2 E2 E2 –
S3 – S3 E1 E3 –
S4 – S4 E1
Table 6
As we can see from Table 6, S-Id is not repeating in Enrolls Table. So it can be considered as a key of
Enrolls table. Both Student and Enrolls Table’s key is same; we can merge it as a single table. The
resultant tables are shown in Table 7 and Table 8. Primary Keys have been underlined.
Table 8
Case 4: Binary Relationship with m: n cardinality
In this scenario, every student can enroll in more than 1 compulsory course and for a compulsory course
there can be more than 1 student. First Convert each entity and relationship to tables. Student table
corresponds to Student Entity with key as S-Id. Similarly Compulsory_Courses table corresponds to
Compulsory Courses Entity with key as C-Id. Enrolls Table represents relationship between Student
and Compulsory_Courses (Which student enrolls in which course). So it will take attribute S -Id from
Person and C-Id from Compulsory_Courses.
S-Id Other Student Attribute S-Id C-Id C-Id Other Compulsory CourseAttribute
S1 – S1 C1 C1 –
S2 – S1 C2 C2 –
S3 – S3 C1 C3 –
S4 – S4 C3 C4 –
S4 C2
S3 C3
Table 9
As we can see from Table 9, S-Id and C-Id both are repeating in Enrolls Table. But its combination is
unique; so it can be considered as a key of Enrolls table. All tables’ keys are different, these can’t be
merged. Primary Keys of all tables have been underlined.
Case 5: Binary Relationship with weak entity
In this scenario, an employee can have many dependents and one dependent can depend on one
employee. A dependent does not have any existence without an employee (e.g; you as a child can be
dependent of your father in his company). So it will be a weak entity and its participation will always
be total. Weak Entity does not have key of its own. So its key will be combination of key of its
identifying entity (E-Id of Employee in this case) and its partial key (D-Name).
First Convert each entity and relationship to tables. Employee table corresponds to Employee Entity
with key as E-Id. Similarly Dependents table corresponds to Dependent Entity with key as D-Name
and E-Id. HashTable represents relationship between Employee and Dependents (Which employee has
which dependents). So it will take attribute E-Id from Employee and D-Name from Dependents.
E-Id Other Employee Attribute E-Id D-Name D-Name E-Id Other DependentsAttribute
E1 – E1 RAM RAM E1 –
E2 – E1 SRINI SRINI E1 –
E3 – E2 RAM RAM E2 –
E3 ASHISH ASHISH E3 –
Table 10
As we can see from Table 10, E-Id, D-Name is key for Has as well as Dependents Table. So we can
merge these two into 1. So the resultant tables are shown in Tables 11 and 12. Primary Keys of all
tables have been underlined.
E-Id Other Employee Attribute
Table 11
Table 12
ER Model, when conceptualized into diagrams, gives a good overview of entity-relationship, which is easier to
understand. ER diagrams can be mapped to relational schema, that is, it is possible to create relational schema
using ER diagram. We cannot import all the ER constraints into relational model, but an approximate schema can
be generated.
There are several processes and algorithms available to convert ER Diagrams into Relational Schema. Some of
them are automated and some of them are manual. We may focus here on the mapping diagram contents to
relational basics.
ER diagrams mainly comprise of −
Mapping Entity
An entity is a real-world object with some attributes.
Mapping Relationship
A relationship is an association among entities.
Mapping Process
Create table for a relationship.
Add the primary keys of all participating Entities as fields of table with their respective data types.
If relationship has any attribute, add each attribute as field of table.
Declare a primary key composing all the primary keys of participating entities.
Declare all foreign key constraints.
Mapping Process
Create table for weak entity set.
Add all its attributes to table as field.
Add the primary key of identifying entity set.
Declare all foreign key constraints.
Relational Algebra
Relational Algebra is procedural query language, which takes Relation as input and generate relation as
output. Relational algebra mainly provides theoretical foundation for relational databases and SQL.
It uses operators to perform queries. An operator can be either unary or binary. They accept relations as their
input and yield relations as their output. Relational algebra is performed recursively on a relation and
intermediate results are also considered relations.
Relational algebra is a procedural query language that works on relational model. The purpose of a query
language is to retrieve data from database or perform various operations such as insert, update, delete on
the data. When I say that relational algebra is a procedural query language, it means that it tells what data
to be retrieved and how to be retrieved.
On the other hand relational calculus is a non-procedural query language, which means it tells what data
to be retrieved but doesn’t tell how to retrieve it.
Select
Project
Union
Set different
Cartesian product
Rename
We will discuss all these operations in the following sections.
A B C
-------
1 2 4
4 3 4
Project Operation (∏)
It projects column(s) that satisfy a given predicate.
Notation − ∏A , A , A (r)
1 2 n
1 CSE A
2 ECE B
3 MECH B
4 CIVIL A
5 CSE B
Table R2 is as follows −
1 CIVIL A
2 CSE A
3 ECE B
To display all the regno of R1 and R2, use the following command −
∏regno(R1) ∪ ∏regno(R2)
Output
Regno
Union
Union combines two different results obtained by a query into a single result in the form of a table.
However, the results should be similar if union is to be applied on them. Union removes all duplicates, if
any from the data and only displays distinct values. If duplicate values are required in the resultant data,
then UNION ALL is used.
An example of union is −
Select Student_Name from Art_Students
UNION
Select Student_Name from Dance_Students
This will display the names of all the students in the table Art_Students and Dance_Students i.e John,
Mary, Damon and Matt.
Intersection
The intersection operator gives the common data values between the two data sets that are intersected.
The two data sets that are intersected should be similar for the intersection operator to work.
Intersection also removes all duplicates before displaying the result.
An example of intersection is −
Select Student_Name from Art_Students
INTERSECT
Select Student_Name from Dance_Students
This will display the names of the students in the table Art_Students and in the table Dance_Students
i.e all the students that have taken both art and dance classes .Those are Mary and Damon in this
example
Note: Only those rows that are present in both the tables will appear in the result set.
table_name1 ∩ table_name2
Intersection Operator (∩) Example
Lets take the same example that we have taken above.
Table 1: COURSE
Student_Name
------------
Aditya
Steve
Paul
Lucy
The result of set difference operation is tuples, which are present in one relation but are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.
Minus (-): Minus on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible. Minus operator when applied on two relations as R1-R2 will give a relation with
tuples which are in R1 but not in R2. Syntax:
Relation1 - Relation2
Find person who are student but not employee, we can use minus operator like:
STUDENT – EMPLOYEE
Table 1: EMPLOYEE
EMP_NO NAME ADDRESS PHONE AGE
Table 2: STUDENT
ROLL_NO NAME ADDRESS PHONE AGE
RESULT:
ROLL_NO NAME ADDRESS PHONE AGE
Set difference
The set difference operators takes the two sets and returns the values that are in the first set but not the
second set.
An example of set difference is −
Select Student_Name from Art_Students
MINUS
Select Student_Name from Dance_Students
This will display the names of all the students in table Art_Students but not in table Dance_Students i.e
the students who are taking art classes but not dance classes.
AXB
Name Age Sex Id Course
---------------------------------
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Note: if A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘n*m’ tuples.
Emp Dep
(Name Id Dept_name ) (Dept_name Manager)
------------------------ ---------------------
A 120 IT Sale Y
B 125 HR Prod Z
C 110 Sale IT A
D 111 IT
Emp ⋈ Dep
Conditional Join
Conditional join works similar to natural join. In natural join, by default condition is equal between
common attribute while in conditional join we can specify the any condition such as greater than, less
than, not equal
Let us see below example
R S
(ID Sex Marks) (ID Sex Marks)
------------------ --------------------
1 F 45 10 M 20
2 F 55 11 M 22
3 F 60 12 M 59
Join between R And S with condition R.marks >= S.marks
Table 1
STUDENT_SPORTS
ROLL_NO SPORTS
1 Badminton
2 Cricket
2 Badminton
4 Badminton
Table 2
ALL_SPORTS
SPORTS
Badminton
Cricket
Table 3
EMPLOYEE
EMP_NO NAME ADDRESS PHONE AGE
Table 4
Intersection (∩): Intersection on two relations R1 and R2 can only be computed if R1 and R2
are union compatible (These two relation should have same number of attributes and
corresponding attributes in two relations have same domain). Intersection operator when
applied on two relations as R1∩R2 will give a relation with tuples which are in R1 as well as
R2. Syntax:
Relation1 ∩ Relation2
Example: Find a person who is student as well as employee- STUDENT ∩ EMPLOYEE
In terms of basic operators (union and minus) :
STUDENT ∩ EMPLOYEE = STUDENT + EMPLOYEE - (STUDENT U EMPLOYEE)
RESULT:
ROLL_NO NAME ADDRESS PHONE AGE
Conditional Join(⋈c): Conditional Join is used when you want to join two or more relation
based on some conditions. Example: Select students whose ROLL_NO is greater than
EMP_NO of employees
STUDENT⋈c STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
In terms of basic operators (cross product and selection) :
σ (STUDENT.ROLL_NO>EMPLOYEE.EMP_NO) (STUDENT×EMPLOYEE)
RESULT:
ROLL_ ADDRE AG EMP_ NAM ADDRE AG
NO NAME SS PHONE E NO E SS PHONE E
94551234 94551234
1 RAM DELHI 51 18 RAM DELHI 51 18
Natural Join(⋈): It is a special case of equijoin in which equality condition hold on all
attributes which have same name in relations R and S (relations on which join operation is
applied). While applying natural join on two relations, there is no need to write equality
condition explicitly. Natural Join will also return the similar attributes only once as their value
will be same in resulting relation.
Example: Select students whose ROLL_NO is equal to ROLL_NO of STUDENT_SPORTS as:
STUDENT⋈STUDENT_SPORTS
In terms of basic operators (cross product, selection and projection) :
∏(STUDENT.ROLL_NO, STUDENT.NAME, STUDENT.ADDRESS, STUDENT.PHONE, STUDENT.AGE STUDENT_SPORTS.SPORTS) (σ (STUDENT.ROLL_NO=STUDENT_SPORTS.ROLL_NO)
(STUDENT×STUDENT_SPORTS))
RESULT:
ROLL_NO NAME ADDRESS PHONE AGE SPORTS
Natural Join is by default inner join because the tuples which does not satisfy the conditions of
join does not appear in result set. e.g.; The tuple having ROLL_NO 3 in STUDENT does not
match with any tuple in STUDENT_SPORTS, so it has not been a part of result set.
Left Outer Join(⟕): When applying join on two relations R and S, some tuples of R or S does
not appear in result set which does not satisfy the join conditions. But Left Outer Joins gives
all tuples of R in the result set. The tuples of R which do not satisfy join condition will have
values as NULL for attributes of S.
Example:Select students whose ROLL_NO is greater than EMP_NO of employees and details
of other students as well
STUDENT⟕STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT
ROLL_ ADDRE AG EMP_ NAM ADDRE AG
NO NAME SS PHONE E NO E SS PHONE E
Right Outer Join(⟖): When applying join on two relations R and S, some tuples of R or S
does not appear in result set which does not satisfy the join conditions. But Right Outer Joins
gives all tuples of S in the result set. The tuples of S which do not satisfy join condition will
have values as NULL for attributes of R.
Example: Select students whose ROLL_NO is greater than EMP_NO of employees and
details of other Employees as well
STUDENT⟖STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
ROLL_ ADDRE AG EMP_ ADDRE AG
NO NAME SS PHONE E NO NAME SS PHONE E
NU NARE 9782918
NULL NULL NULL NULL LL 5 SH HISAR 192 22
NU SURE 9156768
NULL NULL NULL NULL LL 4 SH DELHI 971 18
Full Outer Join(⟗): When applying join on two relations R and S, some tuples of R or S does
not appear in result set which does not satisfy the join conditions. But Full Outer Joins gives all
tuples of S and all tuples of R in the result set. The tuples of S which do not satisfy join
condition will have values as NULL for attributes of R and vice versa.
Example:Select students whose ROLL_NO is greater than EMP_NO of employees and details
of other Employees as well and other Students as well
STUDENT⟗STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
ROLL_ ADDRE AG EMP_ ADDRE AG
NO NAME SS PHONE E NO NAME SS PHONE E
NU NARE 9782918
NULL NULL NULL NULL LL 5 SH HISAR 192 22
NU SURE 9156768
NULL NULL NULL NULL LL 4 SH DELHI 971 18
9455123 NU
1 RAM DELHI 451 18 NULL NULL NULL NULL LL
Division Operator (÷): Division operator A÷B can be applied if and only if:
Attributes of B is proper subset of Attributes of A.
The relation returned by division operator will have attributes = (All attributes of A –
All Attributes of B)
The relation returned by division operator will return those tuples from relation A
which are associated to every B’s tuple.
Consider the relation STUDENT_SPORTS and ALL_SPORTS given in Table 2 and Table 3
above.
To apply division operator as
STUDENT_SPORTS÷ ALL_SPORTS
The operation is valid as attributes in ALL_SPORTS is a proper subset of attributes
in STUDENT_SPORTS.
The attributes in resulting relation will have attributes {ROLL_NO,SPORTS}-
{SPORTS}=ROLL_NO
The tuples in resulting relation will have those ROLL_NO which are associated with
all B’s tuple {Badminton, Cricket}. ROLL_NO 1 and 4 are associated to Badminton
only. ROLL_NO 2 is associated to all tuples of B. So the resulting relation will be:
ROLL_NO
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from
the right table
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records
from the left table
FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table
Relational Calculus
In contrast to Relational Algebra, Relational Calculus is a non-procedural query language, that is, it tells what to
do but never explains how to do it.
Relational calculus exists in two forms −
Where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example −
{< article, page, subject > | ∈ TutorialsPoint ∧ subject = 'database'}
Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is database.
Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also involves relational
operators.
The expression power of Tuple Relation Calculus and Domain Relation Calculus is equivalent to Relational
Algebra.
SQL
o SQL stands for Structured Query Language. It is used for storing and managing data in relational database
management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to create, read, update and delete relational
databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as their standard database
language.
o SQL allows users to query the database in a number of ways, using English-like statements.
Rules:
SQL follows the following rules:
o Structure query language is not case sensitive. Generally, keywords of SQL are written in uppercase.
o Statements of SQL are dependent on text lines. We can use a single SQL statement on one or multiple text line.
o Using the SQL statements, you can perform most of the actions in a database.
o SQL depends on tuple relational calculus and relational algebra.
SQL process:
o When an SQL command is executing for any RDBMS, then the system figure out the best way to carry out the
request and the SQL engine determines that how to interpret the task.
o In the process, various components are included. These components can be optimization Engine, Query engine,
Query dispatcher, classic, etc.
o All the non-SQL queries are handled by the classic query engine, but SQL query engine won't handle logical files.
SQL is a language to operate databases; it includes database creation, deletion, fetching rows, modifying rows,
etc. SQL is an ANSI (American National Standards Institute) standard language, but there are many different
versions of the SQL language.
The optimizer choose the plan with the lowest cost among all considered candidate plans. The optimizer uses available
statistics to calculate cost. For a specific query in a given environment, the cost computation accounts for factors of query
execution such as I/O, CPU, and communication.
For example, a query might request information about employees who are managers. If the optimizer statistics indicate
that 80% of employees are managers, then the optimizer may decide that a full table scan is most efficient. However, if
statistics indicate that very few employees are managers, then reading an index followed by a table access by rowid may be
more efficient than a full table scan.
Because the database has many internal statistics and tools at its disposal, the optimizer is usually in a better position than
the user to determine the optimal method of statement execution. For this reason, all SQL statements use the optimizer.
What is SQL?
SQL is Structured Query Language, which is a computer language for storing, manipulating and retrieving data
stored in a relational database.
SQL is the standard language for Relational Database System. All the Relational Database Management Systems
(RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL Server use SQL as their standard
database language.
Also, they are using different dialects, such as −
Why SQL?
SQL is widely popular because it offers the following advantages −
Allows users to access data in the relational database management systems.
Allows users to describe the data.
Allows users to define the data in a database and manipulate that data.
Allows to embed within other languages using SQL modules, libraries & pre-compilers.
Allows users to create and drop databases and tables.
Allows users to create view, stored procedure, functions in a database.
Allows users to set permissions on tables, procedures and views.
SQL Process
When you are executing an SQL command for any RDBMS, the system determines the best way to carry out
your request and SQL engine figures out how to interpret the task.
There are various components included in this process.
These components are −
Query Dispatcher
Optimization Engines
Classic Query Engine
SQL Query Engine, etc.
A classic query engine handles all the non-SQL queries, but a SQL query engine won't handle logical files.
Following is a simple diagram showing the SQL Architecture −
SQL Commands
The standard SQL commands to interact with relational databases are CREATE, SELECT, INSERT, UPDATE,
DELETE and DROP. These commands can be classified into the following groups based on their nature −
o CREATE
o ALTER
o DROP
o TRUNCATE
Syntax:
Example:
b. DROP: It is used to delete both the structure and record stored in the table.
Syntax
1. DROP TABLE table_name;
Example
c. ALTER: It is used to alter the structure of the database. This change could be either to modify the
characteristics of an existing attribute or probably to add a new attribute.
Syntax:
EXAMPLE
d. TRUNCATE: It is used to delete all the rows from the table and free the space containing the table.
Syntax:
Example:
o INSERT
o UPDATE
o DELETE
a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of a table.
Syntax:
1. INSERT INTO TABLE_NAME
2. (col1, col2, col3,.... col N)
3. VALUES (value1, value2, value3, .... valueN);
Or
For example:
b. UPDATE: This command is used to update or modify the value of a column in the table.
Syntax:
For example:
1. UPDATE students
2. SET User_Name = 'Sonoo'
3. WHERE Student_Id = '3'
Syntax:
For example:
o Grant
o Revoke
Example
1. GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_USER;
Example
These operations are automatically committed in the database that's why they cannot be used while
creating tables or dropping them.
o COMMIT
o ROLLBACK
o SAVEPOINT
a. Commit: Commit command is used to save all the transactions to the database.
Syntax:
1. COMMIT;
Example:
b. Rollback: Rollback command is used to undo transactions that have not already been saved to the
database.
Syntax:
1. ROLLBACK;
Example:
c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling back the entire
transaction.
Syntax:
1. SAVEPOINT SAVEPOINT_NAME;
5. Data Query Language
DQL is used to fetch the data from the database.
o SELECT
a. SELECT: This is the same as the projection operation of relational algebra. It is used to select the
attribute based on the condition described by WHERE clause.
Syntax:
1. SELECT expressions
2. FROM TABLES
3. WHERE conditions;
For example:
1. SELECT emp_name
2. FROM employee
3. WHERE age > 20;
1. COMMIT-
COMMIT in SQL is a transaction control language that is used to permanently save the
changes done in the transaction in tables/databases. The database cannot regain its previous
state after its execution of commit.
Example: Consider the following STAFF table with records:
STAFF
sql>
SELECT *
FROM Staff
WHERE Allowance = 400;
sql> COMMIT;
Output:
So, the SELECT statement produced the output consisting of three rows.
2. ROLLBACK
ROLLBACK in SQL is a transactional control language that is used to undo the transactions
that have not been saved in the database. The command is only been used to undo changes
since the last COMMIT.
Example: Consider the following STAFF table with records:
STAFF
sql>
SELECT *
FROM EMPLOYEES
WHERE ALLOWANCE = 400;
sql> ROLLBACK;
Output:
So, the SELECT statement produced the same output with the ROLLBACK command.
COMMIT permanently saves the changes made by the ROLLBACK undo the changes made by the
1. current transaction. current transaction.
2.
The transaction can not undo changes after COMMIT Transaction reaches its previous state after
COMMIT ROLLBACK
execution. ROLLBACK.
Char vs Varchar
The basic difference between Char and Varchar is that: char stores only fixed-length character
string data types whereas varchar stores variable-length string where an upper limit of length is
specified.
DBMS Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to establish and
identify relationships between tables.
For example: In Student table, ID is used as a key because it is unique for each student. In PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.
Types of key:
1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain
multiple keys, as we saw in the PERSON table. The key which is most suitable from those lists becomes a
primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the EMPLOYEE
table, we can even select License_Number and Passport_Number as primary keys since they are also
unique.
o For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys
are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like
SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate
key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also
be a key.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another table.
o Every employee works in a specific department in a company, and employee and department are two
different entities. So we can't store the department's information in the employee table. That's why we link
these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the EMPLOYEE
table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in
a relation. These attributes or combinations of the attributes are called the candidate keys. One key is
chosen as the primary key from these candidate keys, and the remaining candidate key, if it exists, is
termed the alternate key. In other words, the total number of the alternate keys is the total number of
candidate keys minus the primary key. The alternate key may or may not exist. If there is only one
candidate key in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate
keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No,
acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is
also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple roles, and
an employee may work on multiple projects simultaneously. So the primary key will be composed of all
three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes act as a
composite key since the primary key comprises more than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created when
a primary key is large and complex and has no relationship with many other relations. The data values of
the artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add a new virtual attribute to identify each tuple in the
relation uniquely.
Normalization is a process of organizing the data in database to avoid data redundancy, insertion
anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss
normal forms with examples.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are – Insertion,
update and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that
has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name,
emp_address for storing employee’s address and emp_dept for storing the department details in which the
employee works. At some point of time the table looks like this:
The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the same in
two rows or the data will become inconsistent. If somehow, the correct address gets updated in one
department but not in other then as per the database, Rick would be having two different addresses, which
is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if emp_dept field
doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the
rows that are having emp_dept as D890 would also delete the information of employee Maggie since she
is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will discuss about
normalization.
Normalization
Here are the most commonly used normal forms:
Example: Suppose a company wants to store the names and contact details of its employees. It creates a
table that looks like this:
8812121212
9900012222
9990000123
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same
field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the
emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
emp_id emp_name emp_address emp_mobile
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they teach. They create a
table that looks like this: Since a teacher can teach more than one subjects, the table can have multiple
rows for a same teacher.
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non
prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate key.
This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset
of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create a table
named employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id
that makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super
key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the transitive
dependency:
employee table:
employee_zip table:
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept dept_type dept_no_of_emp
emp_dept_mapping table:
emp_id emp_dept
1001 Stores
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id
attribute uniquely identifies the Stu_Name attribute of student table because if we know the student id we
can tell the student name associated with it. This is known as functional dependency and can be written as
Stu_Id->Stu_Name or in words we can say Stu_Name is functionally dependent on Stu_Id.
Formally:
If column A of a table uniquely identifies the column B of same table then it can represented as A->B
(Attribute B is functionally dependent on attribute A)
DBMS Schema
Definition of schema: Design of a database is called the schema. Schema is of three types: Physical
schema, logical schema and view schema.
For example: In the following diagram, we have a schema that shows the relationship between three
tables: Course, Student and Section. The diagram only shows the design of the database, it doesn’t show
the data present in those tables. Schema is only a structural view(design) of a database as shown in the
diagram below.
The design of a database at physical level is called physical schema, how the data stored in blocks of
storage is described at this level.
Design of database at logical level is called logical schema, programmers and database administrators
work at this level, at this level data can be described as certain types of data records gets stored in data
structures, however the internal details such as implementation of data structure is hidden at this level
(available at physical level).
Design of database at view level is called view schema. This generally describes end user interaction with
database systems.
DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is called instance of
database. Database schema defines the variable declarations in tables that belong to a particular database;
the value of these variables at a moment of time is called the instance of that database.