Chapter 1
Chapter 1
Data
• Data is nothing but raw and unprocessed facts and statistics
stored or free flowing over a network.
• Processed data becomes information.
• For example: Classroom temperature is 0 degree Celsius.
Data vs Information
Database
• The database is a collection of inter-related data.
• It is also used to organize the data in the form of a table,
schema, views, and reports, etc.
• Using the database, you can easily retrieve, insert, and delete
the information. (CRUD)
• For example: Class database, Library database.
Database Example
Database System
• A system that keeps record of data, and manages it.
• The main purpose of database systems is to manage the data.
• Consider a university that keeps the data of students, teachers,
courses, books etc. To manage this data we need to store this
data somewhere where we can add new data, delete unused
data, update outdated data, retrieve data, to perform these
operations on data we need a Database management system
that allows us to store the data in such a way so that all these
operations can be performed on the data efficiently.
What before DBMS?
• File System
Database Management System
• A DBMS is software that allows creation, definition and
manipulation of database, allowing users to store, process and
analyze data easily.
• DBMS provides us with an interface or a tool, to perform
various operations like creating database, storing data in it,
updating data, creating tables in the database and a lot more.
• DBMS also provides protection and security to the databases.
• It also maintains data consistency in case of multiple users.
• Example: My SQL, Oracle.
Database Management System
Lets elaborate DBMS with example of Veda.
Applications of DBMS
Applications of DBMS
• Banking: all transactions
• Airlines: reservations, schedules
• Universities: registration, grades
• Sales: customers, products, purchases
• Online retailers: order tracking, customized recommendations
• Manufacturing: production, inventory, orders, supply chain
• Human resources: employee records, salaries, tax deductions
DBMS Characteristics
DBMS Characteristics
• Data stored into Tables: Data is never directly stored into the
database. Data is stored into tables, created inside the database.
• Reduced Redundancy: In the modern world hard drives are
very cheap, but earlier when hard drives were too expensive,
unnecessary repetition of data in database was a big problem.
But DBMS follows Normalisation which divides the data in
such a way that repetition is minimum.
• Data Consistency: On Live data, i.e. data that is being
continuosly updated and added, maintaining the consistency of
data can become a challenge. But DBMS handles it all by itself.
DBMS Characteristics
• Support Multiple user and Concurrent Access: DBMS
allows multiple users to work on it (update, insert, delete data)
at the same time and still manages to maintain the data
consistency.
• Query Language: DBMS provides users with a simple Query
language, using which data can be easily fetched, inserted,
deleted and updated in a database.
Advantages of DBMS
• Controls data redundancy: It can control data redundancy
because it stores all the data in one single database file and that
recorded data is placed in the database.
• Data sharing: In DBMS, the authorized users of an
organization can share the data among multiple users.
• Easily Maintenance: It can be easily maintainable due to the
centralized nature of the database system.
Advantages of DBMS
• Reduce time: It reduces development time and maintenance
need.
• Backup: It provides backup and recovery subsystems which
create automatic backup of data from hardware and software
failures and restores the data if required.
• Multiple user interface: It provides different types of user
interfaces like graphical user interfaces, application program
interfaces
Disadvantages of DBMS
• Cost of Hardware and Software: It requires a high speed of
data processor and large memory size to run DBMS software.
• Size: It occupies a large space of disks and large memory to run
them efficiently.
• Complexity: Database system creates additional complexity
and requirements.
• Higher impact of failure: Failure is highly impacted the
database because in most of the organization, all the data stored
in a single database and if the database is damaged due to
electric failure or database corruption then the data may be lost
forever.
• A field consists of a grouping of characters.
• A data field represents an attribute (a characteristic or quality) of
some
entity (object, person, place, or event).
Record
• A record represents a collection of attributes that describe a real-
world entity.
• A record consists of fields, with each field describing an attribute
of the entity.
Object
• A database object is any defined object in a database that is used to
store or reference data.
• Anything which we make from create command is known as
object.
Keys in DBMS
• KEYS in DBMS is an attribute or set of attributes which helps
you to identify a row(tuple) in a relation(table).
• They allow you to find the relation between two tables.
• Keys help you uniquely identify a row in a table by a
combination of one or more columns in that table.
Why we need a Key?
• Keys help you to identify any row of data in a table. In a real-
world application, a table could contain thousands of records.
Moreover, the records could be duplicated.
• Keys ensure that you can uniquely identify a table record
despite these challenges.
• Allows you to establish a relationship between and identify the
relation between tables.
• Help you to enforce identity and integrity in the relationship.
Types of Keys in DBMS
• Super Key - A super key is a group of single or multiple keys
which identifies rows in a table.
• Primary Key - is a column or group of columns in a table that
uniquely identify every row in that table.
• Candidate Key - is a set of attributes that uniquely identify
tuples in a table. Candidate Key is a super key with no repeated
attributes.
• Foreign Key - is a column that creates a relationship between
two tables. The purpose of Foreign keys is to maintain data
integrity and allow navigation between two different instances
of an entity.
What is primary key?
• A primary is a single column value used to identify a database
record uniquely.
It has following attributes
• A primary key cannot be NULL
• A primary key value must be unique
• The primary key values should rarely be changed
• The primary key must be given a value when a new record is
inserted.
Primary Key
Create Primary Key
• CREATE TABLE Persons
(
ID int NOT NULL,
FirstName varchar(255),
LastName varchar(255) NOT NULL,
Age int,
PRIMARY KEY (ID)
);
Super Key
Candidate Key
Alternate Key
• An alternate key is the secondary candidate key that contains
all the property of a candidate key but is an alternate option.
Alternate Key
• So, the primary key is Emp_Id. Now, the remaining two
candidate keys are Emp_SSN and Emp_email. Therefore,
Emp_SSN and Emp_Email are the alternate keys.
What is programming language?
DBMS languages
• Database languages are used to read, update and store data in a
database. There are several such languages that can be used for
this purpose; one of them is SQL (Structured Query Language).
DDL(Data Definition Language)
• In order to get the list of all the databases, you can use SHOW
DATABASES statement.
Example: SQL> SHOW DATABASES
CREATE TABLE
• Syntax: CREATE TABLE table_name
(
column1 data_type(size),
column2 data_type(size),
column3 data_type(size),
....
);
CREATE TABLE
Examples:
DROP TABLE table_name;
Here, table_name is the name of the table to be deleted.
DROP DATABASE database_name;
Here, database_name is the name of the database to be deleted.
TRUNCATE
• It is used to remove all records from a table, including all
spaces.
• Syntax: TRUNCATE TABLE table_name;
DROP vs TRUNCATE
• Truncate preserves the structure of the table for future use,
unlike drop table where the table is deleted with its full
structure.
• Table or Database deletion using DROP statement cannot be
rolled back, so it must be used wisely.
ALTER(ADD, DROP, MODIFY)
• ALTER TABLE is used to add, delete/drop or modify columns
in the existing table. It is also used to add and drop various
constraints on the existing table.
• ALTER TABLE – ADD: ADD is used to add columns into the
existing table.
• Syntax: ALTER TABLE table_name
ADD (Columnname_1 datatype,
Columnname_2 datatype,
… Columnname_n datatype);
ALTER TABLE – DROP
1. Parent-Child Relationship
• A parent node exists for each child node. However, a parent node
might have several child nodes. It is not permitted to have more
than one parent.
2. One-to-many Relationship
• The data is organized in a tree-like form, with the datatypes
having a one-to-many relationship. There can only be one path
from any node to its parent. For example, in the preceding
example, there is only one way to get to the node ‘sneakers’,
which is through the ‘men’s shoes’ node.
Features of a Hierarchical Model
3. Deletion Problem
• When a parent node is removed, the child node is removed as
well.
4. Pointers
• Pointers are used to connect the parent and child nodes and to
traverse and navigate between the stored data. The ‘shoes’ node
in the above example points to the two additional nodes,
‘women’s shoes’ and ‘men’s shoes.’
• Pros of Hierarchical Model
• A tree-like structure is incredibly straightforward and quick to
navigate.
• Any modification to the parent node is reflected automatically in
the child node, ensuring data integrity.
• Cons of Hierarchical Model
• Relationships that are complex are not supported.
• Because it only supports one parent per child node, if we have a
complex relationship in which a child node needs to have two
parents, we won’t be able to describe it using this model.
• When a parent node is removed, the child node is removed as well.
Network database
• In this model, entities are organized in a graph which can be
accessed through several paths.
• The network database model allows each child to have multiple
parents.
Features of a Network Model
1. Multiple Paths
• There may be several paths to the same record due to the increased number
of relationships. It allows for quick and easy data access.
2. The Ability to Merge More Relationships
• Data is more connected in this model since there are more relationships.
This paradigm can handle many-to-many as well as one-to-one
relationships.
3. Circular Linked List
• The circular linked list is used to perform operations on the network
model. The present position is kept up to date with the help of a software,
and it navigates through the records based on the relationship.
• Pros of Network Model
• In comparison to the hierarchical model, data can be retrieved faster. This
is because the data in the network model is more related, and there may be
more than one path to a given node. As a result, the data can be accessed in
a variety of ways.
• Data integrity is present since there is a parent-child relationship. Any
changes to the parent record are mirrored in the child record.
• Cons of Network Model
• As the number of relationships to be managed grows, the system may get
increasingly complicated. To operate with the model, a user must have a
thorough understanding of it.
• Any alteration, such as an update, deletion, or insertion, is extremely
difficult.
Relational model
• It represents the database as a collection of relations.
• A relation is nothing but a table of values.
• Every row in the table represents a collection of related data
values.
Entity-Relationship Model
• ER model stands for an Entity-Relationship model. It is a high-
level data model. This model is used to define the data elements
and relationship for a specified system.
• It develops a conceptual design for the database. It also
develops a very simple and easy to design view of data.
Features of an Entity-Relationship Model
1. Update Anomaly
2. Insert Anomaly
3. Delete Anomaly
Update Anomaly
Update Anomaly
• For student Muthu, we have two columns in the above table as
he belongs to two clubs at the college. If we want to change
Muthu's address, we must update it twice otherwise the data
will be inconsistent.
• When the correct address gets updated in one club but not in
another, Muthu would possess two different addresses, which is
not acceptable and could result in inconsistent data.
• So, the update anomaly is when an update of a single data
value requires multiple rows of data to be updated.
Insert Anomaly
Insert Anomaly
• For example, in the above table if a new student named Nanda
has joined the college and he has no department affiliation as
the club allows intake of students only from second year. Then
we can't insert the data of Nanda into the table since
the stu_club field cannot accept null values.
• So, Insertion Anomaly refers to when one cannot insert a new
tuple into a relationship due to lack of data.
Delete Anomaly
Delete Anomaly
• Suppose, for instance, the college at some point closes the club
crypto, then deleting the rows that contain s_club as crypto
would also delete the information of student Mukesh since he
belongs only to this department.
• So, the delete anomaly refers to the situation where the
deletion of data results in the unintended loss of some other
important data.
Functional Dependency
• In a relational database management, functional dependency is
a concept that specifies the relationship between two sets of
attributes where one attribute determines the value of another
attribute.
• It is denoted as X → Y, where the attribute set on the left side
of the arrow, X is called Determinant, and Y is called
the Dependent.
Functional Dependency
Some valid functional dependencies
We have divided our 1NF table into two tables viz. Table 1 and Table2.
Table 1 contains member information.
Table 2 contains information on movies rented.
We have introduced a new column called Membership_id which is the primary key
for table 1. Records can be uniquely identified in Table 1 using membership id.
What is Foreign Key?
• In Table 2, Membership_ID is the Foreign Key
Foreign Key references the primary key of another Table. It helps connect your
Tables
•A foreign key can have a different name from its primary key
•It ensures rows in one table have corresponding rows in another
•Unlike the Primary key, they do not have to be unique. Most often they aren’t
•Foreign keys can be null even though primary keys can not.
Why do you need a foreign key?
You will only be able to insert values into your foreign key that exist in
the unique key in the parent table. This helps in referential integrity.
3NF (Third Normal Form)
• Rule 1- Be in 2NF
• Rule 2- Has no transitive functional dependencies.
• It removes the column that are not dependent on primary key.
• To move our 2NF table into 3NF, we again need to again divide
our table.
• We have again divided our tables and created a new table which
stores Salutations. There are no transitive functional
dependencies, and hence our table is in 3NF
3NF (Third Normal Form)
Advantages:
• Since all data is stored at a single location only thus it is easier to
access and coordinate data.
• The centralized database has very minimal data redundancy since all
data is stored in a single place.
• It is cheaper in comparison to all other databases available.
Disadvantages:
• The data traffic in the case of a centralized database is more.
• If any kind of system failure occurs in the centralized system then
the entire data will be destroyed.
Distributed Database
Distributed Database
• A distributed database is basically a type of database which
consists of multiple databases that are connected with each
other and are spread across different physical locations.
• The data that is stored in various physical locations can thus be
managed independently of other physical locations.
• The communication between databases at different physical
locations is thus done by a computer network.
Distributed Database
Advantages:
• This database can be easily expanded as data is already spread across
different physical locations.
• The distributed database can easily be accessed from different networks.
• This database is more secure in comparison to a centralized database.
Disadvantages:
• This database is very costly and is difficult to maintain because of its
complexity.
• In this database, it is difficult to provide a uniform view to users since it
is spread across different physical locations.
Database Security
• Database security refers to the collective measures used to protect
and secure a database or database management software from
unauthorized access and malicious cyber threats and attacks.
• Database security procedures are aimed at protecting not just the
data inside the database, but the database management system and
all the applications that access it from intrusion, misuse of data, and
damage.
• It is a broad term that includes a multitude of processes, tools and
methodologies that ensure security within a database environment.
Database Security
Database Security
• Database security is designed to protect the Confidentiality,
Integrity & Availability of data of database.
1. Confidentiality is roughly equivalent to privacy.
Confidentiality measures are designed to prevent sensitive
information from unauthorized access attempts. It is common for
data to be categorized according to the amount and type of
damage that could be done if it fell into the wrong hands. More or
less stringent measures can then be implemented according to
those categories.
Database Security