Rdbms Updated III Sem
Rdbms Updated III Sem
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
consistency as compared to earlier forms of data storing applications like file-
processing systems.
Query Language − DBMS is equipped with query language, which makes it
more efficient to retrieve and manipulate data. A user can apply as many and
as different filtering options as required to retrieve a set of data. Traditionally
it was not possible where file-processing system was used.
ACID Properties − DBMS follows the concepts
of Atomicity, Consistency, Isolation, and Durability (normally shortened as
ACID). These concepts are applied on transactions, which manipulate data in
a database. ACID properties help the database stay healthy in multi-
transactional environments and in case of failure.
Multiuser and Concurrent Access − DBMS supports multi-user environment
and allows them to access and manipulate data in parallel. Though there are
restrictions on transactions when users attempt to handle the same data item,
but users are always unaware of them.
Multiple views − DBMS offers multiple views for different users. A user who
is in the Sales department will have a different view of database than a person
working in the Production department. This feature enables the users to have a
concentrate view of the database according to their requirements.
Security − Features like multiple views offer security to some extent where
users are unable to access data of other users and departments. DBMS offers
methods to impose constraints while entering data into the database and
retrieving the same at a later stage. DBMS offers many different levels of
security features, which enables multiple users to have different views with
different features. For example, a user in the Sales department cannot see the
data that belongs to the Purchase department. Additionally, it can also be
managed how much data of the Sales department should be displayed to the
user. Since a DBMS is not saved on the disk as traditional file systems, it is
very hard for miscreants to break the code.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Users
A typical DBMS has users with different rights and permissions who use it for
different purposes. Some users retrieve data and some back it up. The users of a
DBMS can be broadly categorized as follows −
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Drawbacks of File system:
Data Isolation: Because data are scattered in various files, and files may be in
different formats, writing new application programs to retrieve the appropriate
data is difficult.
Duplication of data – Redundant data
Dependency on application programs – Changing files would lead to change
in application programs.
Advantage of DBMS over file based system
There are several advantages of Database management system over file system. Few
of them are as follows:
No redundant data – Redundancy removed by data normalization
Data Consistency and Integrity – data normalization takes care of it too
Secure – Each user has a different set of access
Privacy – Limited access
Easy access to data
Easy recovery
Flexible
Disadvantages of DBMS:
DBMS implementation cost is high compared to the file system
Complexity: Database systems are complex to understand
Performance: Database systems are generic, making them suitable for various
applications. However this feature affect their performance for some
applications
Database Approach :
The database approach is an improvement on the shared file solution as the
use of a database management system (DBMS) provides facilities for
querying, data security and integrity, and allows simultaneous access to data by a
number of different users.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Logical DBMS Architecture
Data are actually stored as bits, or numbers and strings, but it is extremely difficult to
work with the variety and complexity of data at this level.
It is helpful to view data at different levels of abstraction.
Schema:
This is the term for a description of the data organization at some level. Each
level has its own schema.
We will be concerned with three forms of schemas:
Internal or physical
Conceptual or logical
External or user view
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Internal Data Level
The physical schema of the internal level describes details of how data is stored:
files, indices, etc. on the random access disk system. It also typically describes the
record layout of files and type of files (hash, b-tree, flat).
Conceptual Data Level
Also referred to as the Logical level when the conceptual level is implemented to
particular database architecture.
Hides storage details of the internal/physical level.
In the relational model, the conceptual schema presents data as a set of tables.
External Data Level
An external schema specifies a view of the data in terms of the conceptual level. It is
tailored to the needs of a particular category of users. Portions of stored data should
not be seen by some users and begins to implement a level of security and simplifies
the view for these users.
In the relational model, the external schema also presents data as a set of relations.
Examples:
Students should not see faculty salaries.
Faculty should not see billing or payment data.
Three level architecture of DBMS or logical DBMS architecture
The design of a DBMS depends on its architecture. It can be centralized or
decentralized or hierarchical. The architecture of a DBMS can be seen as either
single tier or multi-tier. An n-tier architecture divides the whole system into related
but independent n modules, which can be independently modified, altered, changed,
or replaced.
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the
DBMS and uses it. Any changes done here will directly be done on the DBMS itself.
It does not provide handy tools for end-users. Database designers and programmers
normally prefer to use single-tier architecture.
If the architecture of DBMS is 2-tier, then it must have an application through which
the DBMS can be accessed. Programmers use 2-tier architecture where they access
the DBMS by means of an application. Here the application tier is entirely
independent of the database in terms of operation, design, and programming.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
3-tier Architecture
A 3-tier architecture separates its tiers from each other based on the complexity
of the users and how they use the data present in the database. It is the most widely
used architecture to design a DBMS.
Database (Data) Tier − At this tier, the database resides along with its query
processing languages. We also have the relations that define the data and their
constraints at this level.
Application (Middle) Tier − At this tier reside the application server and the
programs that access the database. For a user, this application tier presents an
abstracted view of the database. End-users are unaware of any existence of the
database beyond the application. At the other end, the database tier is not
aware of any other user beyond the application tier. Hence, the application
layer sits in the middle and acts as a mediator between the end-user and the
database.
User (Presentation) Tier − End-users operate on this tier and they know
nothing about any existence of the database beyond this layer. At this layer,
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
multiple views of the database can be provided by the application. All views
are generated by applications that reside in the application tier.
Multiple-tier database architecture is highly modifiable, as almost all its components
are independent and can be changed independently.
Need for three level architecture
DBA should be able to change database storage structures without affecting the
user’s views.
Internal structure of database should be unaffected by changes to physical
aspects of storage.
DBA should be able to change conceptual structure of database without
affecting all users.
Database Administrator (DBA) Functions & Role
Database administrator. Database administrators(DBAs) use specialized
software to store and organize data. The role may include capacity planning,
installation, configuration, database design, migration, performance monitoring,
security, troubleshooting, as well as backup and data recovery.
A database administrator’s (DBA) primary job is to ensure that data is available,
protected from loss and corruption, and easily accessible as needed. Below are some
of the chief responsibilities that make up the day-to-day work of a DBA.
1. Software installation and Maintenance
A DBA often collaborates on the initial installation and configuration of a new Oracle,
SQL Server etc database. The system administrator sets up hardware and deploys the
operating system for the database server, then the DBA installs the database software
and configures it for use.
2. Data Extraction, Transformation and Loading
Known as ETL, data extraction, transformation, and loading refers to efficiently
importing large volumes of data that have been extracted from multiple systems into a
data warehouse environment.
3. Specialized Data Handling
Today’s databases can be massive and may contain unstructured data types such as
images, documents, or sound and video files. Managing a very large database (VLDB)
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
may require higher-level skills and additional monitoring and tuning to maintain
efficiency.
4. Database Backup and Recovery
DBAs create backup and recovery plans and procedures based on industry best
practices, then make sure that the necessary steps are followed. Backups cost time and
money, so the DBA may have to persuade management to take necessary precautions
to preserve data.
System admins or other personnel may actually create the backups, but it is the DBA’s
responsibility to make sure that everything is done on schedule.
5. Security
A DBA needs to know potential weaknesses of the database software and the
company’s overall system and work to minimise risks. No system is one hundred per
cent immune to attacks, but implementing best practices can minimise risks.
6. Authentication
Setting up employee access is an important aspect of database security. DBAs control
who has access and what type of access they are allowed. For instance, a user may
have permission to see only certain pieces of information, or they may be denied the
ability to make changes to the system.
7. Capacity Planning
The DBA needs to know how large the database currently is and how fast it is
growing in order to make predictions about future needs. Storage refers to how much
room the database takes up in server and backup space. Capacity refers to usage level.
8. Performance Monitoring
Monitoring databases for performance issues is part of the on-going system
maintenance a DBA performs. If some part of the system is slowing down processing,
the DBA may need to make configuration changes to the software or add additional
hardware capacity. Many types of monitoring tools are available, and part of the
DBA’s job is to understand what they need to track to improve the system.
9. Database Tuning
Performance monitoring shows where the database should be tweaked to operate as
efficiently as possible. The physical configuration, the way the database is indexed,
and how queries are handled can all have a dramatic effect on database performance.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
10. Troubleshooting
DBAs are on call for troubleshooting in case of any problems. Whether they need to
quickly restore lost data or correct an issue to minimise damage, a DBA needs to
quickly understand and respond to problems when they occur.
Data files indices
We know that data is stored in the form of records. Every record has a key field,
which helps it to be recognized uniquely.
Indexing is a data structure technique to efficiently retrieve records from the database
files based on some attributes on which the indexing has been done. Indexing in
database systems is similar to what we see in books.
Indexing is defined based on its indexing attributes. Indexing can be of the following
types −
Primary Index − Primary index is defined on an ordered data file. The data
file is ordered on a key field. The key field is generally the primary key of the
relation.
Secondary Index − Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with
duplicate values.
Clustering Index − Clustering index is defined on an ordered data file. The
data file is ordered on a non-key field.
Data Dictionary
A data dictionary is a file or a set of files that contains a database's metadata.
The data dictionary contains records about other objects in the database, such
as data ownership, data relationships to other objects, and other data.
Types of Database :
There are several types of database management systems. Here is a list of seven
common database management systems:
1. Hierarchical databases
2. Network databases
3. Relational databases
4. Object-oriented databases
5. ER model databases
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
1. Hierarchical Databases
In a hierarchical database management systems (hierarchical DBMSs) model, data is
stored in a parent-children relationship nodes. In a hierarchical database, besides
actual data, records also contain information about their groups of parent/child
relationships.
In a hierarchical database model, data is organized into a tree like structure. The data
is stored in form of collection of fields where each field contains only one value. The
records are linked to each other via links into a parent-children relationship. In a
hierarchical database model, each child record has only one parent. A parent can have
multiple children.
2. Network Databases
Network database management systems (Network DBMSs) use a network
structure to create relationship between entities. Network databases are mainly used
on a large digital computers. Network databases are hierarchical databases but unlike
hierarchical databases where one node can have one parent only, a network node can
have relationship with multiple entities. A network database looks more like a cobweb
or interconnected network of records.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
3. Relational Databases
In relational database management systems (RDBMS), the relationship between data
is relational and data is stored in tabular form of columns and rows. Each column if a
table represents an attribute and each row in a table represents a record. Each field in a
table represents a data value.
Structured Query Language (SQL) is a the language used to query a RDBMS
including inserting, updating, deleting, and searching records.
Relational databases work on each table has a key field that uniquely indicates each
row, and that these key fields can be used to connect one table of data to another.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
4. Object-Oriented Model
In this Model we have to discuss the functionality of the object oriented Programming.
It takes more than storage of programming language objects. Object DBMS's increase
the semantics of the C++ and Java. It provides full-featured database programming
capability, while containing native language compatibility. It adds the database
functionality to object programming languages. This approach is the analogical of the
application and database development into a constant data model and language
environment. Applications require less code, use more natural data modeling, and
code bases are easier to maintain. Object developers can write complete database
applications with a decent amount of additional effort.
ER Models :
An ER model is typically implemented as a database. In a simple relational database
implementation, each row of a table represents one instance of an entity type, and each
field in a table represents an attribute type. In a relational database a relationship
between entities is implemented by storing the primary key of one entity as a pointer
or "foreign key" in the table of another entity.
Entity-relationship model was developed by Peter Chen 1976.
DBMS Database Models
A Database model defines the logical design and structure of a database and defines
how data will be stored, accessed and updated in a database management system.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
While the Relational Model is the most widely used database model, there are other
models too:
Hierarchical Model
Network Model
Entity-relationship Model
Relational Model
Hierarchical Model
This database model organises data into a tree-like-structure, with a single root, to
which all the other data is linked. The heirarchy starts from the Root data, and
expands like a tree, adding child nodes to the parent nodes.
In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book,
recipes etc.
In hierarchical model, data is organised into tree-like structure with one one-to-many
relationship between two different types of data, for example, one department can
have many courses, many professors and of-course many students.
Network Model
This is an extension of the Hierarchical model. In this model data is organised more
like a graph, and are allowed to have more than one parent node.
In this database model data is more related as more relationships are established in this
database model. Also, as the data is more related, hence accessing the data is also
easier and fast. This database model was used to map many-to-many data
relationships.
This was the most widely used database model, before Relational Model was
introduced.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Entity-relationship Model
In this database model, relationships are created by dividing object of interest into
entity and its characteristics into attributes.
Different entities are related using relationships.
E-R Models are defined to represent the relationships into pictorial form to make it
easier for different stakeholders to understand.
This model is good to design a database, which can then be turned into tables in
relational model(explained below).
Let's take an example, If we have to design a School Database, then Student will be
an entity with attributes name, age, address etc. As Address is generally complex, it
can be another entity with attributes street name, pincode, city etc, and there will be
a relationship between them.
Relationships can also be of different types. To learn about E-R Diagrams in details,
click on the link.
Relational Model
In this model, data is organised in two-dimensional tables and the relationship is
maintained by storing a common field.
This model was introduced by E.F Codd in 1970, and since then it has been the most
widely used database model, infact, we can say the only database model used around
the world.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
The basic structure of data in the relational model is tables. All the information related
to a particular type is stored in rows of that table.
Hence, tables are also known as relations in relational model.
In the coming tutorials we will learn how to design tables, normalize them to reduce
data redundancy and how to use Structured Query language to access data from tables.
Domains :
A domain is defined as the set of all unique values permitted for an attribute. For
example, a domain of date is the set of all possible valid dates, a domain of integer is
all possible whole numbers, a domain of day-of-week is Monday, Tuesday ...
Tuple :
A table has rows and columns, where rows represents records and columns represent
the attributes. Tuple − A single row of a table, which contains a single record for that
relation is called a tuple. Relation instance − A finite set of tuples in the
relational database system represents relation instance.
Relation :
A general term used in database design is a “relational database"—but a database
relation is not the same thing and does not imply, as its name suggests, a relationship
between tables. A database relation simply refers to an individual table in a relational
database.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
In a relational database, the table is a relation because it stores the relation between
data in its column-row format. The columns are the table's attributes, while the rows
represent the data records. A single row is known as a tuple to database designers.
Keys :
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also
used to establish and identify relationships between tables.
For example: In Student table, ID is used as a key because it is unique for each
student. In PERSON table, passport_number, license_number, SSN are keys since
they are unique for each person.
Types of key:
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
1. Primary key
o It is the first key which is used to identify one and only one instance of an
entity uniquely. An entity can contain multiple keys as we saw in PERSON
table. The key which is most suitable from those lists become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each
employee. In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary key since they are also unique.
o For each entity, selection of the primary key is based on requirement and
developers.
2. Candidate key
o A candidate key is an attribute or set of an attribute which can uniquely identify
a tuple.
o The remaining attributes except for primary key are considered as a candidate
key. The candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of
the attributes like SSN, Passport_Number, and License_Number, etc. are considered
as a candidate key.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
3. Super Key
Super key is a set of an attribute which can uniquely identify a tuple. Super key is a
superset of a candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME) the name of two employees can be the same, but their
EMPLYEE_ID can't be the same. Hence, this combination can also be a key.
The super key would be EMPLOYEE-ID, (EMPLOYEE_ID, EMPLOYEE-NAME),
etc.
4. Foreign key
o Foreign keys are the column of the table which is used to point to the primary
key of another table.
o In a company, every employee works in a specific department, and employee
and department are two different entities. So we can't store the information of
the department in the employee table. That's why we link these two tables
through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new
attribute in the EMPLOYEE table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the
tables are related.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Relational Constraints :
Relational Integrity constraints is referred to conditions which must be present
for a valid relation. These integrity constraints are derived from the rules in the mini-
world that the database represents.
There are many types of integrity constraints. Constraints on the Relational database
management system is mostly divided into three main categories are:
1. Domain constraints
2. Key constraints
3. Referential integrity constraints
Domain Constraints
Domain constraints can be violated if an attribute value is not appearing in the
corresponding domain or it is not of the appropriate data type.
Domain constraints specify that within each tuple, and the value of each attribute must
be unique. This is specified as data types which include standard data types integers,
real numbers, characters, Booleans, variable length strings, etc.
Example:
Create DOMAIN CustomerName
CHECK (value not NULL)
The example shown demonstrates creating a domain constraint such that
CustomerName is not NULL
Key constraints
An attribute that can uniquely identify a tuple in a relation is called the key of the
table. The value of the attribute for different tuples in the relation has to be unique.
Example:
In the given table, CustomerID is a key attribute of Customer Table. It is most likely
to have a single key for one customer, CustomerID =1 is only for the CustomerName
=" Google".
1 Google Active
2 Amazon Active
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
3 Apple Inactive
Referential integrity constraints
Referential integrity constraints is base on the concept of Foreign Keys. A foreign
key is an important attribute of a relation which should be referred to in other
relationships. Referential integrity constraint state happens where relation refers to a
key attribute of a different or same relation. However, that key element must exist in
the table.
Example:
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
1. SELECT originally called RESTRICT
2. PROJECT
3. JOIN
4. PRODUCT
5. UNION
6. INTERSECT
7. DIFFERENCE
8. DIVIDE
The most important of these are (1), (2), (3) and (8), which, together with some
other aggregate functions, are powerful enough to answer a wide range of queries.
The eight operators will be described as general procedures - i.e. not in the syntax
of SQL or any other relational language. The important point is that they define the
result required rather than the detailed process of obtaining it - what but not how.
1.SELECT
RESTRICTS the rows chosen from a table to those entries with specified attribute
values.
SELECT item
FROM stock_level
WHERE quantity > 100
constructs a new, logical table - an unnamed relation - with one column per row
(i.e. item) containing all rows from stock_level that satisfy the WHERE clause.
2.PROJECT
Selects rows made up of a sub-set of columns from a table.
PROJECT stock_item
OVER item AND description
produces a new logical table where each row contains only two columns - item and
description. The new table will only contain distinct rows from stock_item; i.e. any
duplicate rows so formed will be eliminated.
3.JOIN
Associates entries from two tables on the basis of matching column values.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
JOIN stock_item
WITH stock_level
OVER item
It is not necessary for there to be a one-to-one relationship between entries in two
tables to be joined - entries which do not match anything will be eliminated from
the result, and entries from one table which match several entries in the other will
be duplicated the required number of times.
4.PRODUCT
Builds a relation from two specified relations consisting of all possible
combinations of rows, one from each of the two relations.
For example, consider two relations, A and B, consisting of rows:
A: a B: d => A product B: a d
b e a e
c b d
b e
c d
c e
5.UNION
Builds a relation consisting of all rows appearing in either or both of the two
relations.
For example, consider two relations, A and B, consisting of rows:
A: a B: a => A union B: a
b e b
c c
e
6.INTERSECT
Builds a relation consisting of all rows appearing in both of the two relations.
For example, consider two relations, A and B, consisting of rows:
A: a B: a => A intersect B: a
b e
c
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
7.DIFFERENCE
Builds a relation consisting of all rows appearing in the first and not in the second
of the two relations.
For example, consider two relations, A and B, consisting of rows:
A: a B: a => A - B: b and B - A: e
b e c
c
8.DIVIDE
Takes two relations, one binary and one unary, and builds a relation consisting of
all values of one column of the binary relation that match, in the other column, all
values in the unary relation.
A: a x B: x => A divide B: a
a y y
a z
b x
c y
Of the relational operators 3.2.4. to 3.2.8.defined by Codd, the most important is
DIVISION. For example, suppose table A contains a list of suppliers and
commodities, table B a list of all commodities bought by a company. Dividing A
by B produces a table listing suppliers who sell all commodities.
An Entity–relationship model (ER model) describes the structure of a database with
the help of a diagram, which is known as Entity Relationship Diagram (ER
Diagram). An ER model is a design or blueprint of a database that can later be
implemented as a database. The main components of E-R model are: entity set and
relationship set.
What is an Entity Relationship Diagram (ER Diagram)?
An ER diagram shows the relationship among entity sets. An entity set is a
group of similar entities and these entities can have attributes. In terms of DBMS, an
entity is a table or attribute of a table in database, so by showing relationship among
tables and their attributes, ER diagram shows the complete logical structure of a
database. Lets have a look at a simple ER diagram to understand this concept.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
A simple ER Diagram:
In the following diagram we have two entities Student and College and their
relationship. The relationship between Student and College is many to one as a college
can have many students however a student cannot study in multiple colleges at the
same time. Student entity has attributes such as Stu_Id, Stu_Name & Stu_Addr and
College entity has attributes such as Col_ID & Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will
discuss these terms in detail in the next section(Components of a ER Diagram) of this
guide so don’t worry too much about these terms now, just go through them once.
Rectangle: Represents Entity sets.
Ellipses: Attributes
Diamonds: Relationship Set
Lines: They link attributes to Entity Sets and Entity sets to Relationship Set
Double Ellipses: Multivalued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Components of a ER Diagram
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity. The weak entity is represented by a
double rectangle. For example – a bank account cannot be uniquely identified without
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
knowing the bank to which the account belongs, so bank account is a weak entity.
Definition of Strong Entity :
The Strong Entity is the one whose existence does not depend on the existence of
any other entity in a schema. It is denoted by a single rectangle. A strong entity
always has the primary key in the set of attributes that describes the strong entity. It
indicates that each entity in a strong entity set can be uniquely identified.
2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in
an ER diagram. There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student
roll number can uniquely identify a student from a set of students. Key attribute is
represented by oval same as other attributes however the text of key attribute is
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
underlined.
2. Composite attribute:
An attribute that is a combination of other attributes is known as composite attribute.
For example, In student entity, the student address is a composite attribute as an
address is composed of other attributes such as pin code, state, country.
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is
represented with double ovals in an ER Diagram. For example – A person can have
more than one phone numbers so the phone number attribute is multivalued.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute.
It is represented by dashed oval in an ER Diagram. For example – Person age is a
derived attribute as it changes over time and can be derived from another attribute
(Date of birth).
E-R diagram with multivalued and derived attributes:
3. Relationship
A relationship is represented by diamond shape in ER diagram, it shows the
relationship among entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
2. One to Many Relationship
When a single instance of an entity is associated with more than one instances of
another entity then it is called one to many relationship. For example – a customer can
place many orders but a order cannot be placed by many customers.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
diagram each college must have at-least one associated Student.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Taking the following simple ER diagram:
The initial relational schema is expressed in the following format writing the table
names with the attributes list inside a parentheses as shown below for
Persons( personid , name, lastname, email )
Persons and Phones are Tables. name, lastname, are Table Columns (Attributes).
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
UNIT-II DATABASE INTEGRITY AND NORMALISATION
Relational Database Integrity :
Data integrity is enforced in both hierarchical and relational database models. The
following three integrity constraints are used in a relational database structure to
achieve data integrity:
Entity Integrity: This is concerned with the concept of primary keys. The rule
states that every table must have its own primary key and that each has to be
unique and not null.
Referential Integrity: This is the concept of foreign keys. The rule states that
the foreign key value can be in two states. The first state is that the foreign key
value would refer to a primary key value of another table, or it can be null.
Being null could simply mean that there are no relationships, or that the
relationship is unknown.
Domain Integrity: This states that all columns in a relational database are in a
defined domain.
The concept of data integrity ensures that all data in a database can be traced and
connected to other data. This ensures that everything is recoverable and searchable.
Having a single, well-defined and well-controlled data integrity system increases
stability, performance, reusability and maintainability. If one of these features cannot
be implemented in the database, it must be implemented through the software.
The Problem of redundancy in Database
Redundancy means having multiple copies of same data in the database. This
problem arises when a database is well normalized. Suppose a table of student details
attributes are: student Id, student name, college name, college rank, course opted.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
As it can be observed that values of attribute college name, college rank, course is
being repeated which can lead to problems. Problems caused due to redundancy are:
Insertion anomaly, Deletion anomaly, and Updation anomaly.
1. Insertion Anomaly –
If a student detail has to be inserted whose course is not being decided yet then
insertion will not be possible till the time course is decided for student.
This problem happens when the insertion of a data record is not possible without
adding some additional unrelated data to the record.
2. Deletion Anomaly –
If the details of students in this table is deleted then the details of college will
also get deleted which should not occur by common sense.
This anomaly happens when deletion of a data record results in losing some
unrelated information that was stored as part of the record that was deleted
from a table.
3. Updation Anomaly –
Suppose if the rank of the college changes then changes will have to be all over
the database which will be time-consuming and computationally costly.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
If updation do not occur at all places then database will be in inconsistent state.
Normalisation :
Normalization is a process of organizing the data in database to avoid data
redundancy, insertion anomaly, update anomaly & deletion anomaly.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized.
These are – Insertion, update and deletion anomaly
Example: Suppose a manufacturing company stores the employee details in a table
named employee that has four attributes: emp_id for storing employee’s id,
emp_name for storing employee’s name, emp_address for storing employee’s address
and emp_dept for storing the department details in which the employee works. At
some point of time the table looks like this:
The above table is not normalized. We will see the problems that we face when a table
is not normalized.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Update anomaly: In the above table we have two rows for employee Rick as he
belongs to two departments of the company. If we want to update the address of Rick
then we have to update the same in two rows or the data will become inconsistent. If
somehow, the correct address gets updated in one department but not in other then as
per the database, Rick would be having two different addresses, which is not correct
and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training
and currently not assigned to any department then we would not be able to insert the
data into the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department
D890 then deleting the rows that are having emp_dept as D890 would also delete the
information of employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data
Normalization
Here are the most commonly used normal forms:
First normal form(1NF)
Second normal form(2NF)
Third normal form(3NF)
Boyce & Codd normal form (BCNF)
First normal form (1NF)
As per the rule of first normal form, an attribute (column) of a table cannot hold
multiple values. It should hold only atomic values.
Example: Suppose a company wants to store the names and contact details of its
employees. It creates a table that looks like this:
8812121212
102 Jon Kanpur
9900012222
9990000123
104 Lester Bangalore
8123450987
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Two employees (Jon & Lester) are having two mobile numbers so the company stored
them in the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic
(single) values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Candidate Keys: {teacher_id,subject}
Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in
2NF because non prime attribute teacher_age is dependent on teacher_id alone which
is a proper subset of candidate key. This violates the rule for 2NF as the rule says
“no non-prime attribute is dependent on the proper subset of any candidate key of the
table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for
each functional dependency X-> Y at least one of the following conditions hold:
X is a super key of table
Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee,
they create a table named employee_details that looks like this:
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
employee table:
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept_mapping table:
emp_id emp_dept
1001 stores
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key
Storage of Database on Hard Disk :
Databases are stored in file formats, which contain records. At physical level,
the actual data is stored in electromagnetic format on some device. These storage
devices can be broadly categorized into three types −
Primary Storage − The memory storage that is directly accessible to the CPU
comes under this category. CPU's internal memory (registers), fast memory
(cache), and main memory (RAM) are directly accessible to the CPU, as they
are all placed on the motherboard or CPU chipset. This storage is typically
very small, ultra-fast, and volatile. Primary storage requires continuous power
supply in order to maintain its state. In case of a power failure, all its data is
lost.
Secondary Storage − Secondary storage devices are used to store data for
future use or as backup. Secondary storage includes memory devices that are
not a part of the CPU chipset or motherboard, for example, magnetic disks,
optical disks (DVD, CD, etc.), hard disks, flash drives, and magnetic tapes.
Tertiary Storage − Tertiary storage is used to store huge volumes of data.
Since such storage devices are external to the computer system, they are the
slowest in speed. These storage devices are mostly used to take the back up of
an entire system. Optical disks and magnetic tapes are widely used as tertiary
storage.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Memory Hierarchy :
A computer system has a well-defined hierarchy of memory. A CPU has direct
access to it main memory as well as its inbuilt registers. The access time of the main
memory is obviously less than the CPU speed. To minimize this speed mismatch,
cache memory is introduced. Cache memory provides the fastest access time and it
contains data that is most frequently accessed by the CPU.
The memory with the fastest access is the costliest one. Larger storage devices
offer slow speed and they are less expensive, however they can store huge volumes
of data as compared to CPU registers or cache memory.
Magnetic Disks :
Hard disk drives are the most common secondary storage devices in present
computer systems. These are called magnetic disks because they use the concept of
magnetization to store information. Hard disks consist of metal disks coated with
magnetizable material. These disks are placed vertically on a spindle. A read/write
head moves in between the disks and is used to magnetize or de-magnetize the spot
under it. A magnetized spot can be recognized as 0 (zero) or 1 (one).
Hard disks are formatted in a well-defined order to store data efficiently. A
hard disk plate has many concentric circles on it, called tracks. Every track is further
divided into sectors. A sector on a hard disk typically stores 512 bytes of data.
File Organisation and Its Types :
Relative data and information is stored collectively in file formats. A file is a
sequence of records stored in binary format. A disk drive is formatted into several
blocks that can store records. File records are mapped onto those disk blocks.
File Organization :
File Organization defines how file records are mapped onto disk blocks. We
have four types of File Organization to organize file records −
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Types of Indexes :
We know that data is stored in the form of records. Every record has a key
field, which helps it to be recognized uniquely.
Indexing is a data structure technique to efficiently retrieve records from the database
files based on some attributes on which the indexing has been done. Indexing in
database systems is similar to what we see in books.
Indexing is defined based on its indexing attributes. Indexing can be of the following
types −
Primary Index − Primary index is defined on an ordered data file. The data
file is ordered on a key field. The key field is generally the primary key of the
relation.
Secondary Index − Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with
duplicate values.
Clustering Index − Clustering index is defined on an ordered data file. The
data file is ordered on a non-key field.
Ordered Indexing is of two types −
Dense Index
Sparse Index
Dense Index :
In dense index, there is an index record for every search key value in the
database. This makes searching faster but requires more space to store index records
itself. Index records contain search key value and a pointer to the actual record on the
disk.
Sparse Index :
In sparse index, index records are not created for every search key. An index
record here contains a search key and an actual pointer to the data on the disk. To
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
search a record, we first proceed by index record and reach at the actual location of
the data. If the data we are looking for is not where we directly reach by following
the index, then the system starts sequential search until the desired data is found.
Multilevel Index :
Index records comprise search-key values and data pointers. Multilevel index
is stored on the disk along with the actual database files. As the size of the database
grows, so does the size of the indices. There is an immense need to keep the index
records in the main memory so as to speed up the search operations. If single-level
index is used, then a large size index cannot be kept in memory which leads to
multiple disk accesses.
Multi-level Index helps in breaking down the index into several smaller indices
in order to make the outermost level so small that it can be saved in a single disk
block, which can easily be accommodated anywhere in the main memory.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Scenario 1 can be like, sort the file based on roll number (this is key; you can also
create clustered index). On an attribute may be other than roll number e.g. Class
(MCA-I, MCA-II, MCA-III, …., MCA-VI) invert the file.
Now if I wish to access the records of students from MCA-VI, I will locate the
first record and will go sequentially in the linked list. This technique works great if -
Attribute on which I am inverting is primary key or having nearly unique
values
The need is to access everybody in a particular class Longer the list, it is going
to be a clumsy scan. The list can be flexible in size.
Selection/Search – go in a particular order;
Insertion – will be easy in append mode;
Deletion – 1) Re-arrange pointer 2) Mark deleted & re-arrange pointer at
later point of time.
Multi-list File Organisation :
Multi-list file organisation is a multi-index linked file organisation. A linked
file organisation is a logical organisation where physical ordering of records is not of
concern. In linked organisation the series of records is governed by the links that
verify the next record in series. Linking of records can be unordered but such a linking
is very costly for searching of information from a file. Thus, it may be a good idea to
link records in the order of increasing primary key. This will facilitate deletion and
insertion algorithms. Also this really helps the search performance. In addition to
making order during linking, search by a file can be further facilitated by producing
primary and secondary indexes. All these ideas are supported in the multi-list file
organisation. Let us describe these concepts further with the help of an example.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Consider the employee data as given in Figure. The record numbers are given
as alphabets for better explanation. Suppose that the Empid is the key field of the data
records. Let us describe the Multi-list file organisation for the data file.
Record Married/
Empid Name Job Qualification Gender City Salary
Number Single
Software New
A 800 Jain B. Tech. Male Single 15,000/-
Engineer Delhi
Software New
B 500 Inder B. Tech. Female Married 18,000/-
Manager Delhi
Software
C 900 Rashi MCA Female Mumbai Single 16,000/-
Manager
Software
D 700 Gurpreet B. Tech. Male Mumbai Married 12,000/-
Engineer
Software
E 600 Meena MCA Female Mumbai Single 13,000/-
Manager
Figure: Sample data for Employee file
Since, the primary key of the file is Empid, thus the linked order of records should be
defined as B (500), E(600), D(700), A(800), C(900). Though, as the file size will grow
the search performance of the file would deteriorate. Therefore, we can make a
primary index on the file (please note that in this file the records are in the logical
series and tied together using links and not physical placement, thus, the primary
index will be a linked index file rather than block indexes).
Inverted File Organisation :
The inverted file may be the database file itself, rather than its index. It is the
most popular data structure used in document retrieval systems, used on a large scale
for example in search engines.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
UNIT-III: STRUCTURES QUERY LANGUAGE (SQL)
SQL command :
o SQL commands are instructions. It is used to communicate with the database. It
is also used to perform specific tasks, functions, and queries of data.
o SQL can perform various tasks like create a table, add data to tables, drop the
table, modify the table, set permission for users.
Types of SQL Command:
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Example:
1. CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(
100), DOB DATE);
b. DROP: It is used to delete both the structure and record stored in the table.
Syntax
1. DROP TABLE ;
Example
1. DROP TABLE EMPLOYEE;
c. ALTER: It is used to alter the structure of the database. This change could be either
to modify the characteristics of an existing attribute or probably to add a new attribute.
Syntax:
To add a new column in the table
1. ALTER TABLE table_name ADD column_name COLUMN-definition;
To modify existing column in the table:
1. ALTER TABLE MODIFY(COLUMN DEFINITION....);
EXAMPLE
1. ALTER TABLE STU_DETAILS ADD(ADDRESS VARCHAR2(20));
2. ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));
d. TRUNCATE: It is used to delete all the rows from the table and free the space
containing the table.
Syntax:
1. TRUNCATE TABLE table_name;
Example:
1. TRUNCATE TABLE EMPLOYEE;
2. Data Manipulation Language :
o DML commands are used to modify the database. It is responsible for all form
of changes in the database.
o The command of DML is not auto-committed that means it can't permanently
save all the changes in the database. They can be rollback.
Here are some commands that come under DML:
o INSERT
o UPDATE
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
o DELETE
a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the
row of a table.
Syntax:
1. INSERT INTO TABLE_NAME
2. (col1, col2, col3,.... col N)
3. VALUES (value1, value2, value3, .... valueN);
Or
1. INSERT INTO TABLE_NAME
2. VALUES (value1, value2, value3, .... valueN);
For example:
1. INSERT INTO javatpoint (Author, Subject) VALUES ("Sonoo", "DBMS");
b. UPDATE: This command is used to update or modify the value of a column in the
table.
Syntax:
1. UPDATE table_name SET [column_name1= value1,...column_nameN = value
N] [WHERE CONDITION]
For example:
1. UPDATE students
2. SET User_Name = 'Sonoo'
3. WHERE Student_Id = '3'
c. DELETE: It is used to remove one or more row from a table.
Syntax:
1. DELETE FROM table_name [WHERE condition];
For example:
1. DELETE FROM javatpoint
2. WHERE Author="Sonoo";
3. Data Control Language :
DCL commands are used to grant and take back authority from any database user.
Here are some commands that come under DCL:
o Grant
o Revoke
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
a. Grant: It is used to give user access privileges to a database.
Example
1. GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHE
R_USER;
b. Revoke: It is used to take back permissions from the user.
Example
1. REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;
4. Transaction Control Language :
TCL commands can only use with DML commands like INSERT, DELETE and
UPDATE only.
These operations are automatically committed in the database that's why they cannot
be used while creating tables or dropping them.
Here are some commands that come under TCL:
o COMMIT
o ROLLBACK
o SAVEPOINT
a. Commit: Commit command is used to save all the transactions to the database.
Syntax:
1. COMMIT;
Example:
1. DELETE FROM CUSTOMERS
2. WHERE AGE = 25;
3. COMMIT;
b. Rollback: Rollback command is used to undo transactions that have not already
been saved to the database.
Syntax:
1. ROLLBACK;
Example:
1. DELETE FROM CUSTOMERS
2. WHERE AGE = 25;
3. ROLLBACK;
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
c. SAVEPOINT: It is used to roll the transaction back to a certain point without
rolling back the entire transaction.
Syntax:
1. SAVEPOINT SAVEPOINT_NAME;
5. Data Query Language :
DQL is used to fetch the data from the database.
It uses only one command:
o SELECT
a. SELECT: This is the same as the projection operation of relational algebra. It is
used to select the attribute based on the condition described by WHERE clause.
Syntax:
1. SELECT expressions
2. FROM TABLES
3. WHERE conditions;
For example:
1. SELECT emp_name
2. FROM employee
3. WHERE age > 20;
Queries using Order by :
The SQL ORDER BY Keyword
The ORDER BY keyword is used to sort the result-set in ascending or descending
order.
The ORDER BY keyword sorts the records in ascending order by default. To sort
the records in descending order, use the DESC keyword.
ORDER BY Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
Cust.
CustomerName ContactName Address City PostalCode Country
ID
1 Obere Str.
Alfreds Futterkiste Maria Anders Berlin 12209 Germany
57
Avda. de la Méxi
Ana Trujillo
Constit co
2 Emparedados y Ana Trujillo 05021 Mexico
ución D.
helados
2222 F.
Méxi
Antonio Moreno Antonio Mataderos co
3 05023 Mexico
Taquería Moreno 2312 D.
F.
120
4 Lond
Around the Horn Thomas Hardy Hanove WA1 1DP UK
on
r Sq.
Christina Berguvsvä
5 Berglunds snabbköp Luleå S-958 22 Sweden
Berglund gen 8
ORDER BY Example
The following SQL statement selects all customers from the "Customers" table,
sorted by the "Country" column:
Example
SELECT * FROM Customers
ORDER BY Country;
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Where clause :
The WHERE clause is used to filter records.
The WHERE clause is used to extract only those records that fulfill a specified
condition.
WHERE Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Note: The WHERE clause is not only used in SELECT statement, it is also used in
UPDATE, DELETE statement, etc.!
Example :
SELECT * FROM Customers
WHERE Country='Mexico';
Group by :
The GROUP BY statement is often used with aggregate functions (COUNT,
MAX, MIN, SUM, AVG) to group the result-set by one or more columns.
GROUP BY Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
SQL GROUP BY Examples
The following SQL statement lists the number of customers in each country:
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
Nested Queries :
A Subquery or Inner query or a Nested query is a query within another SQL
query and embedded within the WHERE clause.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
A subquery is used to return data that will be used in the main query as a condition to
further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE
statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
There are a few rules that subqueries must follow −
Subqueries must be enclosed within parentheses.
A subquery can have only one column in the SELECT clause, unless multiple
columns are in the main query for the subquery to compare its selected
columns.
An ORDER BY command cannot be used in a subquery, although the main
query can use an ORDER BY. The GROUP BY command can be used to
perform the same function as the ORDER BY in a subquery.
Subqueries that return more than one row can only be used with multiple value
operators such as the IN operator.
The SELECT list cannot include any references to values that evaluate to a
BLOB, ARRAY, CLOB, or NCLOB.
A subquery cannot be immediately enclosed in a set function.
The BETWEEN operator cannot be used with a subquery. However, the
BETWEEN operator can be used within the subquery.
Subqueries with the SELECT Statement
Subqueries are most frequently used with the SELECT statement. The basic syntax is
as follows –
SELECT column_name [, column_name ]
FROM table1 [, table2 ]
WHERE column_name OPERATOR
(SELECT column_name [, column_name ]
FROM table1 [, table2 ]
[WHERE])
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Subqueries with the INSERT Statement
Subqueries also can be used with INSERT statements. The INSERT statement
uses the data returned from the subquery to insert into another table. The selected
data in the subquery can be modified with any of the character, date or number
functions.
The basic syntax is as follows.
INSERT INTO table_name [ (column1 [, column2 ]) ]
SELECT [ *|column1 [, column2 ]
FROM table1 [, table2 ]
[ WHERE VALUE OPERATOR ]
Example
Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table.
Now to copy the complete CUSTOMERS table into the CUSTOMERS_BKP table,
you can use the following syntax.
SQL> INSERT INTO CUSTOMERS_BKP
SELECT * FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS) ;
Subqueries with the UPDATE Statement
The subquery can be used in conjunction with the UPDATE statement. Either single
or multiple columns in a table can be updated when using a subquery with the
UPDATE statement.
The basic syntax is as follows.
UPDATE table
SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
Example
Assuming, we have CUSTOMERS_BKP table available which is backup of
CUSTOMERS table. The following example updates SALARY by 0.25 times in the
CUSTOMERS table for all the customers whose AGE is greater than or equal to 27.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
SQL> UPDATE CUSTOMERS
SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );
This would impact two rows and finally CUSTOMERS table would have the
following records.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 125.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 2125.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
Example
Assuming, we have a CUSTOMERS_BKP table available which is a backup of the
CUSTOMERS table. The following example deletes the records from the
CUSTOMERS table for all the customers whose AGE is greater than or equal to 27.
SQL> DELETE FROM CUSTOMERS
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );
This would impact two rows and finally the CUSTOMERS table would have the
following records.
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
Joins :
We understand the benefits of taking a Cartesian product of two relations,
which gives us all the possible tuples that are paired together. But it might not be
feasible for us in certain cases to take a Cartesian product where we encounter huge
relations with thousands of tuples having a considerable large number of attributes.
Join is a combination of a Cartesian product followed by a selection process. A Join
operation pairs two tuples from different relations, if and only if a given join
condition is satisfied.
We will briefly describe various join types in the following sections.
Theta (θ) Join :
Theta join combines tuples from different relations provided they satisfy the theta
condition. The join condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such
that the attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
Theta join can use all kinds of comparison operators.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Student
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Student_Detail −
Student_detail
Equijoin
When Theta join uses only equality comparison operator, it is said to be
equijoin. The above example corresponds to equijoin.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate the
way a Cartesian product does. We can perform a Natural Join only if there is at least
one common attribute that exists between two relations. In addition, the attributes
must have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in both
the relations are same.
Courses
CS01 Database CS
ME01 Mechanics ME
EE01 Electronics EE
HoD
Dept Head
CS Alex
ME Maya
EE Mira
Courses ⋈ HoD
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Outer Joins :
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join
includes only those tuples with matching attributes and the rest are discarded in the
resulting relation. Therefore, we need to use outer joins to include all the tuples from
the participating relations in the resulting relation. There are three kinds of outer joins
− left outer join, right outer join, and full outer join.
Left
A B
100 Database
101 Mechanics
102 Electronics
Right
A B
100 Alex
102 Maya
104 Mira
Courses HoD
A B C D
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
101 Mechanics --- ---
Courses HoD
A B C D
Courses HoD
A B C D
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
The SQL Joins clause is used to combine records from two or more tables in a
database. A JOIN is a means for combining fields from two tables by using values
common to each.
Consider the following two tables −
Table 1 − CUSTOMERS Table
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
+-----+---------------------+-------------+--------+
Now, let us join these two tables in our SELECT statement as shown below.
SQL> SELECT ID, NAME, AGE, AMOUNT
FROM CUSTOMERS, ORDERS
WHERE CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
This would produce the following result.
+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+----+----------+-----+--------+
Here, it is noticeable that the join is performed in the WHERE clause. Several
operators can be used to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN,
LIKE, and NOT; they can all be used to join tables. However, the most common
operator is the equal to symbol.
There are different types of joins available in SQL −
INNER JOIN − returns rows when there is a match in both tables.
LEFT JOIN − returns all rows from the left table, even if there are no matches
in the right table.
RIGHT JOIN − returns all rows from the right table, even if there are no
matches in the left table.
FULL JOIN − returns rows when there is a match in one of the tables.
SELF JOIN − is used to join a table to itself as if the table were two tables,
temporarily renaming at least one table in the SQL statement.
CARTESIAN JOIN − returns the Cartesian product of the sets of records from
the two or more joined tables.
Views :
SQL CREATE VIEW Statement
In SQL, a view is a virtual table based on the result-set of an SQL statement.
A view contains rows and columns, just like a real table. The fields in a view are
fields from one or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and present the
data as if the data were coming from one single table.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
CREATE VIEW Syntax
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Note: A view always shows up-to-date data! The database engine recreates the data,
using the view's SQL statement, every time a user queries a view.
SQL CREATE VIEW Examples
The following SQL creates a view that shows all customers from Brazil:
Example
CREATE VIEW [Brazil Customers] AS
SELECT CustomerName, ContactName
FROM Customers
WHERE Country = "Brazil";
We can query the view above as follows:
Example
SELECT * FROM [Brazil Customers];
The following SQL creates a view that selects every product in the "Products" table
with a price higher than the average price:
CREATE VIEW [Products Above Average Price] AS
SELECT ProductName, Price
FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);
We can query the view above as follows:
Example
SELECT * FROM [Products Above Average Price];
Sequences :
Sequence is a feature supported by some database systems to produce unique
values on demand. Some DBMS like MySQL supports AUTO_INCREMENT in
place of Sequence.
AUTO_INCREMENT is applied on columns, it automatically increments the
column value by 1 each time a new record is inserted into the table.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Sequence is also somewhat similar to AUTO_INCREMENT but it has some
additional features too.
Creating a Sequence :
Syntax to create a sequence is,
CREATE SEQUENCE sequence-name
START WITH initial-value
INCREMENT BY increment-value
MAXVALUE maximum-value
CYCLE | NOCYCLE;
The initial-value specifies the starting value for the Sequence.
The increment-value is the value by which sequence will be incremented.
The maximum-value specifies the upper limit or the maximum value upto
which sequence will increment itself.
The keyword CYCLE specifies that if the maximum value exceeds the set
limit, sequence will restart its cycle from the begining.
And, NO CYCLE specifies that if sequence exceeds MAXVALUE value, an
error will be thrown.
Using Sequence in SQL Query :
Let's start by creating a sequence, which will start from 1, increment by 1 with a
maximum value of 999.
CREATE SEQUENCE seq_1
START WITH 1
INCREMENT BY 1
MAXVALUE 999
CYCLE;
Now let's use the sequence that we just created above.
Below we have a class table,
ID NAME
1 abhi
2 adam
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
4 alex
The SQL query will be,
INSERT INTO class VALUES(seq_1.nextval, 'anu');
Resultset table will look like,
ID NAME
1 abhi
2 adam
4 alex
1 anu
Once you use nextval the sequence will increment even if you don't Insert any record
into the table.
Indexes :
Tables are made Indexes just to access data from them faster. When you make any
Table as Index, the Select query on that table will become faster, while the Insert
queries will become a bit slower.
When we Index any Table, an Index for that table is created, it is basically a data
structures like technique, to access data faster.
CREATE INDEX index_name ON table_name (column_name) ;
Above mentioned is the query to create an Index for a Table.
Synonyms :
A synonym is an alternative name for objects such as tables, views, sequences,
stored procedures, and other database objects. You generally use synonyms when you
are granting access to an object from another schema and you don't want the users to
have to worry about knowing which schema owns the object.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
o This lock is exclusive, and in this lock, multiple transactions do not modify the
same data simultaneously.
There are four types of lock protocols available:
1. Simplistic lock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-based
protocols allow all the transactions to get the lock on the data before insert or delete or
update on it. It will unlock the data item after completing the transaction.
2. Pre-claiming Lock Protocol
o Pre-claiming Lock Protocols evaluate the transaction to list all the data items
on which they need locks.
o Before initiating an execution of the transaction, it requests DBMS for all the
lock on all those data items.
o If all the locks are granted then this protocol allows the transaction to begin.
When the transaction is completed then it releases all the lock.
o If all the locks are not granted then this protocol allows the transaction to rolls
back and waits until all the locks are granted.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
o the transaction releases its first lock
o In the third phase, the transaction cannot demand any new locks. It only
releases the acquired locks.
Serialisable Schedules :
o The serializability of schedules is used to find non-serial schedules that allow
the transaction to execute concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction
have interleaving of their operations.
o A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Here,
Schedule A and Schedule B are serial schedule.
Schedule C and Schedule D are Non-serial schedule.
Deadlock and its Prevention :
A deadlock is a condition where two or more transactions are waiting
indefinitely for one another to give up locks. Deadlock is said to be one of the most
feared complications in DBMS as no task ever gets finished and is in waiting state
forever.
For example: In the student table, transaction T1 holds a lock on some rows and
needs to update some rows in the grade table. Simultaneously, transaction T2 holds
locks on some rows in the grade table and needs to update the rows in the Student
table held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its
lock and similarly, transaction T2 is waiting for T1 to release its lock. All activities
come to a halt state and remain at a standstill. It will remain in a standstill until the
DBMS detects the deadlock and aborts one of the transactions.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Deadlock Avoidance :
o When a database is stuck in a deadlock state, then it is better to avoid the
database rather than aborting or restating the database. This is a waste of time
and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in
advance. A method like "wait for graph" is used for detecting the deadlock
situation but this method is suitable only for the smaller database. For the larger
database, deadlock prevention method can be used.
Deadlock Detection :
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS
should detect whether the transaction is involved in a deadlock or not. The lock
manager maintains a Wait for the graph to detect the deadlock cycle in the database.
Wait for Graph
o This is the suitable method for deadlock detection. In this method, a graph is
created based on the transaction and their lock. If the created graph has a cycle
or closed loop, then there is a deadlock.
o The wait for the graph is maintained by the system for every transaction which
is waiting for some data held by the others. The system keeps checking the
graph if there is any cycle in the graph.
o The wait for a graph for the above scenario is shown below:
o
Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the resources
are allocated in such a way that deadlock never occurs, then the deadlock can
be prevented.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
o The Database management system analyzes the operations of the transaction
whether they can create a deadlock situation or not. If they do, then the DBMS
never allowed that transaction to be executed.
Optimistic Concurrency Control :
Optimistic concurrency control. Optimistic concurrency control (OCC) is
a concurrency control method applied to transactional systems such as relational
database management systems and software transactional memory. ... While running,
transactions use data resources without acquiring locks on those resources.
Database Recovery and Security:
DBMS is a highly complex system with hundreds of transactions being executed
every second. The durability and robustness of a DBMS depends on its complex
architecture and its underlying hardware and system software. If it fails or crashes
amid transactions, it is expected that the system would follow some sort of algorithm
or techniques to recover lost data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various
categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from
where it can’t go any further. This is called transaction failure where only a few
transactions or processes are hurt.
Reasons for a transaction failure could be −
Logical errors − Where a transaction cannot complete because it has some
code error or any internal error condition.
System errors − Where the database system itself terminates an active
transaction because the DBMS is not able to execute it, or it has to stop
because of some system condition. For example, in case of deadlock or
resource unavailability, the system aborts an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop
abruptly and cause the system to crash. For example, interruptions in power supply
may cause the failure of underlying hardware or software failure.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk
drives or storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head
crash or any other failure, which destroys all or a part of disk storage.
Data security is an imperative aspect of any database system. It is of particular
importance in distributed systems because of large number of users, fragmented and
replicated data, multiple sites and distributed control.
Threats in a Database
Availability loss − Availability loss refers to non-availability of database
objects by legitimate users.
Integrity loss − Integrity loss occurs when unacceptable operations are
performed upon the database either accidentally or maliciously. This may
happen while creating, inserting, updating or deleting data. It results in
corrupted data leading to incorrect decisions.
Confidentiality loss − Confidentiality loss occurs due to unauthorized or
unintentional disclosure of confidential information. It may result in illegal
actions, security threats and loss in public confidence.
Measures of Control
The measures of control can be broadly divided into the following categories −
Access Control − Access control includes security mechanisms in a database
management system to protect against unauthorized access. A user can gain
access to the database after clearing the login process through only valid user
accounts. Each user account is password protected.
Flow Control − Distributed systems encompass a lot of data flow from one
site to another and also within a site. Flow control prevents data from being
transferred in such a way that it can be accessed by unauthorized agents. A
flow policy lists out the channels through which information can flow. It also
defines security classes for data as well as transactions.
Data Encryption − Data encryption refers to coding data when sensitive data
is to be communicated over public channels. Even if an unauthorized agent
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
gains access of the data, he cannot understand it since it is in an
incomprehensible format.
Backup & Recovery Techniques :
Simply speaking, a backup is a copy of data. This copy includes important parts of
your database such as the control file and data files. A backup is a safeguard against
unexpected data loss and application errors; should you lose your original data, you
can use the backup to make it available again.
Backups are divided into physical backups and logical backups. Physical backups,
which are the primary concern of this guide, are copies of physical database files. In
contrast, logical backups contain data that you extract using the Oracle Export utility
and store in a binary file. You can use logical backups to supplement physical
backups. You can make physical backups using either the Oracle8i Recovery Manager
utility or O/S utilities.
To restore a physical backup is to reconstruct it and make it available to the Oracle
server. To recover a restored data file is to update it using redo records, i.e., records of
changes made to the database after the backup was taken. If you use Recovery
Manager (RMAN), you can also recover restored data files with an incremental
backup, which is a backup of a data file that contain only changed data blocks.
Oracle performs crash recovery and instance recovery automatically after an instance
failure. Instance recovery is an automatic procedure that involves two distinct
operations: rolling forward the backup to a more current time by applying online redo
records and rolling back all changes made in uncommitted transactions to their
original state.
In contrast to instance recovery, media recovery requires you to issue recovery
commands. If you use RMAN, then you issue the recover command to apply archived
redo logs or incremental backups to the data files. RMAN automatically selects the
appropriate incremental backups or redo logs and applies them. If you use SQL*Plus,
you can issue the RECOVER or ALTER DATABASE RECOVER statements to
apply the archived logs.
Authorization :
Authorization is a security mechanism used to determine user/client privileges or
access levels related to system resources, including computer programs, files, services,
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
data and application features. Authorization is normally preceded by authentication
for user identity verification.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
UNIT-V: DISTRIBUTED AND CLIENT SERVER DATABASES
Need for Distributed Database Systems :
This chapter introduces the concept of DDBMS. In a distributed database,
there are a number of databases that may be geographically distributed all over the
world. A distributed DBMS manages the distributed database in a manner so that it
appears as one single database to users. In the later part of the chapter, we go on to
study the factors that lead to distributed databases, its advantages and disadvantages.
A distributed database is a collection of multiple interconnected databases, which
are spread physically across various locations that communicate via a computer
network.
Features :
Databases in the collection are logically interrelated with each other. Often
they represent a single logical database.
Data is physically stored across multiple sites. Data in each site can be
managed by a DBMS independent of the other sites.
The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not
synonymous with a transaction processing system.
Distributed Database Management System
A distributed database management system (DDBMS) is a centralized software
system that manages a distributed database in a manner as if it were all stored in a
single location.
Features :
It is used to create, retrieve, update and delete distributed databases.
It synchronizes the database periodically and provides access mechanisms by
the virtue of which the distribution becomes transparent to the users.
It ensures that the data modified at any site is universally updated.
It is used in application areas where large volumes of data are processed and
accessed by numerous users simultaneously.
It is designed for heterogeneous database platforms.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
It maintains confidentiality and data integrity of the databases.
Factors Encouraging DDBMS :
The following factors encourage moving over to DDBMS −
Distributed Nature of Organizational Units − Most organizations in the
current times are subdivided into multiple units that are physically distributed
over the globe. Each unit requires its own set of local data. Thus, the overall
database of the organization becomes distributed.
Need for Sharing of Data − The multiple organizational units often need to
communicate with each other and share their data and resources. This
demands common databases or replicated databases that should be used in a
synchronized manner.
Support for Both OLTP and OLAP − Online Transaction Processing
(OLTP) and Online Analytical Processing (OLAP) work upon diversified
systems which may have common data. Distributed database systems aid both
these processing by providing synchronized data.
Database Recovery − One of the common techniques used in DDBMS is
replication of data across different sites. Replication of data automatically
helps in data recovery if database in any site is damaged. Users can access
data from other sites while the damaged site is being reconstructed. Thus,
database failure may become almost inconspicuous to users.
Support for Multiple Application Software − Most organizations use a
variety of application software each with its specific database support.
DDBMS provides a uniform functionality for using the same data among
different platforms.
Advantages of Distributed Databases :
Following are the advantages of distributed databases over centralized
databases.
Modular Development − If the system needs to be expanded to new locations or
new units, in centralized database systems, the action requires substantial efforts and
disruption in the existing functioning. However, in distributed databases, the work
simply requires adding new computers and local data to the new site and finally
connecting them to the distributed system, with no interruption in current functions.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
More Reliable − In case of database failures, the total system of centralized
databases comes to a halt. However, in distributed systems, when a component fails,
the functioning of the system continues may be at a reduced performance. Hence
DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests
can be met from local data itself, thus providing faster response. On the other hand,
in centralized systems, all queries have to pass through the central computer for
processing, which increases the response time.
Lower Communication Cost − In distributed database systems, if data is located
locally where it is mostly used, then the communication costs for data manipulation
can be minimized. This is not feasible in centralized systems.
Adversities of Distributed Databases :
Following are some of the adversities associated with distributed databases.
Need for complex and expensive software − DDBMS demands complex and
often expensive software to provide data transparency and co-ordination
across the several sites.
Processing overhead − Even simple operations may require a large number of
communications and additional calculations to provide uniformity in data
across the sites.
Data integrity − The need for updating data in multiple sites pose problems of
data integrity.
Overheads for improper data distribution − Responsiveness of queries is
largely dependent upon proper data distribution. Improper data distribution
often leads to very slow response to user requests.
Disadvantages of Data Distribution :
There are following disadvantages of DDBMSs:
Complexity
A distributed DBMS that hides the distributed nature from the user and provides
an acceptable level of performance, reliability, availability is inherently more complex
then a centralized DBMS. The fact that data can be replicated also adds an extra level
of complexity to the distributed DBMS. If the software does not handle data
replication adequately, there wi1l be degradation in availability, reliability and
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
performance compared with the centralized system, and the advantages we cites above
will become disadvantages.
Cost
Increased complexity means that we can expect the procurement and maintenance
costs for a DDBMS to be higher than those for a centralized DBMS. Furthermore, a
distributed
DBMS requires additional hardware to establish a network between sites. There
are ongoing communication costs incurred with the use of this network. There are also
additional labor costs to manage and maintain the local DBMSs and the underlying
network.
Security
In a centralized system, access to the data can be easily controlled. However, in a
distributed DBMS not only does access to replicated data have to be controlled in
multiple locations but also the network itself has to be made secure. In the past,
networks were regarded as an insecure communication medium. Although this is still
partially true, significant developments have been made to make networks more
secure.
Integrity control more difficult
Database integrity refers to the validity and consistency of stored data. Integrity is
usually expressed in terms of constraints, which are consistency rules that the database
is not permitted to violate. Enforcing integrity constraints generally requires access to
a large amount of data that defines the constraints. In a distributed DBMS, the
communication and processing costs that are required to enforce integrity constraints
are high as compared to centralized system.
Lack of Standards
Although distributed DBMSs depend on effective communication, we are only
now starting to see the appearance of standard communication and data
access protocols. This lack of standards has significantly limited the potential of
distributed DBMSs. There are also no tools or methodologies to help users convert a
centralized DBMS into a distributed DBMS
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Lack of experience
General-purpose distributed DBMSs have not been widely accepted, although
many of the protocols and problems are well understood. Consequently, we do not yet
have the same level of experience in industry as we have with centralized DBMSs.
For a prospective adopter of this technology, this may be a significant deterrent.
Database design more complex
Besides the normal difficulties of designing a centralized database, the design of a
distributed database has to take account of fragmentation of data, allocation of
fragmentation to specific sites, and data replication.
Structure of Distributed Database
DDBMS architectures are generally developed depending on three parameters −
Distribution − It states the physical distribution of data across the different
sites.
Autonomy − It indicates the distribution of control of the database system and
the degree to which each constituent DBMS can operate independently.
Heterogeneity − It refers to the uniformity or dissimilarity of the data models,
system components and databases.
Architectural Models :
Some of the common architectural models are −
Client - Server Architecture for DDBMS
Peer - to - Peer Architecture for DDBMS
Multi - DBMS Architecture
Client - Server Architecture for DDBMS
This is a two-level architecture where the functionality is divided into servers and
clients. The server functions primarily encompass data management, query
processing, optimization and transaction management. Client functions include
mainly user interface. However, they have some functions like consistency checking
and transaction management.
The two different client - server architecture are −
Single Server Multiple Client
Multiple Server Multiple Client (shown in the following diagram)
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Design Alternatives
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Data Replication :
It is simply copying data from a database from one server to another server so that all
the users can share the same data without any inconsistency. The result is a
distributed database in which users can access data relevant to their tasks without
interfering with the work of others
The distribution design alternatives for the tables in a DDBMS are as follows
Non-replicated and non-fragmented
Fully replicated
Partially replicated
Fragmented
Mixed
Non-replicated & Non-fragmented
In this design alternative, different tables are placed at different sites. Data is
placed so that it is at a close proximity to the site where it is used most. It is most
suitable for database systems where the percentage of queries needed to join
information in tables placed at different sites is low. If an appropriate distribution
strategy is adopted, then this design alternative helps to reduce the communication
cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored.
Since, each site has its own copy of the entire data base, queries are very fast
requiring negligible communication cost. On the contrary, the massive redundancy in
data requires huge cost during update operations. Hence, this is suitable for systems
where a large number of queries is required to be handled whereas the number of
database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution
of the tables is done in accordance to the frequency of access. This takes into
consideration the fact that the frequency of accessing the tables vary considerably
from site to site. The number of copies of the tables (or portions) depends on how
frequently the access queries execute and the site which generate the access queries.
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments
or partitions, and each fragment can be stored at different sites. This considers the
fact that it seldom happens that all data stored in a table is required at a given site.
Moreover, fragmentation increases parallelism and provides better disaster recovery.
Here, there is only one copy of each fragment in the system, i.e. no redundant data.
The three fragmentation techniques are −
Vertical fragmentation
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables
are initially fragmented in any form (horizontal or vertical), and then these fragments
are partially replicated across the different sites according to the frequency of
accessing the fragments.
Client Server Databases:
Client/Server architecture of database system has two logical components
namely client, and server. Clients are generally personal computers or workstations
whereas server is large workstations, mini range computer system or
a mainframe computer system. The applications and tools of DBMS run on one or
more client platforms, while the DBMS software’s reside on the server. The server
computer is caned backend and the client's computer is called front end. These server
and client computers are connected into a network. The applications and tools act as
clients of the DBMS, making requests for its services. The DBMS, in turn, processes
these requests and returns the results to the client(s). Client/Server architecture
handles the Graphical User Interface (GUI) and does computations and other
programming of interest to the end user. The server handles parts of the job that are
common to many clients, for example, database access and updates.
Advantages of Client/Server Database System :
Client/Server system has less expensive platforms to support applications that
had previously been running only on large and expensive mini or mainframe
computers
GDC-Kamareddy D. Balakrishna
B.Com CA III Semester Paper-III-RDBMS
• Client offer icon-based menu-driven interface, which is superior to the traditional
command-line, dumb terminal interface typical of mini and mainframe computer
systems.
• Client/Server environment facilitates in more productive work by the users and
making better use of existing data.
GDC-Kamareddy D. Balakrishna