lOMoAR cPSD| 43926606
Subject Name: RDBMS AND ORACLE
Department: COMPUTER APPLICATIONS
Class: III – BCA
Semester: V
Unit – I: Database Concepts: A Relational approach: Database – Relationships – DBMS –
Relational Data Model – Integrity Rules – Theoretical Relational Languages. Database
Design: Data Modeling and Normalization: Data Modeling – Dependency – Database Design
– Normal forms – Dependency Diagrams – De -normalization – Another Example of
Normalization
RELATIONAL APPROACH
Database: An Introduction
• A databases is an electronic store of data
• The basic terms used to describe the structure of a database: A
person, place, event or item is called an entity.
The facts describing an entity are known as data. For example, if you were a registrar in a
college, you would like to have all the information about the students. Each student is an entity
in such a scenario.
Each entity can be described by its characteristics, which are known as attributes. For
example, some of the likely attributes for a college student are student identification number,
last name and so on.
All the related entities are collected together to form an entity set.
A database is a collection of entity sets. For example, college‟s database may include
information about entities such as student, faculty, course, term and so on.
The entities in a database are likely to interact with other entities. The interactions between the
entity set are called relationships.
lOMoAR cPSD| 43926606
Relationships
The relationships are classified into three categories:
1. One-to-one relationships: A one-to-one relationship is written as 1:1 in short form. It
exists between tow entity sets, X and Y, if an entity in entity set X has only one matching
entity in entity set Y and vice versa. For example a department in a college has one
chairperson and a chairperson chairs one department in a college
2. One-to-many relationship: a one-to-many relationship is written as 1:M. It exists
between two entity sets, X and Y, if an entiry in entity set X has many matching entities
in entity et Y but an entity in entity set Y has only one matching entity in entity set X.
For example an employee works for one department , but a department has many
employees..
3. Many-to-many relationship: A many-to-many relationship is written as M:N or M:M.
It exists between two entity sets, X and Y, if an entity in entity set X has many matching
entities in entity set Y and an entity in entity set Y has many matching entities in entity
set X. For example an employee works on many projects and a project has many
employees.
DATABASE MANAGEMENT SYSTEM (DBMS)
The database system consists of the following components:
1. A database management system (DBMS) software package such as Microsoft Access,
Visual Fox Pro, Microsoft SQL – server, Oracle.
2. A user developed and implemented database or databases that include tables or data
dictionary and other database objects.
3. Custom applications such as data-entry forms, reports, queries, blocks and programs
4. Computer hardware – personal computers, minicomputers and mainframes in a network
environment.
5. Software – an operating system and a network operating system
6. Personnel – a database administrator, or database designer/analyst, a programmer, and
end users.
lOMoAR cPSD| 43926606
User
Applications DBMS Database
OS Software
Hardware
Database system
A DBMS based on the relational model is also known as Relational Database Management
System (RDBMS). An RDBMS no only manages data but is also responsible for other
important functions:
1. It manages the data and relationship stored in the databse. It creates a data
dictionary as a user creates a database. The data dictionary is a system structure
that stores Metadata ( Data about data).
2. It manages all day to-day transaction
3. It performs bookkeeping duties, so the user has data independence at the
applicatioin level.
4. It transforms logical data requests to match physical data structures. Where a user
requests data, The RDBMS searches through data dictionary, filters out
unnecessary data, and displays the results in a readable and understandable form.
5. it allows user to specity validation rules.
6. It secutes access through passwords, encryption, and restricted user rights
7. It provides backup and recovery procedure for physical security of data
8. It allows users to share data with data- locking capabilities
9. It provides import and export utilities to use data created in other databases or
spreadsheet software or to use data in other software.
10. It enables users to join tables to view information stored in different tables within
the database.
lOMoAR cPSD| 43926606
THE RELATIONAL DATA MODEL
E.F. Codd Developed the relational database model in 1970. The model is based on
mathematical set theory, and it uses a relation as building blocks the database.
The relation is represented by a two – dimensional, flat structure known as a table
In Relational terminology, a row is also referred to as a tuple.
The number of columns in a table is called the degree.
The set of all possible values that a column my have is called the domain of that column.
PROJ2002
PROJNO LOC CUSTOMER
1 Miami Stocks
3 Trenton Smith
5 Phoenix Robins
6 Edison Shaw
7 Seattle Douglas
PROJ2003
PROJNO LOC CUSTOMER
1 Miami Stocks
2 Orlando Allen
3 Trenton Smith
4 Trenton Jones
PRJPARTS
PROJNO PARTNO QTY
1 11 20
lOMoAR cPSD| 43926606
2 33 5
3 11 7
1 22 10
2 11 3
PARTS
PARTNO PARTDESC VENDOR COST
11 Nut Richards 19.95
22 Bolt Black 5.00
33 Washer Mobley 55.99
DEPARTMENT
DEPTNO DEPTNAME
10 PRODUCTION
20 SUPPLIES
30 MARKETING
EMPLOYEE
EMPNO ENAME DEPTNO PROJNO SALARY
101 Carter 10 1 25000
102 Albert 20 3 37000
103 Breen 30 6 50500
104 Gould 20 5 23700
105 Barker 10 7 75000
Relational Terminology File System Terminology
lOMoAR cPSD| 43926606
Entity Set or Table or Relation File
Entity or Row or Tuple Record
Attribute or Column Field
Terminology Comparison
1. A key is a minimal set of column used to uniquely define any row in a table
2. A single column is used as a unique identifier, it is known as a primary key
3. A combination of columns is used as a unique identifier, it is known as composite
key.
4. In a relational database, tables are related to each other through a common column.
A column in a table that references a column in another table is known as a foreign
key.
INTEGRITY RULES
In any database managed by an RDBMS, it is very important that the data in the underlying
tables be consistent. If consistency is compromised, the data are not usable. This need led to
pioneers of database field to formulate two integrity rules:
1. Entity integrity: No column in primary key may be null. The primary key provides the
means of uniquely identifying a row or an entity. A value means a value that is not
known, not entered, not defined or not applicable. A zero or a space is not considered
to be a null value. If the primary key value is a null value in a row, we do not have
enough information about the row to uniquely identify.
2. Referential integrity: A foreign key value may be null value or it must exists as a value
of a primary key in the referenced table.
THEORETICAL RELATIONAL LANGUAGES
E .F. Codd suggested two theoretical relational languages to use with the relational model:
1. Relational algebra, a procedural language 2.
Relational calculus, a non procedural language.
lOMoAR cPSD| 43926606
Relational Algebra
Relational algebra is a procedural language, because the user accomplishes desired
results by using a set of operations in a sequence. It uses set operations on table to produce new
resulting tables.
The nine operations used by relational algebra are:
1. Union
2. Intersection
3. Difference
4. Projection
5. Selection
6. Product
7. Assignment
8. Join
9. Division
Union:
The union of two tables results in retrieval of all rows that are in one or both tables. The
duplicate rows are eliminated from the resulting table. The resulting table does not contain two
rows with identical data values. There is a basic requirement to perform a union operation on
two tables
1. Both tables must have the same degree
2. The domains of the corresponding columns in two tables must be same
TABLE_A = PROJ2002 U PROJ2003
TABLE_A
PROJNO LOC CUSTOMER
1 Miami Stocks
2 Orlando Allen
3 Trenton Smith
lOMoAR cPSD| 43926606
4 Charlotte Jones
5 Phoenix Robins
6 Edison Shaw
7 Seattle Douglas
Intersection
The intersection of two tables produces a table with rows that are in both tables. The
two tables must be union compatible to perform an intersection on them
TABLE_B = PROJ2002 ∩ PROJ2003
TABLE_B
PROJNO LOC CUSTOMER
1 Miami Stocks
3 Trenton Smith
Difference
The difference of two tables produces a table with rows that are present in the first table
but not in the second table. The difference can be performed on union-compatible tables only.
TABLE_C = PROJ2002 – PROJ2003
TABLE_C
PROJNO LOC CUSTOMER
5 Phoenix Robins
6 Edison Shaw
lOMoAR cPSD| 43926606
7 Seattle Douglas
Projection
The projection operation allows us to create a table based on desirable columns from
all existing columns in a table.
TABLE_E = Parts (PartDesc, Cost)
PARTDESC COST
Nut 19.95
Bolt 5.00
Washer 55.99
Selection
The selection operation selects rows from a table based on a condition or conditions.
The conditional operators ( =, <>,>,>=,<,<=) and the logical operators (AND, OR, NOT) are
used along with columns and values to create conditions.
TABLE_F = Sel(PARTS: Cost>10.00)
TABLE_F
PARTNO PARTDESC VENDOR COST
11 Nut Richards 19.95
33 Washer Mobley 55.99
Product
A product of two tables is a combination everything in both tables. It is also known as
a Cartesian product. It can cause huge results with big tables.
First table X rows, Second table has Y rows, the resulting product has X * Y rows
First table M columns, Second Table has N columns, the resulting product has M + N Columns
DEPARTMENT
DEPTNAME
PRODUCTION
SUPPLIES
MARKETING
EMPLOYEE
ENAME
Carter
Albert
TABLE_G = EMPLOYEE * DEPARTMENT
TABLE_G
Ename DeptName
Carter PRODUCTION
Carter SUPPLIES
Carter MARKETING
Albert PRODUCTION
Albert SUPPLIES
Albert MARKETING
Assignment
This operation creates a new table from existing tables.
TABLE_A = PROJ2002 U PROJ2003
Join
This join is one of the most important operations because of its ability to get related
data form a number of tables. The join is based on common set of values.
TABLE_H = join (EMPLOYEE, DEPARTMENT : Deptno = Deptno)
TABLE_H
EMPNO ENAME DEPTNO PRJNO SALARY DEPTNAME
101 Carter 10 1 25000 PRODUCTION
102 Albert 20 3 37000 SUPPLIES
103 Breen 30 6 50500 MARKETING
104 Gould 20 5 23700 SUPPLIES
105 Barker 10 7 75000 PRODUCTION
Division
The division operation is the most difficult operation to comprehend. In relational
algebra, it identifies, it identifies rows in one table that have a certain relationship to all rows
in another table. Let us consider the following two tables:
PROJ
PROJNO
PRJPARTS
PROJNO PARTNO
1 11
2 33
3 11
1 22
2 11
TABLE_I = PRJPARTS / PROJ
TABLE_I
PARTNO
11
Example:
PRJPARTS
PROJNO PARTNO QTY
1 11 20
2 33 5
3 11 7
1 22 10
2 11 3
Referring to the tables in the above figure, find projects where part number 11 is used.
Solution:
M = Sel ( PRJPARTS : partno=11)
N=M (projno)
Relational Calculus
Relational calculus is a nonprocedural language. The programmer specifies the data
requirement, and the system generates the operations needed to produce a table with the
required data. In this section will try to understand relational calculus briefly with sample
example using the general syntax.
Result = (column list): Expression
The list of columns is on the left of the colon, and the expressions (and conditions) are on the
right.
Example:
PRJPARTS
PROJNO PARTNO QTY
1 11 20
2 33 5
3 11 7
1 22 10
2 11 3
Referring to the tables in the above figure, find projects where part number 11 is used.
Solution:
(r.projno) : r in PRJPARTS and r.partno = 11
DATABASE DESIGN: DATA MODELING & NORMALIZATION
DATA MODELING
A model is a simplified version of real-life, complex objects. Databases are complex
and data modeling is a tool to represent the various components and their relationships. The
Entity-relationship (E-R) model is a very popular modeling tool among many.
The E-R Model provides:
1. An excellent communication tool
2. A simple graphical representation of data
The E-R model uses E-R Diagram (ERD) for graphical representation of the database
components.
1. An entity (or an entity set) is represented by rectangle
2. A line represents relationships between the two entities
3. The name of the relationship is an active verb is lowercase letters, for ex. Works ,
manages and employs.
Entity Representation in an E-R Diagram:
EMPLOYEE
Representation of relationship in an E-R diagram
1:1
M: N
1: M
The types of relationships ( 1:1, 1:M, M:N) between entities are called connectivity or
multiplicity.
The relationship between two entities can be given using the lower and upper limits.
This information is called the cardinality. The cardinality is written next to each entity in the
form (n, m), where n is the minimum number and m is the maximum number.
Supervises
EMPLOYEE DEPARTMENT
(1, 1)
(1, 1)
Employees
DIVISION FACULTY
(1, N)
(1, 1)
Contains
INVOICE ITEM
(1, N) (1, N)
Entity, relationship, connectivity and Cardinality
An employee can be part of zero or more than one department, and an item may not be
in any invoice! These types of decisions are known as business rules
The decomposition from M:N to 1:M involves a third entity, knows as composite entity
or an associative entity.
Some of the other elements considered in the database design are:
1. Simple attributes – attributes that cannot be subdivided; for example, last name, city or
gender
2. Composite attributes – attributes that can be subdivided, into atomic form; for example,
a full name can be subdivided into the last name, first name and middle name
3. Single – valued attributes – attributes with a single value; for example, employee ID ,
Social security number or date of birth
4. Multivalued attributes – attributes with multiple values; for example, degree codes or
course registration. The multivalued attributes have to be given special consideration.
DEPENDENCY
There are three types of dependencies in a table:
1. Total or full dependency : A nonkey column dependent on all primary key columns
shows total dependency
2. Partial dependency: In partial dependency; a nonkey column is dependent on part of
primary key.
3. Transitive dependency: In transitive dependency; a nonkey column is dependent on
another nonkey column.
Invoice table and its Columns:
Invno Invdate Custno Itemno Custname Itemname Itemprice Qty
1001 04/14/03 212 1 Starks Screw $2.25 5
1001 04/14/03 212 3 Starks Bolt $3.99 5
1001 04/14/03 212 5 Starks Washer $1.99 9
1002 04/17/03 225 1 Connors Screw $2.25 2
1002 04/17/03 225 2 Connors Nut $5.00 3
1003 04/17/03 239 1 Kapur Screw $2.25 7
1003 04/17/03 239 2 Kapur Nut $5.00 1
1004 04/18/03 211 4 Garcia Hammer $9.99 5
For Example in the invoice table, itemname and itemprice are nonkey columns that are
dependent only on a part of a primary key column itemno. They are not dependent on the invno
column. Similary, the nonkey column invdate is dependent only on invno. They are partially
dependent on the primary key columns. The nonkey column custname is not dependent on any
primary dey column but is dependent on another nonkey column custno. It is said to have
transitive dependency. The nonkey column qty is dependent on both invno and itemno, so it is
said to have full dependency.
DATABASE DESIGN
Normal forms
Data are repeated from row to row. For example invdate, custno, and custname are repeated for
same invno. The item name is entered repeatedly from invoice to invoice. There is a large
amount of redundant data in a table with just eight rows. Redundant date can pose a huge
problem in databases. The redundancy may also lead to anomalies
Anomalies
A deletion anomaly results when the deletion of information bout one entiy leads to the deletion
of information about another entity. For example in above table if an invoice for customer
Garcia is removed, information aout item number4 is also deleted.
An insertion anomaly occurs when he information about an entity cannot be inserted unless the
information about another entity is known. For example, if the company buys a new item, this
information cannot be entered unless an invoice is created for a customer with that new item.
An update anomaly can occur if the items price changes or a new price.
Normalization is a technique to reduce redundancy. It is a decomposition process to split
tables.
First Normal Form (1NF)
A table is said to be in first normal form, or can be labeled 1NF, if the following condition
exists:
1. The primary key is defined. This includes a composite key if a single column cannot be
used as primary key. In our Invoice table, invno and itemid are defined as the composite
primary key components.
2. All nonkey columns show functional dependency on the primary key components. If
you know the invoice number and the item number, you can find out remaining
columns.
3. The table contains no multivalued columns. In a single valued column, the intersection
of a row and a column returns only one value. In a normalization table, the intersection
of row and a column is a single value.
Second Normal Form (2NF)
A table is said to be in second normal form, or 2NF, if the following requirements are satisfied
1. All 1NF requirements are full filled
2. There is no partial dependency
Partial dependency exists in a table in which nonkey columns are partially dependent on part
of a composite key.
Third Normal Form (3NF)
A table is said to be in third normal form, or 3NF, if the following requirements are satisfied
1. All 2NF requirements are fulfilled
2. There is no transitive dependency
A table that has transitive dependency is not in 3NF, but it needs to be decomposed further to
achieve 3NF. However, a table in 2NF that does not contain any transitive dependency does
not need any further decomposition and is automatically in 3NF.
Other, higher normal forms are defined in some databases texts. Boyce-Codd normal form
(BCNF), fourth normal form (4NF), fifth normal form (5NF), and domain key normal form
(DKNF).
DEPENDENCY DIAGRAM
A dependency diagram is used to show total (full), partial and transitive dependies in a table:
1. The primary key components are highlighted.. They are in bold letters and in boxes
with a darker border. The primary key components are conneted to each other using a
bracket.
2. The total and functional dependencies are shown with arrows drawn above the boxes
3. The partial and transitive dependencies are shown with arrows at the bottom of the
diagram.
Denormalization
Denormalization is the process of attempting to optimize the performance of a database
by adding redundant data or by grouping data. In some cases, denormalization helps cover up
the inefficiencies inherent in relational database software. A relational normalized database
imposes a heavy access load over physical storage of data even if it is well tuned for high
performance.
A normalized design will often store different but related pieces of information in separate
logical tables (called relations). If these relations are stored physically as separate disk files,
completing a database query that draws information from several relations
(a join operation) can be slow. If many relations are joined, it may be prohibitively slow. There
are two strategies for dealing with this. The preferred method is to keep the logical design
normalized, but allow the DBMS to store additional redundant information on disk to optimize
query response. In this case it is the DBMS software's responsibility to ensure that any
redundant copies are kept consistent. This method is often implemented in SQL as indexed
views (MS SQL) or materialized views (Oracle). A view represents information in a format
convenient for querying, and the index ensures that queries against the view are optimized.
The more usual approach is to demoralize the logical data design. With care this can achieve a
similar improvement in query response, but at a cost-it is now the database designer's
responsibility to ensure that the demoralized database does not become inconsistent. This is
done by creating rules in the database called constraints, which specify how the redundant
copies of information must be kept synchronized. It is the increase in logical complexity of the
database design and the added complexity of the additional constraints that make this approach
hazardous. Sometimes a denormalized database under heavy write load may actually offer
worse performance than its functionally equivalent normalized counterpart.
A denormalized data model is not the same as a data model that has not been normalized, and
denormalization should only take place after a satisfactory level of normalization has taken
place and that any required constraints and/or rules have been created to deal with the inherent
anomalies in the design. For example, all the relations are in third normal form and any
relations with join and multi-valued dependencies are handled appropriately. Uses of
Denormalization
Databases intended for Online Transaction Processing (OLTP) are typically more normalized
than databases intended for Online Analytical Processing (OLAP). OLTP Applications are
characterized by a high volume of small transactions such as updating a sales record at a super
market checkout counter. The expectation is that each transaction will leave the database in a
consistent state. By contrast, databases intended for OLAP operations are primarily "read
mostly" databases. OLAP applications tend to extract historical data that has accumulated over
a long period of time. For such databases, redundant or "denormalized" data may facilitate
Business Intelligence applications. Specifically, dimensional tables in a star schema often
contain denormalized data.
Denormalization is also used to improve performance on smaller computers as in computerized
cash-registers and mobile devices, since these may use the data for look-up only (e.g. price
lookups).
Denormalization may also be used when no RDBMS exists for a platform (such as Palm), or
no changes are to be made to the data and a swift response is crucial.
Points to remember
➢ An entity‟s characteristics are known as columns. ➢ In a relational model, a
row is known as a tuple.
➢ The degree is the number of columns in a table, and the domain is a set of all
possible values for a column.
➢ The types of relationships are called connectivity.
➢ Attributes can be single valued or multivalued.
EXPECTED QUESTIONS
1. Define Entity,key,null,domain.
2. What are tow integrity rules of the relational model?
3. What are different types of keys?what is their use?
4. Difference between union and intersection.
5. What do we mean by union compatible?which operations require tables to be union
compatible with each other?
6. What is meant by normalization? How to normalize the table?
7. Discuss about theoretical relational languages.
8. How to log in to SQL *plus?