Sem 3 Dbms Unit 2
Sem 3 Dbms Unit 2
Unit-2
UNIT- II
Secondary key
Composite key.
Super Key:-
Super Key is defined as a set of attributes within a table that uniquely
identifies each record within a table. Super Key is a superset of
Candidate key.
Candidate Key
Candidate keys are defined as the set of fields from which primary key
can be selected. It is an attribute or set of attribute that can act as a
primary key for a table to uniquely identify each record in that table.
Primary Key
Primary key is a candidate key that is most appropriate to become
main key of the table. It is a key that uniquely identify each record in a
table.
Example: Studno StuName fee group
Stuno is primary key
Secondary or Alternative key
The candidate key which is not selected for primary key is known as
secondary keys or alternative keys
Composite Key
A Key that consists of two or more attributes that uniquely identify an entity
occurrence is called Composite key. But any attribute that makes up the
Composite key is not a simple key in its own.
Example: E_id E-Id Amount E-id is composite
Data Integrity
Data Integrity validates the data before getting stored in the columns
of the table. SQL Server supports four type of data integrity:
Entity Integrity
Entity Integrity can be enforced through indexes, UNIQUE constraints
and PRIMARY KEY constraints
Domain Integrity
Domain integrity validates data for a column of the table. It can be
enforced using:
o Foreign key constraints,
o Check constraints,
o Default definitions
o NOT NULL.
Referential Integrity
FOREIGN KEY and CHECK constraints are used to enforce Referential
Integrity.
User-Defined Integrity
It enables you to create business logic which is not possible to
develop using system constraints. You can use stored procedure,
trigger and functions to create user-defined integrity
No duplicate tuples. A relation cannot contain two or more tuples which have the
same values for all the attributes. i.e., In any relation, every row is unique.
There is only one value for each attribute of a tuple. The tuple should have only one
value.
Tuples are unordered. The order of rows in a relation is immaterial. One is free to
display a relation in any convenient way.
*****
Q) Explain Relational or Referential Integrity?
Referential integrity is a relational database concept in which multiple tables share a
relationship based on the data stored in the tables, and that relationship must remain consistent.
Relational Integrity is also called Referential Integrity. Tables are related using data values.
Usually, these are the values of numeric keys instead of text fields. Every database relation is
involving just two tables, a parent table and a child table. Relational databases implement RI
using primary keys and foreign keys. The parent table contains the primary key and the
child table contains the foreign key which references (or "points" to, if you prefer) the parent
table.
Referential Integrity rule:
The referential integrity rule says that the database must not contain any unmatched
foreign key values. This implies that:
A row may not be added to a table with a foreign key unless the referenced value exists
in the referenced table.
If the value in a table that's referenced by a foreign key is changed (or the entire row is
deleted), the rows in the table with the foreign key must not be "leave alone."
In general, there are three options available when a referenced primary key value changes
or a row is deleted. The options are:
Disallow. The change is completely disallowed.
Cascade. For updates, the change is cascaded to all dependent tables. For deletions, the
rows in all dependent tables are deleted.
Nullify. For deletions, the dependent foreign key values are set to Null.
Syntax:
CREATE TABLE child ( column definitions,
[constraint name] FOREIGN KEY (fk columns) REFERENCES parent [(pk
columns)] ON DELETE
CASCADE | SET NULL | SET DEFAULT | NO ACTION ON UPDATE
CASCADE | SET NULL | SET DEFAULT | NO ACTION
)
Examples:
1. create table branch ( bname char(15) not null, bcity char(30), assets integer, primary
key (bname) check (assets>= 0))
2. create table account ( account# char(10) not null, bname char(15), balance integer,
primary key (account#)
foreign key (bname) references branch, check (balance>= 0))
******
Relational Algebra
A query language is a language in which user requests to retrieve some information
from the database. The query languages are considered as higher level languages
than programming languages. Query languages are of two types,
Procedural Language
Non-Procedural Language
In procedural language, the user has to describe the specific procedure to retrieve the
information from the database.
Example: The Relational Algebra is a procedural language.
In non-procedural language, the user retrieves the information from the database
without describing the specific procedure to retrieve it.
Example: The Tuple Relational Calculus(TRC) and the Domain Relational
Calculus(DRC) are non-procedural languages.
Limitations Of Relational Algebra
Although relational algebra seems powerful enough for most practical purposes, there
are some simple and natural operators on relations which cannot be expressed by relational
algebra.
It consists of a set of operations that take one or two relations (tables) as input and
produce a new relation, on the request of the user to retrieve the specific information,
as the output.
The relational algebra contains the following operations,
1) Selection 2) Projection 3) Union
4) Rename 5) Set-Difference 6) Cartesian product
7) Intersection 8) Join 9) Divide
The Selection, Projection and Rename operations are called unary operations
because they operate only on one relation.
The other operations operate on pairs of relations and are therefore called binary
operations.
Selection ( ) operation:
The Selection is a relational algebra operation that uses a condition to select rows from
a relation.
A new relation (output) is created from another existing relation by selecting only rows
requested by the user that satisfy a specified condition.
The lower greek letter ‘sigma ’ is used to denote selection
operation. Syntax: Selection condition (relation_name)
For example, to list the regno> 102 from
Student_Table.
σRegno>102(Student_table)
i.e., a new relation (output) is created from another existing relation by selecting only
those columns requested by the user from projection and is denoted by letter pi (π )
For example, to get a name from Student_Table.
πName(Student_Table)
Union
It combines the similar columns from two tables into one resultant table.
All columns that are participating in the UNION operation should be Union Compatible.
This operator combines the records from both the tables into one.
If there are duplicate values as a result, then it eliminates the duplicate.
The resulting records will be from both table and distinct.
Intersection:
This operator is used to pick the records from both the tables which are common to
them.
In other words it picks only the duplicate records from the tables.
Even though it selects duplicate records from the table, each duplicate record will be
displayed only once in the result set.
It should have UNION Compatible columns to run the query with this operator.
Minus
This operator is used to display the records that are present only in the first table or
query, and doesn’t present in second table / query.
It basically subtracts the first query results from the
second. πCust-name (Borrower) - πCust-name
(Depositor)
PRODUCT
This command would show all possible pairs of rows from both tables being used.
This command can also be referred to as the Cartesian Product.
Consider the following two tables
JOIN
Join is combination of Cartesian product followed by selection process.
This operator takes two or more tables and combines them into one table.
This can be used in combination with other commands to get specific information.
There are several types of the Join command.
The Natural Join, Equi-jion, Theta Join, Left Outer Join and Right Outer Join.
DIVIDE
DIVIDE has specific requirements of the table.
One of the tables can only have one column and the other table must have two columns
only.
Q) Write all about Relational Calculus?
Relational calculus is an alternative to relational algebra.
In contrast to the algebra, which is procedural, the relational calculus is non-procedural
or declarative.
It allows user to describe the set of answers without showing procedure about how they
should be computed.
Relational calculus has a big influence on the design of commercial query languages
such as SQL and QBE (Query-by Example).
Relational calculus is of two types,
Tuple Relational Calculus (TRC)
Domain Relational Calculus (DRC)
Variables in TRC takes tuples (rows) as values and TRC had strong influence on SQL.
Variables in DRC takes fields (attributes) as values and DRC had strong influence on
QBE.
Tuple Relational Calculus (TRC)
Normalization
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using
relationships.
The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies.
Failure to eliminate anomalies leads to data redundancy and can cause data integrity
and other problems as the database grows. Normalization consists of a series of
guidelines that helps to guide you in creating a good database structure.
Data modification anomalies can be categorized into three types:
Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new
tuple into a relationship due to lack of data.
Deletion Anomaly: The delete anomaly refers to the situation where the deletion
of data results in the unintended loss of some other important data.
Updatation Anomaly: The update anomaly is when an update of a single data
value requires multiple rows of data to be updated.
Types of Normal Forms:
Following are the various types of Normal forms:
Normal Form Description
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-
valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency,
joining should be lossless.
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table above:
06389 UK Norwich
462007 MP Bhopal
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and
who will be taking that subject so we leave Lecturer and Subject as NULL. But all three
columns together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 &
P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen