[go: up one dir, main page]

0% found this document useful (0 votes)
7 views46 pages

2020 DBMS

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 46

1.

Data and Information

Usually, the terms “data” and “information” are used interchangeably. However, there is a subtle difference
between the two .In a nutshell, data can be a number, symbol, character, word, codes, graphs, etc. On the
other hand, information is data put into context. Information is utilised by humans in some significant way
(such as to make decisions, forecasts etc).

A basic example of information would be a computer. A computer uses programming scripts, formulas, or
software applications to turn data into information.

Data Information

Data is unorganised and unrefined facts Information comprises processed,


organised data presented in a meaningful
context

Data is an individual unit that contains raw materials Information is a group of data that
which do not carry any specific meaning. collectively carries a logical meaning.

Data doesn’t depend on information. Information depends on data.

Raw data alone is insufficient for decision making Information is sufficient for decision making

An example of data is a student’s test score The average score of a class is the
information derived from the given data.

2. Different types of data models used:


A Data Model in Database Management System (DBMS) is the concept of tools that are developed to
summarize the description of the database. Data Models provide us with a transparent picture of data which
helps us in creating an actual database. It shows us from the design of the data to its proper implementation
of data.
Types of Relational Models
1. Conceptual Data Model
2. Representational Data Model
3. Physical Data Model
It is basically classified into 3 types:-
1. Conceptual Data Model

The conceptual data model describes the database at a very high level and is useful to understand the needs
or requirements of the database. It is this model, that is used in the requirement-gathering process i.e. before
the Database Designers start making a particular database. One such popular model is the entity/relationship
model (ER model). The E/R model specializes in entities, relationships, and even attributes that are used by
database designers. In terms of this concept, a discussion can be made even with non-computer science(non-
technical) users and stakeholders, and their requirements can be understood.
Entity-Relationship Model( ER Model): It is a high-level data model which is used to define the data and the
relationships between them. It is basically a conceptual design of any database which is easy to design the
view of data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can be a name, place, object, class, etc. These are
represented by a rectangle in an ER Diagram.
2. Attributes: An attribute can be defined as the description of the entity. These are represented by Eclipse
in an ER Diagram. It can be Age, Roll Number, or Marks for a Student.
3. Relationship: Relationships are used to define relations among different entities. Diamonds and
Rhombus are used to show Relationships.

Characteristics of a conceptual data model

 Offers Organization-wide coverage of the business concepts.


 This type of Data Models are designed and developed for a business audience.
 The conceptual model is developed independently of hardware specifications like data storage capacity,
location or software specifications like DBMS vendor and technology. The focus is to represent data as a
user will see it in the “real world.”
Conceptual data models known as Domain models create a common vocabulary for all stakeholders by
establishing basic concepts and scope

2. Representational Data Model

This type of data model is used to represent only the logical part of the database and does not represent the
physical structure of the database. The representational data model allows us to focus primarily, on the design
part of the database. A popular representational model is a Relational model. The relational Model consists
of Relational Algebra and Relational Calculus. In the Relational Model, we basically use tables to represent
our data and the relationships between them. It is a theoretical concept whose practical implementation is
done in Physical Data Model.
The advantage of using a Representational data model is to provide a foundation to form the base for the
Physical model

3. Physical Data Model

The physical Data Model is used to practically implement Relational Data Model. Ultimately, all data in a
database is stored physically on a secondary storage device such as discs and tapes. This is stored in the
form of files, records, and certain other data structures. It has all the information on the format in which the
files are present and the structure of the databases, the presence of external data structures, and their relation
to each other. Here, we basically save tables in memory so they can be accessed efficiently. In order to come
up with a good physical model, we have to work on the relational model in a better way. Structured Query
Language (SQL) is used to practically implement Relational Algebra.
This Data Model describes HOW the system will be implemented using a specific DBMS system. This model
is typically created by DBA and developers. The purpose is actual implementation of the database.

Characteristics of a physical data model:

 The physical data model describes data need for a single project or application though it maybe
integrated with other physical data models based on project scope.
 Data Model contains relationships between tables that which addresses cardinality and nullability of the
relationships.
 Developed for a specific version of a DBMS, location, data storage or technology to be used in the
project.
 Columns should have exact datatypes, lengths assigned and default values.
 Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined
Some Other Data Models

1. Hierarchical Model

The hierarchical Model is one of the oldest models in the data model which was developed by IBM, in the
1950s. In a hierarchical model, data are viewed as a collection of tables, or we can say segments that form a
hierarchical relation. In this, the data is organized into a tree-like structure where each record consists of one
parent record and many children. Even if the segments are connected as a chain-like structure by logical
associations, then the instant structure can be a fan structure with multiple branches. We call the illogical
associations as directional associations.

2. Network Model

The Network Model was formalized by the Database Task group in the 1960s. This model is the generalization
of the hierarchical model. This model can consist of multiple parent segments and these segments are
grouped as levels but there exists a logical association between the segments belonging to any level. Mostly,
there exists a many-to-many logical association between any of the two segments.

3. Object-Oriented Data Model

In the Object-Oriented Data Model, data and their relationships are contained in a single structure which is
referred to as an object in this data model. In this, real-world problems are represented as objects with different
attributes. All objects have multiple relationships between them. Basically, it is a combination of Object
Oriented programming and a Relational Database Model.

4. Float Data Model

The float data model basically consists of a two-dimensional array of data models that do not contain any
duplicate elements in the array. This data model has one drawback it cannot store a large amount of data that
is the tables can not be of large size.

5. Context Data Model

The Context data model is simply a data model which consists of more than one data model. For example,
the Context data model consists of ER Model, Object-Oriented Data Model, etc. This model allows users to
do more than one thing which each individual data model can do.

6. Semi-Structured Data Model

Semi-Structured data models deal with the data in a flexible way. Some entities may have extra attributes and
some entities may have some missing attributes. Basically, you can represent data here in a flexible way.
Advantages of Data Models
1. Data Models help us in representing data accurately.
2. It helps us in finding the missing data and also in minimizing Data Redundancy.
3. Data Model provides data security in a better way.
4. The data model should be detailed enough to be used for building the physical database.
5. The information in the data model can be used for defining the relationship between tables, primary and
foreign keys, and stored procedures.
Disadvantages of Data Models
1. In the case of a vast database, sometimes it becomes difficult to understand the data model.
2. You must have the proper knowledge of SQL to use physical models.
3. Even smaller change made in structure require modification in the entire application.
4. There is no set data manipulation language in DBMS.
5. To develop Data model one should know physical data stored characteristics.

Conclusion

 Data modeling is the process of developing data model for the data to be stored in a Database.
 Data Models ensure consistency in naming conventions, default values, semantics, security while
ensuring quality of the data.
 Data Model structure helps to define the relational tables, primary and foreign keys and stored
procedures.
 There are three types of conceptual, logical, and physical.
 The main aim of conceptual model is to establish the entities, their attributes, and their relationships.
 Logical data model defines the structure of the data elements and set the relationships between them.
 A Physical Data Model describes the database specific implementation of the data model.
 The main goal of a designing data model is to make certain that data objects offered by the functional
team are represented accurately.
 The biggest drawback is that even smaller change made in structure require modification in the entire
application.
 Reading this Data Modeling tutorial, you will learn from the basic concepts such as What is Data Model?
Introduction to different types of Data Model, advantages, disadvantages, and data model example.

3. Differentiate DDL and DML


DDL (Data definition language) – Data definition language is used to design and modify the structure of a
database.
Common DDL commands are
a. Create – This command is used to create database
b. Alter – This command is used to modify the database.
c. Drop – This command is used to delete database tables.

DML (Data manipulation language) – Data manipulation language provides commands for manipulating data
in databases.
Common DML commands are
a. Select – This command is used to display information from the database.
b. Insert – This command is used to insert new records in the database.
c. Delete – This command is used to delete records from the database.
d. Update – This command is used to modify records in the database.

4. Difference between entity, entity set and entity type

The ER model is a very important concept in DBMS, and it is used for the modeling of the logical view of the
system from a data perspective. The entity, Entity Set, and Entity Type all these terms are very important
concepts of ER Model. In this article, we will understand the difference between them.
1. Entity : An entity is a thing in a real-world with independent existence. An entity can exist independently
and is distinguishable from other objects. It can be identified uniquely.
An entity can be of two types :
 Tangible Entity : Entities that exist in the real world physically. Example: Person, car, etc.
 Intangible Entity : Entities that exist only logically and have no physical existence. Example: Bank
Account, etc.
Example :
 A student with a particular roll number is an entity.
 A company with a particular registration number is an entity.
Note :
 An entity may be concrete like a student, a book, or abstract like a holiday or a particular concept.
 An entity is represented by a set of attributes.
 In a particular relation in RDBMS, a particular record is called an entity.
2. Entity Type : It refers to the category that a particular entity belongs to.
Example :
 A table named student in a university database.
 A table named employee in a company database.
Note :
 The category of a particular entity in the relation in RDBMS is called the entity type.
 It is represented by the name of the table and its schema.
3. Entity Set : An entity set is a collection or set of all entities of a particular entity type at any point in time.
The type of all the entities should be the same.

Example :
 The collection of all the students from the student table at a particular instant of time is an example of an
entity set.
 The collection of all the employees from the employee table at a particular instant of time is an example
of an entity set.
Note :
 Entity sets need not be disjoint. For example, the entity set of Article Writer (all content creators for
Website A) and the entity set of Article Reader (all people who read the article of Website A) may have
members in common.
 The collection of all the entities in the relation of RDBMS is called an entity set.

Relation With Table :


Consider a table student as follows :
Table Name : Student
Student_ID Student_Name Student_Age Student_Gender

1 Avi 19 M

2 Ayush 23 M

3 Nikhil 21 M

4 Riya 16 F

Entity : Each row is an entity.


Example :

1 Avi 19 M

Entity Type : Each entity belongs to the student type. Hence, the type of entity here is a student.
Entity Set : The complete data set of all entities is called entity set. For the above table, the records with
student id 1, 2, 3, 4 are the entity set.

Difference Table :
Entity Entity Type Entity Set

A thing in the real world with Set of all entities of a


A category of a particular entity
independent existence particular entity type.

Any particular row (a record) in a The name of a relation (table) in All rows of a relation (table)
relation(table) is known as an entity. RDBMS is an entity type in RDBMS is entity set

5. Why tuples in a relation are not ordered?


A relation is defined as a set of tuples. Mathematically, elements of a set have no order among them; hence,
tuples in a relation do not have any particular order. In other words, a relation is not sensitive to the ordering of
tuples. Tuple ordering is not part of a relation definition because a relation attempts to represent facts at a
logical or abstract level
For example tuples in the STUDENT relation could be logically ordered by name, roll no, address, and age or
by some other attribute.
6. Explain Except Operator

This operator is also called minus operator. The EXCEPT operation finds the difference between the two
queries and the result comprises the rows that belong only to the first query.

In SQL, EXCEPT returns those tuples that are returned by the first SELECT operation, and not returned by
the second SELECT operation.
Example:
Select all employee names in ‘sales’ department who are not involving in a project called ‘product X

7. List different types of join operations:


SQL join operation combines rows from one or more tables based on a related column between them.
There are different types of join operation in SQL

 Inner join or join

 Left join or left outer join

 Right join or right outer join

 Full outer join


Consider the following tables
INNER JOIN OR JOIN It returns the records that have matching values in both tables. That is we can
represent inner join as
8. Syntax of aggregate functions:
An SQL aggregate function calculates on a set of values and returns a single value. For example, the average
function ( AVG) takes a list of values and returns the average.

The following are the commonly used SQL aggregate functions:

 AVG() – returns the average of a set.


 COUNT() – returns the number of items in a set.
 MAX() – returns the maximum value in a set.
 MIN() – returns the minimum value in a set
 SUM() – returns the sum of all or distinct values in a set

AVG

The AVG() function returns the average values in a set. The following illustrates the syntax of
the AVG() function:

The ALL keyword instructs the AVG() function to calculate the average of all values while
the DISTINCT keyword forces the function to operate on distinct values only. By default, the ALL option is
used.

The following example shows how to use the AVG() function to calculate the average salary of each
department:

MIN

The MIN() function returns the minimum value of a set. The following illustrates the syntax of
the MIN() function:
For example, the following statement returns the minimum salary of the employees in each department:

MAX

The MAX() function returns the maximum value of a set. The MAX() function has the following syntax:
COUNT

The COUNT() function returns the number of items in a set. The following shows the syntax of
the COUNT() function:
SUM
The SUM() function returns the sum of all values. The following illustrates the syntax of the SUM() function:

 Why should we avoid null values in a relation?


Null values in a relation should be avoided as much as possible because they can cause problems with data
integrity and consistency. NULLs represent missing or unknown values, which can lead to ambiguity and
confusion when querying the database. They break two-value (familiar True or False) logic, and require a
three-value logic.

 What is clustering index?


Databases are a crucial component of modern computing, providing a structured way to store, manage, and
retrieve vast amounts of data. As the size of databases increases, it becomes increasingly important to have
an efficient indexing mechanism that can quickly search and retrieve data. Clustering indexing is one such
mechanism that has been designed to meet this need.
What is Clustering Indexing?
Clustering indexing is a database indexing technique that is used to physically arrange the data in a table
based on the values of the clustered index key. This means that the rows in the table are stored on disk in the
same order as the clustered index key. With a clustered index, the database can more efficiently retrieve data
because it doesn’t have to scan the entire table to find the data it needs. Instead, it can use the clustered
index to quickly locate the data, resulting in faster query execution times and improved overall performanc e.
Advantages
 Improved Query Performance: Clustering indexing results in faster query performance, as the data is
stored in a way that makes it easier to retrieve the desired information. This is because the index is built
based on the clustered data, reducing the number of disk I/Os required to retrieve the data.
 Reduced Disk Space Usage: Clustering indexing reduces the amount of disk space required to store the
index. This is because the index contains only the information necessary to retrieve the data, rather than
storing a copy of the data itself.
 Better Handling of Complex Queries: Clustering indexing provides better performance for complex
queries that involve multiple columns. This is because the data is stored in a way that makes it easier to
retrieve the relevant information.
 Improved Insert Performance: Clustering indexing can result in improved insert performance, as the
database does not have to update the index every time a new record is inserted.
 Improved Data Retrieval: Clustering indexing can also improve the efficiency of data retrieval operations.
In a clustered index, the data is stored in a logical order, which makes it easier to locate and retrieve the
data. This can result in faster data retrieval times, particularly for large databases.
Disadvantages
 Increased Complexity: Clustering indexing is a more complex technology compared to other indexing
mechanisms, such as B-Tree indexing.
 Reduced Update Performance: Clustering indexing can result in reduced update performance, as the
database must reorganize the data to reflect the changes.
 Limited to One Clustered Index: A table can have only one clustered index, as having multiple clustered
indexes would result in conflicting physical orderings of the data.

 What is rollback?
A rollback is the operation of restoring a database to a previous state by canceling a specific transaction or
transaction set. Rollbacks are either performed automatically by database systems or manually by users.
When a database user changes a data field but has not yet saved the change, the data is stored in a
temporary state or transaction log. Users querying the unsaved data see the unchanged values. The action of
saving the data is a commit; this allows subsequent queries for this data to show the new values.

However, a user may decide not to save the data. Under this condition, a rollback command manipulates the
data to discard any changes made by the user, and does so without communicating this to the user. Thus, a
rollback occurs when a user begins changing data, realizes the wrong record is being updated and then
cancels the operation to undo any pending changes.

Rollbacks also may be issued automatically after a server or database crash, e.g. after a sudden power loss.
When the database restarts, all logged transactions are reviewed; then all pending transactions are rolled
back, allowing users to reenter and save appropriate changes.

 Granting or Revoking a privilege:


Privileges: Privileges defines the access rights provided to a user on a database object. There are two types
of privileges. DCL commands are used to enforce database security in a multiple user database
environment. Two types of DCL commands are GRANT and REVOKE. Only Database Administrator's or
owner's of the database object can provide/remove privileges on a database object.
SQL GRANT Command
SQL GRANT is a command used to provide access or privileges on the database objects to the users.
The Syntax for the GRANT command is:

 privilege_name is the access right or privilege granted to the user. Some of the access rights are ALL,
EXECUTE, and SELECT.
 object_name is the name of an database object like TABLE, VIEW, STORED PROC and SEQUENCE.
 user_name is the name of the user to whom an access right is being granted.
 user_name is the name of the user to whom an access right is being granted.
 PUBLIC is used to grant access rights to all users.
 ROLES are a set of privileges grouped together.
 WITH GRANT OPTION - allows a user to grant access rights to other users.
For Example: GRANT SELECT ON employee TO user1; This command grants a SELECT permission on
employee table to user1.You should use the WITH GRANT option carefully because for example if you
GRANT SELECT privilege on employee table to user1 using the WITH GRANT option, then user1 can
GRANT SELECT privilege on employee table to another user, such as user2 etc. Later, if you REVOKE the
SELECT privilege on employee from user1, still user2 will have SELECT privilege on employee table.
SQL REVOKE Command:
The REVOKE command removes user access rights or privileges to the database objects.
The Syntax for the REVOKE command is:

For Example: REVOKE SELECT ON employee FROM user1;This command will REVOKE a SELECT
privilege on employee table from user1.When you REVOKE SELECT privilege on a table from a user, the
user will not be able to SELECT data from that table anymore. However, if the user has received SELECT
privileges on that table from more than one users, he/she can SELECT from that table until everyone who
granted the permission revokes it. You cannot REVOKE privileges if they were not initially granted by you.

13. Data independence:


Data Independence is defined as a property of DBMS that helps you to change the Database schema at one
level of a database system without requiring to change the schema at the next higher level. Data
independence helps you to keep data separated from all programs that make use of it.

You can use this stored data for computing and presentation. In many systems, data independence is an
essential function for components of the system.

The database has 3 levels as shown in the diagram below

1. Physical/Internal
2. Conceptual
3. External

Types of Data Independence


In DBMS there are two types of data independence

1. Physical data independence


2. Logical data independence.

Physical Data Independence


Physical data independence helps you to separate conceptual levels from the internal/physical levels. It allows
you to provide a logical description of the database without the need to specify physical structures. Compared
to Logical Independence, it is easy to achieve physical data independence.

With Physical independence, you can easily change the physical storage structures or devices with an effect
on the conceptual schema. Any change done would be absorbed by the mapping between the conceptual and
internal levels. Physical data independence is achieved by the presence of the internal level of the database
and then the transformation from the conceptual level of the database to the internal level.

Examples of changes under Physical Data Independence


Due to Physical independence, any of the below change will not affect the conceptual layer.

 Using a new storage device like Hard Drive or Magnetic Tapes


 Modifying the file organization technique in the Database
 Switching to different data structures.
 Changing the access method.
 Modifying indexes.
 Changes to compression techniques or hashing algorithms.
 Change of Location of Database from say C drive to D Drive

Logical Data Independence


Logical Data Independence is the ability to change the conceptual scheme without changing

1. External views
2. External API or programs

Any change made will be absorbed by the mapping between external and conceptual levels.

When compared to Physical Data independence, it is challenging to achieve logical data independence.

Examples of changes under Logical Data Independence


Due to Logical independence, any of the below change will not affect the external layer.

1. Add/Modify/Delete a new attribute, entity or relationship is possible without a rewrite of existing


application programs
2. Merging two records into one
3. Breaking an existing record into two or more records

Difference between Physical and Logical Data Independence


Logica Data Independence Physical Data Independence

Logical Data Independence is mainly


concerned with the structure or changing the Mainly concerned with the storage of the data.
data definition.

It is difficult as the retrieving of data is mainly


It is easy to retrieve.
dependent on the logical structure of data.

Compared to Logic Physical independence it is Compared to Logical Independence it is easy


difficult to achieve logical data independence. to achieve physical data independence.
You need to make changes in the Application
A change in the physical level usually does not
program if new fields are added or deleted from
need change at the Application program level.
the database.

Modification at the logical levels is significant Modifications made at the internal levels may
whenever the logical structures of the database or may not be needed to improve the
are changed. performance of the structure.

Concerned with conceptual schema Concerned with internal schema

Example: change in compression techniques,


Example: Add/Modify/Delete a new attribute
hashing algorithms, storage devices, etc

Importance of Data Independence

 Helps you to improve the quality of the data


 Database system maintenance becomes affordable
 Enforcement of standards and improvement in database security
 You don’t need to alter data structure in application programs
 Permit developers to focus on the general structure of the Database rather than worrying about the
internal implementation
 It allows you to improve state which is undamaged or undivided
 Database incongruity is vastly reduced.
 Easily make modifications in the physical level is needed to improve the performance of the system.

Summary

 Data Independence is the property of DBMS that helps you to change the Database schema at one
level of a database system without requiring to change the schema at the next higher level.
 Two levels of data independence are 1) Physical and 2) Logical
 Physical data independence helps you to separate conceptual levels from the internal/physical levels
 Logical Data Independence is the ability to change the conceptual scheme without changing
 When compared to Physical Data independence, it is challenging to achieve logical data independence
 Data Independence Helps you to improve the quality of the data

14. Uses of Query Evaluation Engine:

When a query is parsed then generally 2 things happen:


 All the equivalent action plans are generated
 And for each of generated equivalent action plans a query order tree is created.
And then query order tree is evaluated.

Phases of Query Evaluation:

 Parser/Translator
 Optimizer
 Execution Plan
 Query Execution Engine
1. Parser/Translator: When we submit any query to run the task of the parser is to checks the syntax and
verify the query. It checks whether query syntax is correct or not. It also checks for query semantics. Our SQL
command is a high-level language so we need to convert it to low-level language. The translator converts SQL
query to relational algebra expression i.e from high-level language to low-level language.
2. Optimizer: As we know when we submit any query then it is converted into various relational algebra
expressions. We have many relational algebra expressions for the given query. Optimizer selects the query
which is having a low cost. For this optimizer uses the statistics about the data.
3. Execution Plan: In this phase, the database decides the order of execution of given query instructions. For
example, consider the following query:

Here database will first execute From statement then Where and finally Select statement. So the order of
execution for this query is as follows:

4. Query Execution Engine: The query execution engine takes a physical query plan, that is execution plan,
executes the plan, and returns the result. Examples of query engines are Apache Spark, Apache Drill, Presto,
etc. So basically query execution engine interprets SQL command and access data from the database and
then finally returns the answer of the given SQL query. There are many functions performed by the query
execution, some of them being:
 It acts as a dispatcher for all commands in the execution plan. It iterates through all the commands in the
plan until the batch is complete.
 And it interacts with the storage engine to retrieve and update data from tables and indexes.

15. Structural constraints:

Structural Constraints are also called Structural properties of a database management system (DBMS).
Cardinality Ratios and Participation Constraints taken together are called Structural Constraints. The name
constraints refer to the fact that such limitations must be imposed on the data, for the DBMS system to be
consistent with the requirements.
The Structural constraints are represented by Min-Max notation. This is a pair of numbers(m, n) that
appear on the connecting line between the entities and their relationships. The minimum number of times an
entity can appear in a relation is represented by m whereas, the maximum time it is available is denoted by
n. If m is 0 it signifies that the entity is participating in the relation partially, whereas, if m is either greater
than or equal to 1, it denotes total participation of the entity. Note – Number of times an entity participates in
a relationship is same as the number appearance of the entity in the tuples.

16.Implicit and Explicit constraints:

Database Constraints can be categorized into 3 main categories:


1. Constraints that are applied in the data model are called Implicit Constraints. They are restrictions we
impose on the relational database.
2. Constraints that are directly applied in the schemas of the data model, by specifying them in the DDL(Data
Definition Language). These are called as Schema-Based Constraints or Explicit Constraints.
3. Constraints that cannot be directly applied in the schemas of the data model. We call these Application
based or Semantic Constraints.

17. Different forms of SELECT command

SELECT COMMAND It is used for viewing or retrieving data from the table. We can use logic operators as
well as relational operators with in the select command.
The SELECT clause is mandatory and carries out the relational project operation.
The FROM clause is also mandatory. It identifies one or more tables and/or views from which to retrieve the
column data displayed in a result table.
The WHERE clause is optional and carries out the relational select operation. It specifies which rows are to be
selected.
The GROUP BY clause is optional. It organizes data into groups by one or more column names listed in the
SELECT clause.
The optional HAVING clause sets conditions regarding which groups to include in a result table. The groups are
specified by the GROUP BY clause.
The ORDER BY clause is optional. It sorts query results by one or more columns in ascending or descending
order.

18. Purpose of Order By Clause with example:


The ORDER BY clause in SQL is used to sort the result set in ascending or descending order.

Here, the SQL command selects all rows from the Customers table and then sorts them in ascending order
by country.

SQL ORDER BY Syntax

The syntax of the SQL ORDER BY statement is:

SELECT column1, column2, ...


FROM table
ORDER BY columnA, columnB, ...;

Here,

 column1, column2, ... are the columns to be included in the result set

 table is the name of the table from where the rows are selected

 columnA, columnB, ... are the column(s) based on which the rows will be ordered

For example,
ORDER BY in SQL

ORDER BY ASC (Ascending Order)

We can use the ASC keyword to explicitly sort selected records in ascending order. For example,

Here, the SQL command selects all the rows from Customers table and then sorts them in ascending order

by age.
Note: The ORDER BY clause sorts result set in ascending by default, even without the ASC clause.

ORDER BY DESC (Descending Order)

We use the DESC keyword to sort the selected records in descending order. For example,
Here, the SQL command selects all the customers and then sorts them in descending order by age.
ORDER BY With Multiple Columns

We can also use ORDER BY with multiple columns. For example,

Here, the SQL command selects all the records and then sorts them by first_name. If the first_name repeats
more than once, it sorts those records by age.
ORDER BY With WHERE

We can also use ORDER BY with the SELECT WHERE clause. For example,

Here,

 The SQL command first selects the last_name and age fields from the Customers table if their country is

not UK.
 Then, the selected records are sorted in descending order by their last_name.
19. General anomalies and functional dependency in a relation

A functional dependency (FD) is a relationship between two attributes, typically between the PK and other
non-key attributes within a table. For any relation R, attribute Y is functionally dependent on attribute X (usually
the PK), if for every valid instance of X, that value of X uniquely determines the value of Y. This relationship is
indicated by the representation below :

X ———–> Y

The left side of the above FD diagram is called the determinant, and the right side is the dependent. Here are
a few examples.

In the first example, below, SIN determines Name, Address and Birthdate. Given SIN, we can determine any
of the other attributes within the table.

SIN ———-> Name, Address, Birthdate

For the second example, SIN and Course determine the date completed (DateCompleted). This must also
work for a composite PK.
SIN, Course ———> DateCompleted

The third example indicates that ISBN determines Title.


ISBN ———–> Title
Anomalies in the relational model refer to inconsistencies or errors that can arise when working with relational
databases, specifically in the context of data insertion, deletion, and modification. There are different types of
anomalies that can occur in referencing and referenced relations which can be discussed as:
These anomalies can be categorized into three types:
1. Insertion Anomalies
2. Deletion Anomalies
3. Update Anomalies.
How Are Anomalies Caused in DBMS?
Database anomalies are the faults in the database caused due to poor management of storing everything in
the flat database. It can be removed with the process of Normalization, which generally splits the database
which results in reducing the anomalies in the database.

STUDENT Table
STUD_NO STUD_NAME STUD_PHONE STUD_STATE STUD-COUNTRY STUD_AGE

1 RAM 9716271721 Haryana India 20

2 RAM 9898291281 Punjab India 19

3 SUJIT 7898291981 Rajasthan India 18

4 SURESH Punjab India 21

Table 1

STUDENT_COURSE
STUD_NO COURSE_NO COURSE_NAME

1 C1 DBMS

2 C2 Computer Networks

1 C2 Computer Networks

Table 2
Insertion anomaly: If a tuple is inserted in referencing relation and referencing attribute value is not present
in referenced attribute, it will not allow insertion in referencing relation.
Example: If we try to insert a record in STUDENT_COURSE with STUD_NO =7, it will not allow it.
Deletion and Updation anomaly: If a tuple is deleted or updated from referenced relation and the referenced
attribute value is used by referencing attribute in referencing relation, it will not allow deleting the tuple from
referenced relation.
Example: If we want to update a record from STUDENT_COURSE with STUD_NO =1, We have to update it
in both rows of the table. If we try to delete a record from STUDENT with STUD_NO =1, it will not allow it.
To avoid this, the following can be used in query:
 ON DELETE/UPDATE SET NULL: If a tuple is deleted or updated from referenced relation and the
referenced attribute value is used by referencing attribute in referencing relation, it will delete/update the
tuple from referenced relation and set the value of referencing attribute to NULL.
 ON DELETE/UPDATE CASCADE: If a tuple is deleted or updated from referenced relation and the
referenced attribute value is used by referencing attribute in referencing relation, it will delete/update the
tuple from referenced relation and referencing relation as well.

How these Anomalies Occur

 Insertion Anomalies: These anomalies occur when it is not possible to insert data into a database
because the required fields are missing or because the data is incomplete. For example, if a database
requires that every record has a primary key, but no value is provided for a particular record, it cannot be
inserted into the database.
 Deletion anomalies: These anomalies occur when deleting a record from a database and can result in
the unintentional loss of data. For example, if a database contains information about customers and orders,
deleting a customer record may also delete all the orders associated with that customer.
 Update anomalies: These anomalies occur when modifying data in a database and can result in
inconsistencies or errors. For example, if a database contains information about employees and their
salaries, updating an employee’s salary in one record but not in all related records could lead to incorrect
calculations and reporting.
Removal of Anomalies
These anomalies can be avoided or minimized by designing databases that adhere to the principles of
normalization. Normalization involves organizing data into tables and applying rules to ensure data is stored
in a consistent and efficient manner. By reducing data redundancy and ensuring data integrity, normalization
helps to eliminate anomalies and improve the overall quality of the database
According to E.F.Codd, who is the inventor of the Relational Database, the goals of Normalization include:
 it helps in removing all the repeated data from the database.
 it helps in removing undesirable deletion, insertion, and update anomalies.
 it helps in making a proper and useful relationship between tables.

FAQs
1. What is Normalization?

Normalization is the process of split Advantages:

Data Integrity: Relational databases enforce data integrity through various constraints such as primary
keys, foreign keys, and referential integrity rules, ensuring that the data is accurate and consistent.
Scalability: Relational databases are highly scalable and can handle large amounts of data without
sacrificing performance.
Flexibility: The relational model allows for flexible querying of data, making it easier to retrieve specific
information and generate reports.
Security: Relational databases provide robust security features to protect data from unauthorized access.

Disadvantages:

Redundancy: The relational model can result in data redundancy, where the same data is stored in multiple
places, leading to inefficiency and potential data inconsistencies.
Complexity: Creating and maintaining a relational database can be complex and time-consuming, requiring
specialized knowledge and skills.
Performance: As the size of the database grows, performance can suffer due to the need to join multiple
tables to retrieve information.
Inability to handle unstructured data: The relational model is not well-suited for handling unstructured or
semi-structured data, such as images, videos, and text documents.
ting the tables into smaller ones so as to remove anomalies in the database. It helps in
reducing redundancy in the database.
2. What are Anomalies in the Relational Model?
An anomaly is a fault that is present in the database which occurs because of the poor maintenance and poor
storing of the data in the flat database. Normalization is the process of removing anomalies from the database.
3. How Anomalies can be removed?
Anomalies can be removed with the process of Normalization. Normalization involves organizing data into
tables and applying rules to ensure data is stored in a consistent and efficient manner.

20. Explain 2NF with examples


First Normal Form (1NF) does not eliminate redundancy, but rather, it’s that it eliminates repeating groups.
Instead of having multiple columns of the same kind of data in a record, (0NF or Unnormalized form) you
remove the repeated information into a separate relation and represent them as rows. This is what
constitutes 1NF.
Second Normal Form (2NF): Second Normal Form (2NF) is based on the concept of full functional
dependency. Second Normal Form applies to relations with composite keys, that is, relations with a primary
key composed of two or more attributes. A relation with a single-attribute primary key is automatically in at
least 2NF. A relation that is not in 2NF may suffer from the update anomalies. To be in second normal form,
a relation must be in first normal form and relation must not contain any partial dependency. A relation is in
2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any
candidate key) is dependent on any proper subset of any candidate key of the table. In other words,
A relation that is in First Normal Form and every non-primary-key attribute is fully functionally dependent on
the primary key, then the relation is in Second Normal Form (2NF).
In the 2NF, relation must be in 1NF. In the second normal form, all the non-key attributes are fully functional
dependent on the primary key. Or a relation is in 2NF, when it is in 1NF and there is no partial dependency.
21. Control Measures to provide security of data in a database
Database security assures the security of databases against threats. It is concerned with information security
control that involves data protection, database applications or stored functions protection, database systems
protection, database servers, and the associated network link protection. Recently, security threats in mobile
databases have become popular and we need to develop a solution to avoid those threats. A mobile database
is a specialized class of distributed systems. Due to hardware constraints and its distributed nature, security
challenges in mobile development have been raised. Security should be assured in its operating system,
database and network. Traditional database security cannot deal with malicious attacks by persons with legal
entities and is not cost-effective for users having different security requirements. The multilayer security model
with user, OS, DBMS and transaction level intrusion tolerance integrates redundancy and various technology
by adopting integral security strategy and service-oriented intrusion tolerance technology. Several techniques,
such as encryption and electronic signatures, protect data transmissions across websites. For data protection
enforcement of access control policies based on data contents, subject qualifications and characteristics, and
other relevant contextual information, such as time mechanisms, are used. The semantics of data is
considered to specify active access control policies.
The network database system provides an open environment for information storage and management with
massive data. A significant loss will occur once data loss, illegal tampering or code loss also happens within
the network database. The central database security risks are unauthorized or unintended activity or misuse
by authorized database users, database administrators, or network or system managers, or by unauthorized
users or hackers, inappropriate access to sensitive data, metadata or functions within databases, or
inappropriate changes to the database programs, structures or security configurations. Also, malware
infections causing incidents such as unauthorized access, leakage or disclosure of personal or proprietary
data, deletion of or damage to the data or programs, interruption or denial of authorized access to the
database, attacks on other systems and the unanticipated failure of database services may occur in the
database. The performance overload constraints and capacity issues result in the inability of authorized users
to use databases as intended. The physical damage to database servers caused by computer room fires or
floods, overheating, lightning, accidental liquid spills, static discharge, electronic breakdowns or equipment
failures and obsolescence also contribute as risks. The design flaws and programming bugs in databases and
the associated programs and systems create various security vulnerabilities for unauthorized privilege
escalation, data loss or corruption, and performance degradation that may occur in databases. Also, data
corruption and loss caused by the entry of invalid data or commands, mistakes in database or system
administration processes, and criminal damage pose security problems in databases.

Database Security Issues- case studies.

Issues depend on Security types and database threats.


Security Types include:
1. Legal and ethical issues regarding the right to access certain information. Some information may be private
and cannot be accessed by unauthorized persons.
2. Policy issues at the governmental, institutional, or corporate level as to what kinds of information should not
be made publicly available, for example, credit ratings.
3. System-related issues include system levels at which various security functions should be enforced, for
example, whether a security function should be handled at the physical hardware, operating system, and
DBMS levels.
4. The need in some organizations to identify multiple security levels and to categorize the data and users
based on these classifications, for example, top secret, secret, confidential, and unclassified. The
organisation’s security policy concerning permitting access to various data types must be enforced.

Database Threats include:


1. Loss of integrity – Database integrity refers to the requirement that information is protected from improper
modification, including creation, insertion, modification, changing data status, and deletion. Innocence is lost if
authorized changes are made to the data by either intentional or accidental acts. If the loss of system or data
integrity is not corrected, continued use of the contaminated system or corrupted data will result in inaccuracy,
fraud or erroneous decisions.
2. Loss of availability – Database availability refers to making objects available to a human user or a program
to which they have a legitimate right.
3. Loss of confidentiality – Data confidentiality refers to data protection from unauthorized disclosure. The
impact is of confidential information can range from violation of the data privacy act. Unauthorized,
unanticipated or unintentional disclosure could result in a loss of public confidence or legal action against the
organization.

Database Security Control Measures


Four main control measures are used to provide data security in databases. They are :
1. Access control – The security mechanism of a DBMS must include provisions for restricting access to the
database as a whole. This function is called access control and is handled by creating user accounts and
passwords to control the DBMS’s log-in process.
2. Inference control – Statistical databases provide statistical information or summaries of values based on
various criteria. Security for statistical databases must ensure that information about individuals cannot be
accessed. It is possible to deduce or infer specific facts concerning individuals from queries that involve only
summary statistics on groups. Consequently, this must not be permitted either. This problem is called
statistical database security, and corresponding control measures are called
inference control measures.
3. Flow control – It prevents the information from flowing so that it reaches unauthorized users. C covert
channels allow information to flow implicitly in ways that violate an organisation’s security policy.
4. Data encryption protects sensitive data transmitted via some communication network. Encryption can
provide additional protection for sensitive portions of a database. The data is encoded using some coding
algorithm. An unauthorized user who accesses encoded data will have difficulty deciphering it, but authorized
users are given decoding or decryption algorithms to interpret data. Encrypting techniques are complicated to
decode without a key and have been developed for military applications.

Here are a few additional ideas for database security control measures.

 Implementing multi-factor authentication for database access


 Regularly updating and patching the database software
 Conducting regular security audits and vulnerability assessments
 Limiting user access to the database to only those who require it
 Encrypting sensitive data within the database
 Implementing role-based access control (RBAC) to ensure users only have access to necessary data
 Monitoring database activity for suspicious behaviour
 Protecting the database from SQL injection attacks
 Implementing database backup and recovery procedures
 Creating and enforcing strong password policies for database users

Challenges of Database Security


Due to the vast growth in the speed of threats to databases, research efforts need to be devoted to issues like
:

1. Data quality
The database community needs techniques and organizational solutions to assess and attest to data quality.
These techniques include simple mechanisms such as quality stamps posted on websites. It also requires
techniques to provide more effective integrity semantics verification and tools for assessing data quality based
on techniques such as record linkage. Application-level recovery techniques are also needed for automatically
repairing incorrect data. The Extract Transform Load tools widely used to load data in data warehouses are
grappling with these issues.

2. Intellectual property rights


With the widespread use of the Internet and Intranets, data’s legal and informational aspects are becoming
organisations’ primary concerns. Watermarking techniques for relational data have recently been proposed to
address these concerns. The primary purpose of digital watermarking is to protect content from unauthorized
duplication and
distribution by enabling provable ownership of the content. It has traditionally relied upon the availability of a
large noise domain within which the object can be altered while retaining its essential properties. However,
research is needed to assess the robustness of such techniques and investigate different approaches to
preventing intellectual property rights violations.

3. Database Survivability
Database systems must operate and continue their functions despite disruptive events such as information
warfare attacks, even with reduced capabilities. A DBMS makes every effort to prevent an attack and detecting
one in the event of occurrence should be able to do the following:
a. Confinement – Take immediate action to eliminate the attackers’ access to the system and to isolate or
contain the problem to prevent further spread.
b. Damage assessment – Determine the extent of the problem, including failed functions and corrupted data.
c. Reconfiguration – Reconfigure to allow the operation to continue in a degraded mode while recovery
proceeds.
d. Repair – Repair corrupted or lost data and repair or reinstall failed system functions to reestablish an
average level of operation.
e. Fault treatment – To the extent possible, identify the weakness exploited in the attack and take steps to
prevent a recurrence.

The goal of the information warfare attacker is to damage the organization’s operation and fulfil its mission
through the disruption of its information systems. The target of an attack may be the system itself or its data.
While the attacks that bring the system down outright are severe and dramatic, they must also be well timed to
achieve the attackers’ goal since attacks will receive immediate and concentrated attention to get the system
back to operational condition, diagnose how the attack took place
and install preventive measures.

Conclusion
Factors such as security concerns evolution, the disintermediation of data access, and new computing
paradigms and applications, such as grid-based computing and on-demand business, introduced new security
requirements and new contexts to apply and possibly extend current approaches to achieve data security.
These information security measures include access control, auditing, authentication, encryption, integrity
controls, backups and application security. The security designs for specific database systems specify security
administration and management functions such as administration and reporting of user access rights, log
management and analysis, database replication or synchronization and backups, along with various business-
driven information security controls within the database programs and functions, for example, data entry
validation and audit trails. Also, various security-related activities, such as manual controls, are generally
incorporated into the procedures and guidelines relating to database design, development, configuration, use,
management and maintenance.
22. Database users and how they interact with the database:
Database users are categorized based up on their interaction with the database. These are seven types of
database users in DBMS.
1. Database Administrator (DBA) : Database Administrator (DBA) is a person/team who defines the
schema and also controls the 3 levels of database. The DBA will then create a new account id and
password for the user if he/she need to access the database. DBA is also responsible for providing
security to the database and he allows only the authorized users to access/modify the data base. DBA is
responsible for the problems such as security breaches and poor system response time.
 DBA also monitors the recovery and backup and provide technical support.
 The DBA has a DBA account in the DBMS which called a system or superuser account.
 DBA repairs damage caused due to hardware and/or software failures.
 DBA is the one having privileges to perform DCL (Data Control Language) operations such as
GRANT and REVOKE, to allow/restrict a particular user from accessing the database.
2. Naive / Parametric End Users : Parametric End Users are the unsophisticated who don’t have any
DBMS knowledge but they frequently use the database applications in their daily life to get the desired
results. For examples, Railway’s ticket booking users are naive users. Clerks in any bank is a naive user
because they don’t have any DBMS knowledge but they still use the database and perform their given
task.
3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They check whether
all the requirements of end users are satisfied.
4. Sophisticated Users : Sophisticated users can be engineers, scientists, business analyst, who are
familiar with the database. They can develop their own database applications according to their
requirement. They don’t write the program code but they interact the database by writing SQL queries
directly through the query processor.
5. Database Designers : Data Base Designers are the users who design the structure of database which
includes tables, indexes, views, triggers, stored procedures and constraints which are usually enforced
before the database is created or populated with data. He/she controls what data must be stored and
how the data items to be related. It is responsibility of Database Designers to understand the
requirements of different user groups and then create a design which satisfies the need of all the user
groups.
6. Application Programmers : Application Programmers also referred as System Analysts or simply
Software Engineers, are the back-end programmers who writes the code for the application programs.
They are the computer professionals. These programs could be written in Programming languages such
as Visual Basic, Developer, C, FORTRAN, COBOL etc. Application programmers design, debug, test,
and maintain set of programs called “canned transactions” for the Naive (parametric) users in order to
interact with database.
7. Casual Users / Temporary Users : Casual Users are the users who occasionally use/access the
database but each time when they access the database they require the new information, for example,
Middle or higher level manager.
8. Specialized users : Specialized users are sophisticated users who write
specialized database application that does not fit into the traditional data-
processing framework. Among these applications are computer aided-design
systems, knowledge-base and expert systems etc.

23. Naming and displaying conventions used for ER programs:


Document the Naming Convention in Your ERD

The first best practice for naming conventions in data modeling is to write down all the criteria defining the
adopted naming convention. So that it is always visible and at hand, it should be included as a text annotation
together with the entity-relationship diagram (ERD). If you use a database design tool like VERTABELO,
your NAMING CONVENTIONS IN DATABASE MODELING can be documented in virtual sticky notes
that will always be attached to your ERDs. This and other DATABASE MODELING TIPS can save you time
and effort when doing data modeling work.

Use Meaningful Names


So, the first best naming convention practice is to use meaningful names. Avoid abstract, cryptic, or coded
combinations of words.
Specify the Language

In software systems that are used and developed in a single country, there’s probably no need to specify the
language. But if a system involves designers or developers of different nationalities (which is becoming
increasingly common), language choice is not trivial. In such situations, the language used to name schema
objects must be clearly specified in the naming convention and must be respected. This way, there will be no
chances of finding names in different languages.

Best Practices for Naming Tables and Columns

Database tables represent real-world entities, so it is appropriate to use nouns when choosing their names.
When considering database table naming conventions, you must make a decision that seems trivial but is
actually crucial: use plural or singular nouns for the names. (In the case of database column naming
conventions, this problem does not arise, as column names are always singular.)

Plural or Singular Table Names

Some say that the singular should be used because tables represent a single entity, not a collection of things.
For example: Client , Item , Order , Invoice , etc. Others prefer to use the plural, considering that the table is
a container for a collection of things. So, with the same criteria as they would label a box for storing toys, they
give the tables plural names: Customers, Items, Orders, Invoices.

Alphabet and Character Sets

While it is possible to use spaces and any printable character to name tables and other database objects, this
practice is strongly discouraged. The use of spaces and special characters requires that object names be
enclosed in delimiters so that they do not invalidate SQL statements.
For example, if you have a table in your schema called Used Cars (with a space between Used and Cars ),
you’d have to write the table name between delimiters in any query that uses it. For example:

Case Sensitivity

SQL is case-insensitive, so the case-sensitivity of object names is irrelevant when writing queries or issuing
SQL commands to your database. However, a good practice for database schema naming is to clearly define
a case-sensitive criteria for object names. The criteria adopted will affect the readability of the schema, its
neatness, and the interpretation of its elements.

For example, you can use names with all capital letters for tables and upper/lower case for columns. This will
make it easier to identify objects in database diagrams as well as in SQL statements and other database tools.

Compound Names

The ideal name for any database object should strike the optimal balance between synthesis and self-
explanation. Ideally, each name should contain an explanation of what it represents in the real world and also
be able to be synthesized in one word. This is easy to achieve in some cases, particularly when you create a
conceptual schema with tables that contain information on tangible
elements: Users , Employees , Roles , Payrolls , etc.

But as you get a bit more detailed in your diagrams, you will come across elements that cannot be self-
explanatory with a single word. You will then have to define names for objects that represent, for example,
roles per user, payroll items, ID numbers, joining dates, and many others.

Another good schema naming practice is to adopt clear criteria for the use of compound names. Otherwise,
each designer or programmer will use their own criteria – and your schema will quickly become full of random
names.

There are two popular naming options for using compound names. You can either use camel case
(e.g. PayrollItems or DateOfBirth ) or you can separate the words with an underscore,
(e.g. PAYROLL_ITEMS or DATE_OF_BIRTH ).

Using underscore as a word separator in compound names is the way to go for capitalized names; it ensures
the names can be easily read at a glance.
Abbreviations

Using abbreviations for object names is inadvisable, but so is using names that are too long. Ideally, a middle
ground should be found. For example, I recommend only abbreviating object names if the name exceeds 20
characters.

If you make heavy use of abbreviations because many objects in your schema have long names, the list of
abbreviations to be used should be explicit for all users of your schema. In addition, this list should be part of
the naming convention of your schema. You can add a sticky note to your diagrams where you explicitly detail
all abbreviations used along with their extended meaning.

Abbreviations and prefixes should be avoided; they only cause confusion to everyone that needs to work with
the database.

Prefixes and Suffixes

Some people use prefixes or suffixes to denote an element’s type so that it can be easily identified without
referencing the schema description or the database metadata. For example, they add the prefix T_ to all
tables and V_ to all views. Or they add a suffix that denotes the data type of each column.

Using suffixes or prefixes may result in two objects of different types with similar names. For example, you
could easily have a table named T_CUSTOMERS and a view named V_CUSTOMERS . Whoever has to
write a query may not know which of the two should be used and what the difference is between them.

Remember that a view name should indicate its purpose. It would be more helpful if the view name were, for
example, NEW_CUSTOMERS , indicating that it is a subset of the CUSTOMERS table.

Using a suffix indicating the data type of each column does not add useful information. The exception is when
you need to use a counter-intuitive data type for a column. For example, if you need a column to store a date
in integer format, then you could use the int suffix and name the column something like Date_int .

Another common (but discouraged!) practice is to prefix each column name with an abbreviation of the table
name. This adds unnecessary redundancy and makes queries difficult to read and write.
Prefixes for Naming Dependent Objects

A prefix denoting the type of object is considered good practice when naming table- or column-dependent
objects (e.g. indexes, triggers, or constraints). Such objects are not usually represented in database diagrams,
instead being mentioned in schema metadata queries, in logs or execution plans, or in error messages thrown
by the database engine. Some commonly used prefixes are:

 PK for primary key constraints.


 FK for foreign key constraints.
 UK for unique key constraints.
 IX for indexes.
 TG for triggers.

The suggested way to use these prefixes is to concatenate them with the table name and an additional
element denoting the function the constraint performs. In a foreign key constraint, we might indicate the table
at the other end of the constraint; in an index, we might indicate the column names that compose this index. A
foreign key constraint between the Customers table and the Orders table could be
named FK_Customers_Orders .

This way of naming dependent objects makes it easier to relate the object to the table(s) on which it depends.
This is important when it is mentioned in an execution log or error message.

Since you don’t usually have to write the names of dependent objects (like foreign keys or indexes) in SQL
statements, it’s not really important if they are long or do not meet the same naming criteria as objects like
tables, fields, or views.

Prefixes for Grouping Objects by Functional Areas

Another commonly accepted use for prefixes is to quickly distinguish sets of objects that belong to a functional
or logical area of the schema. For example, in a data warehouse schema prefixes let us distinguish dimension
tables from fact tables. They can also distinguish tables with “cold data” from tables with “hot data”, if this kind
of distinction is a top priority in your schema.

In a schema that is used by different applications, prefixes can help to easily identify the tables that are used
by each application. For example, you can establish that tables starting with INV belong to an invoicing app
and those starting with PAY belong to a payroll app.

As I recommended above for abbreviations, these prefixes need to be made explicit in the database diagram,
either through sticky notes or some other form of documentation. This form of grouping by prefix will make it
easier to manage object permissions according to the application that uses them.

Naming Conventions for Views

It is quite common to create views in a database schema to facilitate the writing of queries by solving common
filtering criteria. In a schema that stores membership information, for example, you could create a view of
approved memberships. This would save database developers the task of finding out what conditions a
membership must meet to be approved.
Naming Views

For the above reason, it is common for view names to consist of the table name plus a qualifier designating
the purpose of that view. Since views named in this way often have compound names, you should use the
criteria you’ve adopted for compound names. In the example above, the view might be
called ApprovedMemberships or APPROVED_MEMBERSHIPS , depending on the criteria chosen for
compound names. In turn, you could create a view of memberships pending approval
called PendingMemberships or PENDING_MEMBERSHIPS .

Since views are used as if they were tables, it is good practice that their names follow the same naming
convention as table names – e.g. if you use all uppercase for table names, you should also use all uppercase
for view names.

It is a good practice to name views after their “mother” table (when there is one), adding a qualifier that
designates the purpose of that view.

Making Views Visible

It’s important to make views known. Anyone who uses the database for querying or design work should know
that there are views that can simplify their work.

One way to force users to use views is to restrict access to tables. This ensures that users use the views and
not the tables and that there is no confusion about how to filter the data to get subsets of the tables.

In the case of the membership schema mentioned above, you can restrict access to the Memberships table
and only provide access to the ApprovedMemberships and PendingMemberships views. This ensures that
no one has to define what criteria to use to determine whether a membership is approved or pending.

It is also good practice to include the views in the database diagram and explain their usefulness with sticky
notes. Any user looking at the diagram will also be aware of the existence of the views.
Compliance and Practicality for Database Naming Conventions

Naming convention criteria cannot be enforced by the database engine. This means that compliance must be
overseen by a designer who controls the work of anyone who has permissions to create or modify the
structure of a database. If no one is charged with overseeing naming convention adherence, it is of no use.
While intelligent database design tools such as Vertabelo help ensure that certain naming criteria are met, full
monitoring of the criteria requires a trained human eye.

On the other hand, the best way to enforce the criteria of a naming convention is for those criteria to be useful
and practical. If they are not, users will comply with them reluctantly and drop them as soon as they can. If you
have been given the task of defining a database schema naming convention, it is important that you create it
with the purpose of benefiting the users. And make sure all users are clear about those benefits so they’ll
comply with the convention without protest.

24 DDL COMMANDS used in sql:

DDL Commands in SQL

DDL is an abbreviation of Data Definition Language.

The DDL Commands in Structured Query Language are used to create and modify the schema of the database
and its objects. The syntax of DDL commands is predefined for describing the data. The commands of Data
Definition Language deal with how the data should exist in the database.

Following are the five DDL commands in SQL:

1. CREATE Command
2. DROP Command
3. ALTER Command
4. TRUNCATE Command
5. RENAME Command

CREATE Command

CREATE is a DDL command used to create databases, tables, triggers and other database objects.

Examples of CREATE Command in SQL

Example 1: This example describes how to create a new database using the CREATE DDL command.

Syntax to Create a Database:

CREATE Database Database_Name;

Suppose, you want to create a Books database in the SQL database. To do this, you have to write the following
DDL Command:

Create Database Books;

Example 2: This example describes how to create a new table using the CREATE DDL command.
Syntax to create a new table:

Suppose, you want to create a Student table with five columns in the SQL database. To do this, you have to
write the following DDL command:

Example 3: This example describes how to create a new index using the CREATE DDL command.

Syntax to Create a new index:


DROP Command

DROP is a DDL command used to delete/remove the database objects from the SQL database. We can easily
remove the entire table, view, or index from the database using this DDL command.

Examples of DROP Command in SQL

Example 1: This example describes how to remove a database from the SQL database.

Syntax to remove a database:

Suppose, you want to delete the Books database from the SQL database. To do this, you have to write the
following DDL command:

ALTER Command

ALTER is a DDL command which changes or modifies the existing structure of the database, and it also changes
the schema of database objects.
We can also add and drop constraints of the table using the ALTER command.

Examples of ALTER Command in SQL

Example 1: This example shows how to add a new field to the existing table.

RENAME Command

RENAME is a DDL command which is used to change the name of the database table.

Syntax of RENAME command


b. SQL statement which creates a STUDENT table consisting of name and mark

CREATE TABLE Students(


id int,
name varchar(50),
Mark varchar(50)
);

25. Summarize normal forms based on primary keys and corresponding normalization process:

In database management systems (DBMS), normal forms are a series of guidelines that help to ensure that
the design of a database is efficient, organized, and free from data anomalies. There are several levels of
normalization, each with its own set of guidelines, known as normal forms.

Here are the important points regarding normal forms in DBMS:

1. First Normal Form (1NF): This is the most basic level of normalization. In 1NF, each table cell should
contain only a single value, and each column should have a unique name. The first normal form helps to
eliminate duplicate data and simplify queries.
2. Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each non-key attribute be
dependent on the primary key. This means that each column should be directly related to the primary
key, and not to other columns.
3. Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key attributes are independent of
each other. This means that each column should be directly related to the primary key, and not to any
other columns in the same table.
4. Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that each determinant in
a table is a candidate key. In other words, BCNF ensures that each non-key attribute is dependent only
on the candidate key.
5. Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that a table does not
contain any multi-valued dependencies.
6. Fifth Normal Form (5NF): 5NF is the highest level of normalization and involves decomposing a table
into smaller tables to remove data redundancy and improve data integrity.
Normal forms help to reduce data redundancy, increase data consistency, and improve database
performance. However, higher levels of normalization can lead to more complex database designs and
queries. It is important to strike a balance between normalization and practicality when designing a database

The advantages of using normal forms in DBMS include:

 Reduced data redundancy: Normalization helps to eliminate duplicate data in tables, reducing the
amount of storage space needed and improving database efficiency.
 Improved data consistency: Normalization ensures that data is stored in a consistent and organized
manner, reducing the risk of data inconsistencies and errors.
 Simplified database design: Normalization provides guidelines for organizing tables and data
relationships, making it easier to design and maintain a database.
 Improved query performance: Normalized tables are typically easier to search and retrieve data from,
resulting in faster query performance.
 Easier database maintenance: Normalization reduces the complexity of a database by breaking it down
into smaller, more manageable tables, making it easier to add, modify, and delete data.
Overall, using normal forms in DBMS helps to improve data quality, increase database efficiency, and
simplify database design and maintenance.

1. First Normal Form –

If a relation contain composite or multi-valued attribute, it violates first normal form or a relation is in first
normal form if it does not contain any composite or multi-valued attribute. A relation is in first normal form if
every attribute in that relation is singled valued attribute.
 Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute
STUD_PHONE. Its decomposition into 1NF has been shown in table

2.
 Example 2 –

ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
 In the above table Course is a multi-valued attribute so it is not in 1NF. Below Table is in 1NF as there is
no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3

2. Second Normal Form –

To be in second normal form, a relation must be in first normal form and relation must not contain any partial
dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes
which are not part of any candidate key) is dependent on any proper subset of any candidate key of the
table. Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it is
called partial dependency.
 Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
 {Note that, there are many courses having the same course fee. } Here, COURSE_FEE cannot alone
decide the value of COURSE_NO or STUD_NO; COURSE_FEE together with STUD_NO cannot decide
the value of COURSE_NO; COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO; Hence, COURSE_FEE would be a non-prime attribute, as it does not belong to the one only
candidate key {STUD_NO, COURSE_NO} ; But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is
dependent on COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute
COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial dependency and
so this relation is not in 2NF. To convert the above relation to 2NF, we need to split the table into two
tables such as : Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
 2 C5 NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are
100 students taking C1 course, we don’t need to store its Fee as 1000 for all the 100 records, instead,
once we can store it in the second table as the course fee for C1 is 1000.
 Example 2 – Consider following functional dependencies in relation R (A, B , C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
 In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper
subset of AB doesn’t determine any non-prime attribute.
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).
 Example 1 – In relation STUDENT given in Table 4, FD set: {STUD_NO -> STUD_NAME, STUD_NO ->
STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE} Candidate Key:
{STUD_NO} For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the
third normal form. To convert it in third normal form, we will decompose the relation STUDENT
(STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE) STATE_COUNTRY
(STATE, COUNTRY)
 Example 2 – Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible candidate
keys in above relation are {A, E, CD, BC} All attributes are on right sides of all functional dependencies
are prime.
 Example 1 – Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D,
AC->BE, B->E} Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can
determine all attribute of relation, So AC will be candidate key. A or C can’t be derived from
any other attribute of the relation, so there will be only 1 candidate key {AC}. Step 2. Prime
attributes are those attributes that are part of candidate key {A, C} in this example and others
will be non-prime {B, D, E} in this example. Step 3. The relation R is in 1st normal form as a
relational DBMS does not allow multi-valued or composite attribute. The relation is in 2nd
normal form because BC->D is in 2nd normal form (BC is not a proper subset of candidate
key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal
form (B is not a proper subset of candidate key AC). The relation is not in 3rd normal form
because in BC->D (neither BC is a super key nor D is a prime attribute) and in B->E (neither
B is a super key nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD
should be super key or RHS should be prime attribute. So the highest normal form of relation
will be 2nd Normal form.
 Example 2 –For example consider relation R(A, B, C) A -> BC, B -> A and B both are super
keys so above relation is in BCNF.

Applications of normal forms in DBMS:

Data consistency: Normal forms ensure that data is consistent and does not contain any redundant
information. This helps to prevent inconsistencies and errors in the database.
Data redundancy: Normal forms minimize data redundancy by organizing data into tables that contain only
unique data. This reduces the amount of storage space required for the database and makes it easier to
manage.
Query performance: Normal forms can improve query performance by reducing the number of joins
required to retrieve data. This helps to speed up query processing and improve overall system performance.
Database maintenance: Normal forms make it easier to maintain the database by reducing the amount of
redundant data that needs to be updated, deleted, or modified. This helps to improve database management
and reduce the risk of errors or inconsistencies.
Database design: Normal forms provide guidelines for designing databases that are efficient, flexible, and
scalable. This helps to ensure that the database can be easily modified, updated, or expanded as needed.
1. BCNF is free from redundancy.
2. If a relation is in BCNF, then 3NF is also satisfied.
3. If all attributes of relation are prime attribute, then the relation is always in 3NF.
4. A relation in a Relational Database is always and at least in 1NF form.
5. Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
6. If a Relation has only singleton candidate keys( i.e. every candidate key consists of only 1 attribute),
then the Relation is always in 2NF( because no Partial functional dependency possible).
7. Sometimes going for BCNF form may not preserve functional dependency. In that case go for BCNF only
if the lost FD(s) is not required, else normalize till 3NF only.
8. There are many more Normal forms that exist after BCNF, like 4NF and more. But in real world database
systems it’s generally not required to go beyond BCNF.

You might also like